开源软件名称:gopa
开源软件地址:https://gitee.com/medcl/gopa
开源软件介绍:
GOPA, A Spider Written in Go.
Goal- Light weight, low footprint, memory requirement should < 100MB
- Easy to deploy, no runtime or dependency required
- Easy to use, no programming or scripts ability needed, out of box features
Screenshoot
How to useRequirementsSetupFirst of all, get it, two opinions: download the pre-built package or compile it yourself. Download Pre Built PackageGo to Release or Snapshot page, download the right package for your platform. Note: Darwin is for Mac Compile The Package ManuallySo far, we have: gopa , the main program, a single binary.
config/ , elasticsearch related scripts etc.
gopa.yml , main configuration for gopa.
Optional ConfigBy default, Gopa works well except indexing, if you want to use elasticsearch as indexing, follow these steps: - Create a index in elasticsearch with script
config/elasticsearch/gopa-index-mapping.sh (!important settings!)
Example curl -XPUT "http://localhost:9200/gopa-index" -H 'Content-Type: application/json' -d' { "mappings": { "doc": { "properties": { "host": { "type": "keyword", "ignore_above": 256 }, "snapshot": { "properties": { "bold": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 }, "content_type": { "type": "keyword", "ignore_above": 256 }, "file": { "type": "keyword", "ignore_above": 256 }, "ext": { "type": "keyword", "ignore_above": 256 }, "h1": { "type": "text" }, "h2": { "type": "text" }, "h3": { "type": "text" }, "h4": { "type": "text" }, "hash": { "type": "keyword", "ignore_above": 256 }, "id": { "type": "keyword", "ignore_above": 256 }, "images": { "properties": { "external": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } }, "internal": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } } } }, "italic": { "type": "text" }, "links": { "properties": { "external": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } }, "internal": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } } } }, "path": { "type": "keyword", "ignore_above": 256 }, "sim_hash": { "type": "keyword", "ignore_above": 256 }, "lang": { "type": "keyword", "ignore_above": 256 }, "screenshot_id": { "type": "keyword", "ignore_above": 256 }, "size": { "type": "long" }, "text": { "type": "text" }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "version": { "type": "long" } } }, "task": { "properties": { "breadth": { "type": "long" }, "created": { "type": "date" }, "depth": { "type": "long" }, "id": { "type": "keyword", "ignore_above": 256 }, "original_url": { "type": "keyword", "ignore_above": 256 }, "reference_url": { "type": "keyword", "ignore_above": 256 }, "schema": { "type": "keyword", "ignore_above": 256 }, "status": { "type": "integer" }, "updated": { "type": "date" }, "url": { "type": "keyword", "ignore_above": 256 }, "last_screenshot_id": { "type": "keyword", "ignore_above": 256 } } } } } } }'
Note: Elasticsearch version should >= v5.3 - Enable index module in
gopa.yml , update the elasticsearch's setting:
- module: index enabled: true ui: enabled: true elasticsearch: endpoint: http://localhost:9200 index_prefix: gopa- username: elastic password: changeme StartGopa doesn't require any dependencies, simply run ./gopa to start the program. Gopa can be run as daemon(Note: Only available on Linux and Mac): Example ➜ gopa git:(master) ✗ ./bin/gopa --daemon ________ ________ __________ _____ / _____/ \_____ \\______ \/ _ \/ \ ___ / | \| ___/ /_\ \\ \_\ \/ | \ | / | \ \______ /\_______ /____| \____|__ / \/ \/ \/[gopa] 0.10.0_SNAPSHOT///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 ///[10-21 16:01:09] [INF] [instance.go:23] workspace: data/gopa/nodes/0[gopa] started.
Also run ./gopa -h to get the full list of command line options. Example ➜ gopa git:(master) ✗ ./bin/gopa -h ________ ________ __________ _____ / _____/ \_____ \\______ \/ _ \/ \ ___ / | \| ___/ /_\ \\ \_\ \/ | \ | / | \ \______ /\_______ /____| \____|__ / \/ \/ \/[gopa] 0.10.0_SNAPSHOT///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 ///Usage of ./bin/gopa:-config stringthe location of config file (default "gopa.yml")-cpuprofile stringwrite cpu profile to this file-daemonrun in background as daemon-debugrun in debug mode, gopa will quit with panic error-log stringthe log level,options:trace,debug,info,warn,error (default "info")-log_path stringthe log path (default "log")-memprofile stringwrite memory profile to this file-pidfile stringpidfile path (only for daemon)-pprof stringenable and setup pprof/expvar service, eg: localhost:6060 , the endpoint will be: http://localhost:6060/debug/pprof/ and http://localhost:6060/debug/vars
StopIt's safety to press ctrl+c stop the current running Gopa, Gopa will handle the rest,saving the checkpoint,you may restore the job later,the world is still in your hand. If you are running Gopa as daemon, you may stop it like this: ConfigurationUI- Search Console
http://127.0.0.1:9001/ - Admin Console
http://127.0.0.1:9001/admin/
APIArchitectureContributingYou are sincerely and warmly welcomed to play with this project,from UI style to core features,or just a piece of document,welcome! let's make it better. LicenseReleased under the Apache License, Version 2.0 . |
请发表评论