• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

gaspa93/googlemaps-scraper: Google Maps reviews scraping

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称(OpenSource Name):

gaspa93/googlemaps-scraper

开源软件地址(OpenSource Url):

https://github.com/gaspa93/googlemaps-scraper

开源编程语言(OpenSource Language):

Python 100.0%

开源软件介绍(OpenSource Introduction):

Google Maps Scraper

Scraper of Google Maps reviews. The code allows to extract the most recent reviews starting from the url of a specific Point Of Interest (POI) in Google Maps. An additional extension helps to monitor and incrementally store the reviews in a MongoDB instance.

Installation

Follow these steps to use the scraper:

  • Download Chromedrive from here.

  • Install Python packages from requirements file, either using pip, conda or virtualenv:

      conda create --name scraping python=3.6 --file requirements.txt
    

Note: Python >= 3.6 is required.

Basic Usage

The scraper.py script needs two main parameters as input:

  • --i: input file name, containing a list of urls that point to Google Maps place reviews (default: urls.txt)
  • --N: number of reviews to retrieve, starting from the most recent (default: 100)

Example:

python scraper.py --N 50

generates a csv file containing last 50 reviews of places present in urls.txt

In current implementation, the CSV file is handled as an external function, so if you want to change path and/or name of output file, you need to modify that function.

Additionally, other parameters can be provided:

  • --place: boolean value that allows to scrape POI metadata instead of reviews (default: false)
  • --debug: boolean value that allows to run the browser using the graphical interface (default: false)
  • --source: boolean value that allows to store source URL as additional field in CSV (default: false)
  • --sort-by: string value among most_relevant, newest, highest_rating or lowest_rating (default: newest), developed by @quaesito and that allows to change sorting behavior of reviews

For a basic description of logic and approach about this software development, have a look at the Medium post

Monitoring functionality

The monitor.py script can be used to have an incremental scraper and override the limitation about the number of reviews that can be retrieved. The only additional requirement is to install MongoDB on your laptop: you can find a detailed guide on the official site

The script takes two input:

  • --i: same as monitor.py script
  • --from-date: string date in the format YYYY-MM-DD, gives the minimum date that the scraper tries to obtain

The main idea is to periodically run the script to obtain latest reviews: the scraper stores them in MongoDB up to get either the latest review of previous run or the day indicated in the input parameter.

Take a look to this Medium post to have more details about the idea behind this feature.

Notes

Url must be provided as expected, you can check the example file urls.txt to have an idea of what is a correct url. If you want to generate the correct url:

  1. Go to Google Maps and look for a specific place;
  2. Click on the number of reviews in the parenthesis;
  3. Save the url that is generated from previous interaction.



鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap