• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

khuyentran1401/top-github-scraper: Scape top GitHub repositories and users based ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称:

khuyentran1401/top-github-scraper

开源软件地址:

https://github.com/khuyentran1401/top-github-scraper

开源编程语言:

HTML 80.6%

开源软件介绍:

Medium article

Top Github Scraper

Scrape top Github repositories and users based on keywords.

I used this tool to analyze the top 1k machine learning users and create an interactive map to search for users based on their location.

demo

Setup

Installation

pip install top-github-scraper

Add Credentials

To make sure you can scrape many repositories and users, add your GitHub's credentials to .env file.

touch .env

Add your username and token to .env file:

GITHUB_USERNAME=yourusername
GITHUB_TOKEN=yourtoken

Usage

View full documentation here.

Get Top Github Repositories' URLs

from top_github_scraper import get_top_repo_urls

get_top_repo_urls(keyword="machine learning", stop_page=10)

Output at top_repo_urls_<keyword>_<sort_by>_<start_page>_<end_page>.json:

[
    "/josephmisiti/awesome-machine-learning",
    "/wepe/MachineLearning",
    "/udacity/machine-learning",
    "/Jack-Cherish/Machine-Learning",
    "/ZuzooVn/machine-learning-for-software-engineers",
    "/rasbt/python-machine-learning-book",
    "/lawlite19/MachineLearning_Python",
    "/lazyprogrammer/machine_learning_examples",
    "/trekhleb/homemade-machine-learning",
    "/ujjwalkarn/Machine-Learning-Tutorials"
]

Get Top Github Repositories' Information

from top_github_scraper import get_top_repos

get_top_repos("machine learning", stop_page=10)

Output for 1 repository at top_repo_info_<keyword>_<sort_by>_<start_page>_<end_page>.json :

{
        "stargazers_count": 48620,
        "forks_count": 12155,
        "contributors": {
            "login": [
                "josephmisiti",
                "josephmmisiti",
                "hslatman",
                "0asa",
                "ajkl",
                "ipcenas",
                "cogmission",
                "spekulatius",
                "basickarl",
                "NathanEpstein"
            ],
            "url": [
                "https://api.github.com/users/josephmisiti",
                "https://api.github.com/users/josephmmisiti",
                "https://api.github.com/users/hslatman",
                "https://api.github.com/users/0asa",
                "https://api.github.com/users/ajkl",
                "https://api.github.com/users/ipcenas",
                "https://api.github.com/users/cogmission",
                "https://api.github.com/users/spekulatius",
                "https://api.github.com/users/basickarl",
                "https://api.github.com/users/NathanEpstein"
            ],
            "contributions": [
                671,
                105,
                21,
                12,
                11,
                9,
                8,
                7,
                7,
                7
            ]
        }
    }

Get Top Github Contributors' Profiles

from top_github_scraper import get_top_contributors

get_top_contributors("machine learning", stop_page=10)

Output at top_contributor_info_<keyword>_<sort_by>_<start_page>_<end_page>.csv:

login url type name company location email hireable bio public_repos public_gists followers following
0 josephmisiti https://api.github.com/users/josephmisiti User Joseph Misiti Math & Pencil "Brooklyn, NY" True Mathematician & Co-founder of Math & Pencil 229 142 2705 275
1 josephmmisiti https://api.github.com/users/josephmmisiti User 0 0 2 0
2 hslatman https://api.github.com/users/hslatman User Herman Slatman DistributIT 133 20 469 67
3 0asa https://api.github.com/users/0asa User Vincent Botta Belgium "Innovation Engineer @evs-broadcast, previously Data Scientist @kensuio, E-Marketing Tools Manager @Diagenode, cofounder @Antibody-Adviser and photographer" 35 15 25 16
4 ajkl https://api.github.com/users/ajkl User Ajinkya Kale [email protected] 58 1 29 4
5 ipcenas https://api.github.com/users/ipcenas User 79 0 1 0
6 cogmission https://api.github.com/users/cogmission User David Ray Third planet from the sun... [email protected] Humanity's freedom and abundance through the pursuit of technological innovation in the area of cognitive applications - Cognition Mission 30 19 54 44
7 spekulatius https://api.github.com/users/spekulatius User Peter Thaleikis @bringyourownideas 127.0.0.1 True Software engineer focused on solutions using open source and simply filling in the gaps to fulfill the requirements. 42 1 232 920
8 basickarl https://api.github.com/users/basickarl User Karl Morrison "Malmö, Sweden" [email protected] The question is: Will you take me seriously 5 1 12 6
9 NathanEpstein https://api.github.com/users/NathanEpstein User Nathan Epstein "New York, NY" [email protected] True 23 12 208 0

Get Top Github Users' Profiles

from top_github_scraper import get_top_users

get_top_users("machine learning", stop_page=10)

Output at top_user_info_<keyword>_<start_page>_<end_page>.csv

login url type name company location email hireable bio public_repos public_gists followers following
0 rasbt https://api.github.com/users/rasbt User Sebastian Raschka UW-Madison "Madison, WI" "Machine Learning researcher & open source contributor. Author of ""Python Machine Learning."" Asst. Prof. of Statistics @ UW-Madison." 71 5 13888 35
1 tqchen https://api.github.com/users/tqchen User Tianqi Chen "CMU, OctoML" Large scale Machine Learning 28 1 8611 126
2 halfrost https://api.github.com/users/halfrost User halfrost @Alibaba Shanghai China [email protected]

鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap