有没有办法通过GitHub API访问GitHub配置文件页面上的“存储库贡献”模块中的数据?理想情况下,整个列表,而不仅仅是前五名,显然都可以在网上获得。
使用Google BigQuery和GitHub Archive,我拉了所有的存储库,我做了一个pull请求使用:
SELECT repository_url
FROM [githubarchive:github.timeline]
WHERE payload_pull_request_user_login ='rgbkrk'
GROUP BY repository_url;
您可以使用类似的语义来提取您贡献的存储库数量以及它们所在的语言:
SELECT COUNT(DISTINCT repository_url) AS count_repositories_contributed_to,
COUNT(DISTINCT repository_language) AS count_languages_in
FROM [githubarchive:github.timeline]
WHERE payload_pull_request_user_login ='rgbkrk';
如果您正在寻找总体贡献,其中包括报告使用的问题
SELECT COUNT(DISTINCT repository_url) AS count_repositories_contributed_to,
COUNT(DISTINCT repository_language) AS count_languages_in
FROM [githubarchive:github.timeline]
WHERE actor_attributes_login = 'rgbkrk'
GROUP BY repository_url;
不同之处在于actor_attributes_login
来自Issue Events API。
您可能还想捕获自己的回购,这可能没有您自己提出的问题或PR。
使用GraphQL API v4,您现在可以使用以下方式获得这些贡献的回购:
{
viewer {
repositoriesContributedTo(first: 100, contributionTypes: [COMMIT, ISSUE, PULL_REQUEST, REPOSITORY]) {
totalCount
nodes {
nameWithOwner
}
pageInfo {
endCursor
hasNextPage
}
}
}
}
如果您有超过100个贡献回购(包括您的回购),您将不得不通过分页指定after: "END_CURSOR_VALUE"
中的repositoriesContributedTo
以获取下一个请求。
I tried implementing something like this a while ago for a Github summarizer ...我获取用户所贡献的存储库的步骤如下(以我自己的用户为例):
https://api.github.com/search/issues?q=type:pr+state:closed+author:megawac&per_page=100&page=1
https://api.github.com/repos/jashkenas/underscore/contributors
repos/:owner/:repo/contributors
https://api.github.com/users/megawac/subscriptions
https://api.github.com/users/megawac/orgs https://api.github.com/orgs/jsdelivr/repos
如果用户未提交拉取请求但已添加为贡献者,则会错过回购。我们可以通过搜索来增加找到这些回购的几率
1)打开任何问题(不只是关闭拉动请求) 2)repos用户已加星标
显然,这需要的请求比我们想要的要多得多,但是当你让它们变成软糖特征时你能做什么呢?
你可以使用Search provided by GitHub API。您的查询应如下所示:
https://api.github.com/search/repositories?q=%20+fork:true+user:username
fork参数设置为true可确保您查询所有用户的repos,包括forked。
但是,如果您想确保用户不仅分叉了存储库,而且还为其做出了贡献,那么您应该使用“搜索”请求迭代每个回购并检查用户是否在其中。这非常糟糕,因为github只返回100个贡献者并且没有解决方案......
我遇到了问题。 (GithubAPI: Get repositories a user has ever committed in)
我发现的一个真正的黑客是有一个名为http://www.githubarchive.org/的项目他们从2011年开始记录所有公共活动。不太理想,但可以提供帮助。
所以,例如,在你的情况下:
SELECT payload_pull_request_head_repo_clone_url
FROM [githubarchive:github.timeline]
WHERE payload_pull_request_base_user_login='outoftime'
GROUP BY payload_pull_request_head_repo_clone_url;
如果我没记错的话,给你的请求清单:
https://github.com/jreidthompson/noaa.git
https://github.com/kkrol89/sunspot.git
https://github.com/rterbush/sunspot.git
https://github.com/ottbot/cassandra-cql.git
https://github.com/insoul/cequel.git
https://github.com/mcordell/noaa.git
https://github.com/hackhands/sunspot_rails.git
https://github.com/lgierth/eager_record.git
https://github.com/jnicklas/sunspot.git
https://github.com/klclee/sunspot.git
https://github.com/outoftime/cequel.git
你可以在这里玩bigquery:bigquery.cloud.google.com,数据架构可以在这里找到:https://github.com/igrigorik/githubarchive.org/blob/master/bigquery/schema.js
我写了一个selenium python脚本来做这件事
"""
Get all your repos contributed to for the past year.
This uses Selenium and Chrome to login to github as your user, go through
your contributions page, and grab the repo from each day's contribution page.
Requires python3, selenium, and Chrome with chromedriver installed.
Change the username variable, and run like this:
GITHUB_PASS="mypassword" python3 github_contributions.py
"""
import os
import sys
import time
from pprint import pprint as pp
from urllib.parse import urlsplit
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
username = 'jessejoe'
password = os.environ['GITHUB_PASS']
repos = []
driver = webdriver.Chrome()
driver.get('https://github.com/login')
driver.find_element_by_id('login_field').send_keys(username)
password_elem = driver.find_element_by_id('password')
password_elem.send_keys(password)
password_elem.submit()
# Wait indefinitely for 2-factor code
if 'two-factor' in driver.current_url:
print('2-factor code required, go enter it')
while 'two-factor' in driver.current_url:
time.sleep(1)
driver.get('https://github.com/{}'.format(username))
# Get all days that aren't colored gray (no contributions)
contrib_days = driver.find_elements_by_xpath(
"//*[@class='day' and @fill!='#eeeeee']")
for day in contrib_days:
day.click()
# Wait until done loading
WebDriverWait(driver, 10).until(
lambda driver: 'loading' not in driver.find_element_by_css_selector('.contribution-activity').get_attribute('class'))
# Get all contribution URLs
contribs = driver.find_elements_by_css_selector('.contribution-activity a')
for contrib in contribs:
url = contrib.get_attribute('href')
# Only care about repo owner and name from URL
repo_path = urlsplit(url).path
repo = '/'.join(repo_path.split('/')[0:3])
if repo not in repos:
repos.append(repo)
# Have to click something else to remove pop-up on current day
driver.find_element_by_css_selector('.vcard-fullname').click()
driver.quit()
pp(repos)
它使用python和selenium自动化Chrome浏览器登录github,转到您的贡献页面,每天点击并从任何贡献中获取回购名称。由于此页面仅显示1年的活动,因此您可以使用此脚本获得所有这些内容。
我没有看到在API中做任何事情。我能找到的最接近的是从公共用户那里得到最新的300个事件(不幸的是300是极限),然后你可以对那些用于其他人的存储库的贡献进行排序。
https://developer.github.com/v3/activity/events/#list-public-events-performed-by-a-user
我们需要让Github在他们的API中实现它。
截至目前GitHub API v3,没有提供获取用户当前连胜的方法。
您可以使用它来计算当前条纹。
https://github.com/users/<username>/contributions.json