初学者刮痧计划：如何将Twitter ID转换为用户名？

Question

感谢@BittoBennichan，我已经能够build这个小小的python thingy刮擦在Twitter上发布的媒体中标记的用户id：

from bs4 import BeautifulSoup
from selenium import webdriver
import time

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# go to page
driver.get("http://twitter.com/XXXXXX/media")

#You can adjust it but this works fine
SCROLL_PAUSE_TIME = 2

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height


# Now that the page is fully scrolled, grab the source code.
src = driver.page_source

#Past it into BS
soup = BeautifulSoup(src, 'html.parser')
divs = soup.find_all('div',class_='account')

#PRINT RESULT
#print('printing results')
#for div in divs:
#    print(div['data-user-id'])


#SAVE IN FILE
print('Saving results')    
with open('file.txt','w') as f:
   for div in divs:
        f.write(div['data-user-id']+'\n')

所以程序运行正常。它检索id并打印它们或将它们写入txt文件。我现在可以将此ID列表粘贴到Calc中，并添加一个数据透视表，以查看每个ID被标记的次数。但！我还有一些问题：

- 我只获取ID，而不是用户名。现在更简单的方法是：在收集ID并在文件中将它们放在一起的同时收集用户名？或者将ids文件转换为用户名文件？那最后的解决方案怎么可能呢？

- 我不能无限向下滚动。我回到了2018年9月，但就是这样。它只是说“回到顶部”。现在，是因为我没有登录Twitter或因为一些内置限制？

如果您有任何意见，想法等...任何帮助将不胜感激。谢谢！

编辑1：我从here找到了这个（Tweepy）解决方案：

def get_usernames(ids):
    """ can only do lookup in steps of 100;
        so 'ids' should be a list of 100 ids
    """
    user_objs = api.lookup_users(user_ids=ids)
    for user in user_objs:
        print(user.screen_name)

所以，由于我的列表超过100，我应该这样做：

对于更大的id集合，您可以将其置于for循环中并在遵守twitter API限制时进行相应调用。

Answer 1

您的代码没有为我生成ID，因此最初无法测试这些解决方案。不知道是什么问题，因为我没有调查它，但似乎我的源html没有任何class='account'。所以我在代码中修改了它，只是说，“找到所有具有属性”data-user-id“的div标签：

 divs = soup.find_all('div', {"data-user-id" : re.compile(r".*")})

1）要拥有一个csv，你可以编写并保存为csv，而不是txt。另一种选择是使用id创建一个数据帧然后使用pandas用df.to_csv('path/to/file.csv')写入csv

2）将它放入列表中也是非常容易的。

创建ID列表 - for Loop

#TO PUT INTO LIST (FOR LOOP)
id_list = []
for div in divs:
    id_list.append(div['data-user-id'])

print (id_list)

创建ID列表 - 列表理解

#TO PUT INTO LIST (LIST COMPREHENSION)
id_list = [ div['data-user-id'] for div in divs ]

写入CSV

#SAVE IN FILE
import csv
print('Saving results')    
with open('file.csv','w', newline='') as f:
    writer = csv.writer(f)
    for div in divs:
        writer.writerow([div['data-user-id']])

初学者刮痧计划：如何将Twitter ID转换为用户名？

问题描述投票：0回答：1

1个回答

最新问题

初学者刮痧计划：如何将Twitter ID转换为用户名？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1