我能够抓取两个reddit页面，直到某一点，然后我收到一个错误，我不明白为什么

Question

我试图在subreddit页面上进行一些NLP。我有一大堆代码可以收集大量数据两个网页。它会刮擦数据，直到达到范围（40）。这很好，除了我知道我选择的subreddits比我的代码允许我刮更多的帖子。

谁能弄明白这里发生了什么？

posts_test = []
url = 'https://www.reddit.com/r/TheOnion/.json?after='
for i in range(40):
    res = requests.get(url, headers={'User-agent': 'Maithili'})
    the_onion = res.json()
    for i in range(25):
        post_t = []
        post_t.append(the_onion['data']['children'][i]['data']['title'])
        post_t.append(the_onion['data']['children'][i]['data']['subreddit'])
        posts_test.append(post_t)
    after = the_onion['data']['after']
    url = 'https://www.reddit.com/r/TheOnion/.json?after=' + after

    time.sleep(3)

# Not the onion
url = 'https://www.reddit.com/r/nottheonion/.json?after='

for i in range(40):
    res3 = requests.get(url, headers=headers2)
    not_onion_json = res2.json()
    for i in range(25):
        post_t = []
        post_t.append(not_onion_json['data']['children'][i]['data']['title'])
        post_t.append(not_onion_json['data']['children'][i]['data']['subreddit'])
        posts_test.append(post_t)
    after = not_onion_json['data']['after']
    url = "https://www.reddit.com/r/nottheonion/.json?after=" + after

    time.sleep(3)


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-57-6c1cfdd42421> in <module>
      7     for i in range(25):
      8         post_t = []
----> 9         post_t.append(the_onion['data']['children'][i]['data']['title'])
     10         post_t.append(the_onion['data']['children'][i]['data']['subreddit'])
     11         posts_test.append(post_t)

IndexError: list index out of range"```

Answer 1

你在40停止的原因是因为你告诉python在40停止

for i in range(40):

好消息是你在这里收集下一页

after = not_onion_json['data']['after']

假设一旦你到达after == null页面的末尾，我建议执行while循环。就像是

while after != None:

这将持续到你结束。

我能够抓取两个reddit页面，直到某一点，然后我收到一个错误，我不明白为什么

问题描述投票：0回答：1

1个回答

最新问题

我能够抓取两个reddit页面，直到某一点，然后我收到一个错误，我不明白为什么

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1