抓取数据推特时语法无效?

问题描述 投票:0回答:1

当我为爬网数据推特运行此代码时

from tqdm import tqdm
from bs4 import BeautifulSoup as bs
import re, csv
def html2csv(fData, fHasil, full=True):
    urlPattern=re.compile(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')
    print('Loading Data: ', flush = True)
    Tweets, Username, waktu, replies, retweets, likes, Language, urlStatus =  [], [], [], [], [], [], [], []
    soup = bs(open(fData,encoding='utf-8', errors = 'ignore', mode='r'),'html.parser')
    data = soup.find_all('li', class_= 'stream-item')
for i,t in tqdm(enumerate(data)):
    T = t.find_all('p',class_='TweetTextSize')[0] # Loading tweet
    Tweets.append(bs(str(T),'html.parser').text)
    U = t.find_all('span',class_='username')
    Username.append(bs(str(U[0]),'html.parser').text)
    T = t.find_all('a',class_='tweet-timestamp')[0]# Loading Time
    waktu.append(bs(str(T),'html.parser').text)
    RP = t.find_all('span',class_='ProfileTweet-actionCountForAria')[0]# Loading reply, retweet & Likes
    replies.append(int((bs(str(RP), "lxml").text.split()[0]).replace('.','').replace(',','')))
    RT = t.find_all('span',class_='ProfileTweet-actionCountForAria')[1]
    RT = int((bs(str(RT), "lxml").text.split()[0]).replace('.','').replace(',',''))
    retweets.append(RT)
    L  = t.find_all('span',class_='ProfileTweet-actionCountForAria')[2]
    likes.append(int((bs(str(L), "lxml").text.split()[0]).replace('.','').replace(',','')))

try:# Loading Bahasa
    L = t.find_all('span',class_='tweet-language')
    Language.append(bs(str(L[0]), "lxml").text)
    except:
        Language.append('')
        url = str(t.find_all('small',class_='time')[0])
        try:
            url = re.findall(urlPattern,url)[0]
            except:
                try:
                    mulai, akhir = url.find('href="/')+len('href="/'), url.find('" title=')
                    url = 'https://twitter.com/' + url[mulai:akhir]
                    except:
                        url = ''
                        urlStatus.append(url)
                        print('Saving Data to "%s" ' %fHasil, flush = True)
                        dfile = open(fHasil, 'w', encoding='utf-8', newline='')
                        if full:
                            dfile.write('Time, Username, Tweet, Replies, Retweets, Likes, Language, urlStatus\n')
                            with dfile:
                                writer = csv.writer(dfile)
                                for i,t in enumerate(Tweets):
                                    writer.writerow([waktu[i],Username[i],t,replies[i],retweets[i],likes[i],Language[i],urlStatus[i]])
                                    else:
                                        with dfile:
                                            writer = csv.writer(dfile)
                                            for i,t in enumerate(Tweets):
                                                writer.writerow([Username[i],t])
                                                dfile.close()
                                                print('All Finished', flush = True)

我收到此错误

File "<ipython-input-4-4a19b18dc90d>", line 27
    except:
         ^
SyntaxError: invalid syntax
}
python html twitter web-crawler
1个回答
0
投票

在Python中,缩进用于分隔代码块。这与许多其他使用大括号{}来分隔块(例如Java,Javascript和C)的语言不同。因此,由于空白很重要,Python用户必须密切注意何时以及如何缩进其代码。

[当Python遇到程序缩进问题时,它会引发一个称为IndentationError或TabError的异常。[1]

在您的情况下,这是问题所在:

try:
    print(x)
    except:              # wrong indentation 
      print("An exception occurred")

您可以像这样简单地解决它:

try:
  print(x)
except:         # correct, try and catch stay at the same level
  print("An exception occurred")

希望这会有所帮助。祝你好运。

© www.soinside.com 2019 - 2024. All rights reserved.