我希望安排每小时运行我的python脚本,并将数据保存在elasticsearch索引中。因此,我使用了我编写的函数set_interval,该函数使用了tweepy库。但这无法正常工作,因为我需要它。它每分钟运行一次,并将数据保存在索引中。即使设置了等于3600的秒数,它也会在每分钟运行一次。但是我想将其配置为每小时运行一次。
我该如何解决?这是我的python脚本:
def call_at_interval(time, callback, args):
while True:
timer = Timer(time, callback, args=args)
timer.start()
timer.join()
def set_interval(time, callback, *args):
Thread(target=call_at_interval, args=(time, callback, args)).start()
def get_all_tweets(screen_name):
# authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
screen_name = ""
# initialize a list to hold all the tweepy Tweets
alltweets = []
# make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name=screen_name, count=200)
# save most recent tweets
alltweets.extend(new_tweets)
# save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
# keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
#print
#"getting tweets before %s" % (oldest)
# all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name=screen_name, count=200, max_id=oldest)
# save most recent tweets
alltweets.extend(new_tweets)
# update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#print
#"...%s tweets downloaded so far" % (len(alltweets))
outtweets = [{'ID': tweet.id_str, 'Text': tweet.text, 'Date': tweet.created_at, 'author': tweet.user.screen_name} for tweet in alltweets]
def save_es(outtweets, es): # Peps8 convention
data = [ # Please without s in data
{
"_index": "index name",
"_type": "type name",
"_id": index,
"_source": ID
}
for index, ID in enumerate(outtweets)
]
helpers.bulk(es, data)
save_es(outtweets, es)
print('Run at:')
print(datetime.now())
print("\n")
set_interval(3600, get_all_tweets(screen_name))
摆脱所有计时器代码,只需编写逻辑和cron将执行此操作,因为您需要在crontab -e
0 * * * * /path/to/python /path/to/script.py
0 * * * *
表示每zero分钟运行一次,您可以找到更多的解释here
为什么您需要那么多的复杂性才能每小时完成一些任务?您可以按以下方式每隔一小时运行一次脚本,请注意,脚本运行了1小时+才完成工作:
import time def do_some_work(): print("Do some work") time.sleep(1) print("Some work is done!") if __name__ == "__main__": time.sleep(60) # imagine you would like to start work in 1 minute first time while True: do_some_work() time.sleep(3600) # do work every one hour
如果您想每隔一小时准确运行一次脚本,请在下面执行以下代码:
import time import threading def do_some_work(): print("Do some work") time.sleep(4) print("Some work is done!") if __name__ == "__main__": time.sleep(60) # imagine you would like to start work in 1 minute first time while True: thr = threading.Thread(target=do_some_work) thr.start() time.sleep(3600) # do work every one hour
在这种情况下,thr应该可以比3600秒更快地完成工作,尽管没有完成,您仍然会得到结果,但是结果将来自其他尝试,请参见下面的示例:
import time import threading class AttemptCount: def __init__(self, attempt_number): self.attempt_number = attempt_number def do_some_work(_attempt_number): print(f"Do some work {_attempt_number.attempt_number}") time.sleep(4) print(f"Some work is done! {_attempt_number.attempt_number}") _attempt_number.attempt_number += 1 if __name__ == "__main__": attempt_number = AttemptCount(1) time.sleep(1) # imagine you would like to start work in 1 minute first time while True: thr = threading.Thread(target=do_some_work, args=(attempt_number, ),) thr.start() time.sleep(1) # do work every one hour
您会遇到的结果是:
做一些工作1做一些工作1做一些工作1做一些工作1一些工作完成了! 1个做一些工作2一些工作完成了! 2做些工作3一些工作完成了! 3做一些工作4一些工作完成了! 4做一些工作5一些工作完成了! 5做些工作6一些工作完成了! 6做一些工作7一些工作完成了! 7做些工作8一些工作完成了! 8做一些工作9
我喜欢将subprocess.Popen用于此类任务,如果子子流程由于任何原因未能在一小时内完成工作,则只需终止子流程并开始一个新的子流程。
您还可以使用CRON安排某些进程每1小时运行一次。