我有一个 JSON 文件,我正在解析该文件以尝试查看域是否处于活动状态。
我的代码如下:
for i in range(len(json_data)):
print(i)
if int(json_data[i]['response']['result_count'])>0:
for j in range(len(json_data[i]['response']['matches'])):
try:
socket.gethostbyname(json_data[i]['response']['matches'][j]['domain'] )
except:
del json_data[i]['response']['matches'][j]['domain']
我尝试以以下形式使用多线程:
def run_half():
for i in range(0,round(len(data_json)/2)):
print(i) # make this len(data_json) if NOT testing, range(10) if testing
if int(data_json[i]['response']['result_count'])>0:
for j in range(len(data_json[i]['response']['matches'])):
try:
socket.gethostbyname( data_json[i]['response']['matches'][j]['domain'] )
except:
del data_json[i]['response']['matches'][j]['domain']
def run_half_2():
for i in range(round((len(data_json)/2))+1,len(data_json)):
print(i) # make this len(data_json) if NOT testing, range(10) if testing
if int(data_json[i]['response']['result_count'])>0:
for j in range(len(data_json[i]['response']['matches'])):
try:
socket.gethostbyname( data_json[i]['response']['matches'][j]['domain'] )
except:
del data_json[i]['response']['matches'][j]['domain']
t1 = threading.Thread(target=run_half(),args=(10,))
t2= threading.Thread(target=run_half_2(),args=(10,))
t1.start()
t2.start()
t1.join()
t2.join()
由于某种原因,我没有注意到计算时间的变化。
如有任何意见或建议,我们将不胜感激。谢谢!
是的,线程在这里很有用,因为这是一个网络/IO 绑定任务。
不是像上面那样将工作分成组,更好的方法是将每个主机名检查视为一个单独的任务,并将执行分散到一定数量的工作人员。
我建议你使用Python标准库提供的线程池执行器来实现这一点。
https://docs.python.org/3/library/concurrent.futures.html
这个概念是将每个长时间运行的任务扇出到未来,然后扇入以收集所有结果。
例如,
list_of_work_to_do = ["url1", "url2", "url3"]
with ThreadPoolExecutor(max_workers=8) as executor:
futures = []
# Fan-out work.
for my_url in list_of_work_to_do:
future = executor.submit(long_running_task, my_url)
futures.append(future)
# Fan-in results.
results = [future.result() for future in futures]