如何让python程序更快?多线程在这里有用吗?

问题描述 投票:0回答:1

我有一个 JSON 文件,我正在解析该文件以尝试查看域是否处于活动状态。

我的代码如下:

for i in range(len(json_data)):
    print(i)        
    if int(json_data[i]['response']['result_count'])>0:  
        for j in range(len(json_data[i]['response']['matches'])):
            try: 
                socket.gethostbyname(json_data[i]['response']['matches'][j]['domain'] )
            except:
                del json_data[i]['response']['matches'][j]['domain']

我尝试以以下形式使用多线程:

def run_half():
    for i in range(0,round(len(data_json)/2)):
        print(i)        # make this len(data_json) if NOT testing, range(10) if testing
        if int(data_json[i]['response']['result_count'])>0:  
            for j in range(len(data_json[i]['response']['matches'])):
                try: 
                    socket.gethostbyname( data_json[i]['response']['matches'][j]['domain'] )
                except:
                    del data_json[i]['response']['matches'][j]['domain']
def run_half_2():
    for i in range(round((len(data_json)/2))+1,len(data_json)):
        print(i)        # make this len(data_json) if NOT testing, range(10) if testing
        if int(data_json[i]['response']['result_count'])>0:  
            for j in range(len(data_json[i]['response']['matches'])):
                try: 
                    socket.gethostbyname( data_json[i]['response']['matches'][j]['domain'] )
                except:
                    del data_json[i]['response']['matches'][j]['domain']

t1 = threading.Thread(target=run_half(),args=(10,))
t2= threading.Thread(target=run_half_2(),args=(10,))

t1.start()
t2.start()

t1.join()
t2.join()

由于某种原因,我没有注意到计算时间的变化。

如有任何意见或建议,我们将不胜感激。谢谢!

python json multithreading sockets concurrency
1个回答
2
投票

是的,线程在这里很有用,因为这是一个网络/IO 绑定任务。

不是像上面那样将工作分成组,更好的方法是将每个主机名检查视为一个单独的任务,并将执行分散到一定数量的工作人员。

我建议你使用Python标准库提供的线程池执行器来实现这一点。

https://docs.python.org/3/library/concurrent.futures.html

这个概念是将每个长时间运行的任务扇出到未来,然后扇入以收集所有结果。

例如,

    list_of_work_to_do = ["url1", "url2", "url3"]

    with ThreadPoolExecutor(max_workers=8) as executor:
        futures = []
    
        # Fan-out work.
        for my_url in list_of_work_to_do:
            future = executor.submit(long_running_task, my_url)
            futures.append(future)

        # Fan-in results.
        results = [future.result() for future in futures]
© www.soinside.com 2019 - 2024. All rights reserved.