我将在我的应用程序中使用会话来实现请求,但我已经阅读了一些有关线程和进程安全问题的线程,但这些并不是最近的。
我的应用程序运行 800 多个进程,所以我有点担心从 API 恢复的数据会混乱。
我想到了这种实现模型,所以想知道这是否是在多进程程序中处理请求会话的正确方法。
import multiprocessing
import time
import sys
import requests
from requests.packages.urllib3.util.retry import Retry
session = None
def initialize_session():
global session
if session is None:
session = requests.Session()
retry_strategy = Retry(
total=3,
status_forcelist=[429,500,502,503,504],
allowed_methods=["POST"],
backoff_factor=1
)
adapter = requests.adapters.HTTPAdapter( max_retries=retry_strategy,
pool_connections=1, pool_maxsize=1)
session.mount("https://", adapter)
def worker(args):
global session
initialize_session()
try:
for i in range(10):
print("I am process "+str(j)+" and my cookie is ")
print(session.cookies.get_dict())
session.cookies.set('worker', j)
time.sleep(5) #do some api work, function calls, etc
except:
raise
processes = []
for j in range(0, 4):
p = multiprocessing.Process(target=worker, args=(j,))
processes.append(p)
time.sleep(0.1)
p.start()
for p in processes:
p.join()
您提供的实现对于如何在多处理上下文中处理全局
session
存在一些问题。具体来说,在 Python 多处理中跨进程共享全局 requests.Session
可能会导致问题,因为每个进程都维护自己的内存空间。尝试在进程之间共享像 requests.Session 这样的可变对象可能会导致竞争条件、损坏的状态或其他意外行为。
以下是如何调整代码以在多进程设置中正确处理会话:
import multiprocessing
import time
import requests
from requests.packages.urllib3.util.retry import Retry
def create_session():
# Each process gets its own session
session = requests.Session()
retry_strategy = Retry(
total=3,
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["POST"],
backoff_factor=1
)
adapter = requests.adapters.HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
return session
def worker(j):
# Each worker process initializes its own session
session = create_session()
try:
for i in range(10):
print(f"I am process {j} and my cookie is ")
print(session.cookies.get_dict())
# setting a cookie unique to the process
session.cookies.set('worker', str(j))
time.sleep(5)
except Exception as e:
print(f"Error in process {j}: {e}")
if __name__ == "__main__":
processes = []
for j in range(4): # Creating 4 processes
p = multiprocessing.Process(target=worker, args=(j,))
processes.append(p)
time.sleep(0.1)
p.start()
for p in processes:
p.join()