我在使用RQ工马处理大量作业,遇到了问题。
work-horse terminated unexpectedly; waitpid returned None
{"message": "my_queue: my_job() (dcf797c4-1434-4b77-a344-5bbb1f775113)"}
{"message": "Killed horse pid 8451"}
{"message": "Moving job to FailedJobRegistry (work-horse terminated unexpectedly; waitpid returned None)"}
self.kill_horse()
行,a HorseMonitorTimeoutException
必须发生和 utcnow - job.started_at
区别必须是>job.timeout(超时是巨大的btw)。 while True:
try:
with UnixSignalDeathPenalty(self.job_monitoring_interval, HorseMonitorTimeoutException):
retpid, ret_val = os.waitpid(self._horse_pid, 0)
break
except HorseMonitorTimeoutException:
# Horse has not exited yet and is still running.
# Send a heartbeat to keep the worker alive.
self.heartbeat(self.job_monitoring_interval + 5)
# Kill the job from this side if something is really wrong (interpreter lock/etc).
if job.timeout != -1 and (utcnow() - job.started_at).total_seconds() > (job.timeout + 1):
self.kill_horse()
break
对此,我的下一步应该是什么?
我认为RQ的最新版本(https:/github.comrqrqreleasestagv1.4.0。)有解决方案。
Fixed a bug that may cause early termination of scheduled or requeued jobs. Thanks @rmartin48!