为什么子进程已完成，Popen.poll() 返回的返回码为 None？

Question

我有一些在 Windows 上运行的 Python 代码，它生成一个子进程并等待它完成。子进程的行为不佳，因此脚本进行非阻塞生成调用并在侧面监视进程。如果满足某个超时阈值，它会终止进程，假设它已经脱离轨道。

在某些不可重现的情况下，生成的子进程将消失，并且观察程序不会注意到这一事实。它将继续监视，直到超过超时阈值，尝试终止子进程并收到错误，然后退出。

什么可能导致子进程已经消失而观察者进程无法检测到这一事实？为什么返回码没有被捕获并通过调用

Popen.poll()

返回？

我用来生成和观察过程的代码如下：

import subprocess
import time

def nonblocking_subprocess_call(cmdline):
    print 'Calling: %s' % (' '.join(cmdline))
    p = subprocess.Popen(cmdline, shell=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    return p


def monitor_subprocess(handle, timeout=1200):
    start_time = time.time()
    return_code = 0
    while True:
        time.sleep(60)
        return_code = handle.poll()
        if return_code == None:
            # The process is still running.
            if time.time() - start_time > timeout:
                print 'Timeout (%d seconds) exceeded -- killing process %i' % (timeout, handle.pid)
                return_code = handle.terminate()
                # give the kill command a few seconds to work
                time.sleep(5)
                if not return_code:
                    print 'Error: Failed to kill subprocess %i -- return code was: %s' % (handle.pid, str(return_code))
                # Raise an error to indicate that the process was hung up
                # and we had to kill it.
                raise RuntimeError
        else:
            print 'Process exited with return code: %i' % (return_code)
            break
    return return_code

我看到的是，在进程消失的情况下，第 15 行对

return_code = handle.poll()

的调用将返回

None

而不是返回代码。我知道该进程已完全消失 - 我可以看到它不再存在于任务管理器中。而且我知道该进程在达到超时值之前很久就消失了。

Answer 1

你能给出一个 cmdline 变量的例子吗？您正在生成什么样的子进程？

我在测试脚本上运行了这个，使用以下命令调用批处理文件：

ping -n 151 127.0.0.1>nul

睡150秒

效果很好。

可能是您的子进程没有正确终止。另外，尝试将 sleep 命令更改为 time.sleep(2) 之类的命令。

过去我发现这比更长的睡眠效果更好（特别是如果你的子进程是另一个 python 进程）。

另外，我不确定你的脚本是否有这个，但在 else: 语句中，你有一个额外的括号。

else:
    #print 'Process exited with return code: %i' % (return_code))
    # There's an extra closing parenthesis
    print 'Process exited with return code: %i' % (return_code)
    break

为什么在 join 语句中调用全局 temp_cmdline:

print 'Calling: %s' % (' '.join(temp_cmdline))

我不确定 cmdline 是否是从列表变量 temp_cmdline 中解析的，或者 temp_cmdline 是否是从按空格分割的字符串创建的。不管怎样，如果你的 cmdline 变量是一个字符串，那么直接打印它会更有意义吗？

print 'Calling: %s' % cmdline

Answer 2

子进程对象上的 poll 方法似乎效果不太好。当我生成一些线程来做一些工作时，我曾经遇到过同样的问题。我建议您使用多处理模块。

Answer 3

如果 stdout 被其他东西捕获，Popen.poll 无法按预期工作，您可以检查取出这部分代码“, stdout=subprocess.PIPE”

Answer 4

我遇到了非常相似的问题，其设置与您的非常相似。 @Ema 的回答引导我走上了解决这个问题的正确道路。我可以通过在子进程上偶尔调用

communicate()

来让它工作。我的理论是，如果您的进程创建太多输出，从而触发一些阻止

PIPE

设置

poll()

的错误，

returncode

缓冲区就会满。

尝试将代码更改为：

def monitor_subprocess(handle, timeout=1200):
    start_time = time.time()
    return_code = 0
    while True:
        time.sleep(60)
        try:
          handle_stdout, _ = handle.communicate(timeout=0.25)
        except subprocess.TimeoutExpired:
          pass

        return_code = handle.returncode
        # ...

为什么子进程已完成，Popen.poll() 返回的返回码为 None？

问题描述投票：0回答：4

4个回答

最新问题

为什么子进程已完成，Popen.poll() 返回的返回码为 None？

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4