我正在解决标准作业车间调度问题。启动是通过 Docker 容器中的气流进行的。 以下是机器参数:
当每次计算的操作数变为大约 1000 次时,求解器会因内存错误而崩溃(任务退出并返回代码 -9),而无法找到解决方案(增加 RAM 量并没有帮助)。在测试过程中发现,可以通过在模型设置中指定大量工人(100 人以上)来解决该问题。但是,工作人员数量增加得越多,找到所需数量的解决方案或停止计时器、调用 stop_search 后,模型将不会退出,但一切都会冻结的可能性就越大。
我的回调如下所示:
class ObjectiveEarlyStopping(cp_model.CpSolverSolutionCallback):
def __init__(self, solution_count_limit=GlobalVariables.SOLUTION_COUNT_LIMIT,
max_execution_time=GlobalVariables.MIN_EXECUTION_WORKTIME):
super(ObjectiveEarlyStopping, self).__init__()
self._solution_count = 0
self._solution_limit = max(2, solution_count_limit)
self._max_execution_time = max_execution_time
self._logger = LocalLogger().logger
self._timer = None
self._no_improvement_timer_limit = GlobalVariables.SOLUTION_IMPROVEMENT_TIMEOUT_SEC
self._total_execution_time = 0
def on_solution_callback(self):
self._solution_count += 1
self._logger.info(f"Feasible solution #{self._solution_count} found.")
if self._solution_count >= self._solution_limit:
self._logger.debug(f"Stopping search after {self._solution_count} solutions")
self._stop_timer()
self._timer = None
self._logger.info(f"Stopping search __3")
super().StopSearch()
self._logger.info(f"Stopping search __4")
else:
self._reset_timer()
def _stop_timer(self):
if self._timer:
self._timer.cancel()
def _reset_timer(self):
self._total_execution_time += self._no_improvement_timer_limit
self._stop_timer()
self._timer = Timer(self._no_improvement_timer_limit, self.StopSearch)
self._timer.start()
def StopSearch(self):
self._logger.debug(f"{self._no_improvement_timer_limit} seconds without improvement")
self._timer = None
if self._solution_count >= 2 or self._total_execution_time > self._max_execution_time:
self._logger.info(f"Stopping search __1")
super().StopSearch()
self._logger.info(f"Stopping search __2")
else:
self._logger.debug("Not enough solutions, continue search")
self._reset_timer()
例如,我的日志如下所示:
[2024-04-12, 13:46:17 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:46:17 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:46:22 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:46:22 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:46:26 UTC] {solution_callback.py:23} INFO - Feasible solution #1 found.
[2024-04-12, 13:46:27 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:46:27 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:46:29 UTC] {solution_callback.py:23} INFO - Feasible solution #2 found.
[2024-04-12, 13:46:32 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:46:32 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:46:37 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:46:37 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:46:42 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:46:43 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:46:48 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:46:48 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:46:49 UTC] {solution_callback.py:46} DEBUG - 20 seconds without improvement
[2024-04-12, 13:46:49 UTC] {solution_callback.py:50} INFO - Stopping search __1
[2024-04-12, 13:46:49 UTC] {solution_callback.py:52} INFO - Stopping search __2
[2024-04-12, 13:46:53 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:46:53 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:46:58 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:46:58 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:47:03 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:47:03 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:47:08 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:47:08 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:47:13 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:47:13 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:47:18 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:47:18 UTC] {job.py:216} DEBUG - [heartbeat]
[2024-04-12, 13:47:23 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: production_capacity_balancing.production_shceduling_optimizer manual__2024-04-12T13:34:02+00:00 [running]> from DB
[2024-04-12, 13:47:23 UTC] {job.py:216} DEBUG - [heartbeat]
并且此心跳消息将一直显示,直到我手动停止它。 我还尝试将
super().StopSearch()
替换为 super().stop_search()
,其中有 has_response()
检查呼叫,但这也没有帮助。
请告诉我退出模型时如何避免冻结?
根据输入参数选择工人数量的最佳方法是什么?
我建议使用 32 到 64 个工作线程,并且不要超过核心数量。 您添加的每个工作人员大致都会添加一份模型副本。
帮助求解器的最佳方法是给出一个可行的解决方案作为提示(如果构建一个解决方案很容易)。
另请注明您使用的是哪个版本。
最后,对于非常大的模型,有时我们没有足够频繁地检查时间限制。尽管如此,我们在这些检查方面仍取得了良好进展。这就是我要求版本的原因。