数据:
令
X
为大致在域 [0, 2000000]
中的标记整数数组。 |X|
的大小约为 3000 个元素。标签位于域 [A, B, C]
.
例如:
[(13, A), (16, B), (32, A), (84, C), ...]
限制:
每个数据点都可以在轴上移动
±50
,但最终排序必须遵守以下标准:
目标:
有 2 个指标需要通过可变权重进行优化:
我尝试过的:
我最初选择了
scipy.optimize.minimize()
,但事实证明它不适用于仅整数:
def stretch_optimization(initial_array: np.ndarray, weight_variance=1.0, weight_average=0.1):
ms_data = initial_array[:, 0]
def objective_function(ms_data):
gaps = np.diff(ms_data)
# Get gaps variance, we want the gaps to be as uniform as possible.
gaps_variance = np.var(gaps)
# Get gaps average, we want to maximize the gaps size.
gaps_average = np.mean(gaps)
return weight_variance * gaps_variance - weight_average * gaps_average
bounds = [(ms - 50, ms + 50) for ms, _ in initial_array]
def order_constraint(ms):
return np.diff(ms)
constraints = [{'type': 'ineq', 'fun': order_constraint}]
result: np.ndarray = minimize(objective_function, ms_data, bounds=bounds, constraints=constraints)
我不完全确定接下来该去哪里。我读到也许我应该使用
scipy.optimize.milp
(?) 但我不清楚我应该用它做什么。
也许你可以使用pulp模块:
包含 20 个变量的数组的示例(从 0 到 100,约束为 +/-5):
import pulp
import numpy as np
np.random.seed(42)
# X = np.sort(np.random.randint(0, 2_000_000, size=3000))
X = np.sort(np.random.randint(0, 100, size=20))
prob = pulp.LpProblem("stretch", pulp.LpMinimize)
# amounts to add to each variable (to be computed):
variables = [
# pulp.LpVariable(f"x{i:04}", lowBound=-50, upBound=50, cat="Integer")
pulp.LpVariable(f"x{i:04}", lowBound=-5, upBound=5, cat="Integer")
for i, v in enumerate(X)
]
maximize_average_gap_size = pulp.LpVariable("maximize_average_gap_size")
minimize_variance = pulp.LpVariable("minimize_variance")
# out problem: miniminize variance and maximize gap size
prob += (minimize_variance * 1.0) - (maximize_average_gap_size * 0.1)
gaps = [
((X[idx] + variables[idx]) - (X[idx - 1] + variables[idx - 1]))
for idx in range(1, len(variables))
]
mean_gaps = pulp.lpSum(gaps) / len(gaps)
# define "maximize_average_gap_size"
prob += maximize_average_gap_size == mean_gaps
abs_minimize_variance = pulp.LpVariable("minimize_variance")
abs_vars = []
for i, g in enumerate(gaps):
abs_var = pulp.LpVariable(f"abs_var_{i:04}")
prob += (g - mean_gaps) <= abs_var
prob += -(g - mean_gaps) <= abs_var
abs_vars.append(abs_var)
# define "minimize_variance" (as sum of absolute values)
prob += minimize_variance == pulp.lpSum(abs_vars)
# constraint: every next value after adding has to be greater or equal than before
for idx in range(1, len(variables)):
prob += (X[idx - 1] + variables[idx - 1]) <= (X[idx] + variables[idx])
prob.solve()
# print(pulp.LpStatus[prob.status])
# for v in prob.variables():
# print(v.name, "=", v.varValue)
print("BEFORE:")
print(X)
gaps_before = np.diff(X)
print(f"{np.var(gaps_before)=}")
print(f"{np.mean(gaps_before)=}")
print("AFTER:")
after = [int(x + v.value()) for x, v in zip(X, variables)]
print(after)
gaps_after = np.diff(after)
print(f"{np.var(gaps_after)=}")
print(f"{np.mean(gaps_after)=}")
打印:
Welcome to the CBC MILP Solver
Version: 2.10.3
Build Date: Dec 15 2019
...
BEFORE:
[ 1 2 14 20 21 23 29 37 51 52 60 71 74 74 82 86 87 87 92 99]
np.var(gaps_before)=np.float64(17.185595567867033)
np.mean(gaps_before)=np.float64(5.157894736842105)
AFTER:
[2, 7, 12, 18, 23, 28, 34, 40, 46, 54, 60, 66, 69, 74, 77, 82, 87, 92, 97, 102]
np.var(gaps_after)=np.float64(1.1412742382271468)
np.mean(gaps_after)=np.float64(5.2631578947368425)