如何最佳地拉伸和均匀化一维整数点?

问题描述 投票:0回答:1

数据:

X
为大致在域
[0, 2000000]
中的标记整数数组。
|X|
的大小约为 3000 个元素。标签位于域
[A, B, C]
.

例如:

[(13, A), (16, B), (32, A), (84, C), ...]


限制:

每个数据点都可以在轴上移动

±50
,但最终排序必须遵守以下标准:

  • 整数值必须保持递增顺序;
  • 标签必须保持与初始数组相同的顺序。
  • 整数值必须保持整数,没有浮点数。

目标:

有 2 个指标需要通过可变权重进行优化:

  1. 最小化整数的方差。 (强烈加权,比如1.0)
  2. 最大化整数的平均距离。 (轻微加权,比如0.1)

我尝试过的:

我最初选择了

scipy.optimize.minimize()
,但事实证明它不适用于仅整数:

def stretch_optimization(initial_array: np.ndarray, weight_variance=1.0, weight_average=0.1):
    ms_data = initial_array[:, 0]
    
    def objective_function(ms_data):
        gaps = np.diff(ms_data)
    
        # Get gaps variance, we want the gaps to be as uniform as possible.
        gaps_variance = np.var(gaps)
    
        # Get gaps average, we want to maximize the gaps size.
        gaps_average = np.mean(gaps)
    
        return weight_variance * gaps_variance - weight_average * gaps_average
    
    bounds = [(ms - 50, ms + 50) for ms, _ in initial_array]
    
    def order_constraint(ms):
        return np.diff(ms)
    
    constraints = [{'type': 'ineq', 'fun': order_constraint}]
    
    result: np.ndarray = minimize(objective_function, ms_data, bounds=bounds, constraints=constraints)

我不完全确定接下来该去哪里。我读到也许我应该使用

scipy.optimize.milp
(?) 但我不清楚我应该用它做什么。

python arrays optimization variance
1个回答
0
投票

也许你可以使用模块:

包含 20 个变量的数组的示例(从 0 到 100,约束为 +/-5):

import pulp
import numpy as np

np.random.seed(42)
# X = np.sort(np.random.randint(0, 2_000_000, size=3000))
X = np.sort(np.random.randint(0, 100, size=20))

prob = pulp.LpProblem("stretch", pulp.LpMinimize)

# amounts to add to each variable (to be computed):
variables = [
    # pulp.LpVariable(f"x{i:04}", lowBound=-50, upBound=50, cat="Integer")
    pulp.LpVariable(f"x{i:04}", lowBound=-5, upBound=5, cat="Integer")
    for i, v in enumerate(X)
]

maximize_average_gap_size = pulp.LpVariable("maximize_average_gap_size")
minimize_variance = pulp.LpVariable("minimize_variance")

# out problem: miniminize variance and maximize gap size
prob += (minimize_variance * 1.0) - (maximize_average_gap_size * 0.1)

gaps = [
    ((X[idx] + variables[idx]) - (X[idx - 1] + variables[idx - 1]))
    for idx in range(1, len(variables))
]
mean_gaps = pulp.lpSum(gaps) / len(gaps)

# define "maximize_average_gap_size"
prob += maximize_average_gap_size == mean_gaps

abs_minimize_variance = pulp.LpVariable("minimize_variance")

abs_vars = []
for i, g in enumerate(gaps):
    abs_var = pulp.LpVariable(f"abs_var_{i:04}")
    prob += (g - mean_gaps) <= abs_var
    prob += -(g - mean_gaps) <= abs_var
    abs_vars.append(abs_var)

# define "minimize_variance" (as sum of absolute values)
prob += minimize_variance == pulp.lpSum(abs_vars)

# constraint: every next value after adding has to be greater or equal than before
for idx in range(1, len(variables)):
    prob += (X[idx - 1] + variables[idx - 1]) <= (X[idx] + variables[idx])

prob.solve()

# print(pulp.LpStatus[prob.status])
# for v in prob.variables():
#     print(v.name, "=", v.varValue)

print("BEFORE:")
print(X)

gaps_before = np.diff(X)
print(f"{np.var(gaps_before)=}")
print(f"{np.mean(gaps_before)=}")

print("AFTER:")
after = [int(x + v.value()) for x, v in zip(X, variables)]
print(after)
gaps_after = np.diff(after)
print(f"{np.var(gaps_after)=}")
print(f"{np.mean(gaps_after)=}")

打印:

Welcome to the CBC MILP Solver                                                                                                                                                                
Version: 2.10.3                                                                                                                                                                               
Build Date: Dec 15 2019 


... 

BEFORE:                                                                                        
[ 1  2 14 20 21 23 29 37 51 52 60 71 74 74 82 86 87 87 92 99]
np.var(gaps_before)=np.float64(17.185595567867033)                                                                                                                                            
np.mean(gaps_before)=np.float64(5.157894736842105)                                                                                                                                            
AFTER:                                                                                         
[2, 7, 12, 18, 23, 28, 34, 40, 46, 54, 60, 66, 69, 74, 77, 82, 87, 92, 97, 102]                                                                                                               
np.var(gaps_after)=np.float64(1.1412742382271468)      
np.mean(gaps_after)=np.float64(5.2631578947368425)                                        
© www.soinside.com 2019 - 2024. All rights reserved.