我有一个类,用于评估给定输入参数列表的结果。我想知道可以最小化班级给出的结果的最佳参数集。由于输入必须保持整数,因此我读到与 Scipy.minimize() 相比,Pulp 可能是一个解决方案。
我尝试实现它,但是
prob += ...
行不起作用。这是重现我遇到的问题的代码的简化版本。看来我不明白 Pulp 的一些重要内容。
import pandas as pd
import pulp as pl
class ComplexClass():
def __init__(self, path_data:str):
# self.data = pd.read_csv(path_data)
d = {'col1': [i for i in range(100)], 'col2': [i**2 for i in range(100)]}
self.df = pd.DataFrame(data=d)
def eval_value(self, param: list):
self.p1 = param[0]
self.p2 = param[1]
# [...] Complexe stuff manipulating pandas DataFrame
self.df['col3'] = self.df['col1'].rolling(window=self.p1).mean()
self.df['col4'] = self.df['col2'].rolling(window=self.p2).mean()
self.df['col5'] = self.df['col3']+self.df['col4']
self.best_value = self.df['col5'].iloc[-1]
def func_to_minimize(input_parm:list, initialized_class):
initialized_class.eval_value(input_parm)
return initialized_class.best_value
path = './path2data/data.csv'
my_class = ComplexClass(path_data=path)
# ===== INIT TEST =====
my_param = [2, 10]
print(f'For input {my_param}\n => Value to minimize = {func_to_minimize(input_parm=my_param, initialized_class=my_class)}')
# ====== OPTIMIZATION WITH PULP =====
# Create the 'prob' variable to contain the problem data
prob = pl.LpProblem("Best_Param", pl.LpMinimize)
# The 2 variables Beef and Chicken are created with a lower limit of zero
x1 = pl.LpVariable("param1", lowBound=2, upBound=10, cat=pl.LpInteger)
x2 = pl.LpVariable("param2", lowBound=2, upBound=20, cat=pl.LpInteger)
# The objective function is added to 'prob' first
prob += func_to_minimize([x1, x2], my_class) # <= doesn't work
# The problem data is written to an .lp file
prob.writeLP("Best_Param.lp")
# The problem is solved using PuLP's choice of Solver
prob.solve()
评估
self.df['col3'] = self.df['col1'].rolling(window=self.p1).mean()
时出现错误消息:ValueError: window must be an integer 0 or greater
。事实上,window 中使用的 self.p1
是 LpVariable
而不是整数。
当要最小化的结果必须由函数而不是线进行评估时,如何使用 Pulp 最小化问题?
谢谢你。
问题是你正在尝试使用 PuLP,它是一个线性的 编程(LP)优化库,其函数涉及 本质上不是线性或整数可编程的操作。 PuLP 专为线性规划和整数规划而设计, 这意味着目标函数和约束必须表述为 线性(或仿射)关系。以下是解决您遇到的问题的方法:
PuLP 无法直接处理由非线性评估的目标函数 操作或涉及对动态操作数据的其他 Python 函数或方法的调用, 比如
pandas
中的滚动窗口计算。
PuLP 变量 (LpVariable
) 在优化问题解决之前并不是实际数字。
因此,它们不能直接用于需要的计算或函数
问题定义期间的具体数字输入。
考虑到您需要根据类所做的评估来优化参数 动态处理数据的方法,一种可能的方法是使用 元启发式算法例如 遗传算法, 模拟退火, 或其他不需要目标函数梯度或 甚至它的线性度。像
deap
这样的图书馆,
simanneal
,
或 pyeasyga
在这里很有用。
这些算法可以处理任意目标函数并且可以与
整数约束。
这是使用遗传算法(通过
deap
库)来处理问题的概念大纲:
注意: 确保已安装
库,或使用deap
安装它 在运行下面的示例之前。pip install deap
import random
import pandas as pd
from deap import base, creator, tools, algorithms
class ComplexClass():
def __init__(self, path_data:str):
# self.data = pd.read_csv(path_data)
d = {'col1': [i for i in range(100)], 'col2': [i**2 for i in range(100)]}
self.df = pd.DataFrame(data=d)
def eval_value(self, param: list):
self.p1 = param[0]
self.p2 = param[1]
# [...] Complexe stuff manipulating pandas DataFrame
self.df['col3'] = self.df['col1'].rolling(window=self.p1).mean()
self.df['col4'] = self.df['col2'].rolling(window=self.p2).mean()
self.df['col5'] = self.df['col3'] + self.df['col4']
self.best_value = self.df['col5'].iloc[-1]
def func_to_minimize(input_parm:list, initialized_class):
initialized_class.eval_value(input_parm)
return initialized_class.best_value
# Define the evaluation function for the genetic algorithm
def evalGA(params):
result = func_to_minimize(params, my_class)
return (result,)
# Setting up the genetic algorithm
my_class = ComplexClass("")
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessMin)
toolbox = base.Toolbox()
# Register a function named "attribute_param1" that generates random numbers
# between 2 and 10. This function will be used to generate values for the
# variable "param1"
toolbox.register("attribute_param1", random.randint, 2, 10)
# Do the same process to register a function that generates values for the
# variable "param2". Change only the interval that for "param2" should be
# between 2 and 20
toolbox.register("attribute_param2", random.randint, 2, 20)
# Register another function that is equivalent to generating values for
# "param1" and "param2" one time
toolbox.register("individual", tools.initCycle, creator.Individual,
(toolbox.attribute_param1, toolbox.attribute_param2),
n=1)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("evaluate", evalGA)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutUniformInt, low=[2, 2], up=[10, 20], indpb=0.2)
toolbox.register("select", tools.selTournament, tournsize=3)
# Create population and run the genetic algorithm
population = toolbox.population(n=50)
result = algorithms.eaSimple(population, toolbox, cxpb=0.5, mutpb=0.2, ngen=40, verbose=True)
best_individual = tools.selBest(population, 1)[0]
print('Best Parameters:', best_individual)
print('Best Score:', best_individual.fitness.values[0])
# Prints:
#
# Best Parameters: [10, 20]
# Best Score: 8138.0
这种方法直接评估遗传算法的适应度函数中的类方法, 允许您需要的计算的任何复杂性。
您可以在此处找到有关如何使用
deap
的更多信息:https://deap.readthedocs.io/en