如何构建随机森林和粒子群优化器的混合模型来找到产品的最佳折扣?

问题描述 投票:0回答:1

我需要为每种产品(例如 A、B、C)找到最佳折扣,以便最大化总销售额。我对每种产品都有现有的随机森林模型,将折扣和季节映射到销售。 如何组合这些模型并将它们提供给优化器以找到每种产品的最佳折扣?

选型理由:

  1. RF:它能够在预测变量和响应(sales_uplift_norm)之间提供更好的(w.r.t线性模型)关系。
  2. PSO:在许多白皮书中建议(可在researchgate / IEEE获得),也可在python中使用该包herehere

输入数据样本数据用于在产品级别构建模型。数据一览如下: enter image description here

我遵循的想法/步骤:

  1. 为每个产品构建射频模型

     # pre-processed data
     products_pre_processed_data = {key:pre_process_data(df, key) for key, df in df_basepack_dict.items()}
     # rf models
     products_rf_model = {key:rf_fit(df) for key, df in products_pre_processed_data .items()}
    
  • 将模型传递给优化器
    • 目标函数:最大化sales_uplift_norm(RF模型的响应变量)
    • 约束:
      • 总支出(A + B + C 的支出 <= 20), spends = total_units_sold_of_products * discount_percentage * mrp_of_products
      • 产品下限(A, B, C): [0.0, 0.0, 0.0] # 折扣百分比下限
      • 产品上限(A,B,C): [0.3,0.4,0.4]#折扣百分比上限

sudo/示例代码#因为我无法找到将product_models传递到优化器的方法。

from pyswarm import pso
def obj(x):
    model1 = products_rf_model.get('A')
    model2 = products_rf_model.get('B')
    model3 = products_rf_model.get('C')
    return -(model1 + model2 + model3) # -ve sign as to maximize

def con(x):
    x1 = x[0]
    x2 = x[1]
    x3 = x[2]
    return np.sum(units_A*x*mrp_A + units_B*x*mrp_B + units_C* x *spend_C)-20 # spend budget

lb = [0.0, 0.0, 0.0]
ub = [0.3, 0.4, 0.4]

xopt, fopt = pso(obj, lb, ub, f_ieqcons=con)

如何将 PSO 优化器(或任何其他优化器,如果我没有遵循正确的优化器)与 RF 一起使用?

添加用于模型的函数:

def pre_process_data(df,product): data = df.copy().reset_index() # print(data) bp = product print("----------product: {}----------".format(bp)) # Pre-processing steps print("pre process df.shape {}".format(df.shape)) #1. Reponse var transformation response = data.sales_uplift_norm # already transformed #2. predictor numeric var transformation numeric_vars = ['discount_percentage'] # may include mrp, depth df_numeric = data[numeric_vars] df_norm = df_numeric.apply(lambda x: scale(x), axis = 0) # center and scale #3. char fields dummification #select category fields cat_cols = data.select_dtypes('category').columns #select string fields str_to_cat_cols = data.drop(['product'], axis = 1).select_dtypes('object').astype('category').columns # combine all categorical fields all_cat_cols = [*cat_cols,*str_to_cat_cols] # print(all_cat_cols) #convert cat to dummies df_dummies = pd.get_dummies(data[all_cat_cols]) #4. combine num and char df together df_combined = pd.concat([df_dummies.reset_index(drop=True), df_norm.reset_index(drop=True)], axis=1) df_combined['sales_uplift_norm'] = response df_processed = df_combined.copy() print("post process df.shape {}".format(df_processed.shape)) # print("model fields: {}".format(df_processed.columns)) return(df_processed) def rf_fit(df, random_state = 12): train_features = df.drop('sales_uplift_norm', axis = 1) train_labels = df['sales_uplift_norm'] # Random Forest Regressor rf = RandomForestRegressor(n_estimators = 500, random_state = random_state, bootstrap = True, oob_score=True) # RF model rf_fit = rf.fit(train_features, train_labels) return(rf_fit)
    
python machine-learning optimization random-forest particle-swarm
1个回答
4
投票
您可以在下面找到完整的解决方案!

与您的方法的根本区别如下:

    由于随机森林模型将
  1. season
     特征作为输入,因此必须计算每个季节的最佳折扣。
  2. 检查
  3. pyswarm 的文档,con
     函数产生的输出必须符合 
    con(x) >= 0.0
    。因此,正确的约束是 
    20 - sum(...)
    ,而不是相反。另外,没有给出 
    units
    mrp
     变量;我只是假设值为 1,您可能想要更改这些值。
对原始代码的其他修改包括:

  1. sklearn
    的预处理和管道包装器,以简化预处理步骤。
  2. 最佳参数存储在输出
  3. .xlsx
     文件中。
  4. PSO 的
  5. maxiter
     参数已设置为 
    5
     以加快调试速度,您可能需要将其值设置为另一个值(默认 = 
    100
    )。
因此代码是:

import pandas as pd from sklearn.pipeline import Pipeline from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.compose import ColumnTransformer from sklearn.ensemble import RandomForestRegressor from sklearn.base import clone # ====================== RF TRAINING ====================== # Preprocessing def build_sample(season, discount_percentage): return pd.DataFrame({ 'season': [season], 'discount_percentage': [discount_percentage] }) columns_to_encode = ["season"] columns_to_scale = ["discount_percentage"] encoder = OneHotEncoder() scaler = StandardScaler() preproc = ColumnTransformer( transformers=[ ("encoder", Pipeline([("OneHotEncoder", encoder)]), columns_to_encode), ("scaler", Pipeline([("StandardScaler", scaler)]), columns_to_scale) ] ) # Model myRFClassifier = RandomForestRegressor( n_estimators = 500, random_state = 12, bootstrap = True, oob_score = True) pipeline_list = [ ('preproc', preproc), ('clf', myRFClassifier) ] pipe = Pipeline(pipeline_list) # Dataset df_tot = pd.read_excel("so_data.xlsx") df_dict = { product: df_tot[df_tot['product'] == product].drop(columns=['product']) for product in pd.unique(df_tot['product']) } # Fit print("Training ...") pipe_dict = { product: clone(pipe) for product in df_dict.keys() } for product, df in df_dict.items(): X = df.drop(columns=["sales_uplift_norm"]) y = df["sales_uplift_norm"] pipe_dict[product].fit(X,y) # ====================== OPTIMIZATION ====================== from pyswarm import pso # Parameter of PSO maxiter = 5 n_product = len(pipe_dict.keys()) # Constraints budget = 20 units = [1, 1, 1] mrp = [1, 1, 1] lb = [0.0, 0.0, 0.0] ub = [0.3, 0.4, 0.4] # Must always remain >= 0 def con(x): s = 0 for i in range(n_product): s += units[i] * mrp[i] * x[i] return budget - s print("Optimization ...") # Save optimal discounts for every product and every season df_opti = pd.DataFrame(data=None, columns=df_tot.columns) for season in pd.unique(df_tot['season']): # Objective function to minimize def obj(x): s = 0 for i, product in enumerate(pipe_dict.keys()): s += pipe_dict[product].predict(build_sample(season, x[i])) return -s # PSO xopt, fopt = pso(obj, lb, ub, f_ieqcons=con, maxiter=maxiter) print("Season: {}\t xopt: {}".format(season, xopt)) # Store result df_opti = pd.concat([ df_opti, pd.DataFrame({ 'product': list(pipe_dict.keys()), 'season': [season] * n_product, 'discount_percentage': xopt, 'sales_uplift_norm': [ pipe_dict[product].predict(build_sample(season, xopt[i]))[0] for i, product in enumerate(pipe_dict.keys()) ] }) ]) # Save result df_opti = df_opti.reset_index().drop(columns=['index']) df_opti.to_excel("so_result.xlsx") print("Summary") print(df_opti)
它给出:

Training ... Optimization ... Stopping search: maximum iterations reached --> 5 Season: summer xopt: [0.1941521 0.11233673 0.36548761] Stopping search: maximum iterations reached --> 5 Season: winter xopt: [0.18670604 0.37829516 0.21857777] Stopping search: maximum iterations reached --> 5 Season: monsoon xopt: [0.14898102 0.39847885 0.18889792] Summary product season discount_percentage sales_uplift_norm 0 A summer 0.194152 0.175973 1 B summer 0.112337 0.229735 2 C summer 0.365488 0.374510 3 A winter 0.186706 -0.028205 4 B winter 0.378295 0.266675 5 C winter 0.218578 0.146012 6 A monsoon 0.148981 0.199073 7 B monsoon 0.398479 0.307632 8 C monsoon 0.188898 0.210134
    
© www.soinside.com 2019 - 2024. All rights reserved.