目标:以下是一个数据集示例,其中包含 “ID”、“PHASENAME”、“CDAYS”、“MULTI_FACTOR”、“DAY_COUNTER”和“DAILY_LABOR_PERCENT”。目标是获取每个“ID”、“PHASENAME”、“CDAYS”,并将“DAY_COUNTER”从最后一天增加到 100 天。我还要重复“DAILY_LABOR_PERCENT”。
在此示例数据集中,此人工作了 14 天,每天记录其劳动百分比。我不想分娩 14 天,而是想要分娩 100 天(公式:“DAY_COUNTER”*“Multi_Factor”和“DAILY_LABOR_PERCENT”/“Multi_Factor”)。
我在代码中做了什么:我首先关注日数增长。 我创建了一个函数“day_factor”,它将执行上面的第一个公式,并尝试迭代每个索引范围并应用该函数。
错误消息: TypeError:字符串索引必须是整数。
另外,我担心一旦放大这个数据集,范围为 100 将不起作用。
寻找
代码
import pandas as pd
import numpy as np
data={
"ID": [ "BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR"],
"PHASENAME": [ "C","C","C","C","C","C","C","C","C","C","C","C","C","C"],
"C_DAYS": [ 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0],
"Multi_Factor": [7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857],
"DAY_COUNTER": [1,2,3,4,5,6,7,8,9,10,11,12,13,14],
"DAILY_LABOR_PERCENT": [1.0,5.0,9.0,11.0,10.0,9.0,9.0,9.0,8.0,10.0,8.0,7.0,4.0,0.0],
}
df=pd.DataFrame(data)
df1=df.copy()
#creating a math function
def day_factor(row):
return row['DAY_COUNTER'] * row['Multi_Factor']
#creating empty list to store rows for each id, phasename, and cdays
new_rows= []
#iterating through each index range and applying function to get day_counter to 100
for i in range(100):
new_row= df1.iloc[i % len(df1)].apply(day_factor)
new_rows.append(new_row)
#creating new df with the new rows
df2=pd.DataFrame(new_rows,columns=['New_DAY_COUNTER'])
#not creating 100 rows and getting TypeError: string indices must be integers
作为一般规则,您希望避免迭代数据帧行,因为这是非常低效的。
IIUC,您可以使用像
numpy.tile
这样的矢量化内容重复数据帧,直到获得所需的行数,然后相应地调整值:
n = 100
first_day = df["DAY_COUNTER"].iloc[0]
rep = np.tile(df.values, (n // len(df) + 1, 1))
out = pd.DataFrame(rep, columns=df.columns).iloc[:n]
out["DAY_COUNTER"] = range(first_day, n + first_day)
ID PHASENAME C_DAYS Multi_Factor DAY_COUNTER DAILY_LABOR_PERCENT
0 BAR C 14.0 7.142857 1 1.0
1 BAR C 14.0 7.142857 2 5.0
2 BAR C 14.0 7.142857 3 9.0
3 BAR C 14.0 7.142857 4 11.0
4 BAR C 14.0 7.142857 5 10.0
.. ... ... ... ... ... ...
95 BAR C 14.0 7.142857 96 7.0
96 BAR C 14.0 7.142857 97 4.0
97 BAR C 14.0 7.142857 98 0.0
98 BAR C 14.0 7.142857 99 1.0
99 BAR C 14.0 7.142857 100 5.0