Python 应用函数在循环中创建新行

问题描述 投票:0回答:1

目标:以下是一个数据集示例,其中包含 “ID”、“PHASENAME”、“CDAYS”、“MULTI_FACTOR”、“DAY_COUNTER”和“DAILY_LABOR_PERCENT”。目标是获取每个“ID”、“PHASENAME”、“CDAYS”,并将“DAY_COUNTER”从最后一天增加到 100 天。我还要重复“DAILY_LABOR_PERCENT”。

在此示例数据集中,此人工作了 14 天,每天记录其劳动百分比。我不想分娩 14 天,而是想要分娩 100 天(公式:“DAY_COUNTER”*“Multi_Factor”和“DAILY_LABOR_PERCENT”/“Multi_Factor”)。

我在代码中做了什么:我首先关注日数增长。 我创建了一个函数“day_factor”,它将执行上面的第一个公式,并尝试迭代每个索引范围并应用该函数。

错误消息: TypeError:字符串索引必须是整数。

另外,我担心一旦放大这个数据集,范围为 100 将不起作用。

寻找

  1. 解决我的错误消息的建议。
  2. 如果这能帮助我将数据集扩展到 100 行
  3. 有关如何使用更大的数据集来解决此问题的建议。

代码

import pandas as pd
import numpy as np


data={
    "ID": [ "BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR"],
    "PHASENAME": [ "C","C","C","C","C","C","C","C","C","C","C","C","C","C"],
    "C_DAYS": [ 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0],
    "Multi_Factor": [7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857],
    "DAY_COUNTER": [1,2,3,4,5,6,7,8,9,10,11,12,13,14],
    "DAILY_LABOR_PERCENT": [1.0,5.0,9.0,11.0,10.0,9.0,9.0,9.0,8.0,10.0,8.0,7.0,4.0,0.0],
    }

df=pd.DataFrame(data)
df1=df.copy()

#creating a math function 

def day_factor(row):
    return row['DAY_COUNTER'] * row['Multi_Factor']

#creating empty list to store rows for each id, phasename, and cdays
new_rows= []

#iterating through each index range and applying function to get day_counter to 100

for i in range(100):
    new_row= df1.iloc[i % len(df1)].apply(day_factor)
    new_rows.append(new_row)

#creating new df with the new rows
    
df2=pd.DataFrame(new_rows,columns=['New_DAY_COUNTER'])
    

#not creating 100 rows  and getting TypeError: string indices must be integers

python pandas loops iteration
1个回答
0
投票

作为一般规则,您希望避免迭代数据帧行,因为这是非常低效的。

IIUC,您可以使用像

numpy.tile
这样的矢量化内容重复数据帧,直到获得所需的行数,然后相应地调整值:

n = 100
first_day = df["DAY_COUNTER"].iloc[0]

rep = np.tile(df.values, (n // len(df) + 1, 1))
out = pd.DataFrame(rep, columns=df.columns).iloc[:n]

out["DAY_COUNTER"] = range(first_day, n + first_day)
     ID PHASENAME C_DAYS Multi_Factor  DAY_COUNTER DAILY_LABOR_PERCENT
0   BAR         C   14.0     7.142857            1                 1.0
1   BAR         C   14.0     7.142857            2                 5.0
2   BAR         C   14.0     7.142857            3                 9.0
3   BAR         C   14.0     7.142857            4                11.0
4   BAR         C   14.0     7.142857            5                10.0
..  ...       ...    ...          ...          ...                 ...
95  BAR         C   14.0     7.142857           96                 7.0
96  BAR         C   14.0     7.142857           97                 4.0
97  BAR         C   14.0     7.142857           98                 0.0
98  BAR         C   14.0     7.142857           99                 1.0
99  BAR         C   14.0     7.142857          100                 5.0
© www.soinside.com 2019 - 2024. All rights reserved.