我正在从 SEC Edgar 网站提取数据。行包括 Q1、Q2、Q3 和 FY 会计期间。我想从 FY 值中减去前四分之三的值。
这是代码:
def BackOutFiscalYears ( df):
if df.empty:
return df
Length = len(df)
if Length < 4:
return df
ndx = Length - 1
while ndx >=4:
if df.iloc[ndx]['fp'] == 'FY':
QtrValue = df.iloc[ndx]['val'] - (df.iloc[ndx-1]['val'] + df.iloc[ndx-2]['val'] + df.iloc[ndx-3]['val'])
QtrValue = int(QtrValue)
# print ( f'QtrValue {QtrValue}')
# the next line is what I want (it looks logical) but it throws a warning and does not replace the cell
# df.iloc[ndx]['val'] = QtrValue
# tried these things but they all blow up
# df.at[ndx,'val'] = QtrValue
# df.loc[ndx]['val'] = QtrValue
# df.loc[df.loc[ndx,'val']] = QtrValue
# this does weird stuff.
# converts everything to floats. Even though QtrValue is a int.
# add rows with correct values in 'val' column, all other columns NaN
df.loc[ndx,'val'] = QtrValue
ndx -= 1
return df
这是调用函数之前和之后的数据帧:
start end val accn fy fp form filed frame
156 2020-01-01 2020-03-31 17571000000 0001558370-21-004922 2021 Q1 10-Q 2021-04-27 CY2020Q1
160 2020-04-01 2020-06-30 18123000000 0001558370-21-009351 2021 Q2 10-Q 2021-07-27 CY2020Q2
164 2020-07-01 2020-09-30 17560000000 0001558370-21-014734 2021 Q3 10-Q 2021-11-05 CY2020Q3
167 2020-01-01 2020-12-31 55179000000 0001558370-23-002376 2022 FY 10-K 2023-02-28 CY2020
169 2021-01-01 2021-03-31 13187000000 0001558370-22-005983 2022 Q1 10-Q 2022-04-26 CY2021Q1
173 2021-04-01 2021-06-30 14218000000 0001558370-22-010985 2022 Q2 10-Q 2022-07-25 CY2021Q2
177 2021-07-01 2021-09-30 13251000000 0001558370-22-015322 2022 Q3 10-Q 2022-10-25 CY2021Q3
179 2021-01-01 2021-12-31 57350000000 0001558370-23-002376 2022 FY 10-K 2023-02-28 CY2021
181 2022-01-01 2022-03-31 14197000000 0001558370-23-006656 2023 Q1 10-Q 2023-04-25 CY2022Q1
185 2022-04-01 2022-06-30 15535000000 0000051143-23-000021 2023 Q2 10-Q 2023-07-25 CY2022Q2
189 2022-07-01 2022-09-30 14107000000 0000051143-23-000032 2023 Q3 10-Q 2023-10-31 CY2022Q3
190 2022-01-01 2022-12-31 60530000000 0001558370-23-002376 2022 FY 10-K 2023-02-28 CY2022
191 2023-01-01 2023-03-31 14252000000 0001558370-23-006656 2023 Q1 10-Q 2023-04-25 CY2023Q1
193 2023-04-01 2023-06-30 15475000000 0000051143-23-000021 2023 Q2 10-Q 2023-07-25 CY2023Q2
195 2023-07-01 2023-09-30 14752000000 0000051143-23-000032 2023 Q3 10-Q 2023-10-31 CY2023Q3
start end val accn fy fp form filed frame
181 2022-01-01 2022-03-31 1.419700e+10 0001558370-23-006656 2023.0 Q1 10-Q 2023-04-25 CY2022Q1
185 2022-04-01 2022-06-30 1.553500e+10 0000051143-23-000021 2023.0 Q2 10-Q 2023-07-25 CY2022Q2
189 2022-07-01 2022-09-30 1.410700e+10 0000051143-23-000032 2023.0 Q3 10-Q 2023-10-31 CY2022Q3
190 2022-01-01 2022-12-31 6.053000e+10 0001558370-23-002376 2022.0 FY 10-K 2023-02-28 CY2022
191 2023-01-01 2023-03-31 1.425200e+10 0001558370-23-006656 2023.0 Q1 10-Q 2023-04-25 CY2023Q1
193 2023-04-01 2023-06-30 1.547500e+10 0000051143-23-000021 2023.0 Q2 10-Q 2023-07-25 CY2023Q2
195 2023-07-01 2023-09-30 1.475200e+10 0000051143-23-000032 2023.0 Q3 10-Q 2023-10-31 CY2023Q3
55 NaN NaN 1.669400e+10 NaN NaN NaN NaN NaN NaN
51 NaN NaN 1.925000e+09 NaN NaN NaN NaN NaN NaN
47 NaN NaN 2.343000e+09 NaN NaN NaN NaN NaN NaN
39 NaN NaN 2.254200e+10 NaN NaN NaN NaN NaN NaN
35 NaN NaN 2.177100e+10 NaN NaN NaN NaN NaN NaN
27 NaN NaN 2.411300e+10 NaN NaN NaN NaN NaN NaN
23 NaN NaN 3.341400e+10 NaN NaN NaN NaN NaN NaN
19 NaN NaN 2.767100e+10 NaN NaN NaN NaN NaN NaN
我知道这个问题已经被问过并回答过很多次,并且我已经尝试了几种建议的解决方案但无济于事。
我尝试过这个:
df.iloc[ndx]['val'] = QtrValue
但是该值没有更新,我收到了警告。
经过 StackOverflow 和其他地方的多次尝试后,我尝试了:
df.loc[ndx,'val'] = QtrValue
但是添加了奇怪的行,这是不对的
非常感谢。汤姆
我认为您想从 FY 值中减去所有季度值的总和。如果是的话,您可以尝试以下方法:
df['year']=df['frame'].str.extract('(\d+)') #Extracts year from frame column
df['value_sum']=df.groupby(['year'])['val'].transform("sum") # sums val/value col by year
df['year_sub_quarter']=np.where(df['fp']=='FY',df['val']-(df['value_sum']-df['val']),np.nan) #calculates FY val - sum of quarterly val
print(df['year_sub_quarter'])