有没有办法对下面的三嵌套循环进行向量化,计算每小时数据的日均值?下面的函数首先循环一年,然后循环几个月,最后循环几天。它还检查最近一个月和一天,以确保循环不会超出数据的最后一个月或一天。
def hourly2daily(my_var,my_periods):
import pandas as pd
import sys
print('######### Daily2monthly function ##################')
Frs_year =my_periods[0].year
Frs_month =my_periods[0].month
Frs_day =my_periods[0].day
Frs_hour =my_periods[0].hour
Last_year =my_periods[-1].year
Last_month =my_periods[-1].month
Last_day =my_periods[-1].day
Last_hour =my_periods[-1].hour
print('First year is '+str(Frs_year) +'\n'+\
'First months is '+str(Frs_month)+'\n'+\
'First day is '+str(Frs_day)+'\n'+\
'First hour is '+str(Frs_hour))
print(' ')
print('Last year is '+str(Last_year)+'\n'+\
'Last months is '+str(Last_month)+'\n'+\
'Last day is '+str(Last_day)+'\n'+\
'Last hour is '+str(Last_hour))
#### Trick to be used for pd.data_range function only #######
# The following if condition was written for "data_range" function
# to understand why we did that, try
# pd.date_range('01/2000','12/2000',freq='M')
# you will find that there is no Dec, and the only way to do that is
# to do the following if condition tricks.
# at 12 data range know that now it should look at the next year.
Last_year_ = Last_year
Last_month_= Last_month
if (Last_month == 12):
Last_year_ = Last_year+1
Last_month_ = 1
Frs = str(Frs_year)+'/'+str(Frs_month)+'/'+str(Frs_day)+' '+str(Frs_hour)+":00"
Lst = str(Last_year_)+'/'+str(Last_month_)+'/'+str(Last_day)+' '+str(Last_hour)+":00"
my_daily_time=pd.date_range(Frs,Lst,freq='D')
## END of the data_range tricks ###########
nt_days=len(my_daily_time)
nd=np.ndim(my_var)
if (nd == 1): # only time series
var_mean=np.full((nt_days),np.nan)
if (nd == 2): # e.g., time, lat or lon or lev
n1=np.shape(my_var)[1]
var_mean=np.full((nt_days,n1),np.nan)
if (nd == 3): # e.g., time, lat, lon
n1=np.shape(my_var)[1]
n2=np.shape(my_var)[2]
var_mean=np.full((nt_days,n1,n2),np.nan)
if (nd == 4): # e.g., time, lat , lon, lev
n1=np.shape(my_var)[1]
n2=np.shape(my_var)[2]
n3=np.shape(my_var)[3]
var_mean=np.full((nt_days,n1,n2,n3),np.nan)
end_mm=12
k=0
# First loop over Years
#######################
for yy in np.arange(Frs_year,Last_year+1):
print('working on Year '+str(yy))
# in case the last month is NOT 12
if (yy == Last_year):
end_mm=Last_month
print('The last month is '+str(end_mm))
# Second loop over Months
#########################
for mm in np.arange(1,end_mm+1):
end_day=pd.Period(str(yy)+'-'+str(mm)).days_in_month
# in case the last day is not at the end of the month.
if ((yy == Last_year) & (mm == Last_month)):
end_day=Last_day
# Third loop over Days
#######################
for dd in np.arange(1,end_day+1):
print(str(yy)+'-'+str(mm)+'-'+str(dd))
#list all days of the month and year.
I=np.where((my_periods.year == yy) &\
(my_periods.month == mm) &\
(my_periods.day == dd ))[0]
print(I)
# if there is a discontinuity in time.
if len(I) == 0 :
print('Warning time shift here >>')
print('Check the continuity of your time sequence')
sys.exit()
var_mean[k,...]=np.nanmean(my_var[I,...],0)
k=k+1
return var_mean,my_daily_time
我仍然不确定你的函数的输入变量是什么样的,但我将首先使用随机
pd.DataFrame
来说明这个概念:
import numpy as np
import pandas as pd
# Fill a dataframe with random integers
# and set a date range as the index
df = pd.DataFrame(
np.random.randint(0, 100, size=(8785, 2)),
columns=['A', 'B'],
index=pd.date_range(start='2024-01-01', end='2025-01-01', freq='h')
)
df.head()
A B
2024-01-01 00:00:00 5 87
2024-01-01 01:00:00 51 55
2024-01-01 02:00:00 87 40
2024-01-01 03:00:00 85 90
2024-01-01 04:00:00 38 52
现在获取每日平均值:
result = df.groupby(pd.Grouper(freq='D')).agg('mean')
result.head()
A B
2024-01-01 49.875000 51.750000
2024-01-02 42.708333 54.791667
2024-01-03 46.416667 43.833333
2024-01-04 58.541667 55.791667
2024-01-05 53.625000 44.125000