向量化三个嵌套循环,计算每小时数据的日均值

问题描述 投票:0回答:1

有没有办法对下面的三嵌套循环进行向量化,计算每小时数据的日均值?下面的函数首先循环一年,然后循环几个月,最后循环几天。它还检查最近一个月和一天,以确保循环不会超出数据的最后一个月或一天。

def hourly2daily(my_var,my_periods):
    
    import pandas as pd 
    import sys
    
    print('######### Daily2monthly function ##################')
    Frs_year   =my_periods[0].year
    Frs_month  =my_periods[0].month
    Frs_day    =my_periods[0].day
    Frs_hour   =my_periods[0].hour
    
    Last_year  =my_periods[-1].year
    Last_month =my_periods[-1].month
    Last_day   =my_periods[-1].day
    Last_hour  =my_periods[-1].hour
    

    print('First year   is '+str(Frs_year) +'\n'+\
          'First months is '+str(Frs_month)+'\n'+\
          'First day    is '+str(Frs_day)+'\n'+\
          'First hour   is '+str(Frs_hour))
    print('        ')
    
    print('Last  year   is '+str(Last_year)+'\n'+\
          'Last  months is '+str(Last_month)+'\n'+\
          'Last  day    is '+str(Last_day)+'\n'+\
          'Last  hour   is '+str(Last_hour))
    
    
     #### Trick to be used for pd.data_range function only #######
     # The following if condition was written for "data_range" function          
     # to understand why we did that, try 
     # pd.date_range('01/2000','12/2000',freq='M')
     # you will find that there is no Dec, and the only way to do that is
     # to do the following if condition tricks. 
     
    # at 12 data range know that now it should look at the next year.
    Last_year_ = Last_year
    Last_month_= Last_month
    
    if (Last_month ==  12): 
        Last_year_  = Last_year+1
        Last_month_ = 1
    
    Frs = str(Frs_year)+'/'+str(Frs_month)+'/'+str(Frs_day)+' '+str(Frs_hour)+":00"
    
    Lst = str(Last_year_)+'/'+str(Last_month_)+'/'+str(Last_day)+' '+str(Last_hour)+":00"
    
    my_daily_time=pd.date_range(Frs,Lst,freq='D')
    
    ## END of the data_range tricks ###########

    nt_days=len(my_daily_time)
    nd=np.ndim(my_var)

        
    if (nd == 1): # only time series
        var_mean=np.full((nt_days),np.nan)
    
    if (nd == 2): # e.g., time, lat or lon or lev
        n1=np.shape(my_var)[1]
        var_mean=np.full((nt_days,n1),np.nan)
    
    if (nd == 3): #  e.g., time, lat, lon 
        n1=np.shape(my_var)[1]
        n2=np.shape(my_var)[2]
        var_mean=np.full((nt_days,n1,n2),np.nan)
    
    if (nd == 4): # e.g., time, lat , lon, lev 
        n1=np.shape(my_var)[1]
        n2=np.shape(my_var)[2]
        n3=np.shape(my_var)[3]
        var_mean=np.full((nt_days,n1,n2,n3),np.nan)
    
    end_mm=12
    k=0
    # First loop over Years 
    #######################
    for yy in np.arange(Frs_year,Last_year+1):
        print('working on Year '+str(yy))
        # in case the last month is NOT 12
        if (yy == Last_year):
            end_mm=Last_month 
            print('The last month is '+str(end_mm))
        # Second loop over Months
        #########################
        for mm in np.arange(1,end_mm+1):
            end_day=pd.Period(str(yy)+'-'+str(mm)).days_in_month
            # in case the last day is not at the end of the month.
            if ((yy == Last_year) & (mm == Last_month)):
                end_day=Last_day 
            # Third loop over Days
            #######################
            for dd in np.arange(1,end_day+1):
                print(str(yy)+'-'+str(mm)+'-'+str(dd))
                #list all days of the month and year.
                I=np.where((my_periods.year ==  yy) &\
                           (my_periods.month == mm) &\
                           (my_periods.day == dd  ))[0]
                
                print(I)
                # if there is a discontinuity in time.
                if len(I) == 0 :
                    print('Warning time shift here >>')
                    print('Check the continuity of your time sequence')
                    sys.exit()
                
                var_mean[k,...]=np.nanmean(my_var[I,...],0)
                k=k+1
    return var_mean,my_daily_time

python python-3.x pandas vectorization
1个回答
0
投票

我仍然不确定你的函数的输入变量是什么样的,但我将首先使用随机

pd.DataFrame
来说明这个概念:

import numpy as np
import pandas as pd

# Fill a dataframe with random integers
# and set a date range as the index
df = pd.DataFrame(
    np.random.randint(0, 100, size=(8785, 2)), 
    columns=['A', 'B'], 
    index=pd.date_range(start='2024-01-01', end='2025-01-01', freq='h')
)

df.head()
                    A   B
2024-01-01 00:00:00 5   87
2024-01-01 01:00:00 51  55
2024-01-01 02:00:00 87  40
2024-01-01 03:00:00 85  90
2024-01-01 04:00:00 38  52

现在获取每日平均值:

result = df.groupby(pd.Grouper(freq='D')).agg('mean')
result.head()


            A           B
2024-01-01  49.875000   51.750000
2024-01-02  42.708333   54.791667
2024-01-03  46.416667   43.833333
2024-01-04  58.541667   55.791667
2024-01-05  53.625000   44.125000



最新问题
© www.soinside.com 2019 - 2025. All rights reserved.