Python 3生成器函数返回相同的值

问题描述 投票:0回答:1

我正在尝试构建一个批处理生成器,该生成器将大的Pandas DataFrame作为输入并输出给定​​数量的行(batch_size)。我一直在尝试对10行的较小数据框进行操作。我在使用生成器功能时遇到了麻烦。下面的for循环在练习数据帧上运行良好,并吐出了指定的批处理大小:

for i in range(0, len(df), 3):
lower = i
upper = i+3
print(df.iloc[lower:upper])

但是,很难将其构建到生成器函数中:

def Generator(batch_size, seed = None):
num_items = len(df)
x = df.sample(frac = 1, replace = False, random_state = seed)
for offset in range(0, num_items, batch_size):
    lower_limit = offset
    upper_limit = offset+batch_size
    batch = x.iloc[lower_limit:upper_limit]
    yield batch

不幸的是:

next(Generator(e.g.1))

一遍又一遍地返回同一行

我对使用此工具还很陌生,我觉得我一定很想念某些东西,但是,我无法发现什么。如果有人能指出问题所在,我将不胜感激。

编辑:数据框是预定义的,它是:

raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Sarah', 'Gueniva', 'Know', 'Sara', 'Cat'], 
    'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Mornig', 'Jaker', 'Alom', 'Ormon', 'Koozer'], 
    'age': [42, 52, 36, 24, 73, 53, 26, 72, 73, 24], 
    'preTestScore': [4, 24, 31, 2, 3, 13, 52, 72, 26, 26],
    'postTestScore': [25, 94, 57, 62, 70, 82, 52, 56, 234, 254]}

df = pd.DataFrame(raw_data,columns = ['first_name','last_name','age','preTestScore','postTestScore'])df

python-3.x pandas debugging generator yield
1个回答
0
投票

根据调用Generator的结果创建一个迭代器,并next()调用该迭代器。否则,您为生成器重新创建新的生成器“状态”,如果提供了种子,则它们可能具有相同的“第一行”。

解决了缩进问题后,它应能正常工作:

import pandas as pd

# I dislike variable scope bleeding into the function, provide df explicitly
def Generator(df, batch_size, seed = None):
    num_items = len(df)
    x = df.sample(frac = 1, replace = False, random_state = seed)
    for offset in range(0, num_items, batch_size):
        lower_limit = offset
        upper_limit = offset+batch_size
        batch = x.iloc[lower_limit:upper_limit]
        yield batch


raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Sarah', 
                           'Gueniva', 'Know', 'Sara', 'Cat'], 
    'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Mornig', 
                  'Jaker', 'Alom', 'Ormon', 'Koozer'], 
    'age': [42, 52, 36, 24, 73, 53, 26, 72, 73, 24], 
    'preTestScore': [4, 24, 31, 2, 3, 13, 52, 72, 26, 26],
    'postTestScore': [25, 94, 57, 62, 70, 82, 52, 56, 234, 254]}

df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 
                                       'preTestScore', 'postTestScore'])


# capture a "state" for the generator function
i = iter(Generator(df, 2)) 

# get the next states from the iterator and print
print(next(i))
print(next(i))
print(next(i))

输出:

  first_name last_name  age  preTestScore  postTestScore
8       Sara     Ormon   73            26            234
6    Gueniva     Jaker   26            52             52


  first_name last_name  age  preTestScore  postTestScore
5      Sarah    Mornig   53            13             82
9        Cat    Koozer   24            26            254

  first_name last_name  age  preTestScore  postTestScore
1      Molly  Jacobson   52            24             94
2       Tina       Ali   36            31             57

如果您这样做

print(next(Generator(df, 2)))    
print(next(Generator(df, 2)))
print(next(Generator(df, 2)))

您创建了三个单独的混洗的df,它们可能会显示相同的行,因为您只打印了它的第一个“迭代”,然后就将其丢弃了]]

© www.soinside.com 2019 - 2024. All rights reserved.