这是一个使用关键字yield的函数。
我想从函数中获取真实数据。
我怎样才能做到这一点?
"""
# function to reshape features into (samples, time steps, features)
Only sequences that meet the window-length are considered, no padding is used.
This means for testing we need to drop those which are below the window-length.
An alternative would be to pad sequences so that we can use shorter ones
"""
def gen_sequence(samples, seq_length, seq_cols):
# for one id I put all the rows in a single matrix
data_matrix = samples[seq_cols].values
num_elements = data_matrix.shape[0]
# Iterate over two lists in parallel.
# For example id1 have 192 rows and sequence_length is equal to 50
# so zip iterate over two following list of numbers (0,112),(50,192)
# 0 50 -> from row 0 to row 50
# 1 51 -> from row 1 to row 51
# 2 52 -> from row 2 to row 52
# ...
# 111 191 -> from row 111 to 191
for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
yield data_matrix[start:stop, :]
这就是我做的,但我只得到一份[]的清单
# samples, seq_length, seq_cols
# generator for the sequences
seq_gen = []
for serial_number in hdd['serial_number'].unique():
temp = gen_sequence(hdd[hdd['serial_number']==serial_number], sequence_length, sequence_cols)
print(type(temp))
seq_gen.append(list(temp))
# print(seq_gen)
dataframe hdd的例子
date serial_number ... smart_197_raw smart_198_raw
15 2018-01-01 S30075JX ... 0 0
509 2018-01-02 S30075JX ... 0 0
1000 2018-01-03 S30075JX ... 0 0
1488 2018-01-04 S30075JX ... 0 0
1975 2018-01-05 S30075JX ... 0 0
[5行x 16列]
hdd.columns:
'date','capacity_bytes','serial_number','model','failure','smart_5_raw','smart_197_raw','smart_187_raw',
'smart_7_raw','smart_1_raw','smart_3_raw','smart_9_raw','smart_194_raw','smart_189_raw',
'smart_188_raw','smart_198_raw'
temp_samples = hdd[hdd['serial_number']==serial_number]
。
print(temp_samples.shape)
的结果是这样的:
(90, 16)
(90, 16)
(2, 16)
(90, 16)
(90, 16)
(90, 16)
(61, 16)
(89, 16)
(90, 16)
(89, 16)
(89, 16)
(13, 16)
(40, 16)
(36, 16)
(90, 16)
(90, 16)
(32, 16)
(90, 16)
(90, 16)
(68, 16)
(90, 16)
(57, 16)
(7, 16)
(4, 16)
(90, 16)
(90, 16)
(27, 16)
(90, 16)
(90, 16)
(50, 16)
(35, 16)
(90, 16)
(89, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(22, 16)
(49, 16)
(90, 16)
(90, 16)
(90, 16)
(88, 16)
(90, 16)
(90, 16)
(88, 16)
(44, 16)
(90, 16)
(90, 16)
(90, 16)
(89, 16)
(90, 16)
(90, 16)
(16, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(86, 16)
(90, 16)
(24, 16)
(76, 16)
(36, 16)
(90, 16)
(83, 16)
(66, 16)
(50, 16)
(90, 16)
(90, 16)
(90, 16)
(73, 16)
(90, 16)
(52, 16)
(3, 16)
(90, 16)
(6, 16)
(23, 16)
(43, 16)
(42, 16)
(52, 16)
(25, 16)
(20, 16)
(11, 16)
(52, 16)
(83, 16)
(8, 16)
(34, 16)
(90, 16)
(64, 16)
(52, 16)
(90, 16)
(52, 16)
(71, 16)
(90, 16)
(28, 16)
(37, 16)
(15, 16)
(88, 16)
(90, 16)
(90, 16)
(80, 16)
(90, 16)
(26, 16)
(90, 16)
(89, 16)
(90, 16)
(90, 16)
(90, 16)
(3, 16)
(90, 16)
(90, 16)
(82, 16)
(90, 16)
(37, 16)
(90, 16)
(90, 16)
(90, 16)
(68, 16)
(10, 16)
(12, 16)
(90, 16)
(16, 16)
(1, 16)
(43, 16)
(1, 16)
(7, 16)
seq_cols的res:
['smart_187_raw', 'failure', 'smart_5_raw', 'smart_197_raw', 'smart_194_raw', 'capacity_bytes', 'smart_7_raw', 'smart_3_raw', 'smart_189_raw', 'smart_198_raw', 'smart_9_raw', 'smart_188_raw', 'smart_1_raw']
seq_length的值是90
如果要从生成器获取完整数据(不是通过它迭代值),可以将其转换为列表。
改变这一行:
temp = gen_sequence(hdd[hdd['serial_number']==serial_number], sequence_length, sequence_cols)
对此:
temp = list(gen_sequence(hdd[hdd['serial_number']==serial_number], sequence_length, sequence_cols))