根据数据生成随机样本

问题描述 投票:0回答:1

我在数据集中有506个点。我必须从该数据生成随机样本,例如我必须选择303点而不进行替换,而剩下的203点我需要从这303点中进行选择。

我已经编写了以下代码。

def generating_samples(input_data, target_data):

    selected_rows = np.random.choice(len(input_data), 303)
    replacing_rows = np.random.choice(selected_rows,203)
    selected_columns = np.random.choice(3,13,1)
    sample_data = input_data[selected_rows[:,None],selected_columns]
    target_of_sample_data = target_data[selected_rows]

    #replicating data
    replicated_sample_data = sample_data[replacing_rows]
    target_of_replicated_sample_data = target_data[replacing_rows]

    #concatenating data
    sampled_input_data = np.vstack(sample_data, replicated_sample_data)
    target_of_sample_data = target_of_sample_data.reshape(-1,1)
    target_of_replicated_sample_data = target_of_replicated_sample_data.reshape(-1,1)
    sampled_target_data = np.vstack(target_of_sample_data,target_of_replicated_sample_data)

    return sampled_input_data , sampled_target_data, selected_rows,selected_columns



def grader_samples(a,b,c,d):
        length = (len(a)==506  and len(b)==506)
        sampled = (len(a)-len(set([str(i) for i in a]))==203)
        rows_length = (len(c)==303)
        column_length= (len(d)>=3)
        assert(length and sampled and rows_length and column_length)
        return True

a,b,c,d = generating_samples(x, y)
grader_samples(a,b,c,d)

但是正在发生以下错误。

IndexError                                Traceback (most recent call last)
<ipython-input-14-ca772632e834> in <module>
      7     return True
      8 
----> 9 a,b,c,d = generating_samples(x, y)
     10 grader_samples(a,b,c,d)

<ipython-input-13-bcf904f160e5> in generating_samples(input_data, target_data)
     13 
     14     #replicating data
---> 15     replicated_sample_data = sample_data[replacing_rows]
     16     target_of_replicated_sample_data = target_data[replacing_rows]
     17 

IndexError: index 391 is out of bounds for axis 0 with size 303
python-3.x error-handling sampling
1个回答
0
投票

使用:replicated_sample_data = input_data[replacing_rows],因为复制的样本数据来自原始数据集。并且样本数据已经从原始数据集中进行了采样,因此它是我们原始数据集的一个子集,并导致索引错误

© www.soinside.com 2019 - 2024. All rights reserved.