在pandas中赋值随机函数

问题描述 投票:0回答:1

我试图根据类别值的值,随意分配第四个值(2种类型的伙伴中的1个)。

具有3个特征的随机分配值的小df:类别,年龄和性别

        Unique_ID   Category    Age      Sex        Buddy  
0       0           2           11       male       NaN
1       1           3           7        female     NaN
2       2           1           4        male       NaN
3       3           2           20       male       NaN
4       4           1           19       female     NaN

我包括生成df的代码,如果有帮助的话

我已经创建了一个函数来硬编码np.random.choice的概率,但在将assign_buddy函数应用于df时遇到错误消息:ValueError:Series的真值是不明确的。使用a.empty,a.bool(),a.item(),a.any()或a.all()。

columns = ['Unique_ID',  'Category', 'Age', 'Sex', 'Buddy']
df = pd.DataFrame(columns=columns)

Sexes = ['female', 'male']
df.Sex = np.random.choice(a=Sexes, size=n, p=[0.6, 0.4])

list_Category = [1,2,3,4]
df.Category = np.random.choice(a=list_category, size=n, p=[0.3, 0.4, 0.2, 0.1])

buddy_list = ['buddy_1', 'buddy_2']

def assign_buddy(Category_prob_list):
"""
takes in a Category value
return: Buddy
"""    
    if  df['Category'] == list_Category[0]:
        df['Buddy'] = np.random.choice(a=buddy_list, size=n, p=[0.1, 0.9])
        return df['Buddy']
    elif df['Category'] == list_Category[1]:
        df['Buddy'] = np.random.choice(a=buddy_list, size=n, p=[0.3, 0.7])
        return df['Buddy']
    elif df['Category'] == list_Category[2]:
        df['Buddy'] = np.random.choice(a=buddy_list, size=n, p=[0.7, 0.3])
        return df['Buddy']
    elif df['Category'] == list_Category[3]:
        df['Buddy'] = np.random.choice(a=buddy_list, size=n, p=[0.9, 0.1])
        return df['Buddy']
    else:
        pass
# should apply assign_buddy to each row in df
df['Category'].apply((assign_buddy))  

我有一个assign_buddy概率字典,但无法弄清楚地图并应用逻辑,尽管所有文档。

我已经尝试创建一个函数,它返回从d传递给np.random.choice中的参数p的概率,但它不起作用。

# key is category label and values are probabilities for np.random.choice
d = {1: [0.1, 0.9], 2: [0.3, 0.7], 3: [0.7, 0.3], 4: [0.9, 0.1]}

任何见解赞赏!

python pandas stochastic
1个回答
0
投票

试试这个

n = 20
columns = ['Unique_ID',  'Category', 'Age', 'Sex', 'Buddy']
df = pd.DataFrame(columns=columns)

list_category = [1,2,3,4]
buddy_list = ['buddy_1', 'buddy_2']
Sexes = ['female', 'male']
df.Sex = np.random.choice(a=Sexes, size=n, p=[0.6, 0.4])
df.Category = np.random.choice(list_category, size=n, p=[0.3, 0.4, 0.2, 0.1])

d = {1: [0.1, 0.9], 2: [0.3, 0.7], 3: [0.7, 0.3], 4: [0.9, 0.1]}

for val in list_category:
    sz = (df["Category"] == val).sum() # find the size for array to create
    # use `loc` to select places you want to replace
    df.loc[df["Category"] == val,'Buddy'] = np.random.choice(
                                               buddy_list, sz, p=d[val])
© www.soinside.com 2019 - 2024. All rights reserved.