样本数据集:
ID 1 2 3 X Y Z
0 1 2 1 2 3 3 4
1 2 1 3 1 4 3 4
2 3 2 2 1 2 4 3
3 4 3 2 1 2 3 3
4 5 1 2 2 1 3 2
5 6 2 3 2 4 4 2
cross1 = pd.crosstab(sample["1"], org1_df["X"])
cross2 = pd.crosstab(sample["2"], org1_df["X"])
cross3 = pd.crosstab(sample["3"], org1_df["X"])
cross4 = pd.crosstab(sample["1"], org1_df["Y"])
cross5 = pd.crosstab(sample["2"], org1_df["Y"])
cross6 = pd.crosstab(sample["3"], org1_df["Y"])
cross7 = pd.crosstab(sample["1"], org1_df["Z"])
etc.
我想循环执行此代码,用新列(“列 2”和“列 Y”)替换“列 1”和“列 X”,以生成新的交叉表并将该交叉表分配给新的数据框。手动操作一次即可,非常简单。这按类别(在本例中为业务类型)提供了调查问题的答案计数。
1 = Large Business
2 = Small Business
3 = Non-profit
cross1 = pd.crosstab(sample["1"], sample["X"])
print(cross1)
X 1 2 3 4
1
1 1 0 0 1
2 0 1 1 1
3 0 1 0 0
我需要迭代,所以我有多个数据框:
交叉1 交叉2 交叉3 交叉4 ...等等
demo_questions =
['1', '2', '3']
survey_questions =
['X', 'Y', 'Z']
for d, s in [demo_questions, survey_questions]:
cross[d] = pd.crosstab(sample[d], sample[s])
我尝试了上述方法,但收到以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[37], line 1
----> 1 for d, s in [demo_questions, survey_questions]:
2 cross[d] = pd.crosstab(sample[d], sample[s])
ValueError: too many values to unpack (expected 2)
创建一个字典来存储数据透视表,然后迭代演示问题和调查问题的组合,并在字典理解中生成频率表
cross = {
f'{d}_{s}':
pd.crosstab(df[d], df[s])
for d in demo_questions
for s in survey_questions
}
现在您可以通过索引字典来访问结果
print(cross['1_X'])
X 1 2 3 4
1
1 1 0 0 1
2 0 1 1 1
3 0 1 0 0