代码:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from apyori import apriori
dataset = [['egg','bread'],['milk'],['apple','milk'],['diapers'],['orange','egg','milk']]
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
final_df = pd.DataFrame(te_ary, columns=te.columns_)
print(final_df)
frq_itemsets= apriori(final_df, min_support=0.5, use_colnames=True)
association_results = list(frq_itemsets)
print(association_results)
输出:
apple bread china egg embroidery milk
0 False True False True False False
1 False False False False False True
2 True False False False False True
3 False False False False True False
4 False False True True False True
[RelationRecord(items=frozenset({'a'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'a'}), confidence=0.5, lift=1.0)]), RelationRecord(items=frozenset({'e'}), support=0.6666666666666666, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'e'}), confidence=0.6666666666666666, lift=1.0)]), RelationRecord(items=frozenset({'i'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'i'}), confidence=0.5, lift=1.0)])]
我做错了什么?我到处搜索过,但似乎找不到这样的问题。
提前致谢。我希望这不是一个愚蠢的问题。有人可以帮忙吗?
我相信
apriori
存在误用,具体取决于您从哪个包中获得它。看看下面的区别
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
dataset = [['egg','bread'],['milk'],['apple','milk'],
['diapers'],['orange','egg','milk']]
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
final_df = pd.DataFrame(te_ary, columns=te.columns_)
print(final_df)
from mlxtend.frequent_patterns import apriori
# this method returns a dataframe, no need to use a list
df_freq = apriori(final_df, min_support=0.5, use_colnames=True)
print(df_freq)
# support itemsets
# 0 0.6 (milk)
from apyori import apriori
# this method returns a generator hence the use of list to get the result
print(list(apriori(dataset, min_support=0.5, )))
# [RelationRecord(items=frozenset({'milk'}), support=0.6,
# ordered_statistics=[OrderedStatistic(items_base=frozenset(),
# items_add=frozenset({'milk'}),
# confidence=0.6, lift=1.0)])]
我也遇到了同样的问题!对我来说,解决方案是对 DF 进行一次性编码。简而言之,根据您的数据集,这意味着将其转换为列表。
df = df.astype(str)
str_df = df.values.tolist()
te_ary = te.fit(str_list).transform(str_list)
这为我解决了问题!