我有一个客户和交易的表。有没有办法如何让将被过滤去年3/6/9/12个月的特点?我想自动生成功能:
我已经使用training_window =["1 month", "3 months"],
尝试,但它似乎并没有返回多个功能,每个窗口。
例:
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
window_features = ft.dfs(entityset=es,
target_entity="customers",
training_window=["1 hour", "1 day"],
features_only = True)
window_features
我一定要单独做的各个窗口,然后合并结果?
至于你提到的在Featuretools 0.2.1你必须单独建立特征矩阵的每个培训窗口,然后合并结果。随着你的榜样,你会做到这一点,如下所示:
import pandas as pd
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
cutoff_times = pd.DataFrame({"customer_id": [1, 2, 3, 4, 5],
"time": pd.date_range('2014-01-01 01:41:50', periods=5, freq='25min')})
features = ft.dfs(entityset=es,
target_entity="customers",
agg_primitives=['count'],
trans_primitives=[],
features_only = True)
fm_1 = ft.calculate_feature_matrix(features,
entityset=es,
cutoff_time=cutoff_times,
training_window='1h',
verbose=True)
fm_2 = ft.calculate_feature_matrix(features,
entityset=es,
cutoff_time=cutoff_times,
training_window='1d',
verbose=True)
new_df = fm_1.reset_index()
new_df = new_df.merge(fm_2.reset_index(), on="customer_id", suffixes=("_1h", "_1d"))
然后,新的数据帧的样子:
customer_id COUNT(sessions)_1h COUNT(transactions)_1h COUNT(sessions)_1d COUNT(transactions)_1d
1 1 17 3 43
2 3 36 3 36
3 0 0 1 25
4 0 0 0 0
5 1 15 2 29