
问题描述 投票:1回答:2

我提到这个线程,以Create table with all pairs of values from one column in R, counting unique valuesTable of Interactions - Case with pets and houses了解如何创建双向交互表。我怎么能在所有可能的情况下这样做?此外,我想在这些箱子(组合)中找到发生频率和收入。


   Customer      Product Revenue
1         A         Rice      10
2         A Sweet Potato       2
3         A       Walnut       4
4         B         Rice       3
5         B       Walnut       2
6         C       Walnut       3
7         C Sweet Potato       4
8         D         Rice       3
9         E Sweet Potato       4
10        F       Walnut       7
11        G         Rice       2
12        G Sweet Potato       3
13        H Sweet Potato       4
14        H       Walnut       6
15        I         Rice       2

DFI <- structure(list(Customer = c("A", "A", "A", "B", "B", "C", "C", 
"D", "E", "F", "G", "G", "H", "H", "I"), Product = c("Rice", 
"Sweet Potato", "Walnut", "Rice", "Walnut", "Walnut", "Sweet Potato", 
"Rice", "Sweet Potato", "Walnut", "Rice", "Sweet Potato", "Sweet Potato", 
"Walnut", "Rice"), Revenue = c(10, 2, 4, 3, 2, 3, 4, 3, 4, 7, 
2, 3, 4, 6, 2)), .Names = c("Customer", "Product", "Revenue"), row.names = c(NA, 
15L), class = "data.frame")

这是生成产品Sweet Potato RiceWalnut的所有组合的代码:

  combn, x = unique(DFI$Product), simplify = FALSE))

[1] "Rice"

[1] "Sweet Potato"

[1] "Walnut"

[1] "Rice"         "Sweet Potato"

[1] "Rice"   "Walnut"

[1] "Sweet Potato" "Walnut"      

[1] "Rice"         "Sweet Potato" "Walnut"      


  Combination Frequency
1           R         2
2           S         1
3           W         1
4         R,S         1
5         S,W         2
6         R,W         1
7       R,S,W         1

DFOUTa <- structure(list(Combination = c("R", "S", "W", "R,S", "S,W", "R,W", 
"R,S,W"), Frequency = c(2, 1, 1, 1, 2, 1, 1)), .Names = c("Combination", 
"Frequency"), row.names = c(NA, 7L), class = "data.frame")


  Combination Revenue
1           R       5
2           S       4
3           W       7
4         R,S       5
5         S,W      17
6         R,W       5
7       R,S,W      16

DFOUTb <- structure(list(Combination = c("R", "S", "W", "R,S", "S,W", "R,W", 
"R,S,W"), Revenue = c(5, 4, 7, 5, 17, 5, 16)), .Names = c("Combination", 
"Revenue"), row.names = c(NA, 7L), class = "data.frame")



PS:为了简洁起见,我在输出文件中分别将产品名称RiceSweet PotatoWalnut缩短为RSW

r dplyr data.table

这应该可以获得频率和收入 - 我假设您希望将每个客户的订单组合成一个组合:

require(data.table); setDT(DFI)

  ][, .(Combination= paste(Product, collapse=", "), Revenue = sum(Revenue)) , by=.(Customer)
  ][, .(.N, Revenue= sum(Revenue)), by=.(Combination)]

                  Combination N Revenue
1: Rice, Sweet Potato, Walnut 1      16
2:               Rice, Walnut 1       5
3:                       Rice 2       5
4:         Rice, Sweet Potato 1       5
5:       Sweet Potato, Walnut 2      17
6:               Sweet Potato 1       4
7:                     Walnut 1       7




# spin off product table, assign abbreviations
prodDF = DFI[, .(Product = unique(Product))][, prod := substr(Product, 1, 1)]
DFI[prodDF, on=.(Product), prod := i.prod]

# spin off customer table, assign their bundles and revenues
custDF = DFI[order(prod), .(Bundle = toString(prod)), keyby=Customer]    
custDF[DFI[, sum(Revenue), by=.(Customer)], rev := i.V1]

# aggregate from customers to bundles
res = custDF[, .(.N, rev = sum(rev)), keyby=Bundle]

# clean up extra columns
DFI[, prod := NULL]


    Bundle N rev
1:       R 2   5
2:    R, S 1   5
3: R, S, W 1  16
4:    R, W 1   5
5:       S 1   4
6:    S, W 2  17
7:       W 1   7

这与@ Mako的答案非常相似,但......

  1. 我的两个汇总在汇总收入时都使用?GForce,而Mako在客户层面的收入总和却没有。
  2. 这样就可以留下客户表,您可以检查或合并其他客户属性(如果有的话);和产品表同上。


© www.soinside.com 2019 - 2024. All rights reserved.