错误：在 R 中使用函数 group_by() 和 summarise() 时，为什么 dplyr 没有读取数据框中的因子水平

Question

问题：

我有一个大数据框（388 x 729），我正在尝试计算每个月（超过 14 年）的平均值，这是使用名为“Daffodil_Bulbs”的数字列的一个因素。

我创建了一个向量，以便月份以正确的顺序输出，但是当我使用包 dplyr 运行我的 R 代码时，它没有读取月份“July”，并将其替换为“NA”（请参阅 R代码输出如下）。

我检查了我的数据框，没有 NA 或缺失值

有人知道如何解决这个问题吗？

R 代码：

#Create a vector so the months are in the right order 
month_levels = c('January', 'February', 'March', 'April', 'May', 'June', 'July',
                 'August', 'September', 'October', 'November', 'December')

#Use dplyr to subset the data to find the average group size per month 
Df_Average_Month <- MyDf %>% dplyr::mutate(Month=ordered(Month, levels=month_levels)) %>%
                                    group_by(Month) %>%
                                    summarise(Average_Daffodiles = mean(Daffodile_Bulbs, na.rm = TRUE))

月份向量的输出

> month_levels = c('January', 'February', 'March', 'April', 'May', 'June', 'July',
+                  'August', 'September', 'October', 'November', 'December')

数据帧结构

$ Month                              : Factor w/ 18 levels "April","April ",..: 9 8 8 8 8 8 8 8 8 1 ...
$ Daffodil Bulbs                     : num  0 3 0 3 2 1 0 0 0 0 ...

R 代码输出

# A tibble: 12 × 2
   Month     Average_Daffodils
   <ord>                  <dbl>
 1 January                11.4 
 2 February               11.3 
 3 March                  12.4 
 4 April                   8.67
 5 May                    12.6 
 6 June                   12.5 
 7 August                  9.67
 8 September              12.7 
 9 October                 9.92
10 November                9.19
11 December               10.8 
12 NA                     16.3

Answer 1

看起来

dplyr

可能会跳过您的组中没有相应数据的因子级别。请务必检查数据集中是否包含所有级别。考虑使用

droplevels()

清理任何未使用的因子级别。另外，检查可能影响您分组的

NA

值。

错误：在 R 中使用函数 group_by() 和 summarise() 时，为什么 dplyr 没有读取数据框中的因子水平

问题描述投票：0回答：1

1个回答

最新问题

错误：在 R 中使用函数 group_by() 和 summarise() 时，为什么 dplyr 没有读取数据框中的因子水平

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1