如何从 r 中使用 lapply() 的分割数据框中删除多个部分？

Question

我创建了一个函数，在 lapply() 中使用一个已经分割的数据框。问题是，如果分割数据帧的某些条件缺失，则 lapply() 不会完全运行该函数。或者有没有办法通过 lapply() 删除包含 NA 的部分？另一个问题是，在使用 lapply() 代码之前我无法删除带有 NA 的部分，因为 lapply() 希望 split() 函数中的所有条件都存在，否则它会给出与存在 NA 部分相同的错误.

我尝试使用以下解决方案删除一个拆分部分，并且它有效：

数据结构：

Participant   visit  trial  tl    tr    br    bl    time  CoPx  CoPy  
1   006        FU1   lc     81    7     6     1      1    -0.8   -13      
2   006        FU1   lo     79    7     8     1      2    -0.7   -15      
3   006        FU1   rc     7     9     9     0      3    -0.7   -51  
4   006        FU1   ro     9     8     3     1      4    -0.4   -15  
5   006        FU2   lc     9     4     8     1      5    -0.5   -17  
6   006        FU2   lo     79    7     2     10     6    -0.5   -17  
7   006        FU2   rc     7     4     9     1      7    -0.3   -86  
8   006        FU2   ro     7     4     7     13     8    -0.8   -200  
9   009        FU1   lc     81    7     6     1      1    -0.8   -13  
10  009        FU1   lo     79    7     8     1      2    -0.7   -15  
11  009        FU1   rc     7     9     9     0      3    -0.7   -51  
12  009        FU1   ro     9     8     3     1      4    -0.4   -15  
13  009        FU2   lc     9     4     8     1      5    -0.5   -17  
14  009        FU2   lo     79    7     2     10     6    -0.5   -17  
15  009        FU2   rc     NA    NA    NA    NA     NA    NA     NA  
16  009        FU2   ro     7     4     7     13     8    -0.8   -200

代码：

`minifunc.area <- function(df){
# This mini function is designed to extract the center of the ellipse and the area of the 95%         confidence interval
# The code in this function comes from https://stackoverflow.com/questions/38782051/how-to-    calculate-the-area-of-ellipse-drawn-by-ggplot2

# remove NA's before running filter
#df <- drop_na(df)

# Plot object
p <- ggplot(df, aes(x = CoPx, y = CoPy))+
  geom_point()+
  stat_ellipse(level = 0.95) # 95% confidence interval

# Get ellipse coordinates from plot
pb <- ggplot_build(p)
pb <- as.data.frame(pb$data[[2]])
el <- pb[,1:2]

# Center of ellipse
ctr <- MASS::cov.trob(el)$center  


  # Calculate distance to center from each point on the ellipse
dist2center <- sqrt(rowSums((t(t(el) - ctr))^2))

# Calculate area of ellipse from semi-major and semi-minor axes. 
# These are, respectively, the largest and smallest values of dist2center. 
answer <- pi * min(dist2center) * max(dist2center)

} # end of mini function
  

splt.df <- split(df, list(df$trial, df$Participant))

`

输出示例：

$lo.FU2.006
# A tibble: 6,041 × 12
# Groups:   Participant, visit, trial [1]
  Participant visit trial    tl    tr    br    bl  time   CoPx   CoPy
  <chr>       <chr> <chr> <dbl> <dbl> <dbl> <dbl> <int>  <dbl>  <dbl>
1 006         FU2   lo     99.4  75.6  67.1  88.3     1 -0.678  -1.198
2 006         FU2   lo     67.5  98.3  44.8  85.6     2 -0.375  -0.375


$rc.FU2.009
# A tibble: 6,041 × 12
# Groups:   Participant, visit, trial [1]
  Participant visit trial    tl    tr    br    bl  time   CoPx   CoPy
  <chr>       <chr> <chr> <dbl> <dbl> <dbl> <dbl> <int>  <dbl>  <dbl>
1 009         FU2   rc       NA    NA    NA    NA    NA    NA    NA

# Figure out which df has missing data and define which row is missing
# removing duplicated trials per participant
temp.df <- df %>%
  distinct(Participant, trial)
# creating a new column with assenting numbers to count row number
temp.df$row.nr <- c(1:nrow(temp.df))

# identify which row has the missing value
save.na <- temp.df %>%
  dplyr::filter(Participant == p.nr & trial == trial.name)

na.row.nr <- unlist(save.na1[3]) # chr with the section number that needs to be removed from the split section (example: '15')

# apply the mini function to the splt data
splt <- lapply(splt.df[-na.row.nr], function(splt.df) minifunc.area(splt.df))

上面的方法有效，但是，现在我想做同样的事情，但缺少两个部分。

我已经尝试过以下尝试：

# Attempt 1)  
 na.row.nr <- unlist(save.na1[3])
 # where na.row.nr[1] is the first section to be removed and na.row.nr[2] is the second

# apply the mini function to the splt data
splt <- lapply(splt.df[-c(na.row.nr[1], na.row.nr[2])], function(splt.df) minifunc.area(splt.df))

 # Attempt 2) 
 na.row.nr <- unlist(save.na1[3])
 # where na.row.nr[1] is the first section to be removed and na.row.nr[2] is the second

# apply the mini function to the splt data
splt <- lapply(splt.df[-na.row.nr[1], -na.row.nr[2]], function(splt.df) minifunc.area(splt.df))

# Attempt 3) 
 na.row.nr <- unlist(save.na1[3])
 # where na.row.nr[1] is the first section to be removed and na.row.nr[2] is the second

 missing.rows <- c(na.row.nr[1], na.row.nr[2])

# apply the mini function to the splt data
splt <- lapply(splt.df[-na.row.nr[1] & -na.row.nr[2]], function(splt.df) minifunc.area(splt.df))

输出：

Error in `[.data.frame`(pb, , 1:2) : undefined columns selected
In addition: Warning message:
Removed 1 rows containing non-finite values (`stat_ellipse()`).

根据我的理解，这意味着 lapply() 无法遍历所有部分，并且一旦到达包含 NA 或不存在的部分就会停止

Answer 1

您可以使用

if()

在错误发生之前捕获错误，或者如评论中所述，使用

purrr

中的 safely()。

我认为椭圆至少需要 3 个点，所以你可以用

if()

来捕捉这种情况：

minifunc.area <- function(df){

  if(nrow(df)<3) {
    cat("Not enough rows in data to calculate an ellipse\n")
    return(NA_real_)
  }

如果没有足够的独特点，

stat_ellipse()

功能也可能会失败，因此您可以使用

if()

再次捕获：

  # Get ellipse coordinates from plot
  pb <- ggplot_build(p)
  pb <- as.data.frame(pb$data[[2]])

  if(nrow(pb)<3 | ncol(pb)<2) return(NA_real_)

这应该足以处理大多数情况。让我们尝试使用整个数据集。

minifunc.area(df)
[1] 1031.753
Warning message:
Removed 1 row containing non-finite outside the scale range (`stat_ellipse()`).

现在让我们按一些变量进行划分。我将在这里使用

sapply

，以便输出更适合屏幕。

按参与者划分：

splt.df <- split(df, df$Participant)
sapply(splt.df, minifunc.area)
       6        9 
291.9459 175.9905

按试验分割：

splt.df <- split(df, df$trial)
sapply(splt.df, minifunc.area)
      lc       lo       rc       ro 
12.24941       NA       NA       NA

按参与者和试验划分：

splt.df <- split(df, list(df$trial, df$Participant))
lapply(splt.df, minifunc.area)
lc.6 lo.6 rc.6 ro.6 lc.9 lo.9 rc.9 ro.9 
  NA   NA   NA   NA   NA   NA   NA   NA

如何从 r 中使用 lapply() 的分割数据框中删除多个部分？

问题描述投票：0回答：1

1个回答

最新问题

如何从 r 中使用 lapply() 的分割数据框中删除多个部分？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1