如何从 r 中使用 lapply() 的分割数据框中删除多个部分?

问题描述 投票:0回答:1

我创建了一个函数,在 lapply() 中使用一个已经分割的数据框。问题是,如果分割数据帧的某些条件缺失,则 lapply() 不会完全运行该函数。或者有没有办法通过 lapply() 删除包含 NA 的部分? 另一个问题是,在使用 lapply() 代码之前我无法删除带有 NA 的部分,因为 lapply() 希望 split() 函数中的所有条件都存在,否则它会给出与存在 NA 部分相同的错误.

我尝试使用以下解决方案删除一个拆分部分,并且它有效:

数据结构:

Participant   visit  trial  tl    tr    br    bl    time  CoPx  CoPy  
1   006        FU1   lc     81    7     6     1      1    -0.8   -13      
2   006        FU1   lo     79    7     8     1      2    -0.7   -15      
3   006        FU1   rc     7     9     9     0      3    -0.7   -51  
4   006        FU1   ro     9     8     3     1      4    -0.4   -15  
5   006        FU2   lc     9     4     8     1      5    -0.5   -17  
6   006        FU2   lo     79    7     2     10     6    -0.5   -17  
7   006        FU2   rc     7     4     9     1      7    -0.3   -86  
8   006        FU2   ro     7     4     7     13     8    -0.8   -200  
9   009        FU1   lc     81    7     6     1      1    -0.8   -13  
10  009        FU1   lo     79    7     8     1      2    -0.7   -15  
11  009        FU1   rc     7     9     9     0      3    -0.7   -51  
12  009        FU1   ro     9     8     3     1      4    -0.4   -15  
13  009        FU2   lc     9     4     8     1      5    -0.5   -17  
14  009        FU2   lo     79    7     2     10     6    -0.5   -17  
15  009        FU2   rc     NA    NA    NA    NA     NA    NA     NA  
16  009        FU2   ro     7     4     7     13     8    -0.8   -200

代码:

`minifunc.area <- function(df){
# This mini function is designed to extract the center of the ellipse and the area of the 95%         confidence interval
# The code in this function comes from https://stackoverflow.com/questions/38782051/how-to-    calculate-the-area-of-ellipse-drawn-by-ggplot2

# remove NA's before running filter
#df <- drop_na(df)

# Plot object
p <- ggplot(df, aes(x = CoPx, y = CoPy))+
  geom_point()+
  stat_ellipse(level = 0.95) # 95% confidence interval

# Get ellipse coordinates from plot
pb <- ggplot_build(p)
pb <- as.data.frame(pb$data[[2]])
el <- pb[,1:2]

# Center of ellipse
ctr <- MASS::cov.trob(el)$center  


  # Calculate distance to center from each point on the ellipse
dist2center <- sqrt(rowSums((t(t(el) - ctr))^2))

# Calculate area of ellipse from semi-major and semi-minor axes. 
# These are, respectively, the largest and smallest values of dist2center. 
answer <- pi * min(dist2center) * max(dist2center)

} # end of mini function
  

splt.df <- split(df, list(df$trial, df$Participant))

`

输出示例:

$lo.FU2.006
# A tibble: 6,041 × 12
# Groups:   Participant, visit, trial [1]
  Participant visit trial    tl    tr    br    bl  time   CoPx   CoPy
  <chr>       <chr> <chr> <dbl> <dbl> <dbl> <dbl> <int>  <dbl>  <dbl>
1 006         FU2   lo     99.4  75.6  67.1  88.3     1 -0.678  -1.198
2 006         FU2   lo     67.5  98.3  44.8  85.6     2 -0.375  -0.375


$rc.FU2.009
# A tibble: 6,041 × 12
# Groups:   Participant, visit, trial [1]
  Participant visit trial    tl    tr    br    bl  time   CoPx   CoPy
  <chr>       <chr> <chr> <dbl> <dbl> <dbl> <dbl> <int>  <dbl>  <dbl>
1 009         FU2   rc       NA    NA    NA    NA    NA    NA    NA
# Figure out which df has missing data and define which row is missing
# removing duplicated trials per participant
temp.df <- df %>%
  distinct(Participant, trial)
# creating a new column with assenting numbers to count row number
temp.df$row.nr <- c(1:nrow(temp.df))

# identify which row has the missing value
save.na <- temp.df %>%
  dplyr::filter(Participant == p.nr & trial == trial.name)

na.row.nr <- unlist(save.na1[3]) # chr with the section number that needs to be removed from the split section (example: '15')

# apply the mini function to the splt data
splt <- lapply(splt.df[-na.row.nr], function(splt.df) minifunc.area(splt.df))

上面的方法有效,但是,现在我想做同样的事情,但缺少两个部分。

我已经尝试过以下尝试:

# Attempt 1)  
 na.row.nr <- unlist(save.na1[3])
 # where na.row.nr[1] is the first section to be removed and na.row.nr[2] is the second

# apply the mini function to the splt data
splt <- lapply(splt.df[-c(na.row.nr[1], na.row.nr[2])], function(splt.df) minifunc.area(splt.df))

 # Attempt 2) 
 na.row.nr <- unlist(save.na1[3])
 # where na.row.nr[1] is the first section to be removed and na.row.nr[2] is the second

# apply the mini function to the splt data
splt <- lapply(splt.df[-na.row.nr[1], -na.row.nr[2]], function(splt.df) minifunc.area(splt.df))

# Attempt 3) 
 na.row.nr <- unlist(save.na1[3])
 # where na.row.nr[1] is the first section to be removed and na.row.nr[2] is the second

 missing.rows <- c(na.row.nr[1], na.row.nr[2])

# apply the mini function to the splt data
splt <- lapply(splt.df[-na.row.nr[1] & -na.row.nr[2]], function(splt.df) minifunc.area(splt.df))

输出:

Error in `[.data.frame`(pb, , 1:2) : undefined columns selected
In addition: Warning message:
Removed 1 rows containing non-finite values (`stat_ellipse()`). 

根据我的理解,这意味着 lapply() 无法遍历所有部分,并且一旦到达包含 NA 或不存在的部分就会停止

r function split lapply
1个回答
0
投票

您可以使用

if()
在错误发生之前捕获错误,或者如评论中所述,使用
purrr
中的 safely()

我认为椭圆至少需要 3 个点,所以你可以用

if()
来捕捉这种情况:

minifunc.area <- function(df){

  if(nrow(df)<3) {
    cat("Not enough rows in data to calculate an ellipse\n")
    return(NA_real_)
  }

如果没有足够的独特点,

stat_ellipse()
功能也可能会失败,因此您可以使用
if()
再次捕获:

  # Get ellipse coordinates from plot
  pb <- ggplot_build(p)
  pb <- as.data.frame(pb$data[[2]])

  if(nrow(pb)<3 | ncol(pb)<2) return(NA_real_)

这应该足以处理大多数情况。让我们尝试使用整个数据集。

minifunc.area(df)
[1] 1031.753
Warning message:
Removed 1 row containing non-finite outside the scale range (`stat_ellipse()`).

现在让我们按一些变量进行划分。我将在这里使用

sapply
,以便输出更适合屏幕。

按参与者划分:

splt.df <- split(df, df$Participant)
sapply(splt.df, minifunc.area)
       6        9 
291.9459 175.9905

按试验分割:

splt.df <- split(df, df$trial)
sapply(splt.df, minifunc.area)
      lc       lo       rc       ro 
12.24941       NA       NA       NA

按参与者和试验划分:

splt.df <- split(df, list(df$trial, df$Participant))
lapply(splt.df, minifunc.area)
lc.6 lo.6 rc.6 ro.6 lc.9 lo.9 rc.9 ro.9 
  NA   NA   NA   NA   NA   NA   NA   NA 
© www.soinside.com 2019 - 2024. All rights reserved.