自定义函数计算的均值不正确。无法定位代码中的问题

Question

我最近开始学习 R。我编写了一个名为“pollutantmean”的函数，用于计算指定监测器列表中污染物（硫酸盐或硝酸盐）的平均值。函数“pollutantmean”接受三个参数：“directory”、“pollutant”和“id”。给定矢量监视器 ID 号，“pollutantmean”从“目录”参数中指定的目录中读取监视器的颗粒物数据，并返回所有监视器中污染物的平均值，忽略编码为 NA 的任何缺失值。

我写了下面的函数，但是它计算的均值是错误的。我无法定位问题。

Pollutantmean <- function(directory, pollutant, id=1:332) {
  directory <- setwd("C:/Users/raza zaidi/Desktop/Data_Science/New_Rproject/specdata")
  
  # Created an empty data.frame
  databox <- data.frame()
  
  # Used loop function to read through all csv files
  for (data_files in list.files(pattern= "*.csv")) {
    allcsv <- read.csv(data_files, header = TRUE)
  }
  
  # I combined all csv files into single object
  combined_data <- rbind(databox, allcsv)
  pollutant <- combined_data[, c("sulfate", "nitrate")]
  subset_df <- subset(combined_data, select = c("ID"))
  
  # Filtered the rows of the data frame to include only the specified id values
  subset_df <- subset(subset_df, id %in% ID)
  total <- sum(pollutant, na.rm = TRUE)
  
  # Computed the mean of the pollutant values
  mean_value <- total / length(id)
  
  # Return the mean value
  return(mean_value)
}

如果我们运行以下函数

pollutantmean("specdata", "sulfate", 1:10)

答案应该如下

[1] 4.064128

但我越来越

[1] 2.3275

我无法找出我计算中的错误。谁能帮我找出问题所在？

如果我们运行以下函数

pollutantmean("specdata", "sulfate", 1:10)

答案应该如下

[1] 4.064128

但我越来越

[1] 2.3275

Answer 1

查看数据或最小的可重现示例将有助于检查您的“应该是”答案。

但是，从代码来看，问题很可能是由于

NA

值造成的。您在计算

sum

而不是

length

时忽略了它们

> x <- rnorm(10)
> x[c(1, 5)] <- NA_real_
> sum(x, na.rm = TRUE) / length(x)
[1] 0.8789583
> mean(x, na.rm = TRUE)
[1] 1.098698
> sum(x, na.rm = TRUE) / length(x[!is.na(x)])
[1] 1.098698

您的代码中还有其他几个问题：

```
directory
```
作为函数参数传递，但也在第一行被覆盖。
```
databox
```
是多余的。
```
allcsv
```
在循环的每次迭代中被覆盖。因此，它不会读取所有文件，而只会读取最后一个文件。（这也会影响你的预期结果）
```
sum(pollutant, na.rm = TRUE)
```
对具有两列的数据框求和。也许不是预期的行为。（这也会影响你的预期结果）

> df = data.frame(a = 1:3, b = 1:3)
> sum(df)
[1] 12
> colSums(df)
a b 
6 6

自定义函数计算的均值不正确。无法定位代码中的问题

问题描述投票：0回答：1

1个回答

最新问题

自定义函数计算的均值不正确。无法定位代码中的问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1