在一个目录中,有一些文件包:
cpu_server01.csv
cpu_server02.csv
cpu_server03.csv
等等
我可以读取文件的内容并将其附加到dflist,如下所示。但我需要在dflist中创建另一列并将文件名放在那里?
path("C:/Server/web/")
#cpu
filenames <- list.files(path, pattern="cpu_*", full.names=TRUE)
dflist <- lapply(filenames, function(i) {
read.csv(i, header=T)
})
我如何将文件的名称分别添加到每个文件中?
Date Cpu filename
这应该工作:
for(i in 1:length(dflist))
dflist[[i]]$file_name = filenames[i]
例:
filenames=c("a","b")
dflist = list(head(mtcars,3),head(mtcars,3))
for(i in 1:length(dflist))
dflist[[i]]$file_name = filenames[i]
输出:
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb file_name
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 a
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 a
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 a
[[2]]
mpg cyl disp hp drat wt qsec vs am gear carb file_name
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 b
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 b
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 b
除了Florian's answer,还有两种处理这种常见情况的替代方法。
将文件名复制为单个data.frames的列仅使IMHO感觉如果您计划将它们rbind()
成为一个大型数据对象(请参阅下面的示例)
如果要在列表中单独保留每个data.frame,可以适当地命名列表元素,例如,
path <- "."
# get vector of filenames, note that pattern includes the cvs extension
filenames <- list.files(path, pattern = "cpu_.*csv$", full.names = TRUE)
# read files as a list of data.frames
dflist <- lapply(filenames, read.csv, header = TRUE)
# rename list element using file names without path
names(dflist) <- basename(filenames)
请注意,在调用lapply()
时没有必要定义匿名函数,因为lapply()
将无法识别的参数传递给被调用函数。所以,我们可以简明扼要地写
lapply(filenames, read.csv, header = TRUE)
代替
lapply(filenames, function(i) read.csv(i, header = TRUE))
现在,dflist
已被恰当地命名
$cpu_server01.csv V1 V2 1 A 1001 2 B 1002 3 C 1003 $cpu_server02.csv V1 V2 1 A 2001 2 B 2002 3 C 2003 $cpu_server03.csv V1 V2 1 A 3001 2 B 3002 3 C 3003
如果目标是将所有数据块组合在一个大型数据对象中,则需要识别每行的原始源文件。
这可以通过Florian's approach和随后的rbinding来实现。或者,我们可以使用data.table
的rbindlist()
函数。
如果列表元素已经如上所述命名,我们可以简单地添加:
combi <- data.table::rbindlist(dflist, idcol = "file.name")
combi
file.name V1 V2 1: cpu_server01.csv A 1001 2: cpu_server01.csv B 1002 3: cpu_server01.csv C 1003 4: cpu_server02.csv A 2001 5: cpu_server02.csv B 2002 6: cpu_server02.csv C 2003 7: cpu_server03.csv A 3001 8: cpu_server03.csv B 3002 9: cpu_server03.csv C 3003
rbindlist()
创建了id列“file.name”,并使用列表元素的名称填充它。
或者,我们可以先调用rbindlist()
并添加文件名作为因子:
library(data.table)
path <- "."
# get vector of filenames, note that pattern includes the cvs extension
filenames <- list.files(path, pattern = "cpu_.*csv$", full.names = TRUE)
# read files as a list of data.frames and combine immediately
combi <- rbindlist(lapply(filenames, read.csv, header = TRUE), idcol = "file.name")
# change file number to appropriately labeled factor
combi[, file.name := factor(file.name, labels = basename(filenames))][]
file.name V1 V2 1: cpu_server01.csv A 1001 2: cpu_server01.csv B 1002 3: cpu_server01.csv C 1003 4: cpu_server02.csv A 2001 5: cpu_server02.csv B 2002 6: cpu_server02.csv C 2003 7: cpu_server03.csv A 3001 8: cpu_server03.csv B 3002 9: cpu_server03.csv C 3003
为了重现性,虚拟文件由创建
idx_vec <- 1:3
invisible(sapply(1:3, function(i) {
x <- data.frame(V1 = LETTERS[idx_vec], V2 = 1000L * i + idx_vec)
write.csv(x, sprintf("cpu_server%02i.csv", i), row.names = FALSE)
}))