当R中的read.csv时，检查表的标题符合预期

Question

我正在尝试在我的R脚本中插入一个检查步骤，以确定我正在读取的CSV表的结构是否符合预期。查看详细信息：table.csv具有以下名称：[1]“A”，“B”，“C”，“D”

此文件由其他人生成，因此我想确保在我的脚本开头时，列号和列的数量/顺序没有变化。

我试着做以下事情：

    #dataframes to import
    df_table <- read.csv('table.csv')

    #define correct structure of file
    Correct_Columns <- c('A','B','C','D')
    #read current structure of table
    Current_Columns <- colnames(df_table)

    #Check whether CSV was correctly imported from Source
    if(Current_Columns != Correct_Columns)

    {
    # if structure has changed, stop the script. 
    stop('Imported CSV has a different structure, please review export from Source.')
    } 
    #if not, continue with the rest of the script...

在此先感谢您的帮助！

Answer 1

使用基数R，我建议你看看all.equal()，identical()或any()。

请参阅以下示例：

a <- c(1,2)
b <- c(1,2)
c <- c(1,2)
d <- c(1,2)
df <- data.frame(a,b,c,d)

names.df <- colnames(df)
names.check <- c("a","b","c","d")

!all.equal(names.df,names.check)
# [1] FALSE

!identical(names.df,names.check)
# [1] FALSE

any(names.df!=names.check)
# [1] FALSE

下面，您的代码可以修改如下：

if(!all.equal(Current_Columns,Correct_Columns))
{
# call your stop statement here
}

您的代码可能会发出警告，因为Current_Columns!=Correct_Columns将比较向量的所有条目（即，在控制台上自行运行Current_Columns!=Correct_Columns将返回具有TRUE / FALSE值的向量）。

相反，all.equal()或identical()将比较整个向量，同时将它们视为对象。

为了完整起见，请注意all.equal()和identical()之间的细微差别。在你的情况下，你使用哪一个并不重要，但在处理数字向量时它会变得很重要。有关更多信息，请参阅here。

Answer 2

data.table的快速方法：

library(data.table)
DT <- fread("table.csv")
Correct_Columns <- c('A','B','C','D')
Current_Columns <- colnames(df_table)

检查成对匹配中是否存在错误：

if(F %in% Current_Columns == Correct_Columns){
  stop('Imported CSV has a different structure, please review export from Source.')
}

}

当R中的read.csv时，检查表的标题符合预期

问题描述投票：0回答：2

2个回答

最新问题

当R中的read.csv时，检查表的标题符合预期

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2