在R中自动获取复杂的标题

问题描述 投票:0回答:1

我想请求一个脚本来检测和合并(见下文)R中的标题行,当示例中有多行标题时。普遍的答案应该是: 1.确定标题行数(2到更多) 2.填充标题间隙(请参阅示例中的NA) 3.将所有标题行合并为一个。

我只能手动完成,见下文。对于包含任意行数的标头,这可能是可能的。

text1<-"NA      h_row1a NA      NA      NA      h_row1b NA      NA      NA
        NA      h_row2a NA      h_row2b NA      h_row2c NA      h_row2d NA
        NA      h_row3a h_row3b h_row3c h_row3d h_row3e h_row3f h_row3g h_row3h
element1        2       24%     25      40      23      44%     76      34
element2        3       26%     40      86      233     12%     55      12"
table1<-read.table(text=text1, skip=3,header=FALSE)
cat(text1, file = "ex.data")
header<-scan("ex.data", nlines = 1, what = character(), sep="", na.strings = "NA")
library(zoo)
header<-na.locf(header, na.rm=FALSE) # this fills the header gaps
header2 <- scan("ex.data", skip = 1, nlines = 1, what = character(), sep="", na.strings = "NA")
header2<-na.locf(header2, na.rm=FALSE)
header3 <- scan("ex.data", skip = 2, nlines = 1, what = character(), sep="", na.strings = "NA")
names(table1) <- paste0(header, header2, header3)
table1
#    NANANA h_row1ah_row2ah_row3a h_row1ah_row2ah_row3b h_row1ah_row2bh_row3c h_row1ah_row2bh_row3d h_row1bh_row2ch_row3e h_row1bh_row2ch_row3f, etc.
#1 element1                     2                   24%                    25                    40                    23                   44%, etc.
#2 element2                     3                   26%   , etc.
r algorithm
1个回答
1
投票

你可以这样做。它使用rle来查看有多少行无法强制到numeric,并假设这些是标题。我还把第一列设为rownames - 不确定你是否想要这个。您可能还希望在完成此过程后将剩余值转换为numeric - 此时它们仍然是character

tab <- read.table(text=text1, header=FALSE,stringsAsFactors = FALSE)
#estimate no of header rows
headrows <- rle(apply(tab,1,function(x)(any(!is.na(as.numeric(x))))))$lengths[1]
#fill in blanks in headers
tab[1:headrows,] <- t(apply(tab[1:headrows,],1,na.locf,na.rm=FALSE))
names(tab) <- apply(tab[1:headrows,],2,paste0,collapse="_")
tab <- tab[-c(1:headrows),] #remove header rows (now set as column names)
rownames(tab) <- tab[,1]
tab <- tab[,-1] #remove first column (now set as rownames)

tab
         h_row1a_h_row2a_h_row3a h_row1a_h_row2a_h_row3b h_row1a_h_row2b_h_row3c h_row1a_h_row2b_h_row3d
element1                       2                     24%                      25                      40
element2                       3                     26%                      40                      86
         h_row1b_h_row2c_h_row3e h_row1b_h_row2c_h_row3f h_row1b_h_row2d_h_row3g h_row1b_h_row2d_h_row3h
element1                      23                     44%                      76                      34
element2                     233                     12%                      55                      12
© www.soinside.com 2019 - 2024. All rights reserved.