在带有交错列名称的数据文件中读取到r

问题描述 投票:0回答:1
* +---------------- Station Code * | +----------- Schedule Arrival Day * | | +-------- Schedule Arrival Time * | | | +----- Schedule Departure Day * | | | | +-- Schedule Departure Time * | | | | | +------------- Actual Arrival Time * | | | | | | +------- Actual Departure Time * | | | | | | | +- Comments * V V V V V V V V * NOL * * 1 900A * 900A Departed: On time. * SCH * * 1 1030A * 1039A Departed: 9 minutes late. * NIB * * 1 1156A * 1159A Departed: 3 minutes late. * LFT * * 1 1224P * 1228P Departed: 4 minutes late. * LCH * * 1 155P * 155P Departed: On time.

我如何在不同行上使用几个不同的列名来阅读r?

这不是一个完全容易的任务,但是我们可能会像@Tim G所建议的那样做:
r
1个回答
0
投票
# 1. prep work ## read data as lines of text data_lines <- readLines("data.txt") ## identify which line starts with "* V" v_line_index <- which(grepl("^\\* V", data_lines)) # 2. generate vector of column names ## from line 1 until the line before the v_line_index data_col_name_lines <- data_lines[1:(v_line_index - 1)] ## remove any special characters to retain column names col_names <- gsub("[\\* \\+\\|-]", "", data_col_name_lines) ## and prepend an initial column name col_names <- c("star", col_names) # 3. determine column widths: ## split the v_line_index line into individual characters v_line_index_chars <- strsplit(data_lines[v_line_index], "")[[1]] ## determine which columns have either a "*" or "V" v_line_v_indexes <- which(v_line_index_chars %in% c("*", "V")) ## calculate column widths by taking differences and appending a long last width widths = c(diff(v_line_v_indexes), 100) # 4. finally read the data with `read.fwf()` read.fwf( "data.txt", widths = widths, header = FALSE, skip = 9, col.names = col_names )


最新问题
© www.soinside.com 2019 - 2025. All rights reserved.