* +---------------- Station Code
* | +----------- Schedule Arrival Day
* | | +-------- Schedule Arrival Time
* | | | +----- Schedule Departure Day
* | | | | +-- Schedule Departure Time
* | | | | | +------------- Actual Arrival Time
* | | | | | | +------- Actual Departure Time
* | | | | | | | +- Comments
* V V V V V V V V
* NOL * * 1 900A * 900A Departed: On time.
* SCH * * 1 1030A * 1039A Departed: 9 minutes late.
* NIB * * 1 1156A * 1159A Departed: 3 minutes late.
* LFT * * 1 1224P * 1228P Departed: 4 minutes late.
* LCH * * 1 155P * 155P Departed: On time.
我如何在不同行上使用几个不同的列名来阅读r?
?
这不是一个完全容易的任务,但是我们可能会像@Tim G所建议的那样做:
# 1. prep work
## read data as lines of text
data_lines <- readLines("data.txt")
## identify which line starts with "* V"
v_line_index <- which(grepl("^\\* V", data_lines))
# 2. generate vector of column names
## from line 1 until the line before the v_line_index
data_col_name_lines <- data_lines[1:(v_line_index - 1)]
## remove any special characters to retain column names
col_names <- gsub("[\\* \\+\\|-]", "", data_col_name_lines)
## and prepend an initial column name
col_names <- c("star", col_names)
# 3. determine column widths:
## split the v_line_index line into individual characters
v_line_index_chars <- strsplit(data_lines[v_line_index], "")[[1]]
## determine which columns have either a "*" or "V"
v_line_v_indexes <- which(v_line_index_chars %in% c("*", "V"))
## calculate column widths by taking differences and appending a long last width
widths = c(diff(v_line_v_indexes), 100)
# 4. finally read the data with `read.fwf()`
read.fwf(
"data.txt",
widths = widths,
header = FALSE,
skip = 9,
col.names = col_names
)