在带有交错列名称的数据文件中读取到r

Question

* +---------------- Station Code
* |    +----------- Schedule Arrival Day
* |    |  +-------- Schedule Arrival Time
* |    |  |     +----- Schedule Departure Day
* |    |  |     |  +-- Schedule Departure Time
* |    |  |     |  |     +------------- Actual Arrival Time
* |    |  |     |  |     |     +------- Actual Departure Time
* |    |  |     |  |     |     |     +- Comments
* V    V  V     V  V     V     V     V
* NOL  *  *     1  900A  *     900A  Departed:  On time.
* SCH  *  *     1  1030A *     1039A Departed:  9 minutes late.
* NIB  *  *     1  1156A *     1159A Departed:  3 minutes late.
* LFT  *  *     1  1224P *     1228P Departed:  4 minutes late.
* LCH  *  *     1  155P  *     155P  Departed:  On time.

我如何在不同行上使用几个不同的列名来阅读r？

？

这不是一个完全容易的任务，但是我们可能会像@Tim G所建议的那样做：

Answer 1

# 1. prep work
## read data as lines of text
data_lines <- readLines("data.txt")
## identify which line starts with "* V"
v_line_index <- which(grepl("^\\* V", data_lines))

# 2. generate vector of column names
## from line 1 until the line before the v_line_index
data_col_name_lines <- data_lines[1:(v_line_index - 1)]
## remove any special characters to retain column names
col_names <- gsub("[\\* \\+\\|-]", "", data_col_name_lines)
## and prepend an initial column name
col_names <- c("star", col_names)

# 3. determine column widths:
## split the v_line_index line into individual characters
v_line_index_chars <- strsplit(data_lines[v_line_index], "")[[1]]
## determine which columns have either a "*" or "V"
v_line_v_indexes <- which(v_line_index_chars %in% c("*", "V"))
## calculate column widths by taking differences and appending a long last width
widths = c(diff(v_line_v_indexes), 100)

# 4. finally read the data with `read.fwf()`
read.fwf(
  "data.txt", 
  widths = widths,
  header = FALSE, 
  skip = 9,
  col.names = col_names
)

在带有交错列名称的数据文件中读取到r

问题描述投票：0回答：1

1个回答

最新问题

在带有交错列名称的数据文件中读取到r

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1