我收到了一个txt数据文件,如下所示:
# test A response c
1 x1 1
2 x2 0
.. ..
324 x324 5
# test P response 8
1 x1 2
2 x2 1
.. ..
501 x501 4
# test 7 response t
1 x1 2
2 x2 1
.. ..
936 x936 4
它有两个长列,由多个子标题分隔(例如“测试 A 响应 c”)。请注意,每个子标题下的行数都是可变的。总行数约为10,000。我想整理一下并删除所有副标题,如下所示:
x y Test Response
x1 1 test A response c
x2 0 test A response c
.. ..
x324 5 test A response c
x1 2 test P response 8
x2 1 test P response 8
.. ..
x501 4 test P response 8
x1 2 test P response 8
x2 1 test P response 8
.. ..
x936 4 test P response 8
最好的方法是什么?
解决您问题的一种方法
library(tidyr)
library(dplyr)
df |>
mutate(Test = replace(x, !grepl("test", x), NA),
Response = replace(y, !grepl("response", y), NA)) |>
fill(Test, Response) |>
filter(!grepl("test", x))
x y Test Response
1 x1 1 test A response c
2 x2 0 test A response c
3 x324 5 test A response c
4 x1 2 test P response 8
5 x2 1 test P response 8
6 x501 4 test P response 8
7 x1 2 test 7 response t
8 x2 1 test 7 response t
9 x936 4 test 7 response t
# read file
df = read.delim("file.txt", header=FALSE, , col.names=c("x", "y"))
# toy example:
df = read.delim(text="test A response c
x1 1
x2 0
x324 5
test P response 8
x1 2
x2 1
x501 4
test 7 response t
x1 2
x2 1
x936 4", header=FALSE, , col.names=c("x", "y"))
df
x y
1 test A response c
2 x1 1
3 x2 0
4 x324 5
5 test P response 8
6 x1 2
7 x2 1
8 x501 4
9 test 7 response t
10 x1 2
11 x2 1
12 x936 4