整理数据集 - 将子标题转换为列

Question

我收到了一个txt数据文件，如下所示：

# test A response c
1   x1     1
2   x2     0
..       ..
324 x324   5
# test P response 8
1   x1     2
2   x2     1
..       ..
501 x501   4
# test 7 response t
1   x1     2
2   x2     1
..       ..
936 x936   4

它有两个长列，由多个子标题分隔（例如“测试 A 响应 c”）。请注意，每个子标题下的行数都是可变的。总行数约为10,000。我想整理一下并删除所有副标题，如下所示：

x      y Test   Response  
x1     1 test A response c
x2     0 test A response c
..       ..
x324   5 test A response c
x1     2 test P response 8
x2     1 test P response 8
..       ..
x501   4 test P response 8
x1     2 test P response 8
x2     1 test P response 8
..       ..
x936   4 test P response 8

最好的方法是什么？

Answer 1

解决您问题的一种方法

library(tidyr)
library(dplyr)

df |> 
  mutate(Test = replace(x, !grepl("test", x), NA),
         Response = replace(y, !grepl("response", y), NA)) |> 
  fill(Test, Response) |> 
  filter(!grepl("test", x))

     x y   Test   Response
1   x1 1 test A response c
2   x2 0 test A response c
3 x324 5 test A response c
4   x1 2 test P response 8
5   x2 1 test P response 8
6 x501 4 test P response 8
7   x1 2 test 7 response t
8   x2 1 test 7 response t
9 x936 4 test 7 response t

数据

# read file
df = read.delim("file.txt", header=FALSE, , col.names=c("x", "y"))
# toy example:
df = read.delim(text="test A    response c
x1  1
x2  0
x324    5
test P  response 8
x1  2
x2  1
x501    4
test 7  response t
x1  2
x2  1
x936    4", header=FALSE, , col.names=c("x", "y"))

df
        x          y
1  test A response c
2      x1          1
3      x2          0
4    x324          5
5  test P response 8
6      x1          2
7      x2          1
8    x501          4
9  test 7 response t
10     x1          2
11     x2          1
12   x936          4

整理数据集 - 将子标题转换为列

问题描述投票：0回答：1

1个回答

数据

最新问题

整理数据集 - 将子标题转换为列

问题描述 投票：0回答：1

1个回答

数据

最新问题

问题描述投票：0回答：1