整理数据集 - 将子标题转换为列

问题描述 投票:0回答:1

我收到了一个txt数据文件,如下所示:

# test A response c
1   x1     1
2   x2     0
..       ..
324 x324   5
# test P response 8
1   x1     2
2   x2     1
..       ..
501 x501   4
# test 7 response t
1   x1     2
2   x2     1
..       ..
936 x936   4

它有两个长列,由多个子标题分隔(例如“测试 A 响应 c”)。请注意,每个子标题下的行数都是可变的。总行数约为10,000。我想整理一下并删除所有副标题,如下所示:

x      y Test   Response  
x1     1 test A response c
x2     0 test A response c
..       ..
x324   5 test A response c
x1     2 test P response 8
x2     1 test P response 8
..       ..
x501   4 test P response 8
x1     2 test P response 8
x2     1 test P response 8
..       ..
x936   4 test P response 8

最好的方法是什么?

r tidyr
1个回答
0
投票

解决您问题的一种方法

library(tidyr)
library(dplyr)

df |> 
  mutate(Test = replace(x, !grepl("test", x), NA),
         Response = replace(y, !grepl("response", y), NA)) |> 
  fill(Test, Response) |> 
  filter(!grepl("test", x))

     x y   Test   Response
1   x1 1 test A response c
2   x2 0 test A response c
3 x324 5 test A response c
4   x1 2 test P response 8
5   x2 1 test P response 8
6 x501 4 test P response 8
7   x1 2 test 7 response t
8   x2 1 test 7 response t
9 x936 4 test 7 response t
数据
# read file
df = read.delim("file.txt", header=FALSE, , col.names=c("x", "y"))
# toy example:
df = read.delim(text="test A    response c
x1  1
x2  0
x324    5
test P  response 8
x1  2
x2  1
x501    4
test 7  response t
x1  2
x2  1
x936    4", header=FALSE, , col.names=c("x", "y"))

df
        x          y
1  test A response c
2      x1          1
3      x2          0
4    x324          5
5  test P response 8
6      x1          2
7      x2          1
8    x501          4
9  test 7 response t
10     x1          2
11     x2          1
12   x936          4
© www.soinside.com 2019 - 2024. All rights reserved.