如何在 R 中将足球比赛数据帧转换为具有单独行的主队和客队的长格式

问题描述 投票:0回答:1

我在 R 中有一个 DataFrame,其中包含以下列:

  • season:比赛的季节(例如“2015/2016”)
  • stage:比赛的阶段或回合(例如,1 表示第 1 轮)
  • home_team_api_id:主队ID
  • away_team_api_id:客队ID
  • home_team_goal:主队进球数
  • away_team_goal:客队进球数
  • match_api_id:匹配的ID(每个匹配的唯一标识符 匹配)

这是我的数据示例:

  df <- data.frame(
  season = c("2015/2016", "2015/2016"),
  stage = c(1, 1),
  home_team_api_id = c(1, 2),
  away_team_api_id = c(2, 1),
  home_team_goal = c(3, 2),
  away_team_goal = c(1, 3),
  match_api_id = c(101, 102)
)

我想将此 DataFrame 转换为长格式,其中每场比赛都有两行:一行为主队,一行为主队。转换后的 DataFrame 中应包含以下列:

  • match_api_id:匹配的 ID(对于 主队和客队)
  • team_api_id:球队ID(主队或客队)
  • opponent_team_api_id:对手球队的ID(客队或客队) 主队,取决于行)
  • 进球数:球队进球数
  • goals_conceded:球队失球数
  • is_home:一个布尔列,指示球队是否在主场比赛 (正确)或离开(错误)

期望的输出: 对于示例输入数据,所需的输出将如下所示:

     season stage match_api_id team_api_id opponent_team_api_id goals goals_conceded is_home
1  2015/2016     1           101           1                   2     3              1    TRUE
2  2015/2016     1           101           2                   1     1              3   FALSE
3  2015/2016     1           102           2                   1     2              3    TRUE
4  2015/2016     1           102           1                   2     3              1   FALSE

这是我迄今为止尝试过的:

df_long <- df %>%
  pivot_longer(cols = c(home_team_api_id, away_team_api_id), 
               names_to = "team_type", 
               values_to = "team_api_id") %>%
  mutate(
    is_home = ifelse(team_type == "home_team_api_id", TRUE, FALSE),
    goals = ifelse(is_home, home_team_goal, away_team_goal),
    goals_conceded = ifelse(is_home, away_team_goal, home_team_goal)
  ) %>%
  select(match_api_id, season, stage, team_api_id, goals, goals_conceded, is_home)

# opponent_team_api_id basierend auf match_api_id anhängen
df_long <- df_long %>%
  left_join(df %>%
              select(match_api_id, home_team_api_id, away_team_api_id),
            by = "match_api_id") %>%
  mutate(
    opponent_team_api_id = ifelse(is_home, away_team_api_id, home_team_api_id)
  ) %>%
  select(-home_team_api_id, -away_team_api_id)

这是我的结果:

match_api_id    season  stage   team_api_id goals   goals_conceded  is_home
1   2015/2016   1   1   1   3   TRUE
2   2015/2016   1   2   1   3   FALSE
3   2015/2016   1   2   2   3   TRUE
4   2015/2016   1   1   3   2   FALSE

使它变得困难并将其与这个问题分开的是,我想一次应用pivot_longer两次。我想要更长的目标和 teamID

如何在 R 中实现这种转变?任何帮助将不胜感激!

谢谢!

r dplyr tidyr data-transform
1个回答
0
投票

我会

pivot_longer
分成 4 行,然后
pivot_wider
回到 2 行。

main <- df |>
  pivot_longer(cols = c(home_team_api_id, away_team_api_id, home_team_goal, away_team_goal)) |>
  separate_wider_delim(name, delim = "_team_", names = c("is_home", "var")) |>
  pivot_wider(names_from = var, values_from = value)

结果将是:

  season    stage match_api_id is_home  api_id  goal
  <chr>     <dbl>        <dbl>   <chr>  <dbl> <dbl>
1 2015/2016     1          101    home       1     3
2 2015/2016     1          101    away       2     1
3 2015/2016     1          102    home       2     2
4 2015/2016     1          102    away       1     3

如果绝对有必要保留有关对手的信息(这似乎是多余的),请复制另一个数据集并将其合并回去。

sub <- main |>
  mutate(is_home = ifelse(is_home == "home", "away", "home")) |>
  rename_with(~ paste0("opponent_", .x), api_id:goal)

complete <- left_join(main, sub) |>
  mutate(is_home = ifelse(is_home == "home", TRUE, FALSE))

结果将是:

  season    stage match_api_id is_home api_id  goal opponent_api_id opponent_goal
  <chr>     <dbl>        <dbl> <lgl>    <dbl> <dbl>           <dbl>         <dbl>
1 2015/2016     1          101 TRUE         1     3               2             1
2 2015/2016     1          101 FALSE        2     1               1             3
3 2015/2016     1          102 TRUE         2     2               1             3
4 2015/2016     1          102 FALSE        1     3               2             2
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.