使用 purrr 将变量列表映射到函数时出现问题

Question

我正在使用 unvotes 软件包，目标是“使用

purrr

功能，找出从 2000 年开始平均哪三个国家最同意美国的观点”。我首先创建了下面这个函数来计算从 x 年起任意两个国家之间的协议率：

votes_agreement_calculator <- function(country_code1, country_code2, year_min) {
library(unvotes)
data("un_votes")
data("un_roll_calls")
un_votes2 <-  un_votes %>%
  inner_join(un_roll_calls, by = "rcid")
un_votes2$year <- format(un_votes2$date, "%Y")
subset <- un_votes2 %>% 
  dplyr::select(year, unres, country_code, vote) %>% 
  filter(country_code == country_code1 | country_code == country_code2,
                year >= year_min) %>%
  group_by(unres) %>% 
  pivot_wider(names_from = country_code,
              values_from = vote,
              values_fn = first) %>% 
filter(!is.na(!!sym(country_code1)) & !is.na(!!sym(country_code2))) %>% 
  #!!sym() so that R recognizes the inputs as columns 
    group_by(year) %>% 
    summarise(total = n(),
              agree = sum(!!sym(country_code1) == !!sym(country_code2), na.rm = TRUE),  
              share = agree / total) 
return(mean(subset$share, na.rm = TRUE))
}

我首先遇到了一些问题，因为在 filter() 和 sum() 中它不会将我的输入识别为列，ChaGPT 建议放置 !!sym() 以便识别列。我测试了上面的功能，效果很好。然后我创建了一个新函数，以便它只需要一个国家作为参数：

new_function <- function(country_code) {votes_agreement_calculator(country_code, "US", 2000)
}

#checking if the results are the same as before
new_function("RU") #yes they are

这个函数也工作得很好，然后我使用 dplyr 按国家/地区代码对数据进行分组，并使用 purrr 将国家/地区代码列传递到函数中

#passing list of country codes as a vector into the new function 
votes_US_agreement <- un_votes2 %>%
  group_by(country_code) %>% 
  mutate(agreement_rate = purrr::map_vec(country_code, new_function)) %>%
  arrange(desc(agreement_rate)) %>% #arrange them in in descending order 
  slice_head(n = 3) #check the first three with the most agreement rates out

但是 R 不会给我输出，只是连续运行上面的代码几个小时，没有错误消息或类似的东西。

如果有人指出我的代码有什么问题，我将非常感激:)

我期望将变量列表映射到函数中应该可以正常工作，因为为函数提供单独的参数可以正常工作，但它没有......相反，它一直不停地运行......

Answer 1

我在您发布的代码中发现了三个问题。

您正在函数顶部加载
```
unvotes
```
包及其数据。这应该在函数之外完成，所以这段代码只被执行一次。此外，您在函数中进行的一些数据转换对于任何值都具有相同的结果你用作参数。这些步骤也应该在函数之外完成。
ChatGPT 为您提供了一个可行但已弃用的解决方案，用于在使用数据屏蔽的 dplyr 动词中的变量字符串中使用
```
characters
```
。而不是
```
!!sym(x)
```
它是推荐使用
```
.data[[x]]
```
。阅读使用 dplyr 进行编程以了解更多信息。请注意，这种类型的行为特定于
```
dplyr
```
和
```
tidyverse
```
，而不是一般/基本 R 事物。
当您迭代
```
country_code
```
数据框中的
```
un_votes2
```
列时，每次出现在
```
un_votes2
```
中时，您都会对某个国家/地区重复相同的计算。相反，您只想每个国家/地区执行一次。此外，您在函数中选择的转换步骤对于性能来说并不是最好的。在我的经验是，对于此类比较任务，
```
dplyr::*_join()
```
操作的表现比
```
tidyr::pivot_wider()
```
要好得多。如果我们重新思考具有连接操作的函数我们可以实现一个函数，而不是运行函数对于每个国家/地区组合，您可以为每个国家/地区（即“美国”）运行一次，并让它与所有其他国家/地区进行比较。

下面是解决我发现的问题的示例。请务必检查计算并确保它们正在执行你在期待什么。

# Packages and data are loaded outside of function.
library(unvotes)
library(tidyverse)
data("un_votes")
data("un_roll_calls")

# Data transformations that are necessary for all further steps are done outside of the function.
un_votes2 <-
  un_votes %>%
  inner_join(un_roll_calls, by = "rcid") %>%
  mutate(year = year(date)) %>%
  dplyr::select(year, unres, country_code, vote)


# The function takes only one country code as input and then returns the agreement share with all other countries in the specified time frame.
votes_agreement_calculator <- function(country_code1, year_min) {
  target_subset <-
    un_votes2 %>%
    filter(
      country_code == country_code1,
      year >= year_min,
      !is.na(country_code)
    )

  non_target_subset <-
    un_votes2 %>%
    filter(
      country_code != country_code1,
      year >= year_min,
      !is.na(country_code)
    )

  left_join(
    non_target_subset,
    target_subset,
    by = c("year", "unres"),
    relationship = "many-to-many"
  ) %>%
    distinct() %>%
    mutate(agree = vote.x == vote.y) %>%
    group_by(country_code.x) %>%
    summarise(agree_share = sum(agree, na.rm = TRUE) / n())
}

我们现在可以一次针对一个国家/地区运行该功能。为了回答您的研究问题，我们使用“US”并按协议份额降序排序。

votes_agreement_calculator(
  country_code1 = "US",
  year_min = 2000
) %>%
  arrange(desc(agree_share))
#> # A tibble: 192 × 2
#>    country_code.x agree_share
#>    <chr>                <dbl>
#>  1 IL                   0.772
#>  2 FM                   0.688
#>  3 PW                   0.641
#>  4 MH                   0.617
#>  5 CA                   0.521
#>  6 GB                   0.492
#>  7 AU                   0.452
#>  8 FR                   0.448
#>  9 MC                   0.401
#> 10 CZ                   0.401
#> # ℹ 182 more rows

或所有国家：

un_votes %>%
  pull(country_code) %>%
  # we are using `unique()` to make sure that each country code is used only once
  unique() %>%
  # `set_names()` applies the country code values as names to itself, this is useful for map
  # to produce named results and `list_rbind()` being able to identify the target country.
  set_names() %>%
  map(votes_agreement_calculator, year_min = 2000) %>%
  list_rbind(names_to = "target_country")
#> # A tibble: 37,828 × 3
#>    target_country country_code.x agree_share
#>    <chr>          <chr>                <dbl>
#>  1 US             AD                   0.354
#>  2 US             AE                   0.148
#>  3 US             AF                   0.169
#>  4 US             AG                   0.168
#>  5 US             AL                   0.394
#>  6 US             AM                   0.197
#>  7 US             AO                   0.148
#>  8 US             AR                   0.224
#>  9 US             AT                   0.346
#> 10 US             AU                   0.452
#> # ℹ 37,818 more rows

使用 purrr 将变量列表映射到函数时出现问题

问题描述投票：0回答：1

1个回答

最新问题

使用 purrr 将变量列表映射到函数时出现问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1