来自 R 的我正在重新做一些对我帮助很大的练习。所以尝试重新创建这个 R 代码:
wide_data <- read_csv('https://raw.githubusercontent.com/rafalab/dslabs/master/inst/extdata/life-expectancy-and-fertility-two-countries-example.csv')
new_tidy_data <- pivot_longer(wide_data, `1960`:`2015`, names_to = "year", values_to = "fertility")
数据看起来像这样(我不知道如何粘贴输出) 但有 113 列:首先是国家/地区,然后是 1960_fertility 1960_life_expectancy 1961_fertility 1961_life_expectancy ..... 2015_fertility 2015_life_expectancy
还有2排德国、韩国
预期结果:
head(new_tidy_data)
#> # A tibble: 6 × 3
#> country year fertility
#> <chr> <chr> <dbl>
#> 1 Germany 1960 2.41
#> 2 Germany 1961 2.44
#> 3 Germany 1962 2.47
#> 4 Germany 1963 2.49
#> 5 Germany 1964 2.49
#> # ℹ 1 more row
到目前为止,我的代码如下所示:
import polars as pl
import polars.selectors as cs
df = pl.read_csv('https://raw.githubusercontent.com/rafalab/dslabs/master/inst/extdata/life-expectancy-and-fertility-two-countries-example.csv')
df.pivot() # This is where not even chat gpt helped me
谢谢!!
仅使用 pandas 库,我想出了以下解决方案:
import pandas as pd
wide_data = pd.read_csv('https://raw.githubusercontent.com/rafalab/dslabs/master/inst/extdata/life-expectancy-and-fertility-two-countries-example.csv')
new_tidy_data = wide_data.melt(id_vars='country', var_name='year', value_name='fertility')
# Check if the underscore is present before splitting
new_tidy_data['year'], new_tidy_data['metric'] = zip(*new_tidy_data['year'].apply(lambda x: x.split('_') if '_' in x else (x, None)))
# Filter only rows related to fertility
new_tidy_data = new_tidy_data[new_tidy_data['metric'] == 'fertility']
# Print the data
print(new_tidy_data)