dataframe 相关问题

数据框是表格数据结构。通常,它包含数据,其中行是观察值,列是各种类型的变量。虽然“数据框架”或“数据框架”是这个概念用于多种语言的术语(R,Apache Spark,deedle,Maple,Python中的pandas库和Julia中的DataFrames库),“table”是用于的术语MATLAB和SQL。

如何映射具有多个参数(和更改数据)的函数?

我正在创建一个函数,为州-城市级别的数据框中的各个州创建 gt 表。我会经常更改数据以及 gt 中选定的列,因此我添加...

回答 1 投票 0

如何根据另一个列值指定带后缀的列名

#X 列包含 V* 列之一的后缀。 需要将 V(X) 中的值放入 Y 列中。 将 pandas 导入为 pd 将 numpy 导入为 np # 示例数据帧 df = pd.DataFrame({ 'EMPLID':[12,...

回答 1 投票 0

根据字符串条件将数据帧行扩展为多行

我有一些类似于下面的数据框的原始数据: df = pd.DataFrame([{'var1': '220-224(偶数)路名1', 'var2': '位置1', 'var3': '区域1'}, {'var1': '5 到 9 的站点(o...

回答 1 投票 0

将前一行中 2 个单元格的值添加到数据框中的当前行

我有一个如下所示的数据框 名称 值 =================== 一个2400 乙-400 C 400 d 600 我需要 df 采用以下格式 名称 Lower_V...

回答 2 投票 0

为连接两个Python数据框的资源分配订单

我有一个数据框问题。我有两个数据框。第一个数据帧包含订单详细信息,第二个数据帧包含线圈详细信息。我需要给线圈分配订单以满足订单的需求...

回答 1 投票 0

为什么决定让 data.table 和 tibble 与 R 中的 data.frame 不兼容? [已关闭]

我使用 R 已有多年,有时会遇到由于 data.table 和 tibble 与 data.frame 不兼容而导致的问题。这通常需要在它们之间执行转换...

回答 1 投票 0

Databricks DLT DataFrame - 如何使用带有注释的模式

Databricks DLT DataFrame - 如何使用模式 我是 Databricks Delta Live Tables 和 DataFrames 的新手,我对阅读时如何使用模式感到困惑 从溪流中。我正在逐桌做

回答 1 投票 0

从数据表的列中选择数据

比如说我的数据是: 年份 车门 气缸传动装置 2003 日产 4 4 吨 2006 日产 4 4 吨 2003 本田 2 6 AT 我要闪亮归还f...

回答 1 投票 0

使用 R 中的 ggplot 2 合并条形图中的条形

我有一个名为 df 的数据框,它有 3 个 Likert 量表级别列和一个过滤器列: df # 小标题:50 × 4 val1 val2 val3 var 我有一个名为 df 的数据框,它有 3 个 Likert 量表级别列和一个过滤器列: df # A tibble: 50 × 4 val1 val2 val3 var <chr> <chr> <chr> <chr> 1 "Very \n Dissatisfied" "Neutral" "Very \n Dissatisf… Yes 2 "Neutral" "Neutral" "Neutral" No 3 "Dissatisfied" "Satisfied" "Neutral" Yes 4 "Very \n Satisfied" "Satisfied" "Very \n Satisfied" Yes 5 "Very \n Dissatisfied" "Very \n Dissatisfied" "Neutral" Yes 6 "Very \n Satisfied" "Very \n Satisfied" "Very \n Satisfied" Yes 7 "Dissatisfied" "Neutral" "Dissatisfied" Yes 8 "Neutral" "Satisfied" "Neutral" Yes 9 "Satisfied" "Very \n Satisfied" "Satisfied" No 10 "Neutral" "Satisfied" "Neutral" Yes 上一个问题的结果函数这里 给了我所有相同值的条形图。这是正确的。!!我想要的只是不要重复 3 次(20,30 和 50),我想在右图的这一栏上重复一次。不是 3 次。 这可能吗? plot_fun <- function(x, y) { .data <- df |> filter(var %in% x) p1 <- .data |> ggstats::gglikert(include = -var) + aes(y = reorder(.question, ifelse( .answer %in% c("Very \n Dissatisfied", "Dissatisfied"), 1, 0 ), FUN = sum ), decreasing = TRUE) + facet_wrap(~paste0("var to ", y))+ scale_fill_manual(values = custom_colors) + theme( strip.text = element_text(size = 14,color = "black"), # Increase facet label size axis.title = element_text(size = 14), # Increase axis title size axis.text = element_text(size = 10))+ # Increase axis text size theme(strip.background = element_rect(color="black", fill="red", size=1.5, linetype="solid")) p2 <- .data %>% tidyr::pivot_longer(-var) |> filter(!is.na(value)) |> mutate( name = reorder(name, ifelse( value %in% c("Very \n Dissatisfied", "Dissatisfied"), 1, 0 ), FUN = sum ) ) |> ggplot(aes(y = name)) + geom_bar(fill = "lightgrey")+ theme_light()+ geom_text(aes(label = ..count..), stat = "count", position=position_stack(vjust = 0.5))+ theme( axis.text.y = element_blank(), axis.ticks.y = element_blank()) list(p1, p2) } .include <- list(No = "No", Yes = "Yes", All = c("Yes", "No")) purrr::imap(.include, plot_fun) |> purrr::reduce(c) |> wrap_plots(ncol = 2) + plot_layout(axes = "collect", guides = "collect", widths = c(.7, .3)) & labs(x = NULL, y = NULL) & theme(legend.position = "bottom") 数据 dput(df) structure(list(val1 = c("Very \n Dissatisfied", "Neutral", "Dissatisfied", "Very \n Satisfied", "Very \n Dissatisfied", "Very \n Satisfied", "Dissatisfied", "Neutral", "Satisfied", "Neutral", "Very \n Dissatisfied", "Very \n Satisfied", "Very \n Dissatisfied", "Satisfied", "Neutral", "Very \n Dissatisfied", "Neutral", "Neutral", "Satisfied", "Neutral", "Very \n Satisfied", "Dissatisfied", "Dissatisfied", "Satisfied", "Neutral", "Dissatisfied", "Satisfied", "Very \n Dissatisfied", "Dissatisfied", "Very \n Dissatisfied", "Very \n Dissatisfied", "Dissatisfied", "Dissatisfied", "Dissatisfied", "Neutral", "Dissatisfied", "Dissatisfied", "Very \n Dissatisfied", "Satisfied", "Satisfied", "Neutral", "Very \n Dissatisfied", "Very \n Satisfied", "Very \n Dissatisfied", "Satisfied", "Very \n Dissatisfied", "Very \n Dissatisfied", "Satisfied", "Dissatisfied", "Dissatisfied"), val2 = c("Neutral", "Neutral", "Satisfied", "Satisfied", "Very \n Dissatisfied", "Very \n Satisfied", "Neutral", "Satisfied", "Very \n Satisfied", "Satisfied", "Very \n Dissatisfied", "Very \n Satisfied", "Satisfied", "Very \n Satisfied", "Satisfied", "Neutral", "Dissatisfied", "Satisfied", "Neutral", "Satisfied", "Satisfied", "Neutral", "Very \n Satisfied", "Very \n Satisfied", "Satisfied", "Satisfied", "Very \n Satisfied", "Satisfied", "Neutral", "Neutral", "Neutral", "Neutral", "Neutral", "Satisfied", "Satisfied", "Dissatisfied", "Neutral", "Satisfied", "Very \n Satisfied", "Satisfied", "Satisfied", "Very \n Dissatisfied", "Satisfied", "Neutral", "Satisfied", "Very \n Dissatisfied", "Neutral", "Satisfied", "Neutral", "Satisfied" ), val3 = c("Very \n Dissatisfied", "Neutral", "Neutral", "Very \n Satisfied", "Neutral", "Very \n Satisfied", "Dissatisfied", "Neutral", "Satisfied", "Neutral", "Very \n Dissatisfied", "Very \n Satisfied", "Very \n Dissatisfied", "Satisfied", "Neutral", "Very \n Dissatisfied", "Satisfied", "Neutral", "Satisfied", "Neutral", "Very \n Satisfied", "Neutral", "Satisfied", "Satisfied", "Neutral", "Dissatisfied", "Satisfied", "Very \n Satisfied", "Neutral", "Very \n Dissatisfied", "Very \n Dissatisfied", "Dissatisfied", "Satisfied", "Dissatisfied", "Dissatisfied", "Very \n Dissatisfied", "Dissatisfied", "Very \n Dissatisfied", "Satisfied", "Satisfied", "Neutral", "Very \n Dissatisfied", "Very \n Satisfied", "Very \n Dissatisfied", "Satisfied", "Very \n Dissatisfied", "Dissatisfied", "Satisfied", "Neutral", "Dissatisfied"), var = c("Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "No", "No", "Yes", "No", "No", "No", "No", "No", "Yes", "No", "No", "Yes", "No", "No", "No", "Yes", "No", "No", "Yes", "No", "No", "No", "No", "No", "Yes", "No", "No", "No", "Yes", "No", "No", "Yes", "Yes", "No", "Yes", "Yes", "No", "No", "No", "Yes" )), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame" )) likert_levels <- c( "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree" ) 只需拆下枢轴部分即可: library(tidyverse) library(patchwork) likert_levels <- c( "Very \n Dissatisfied", "Dissatisfied", "Neutral", "Satisfied", "Very \n Satisfied" ) plot_fun <- function(x, y) { .data <- df |> filter(var %in% x) |> mutate( across(-var, ~ factor(.x, likert_levels)) ) p1 <- .data |> ggstats::gglikert(include = -var) + aes(y = reorder(.question, ifelse( .answer %in% c("Very \n Dissatisfied", "Dissatisfied"), 1, 0 ), FUN = sum ), decreasing = TRUE) + facet_wrap(~ paste0("var to ", y)) + # scale_fill_manual(values = custom_colors) + theme( strip.text = element_text(size = 14, color = "black"), # Increase facet label size axis.title = element_text(size = 14), # Increase axis title size axis.text = element_text(size = 10) ) + # Increase axis text size theme(strip.background = element_rect(color = "black", fill = "red", size = 1.5, linetype = "solid")) p2 <- .data %>% count() |> ggplot(aes(y = factor(1), x = n)) + geom_col(fill = "lightgrey") + theme_light() + geom_text(aes(label = n), position = position_stack(vjust = 0.5) ) + theme( axis.text.y = element_blank(), axis.ticks.y = element_blank() ) list(p1, p2) } .include <- list(No = "No", Yes = "Yes", All = c("Yes", "No")) purrr::imap(.include, plot_fun) |> purrr::reduce(c) |> wrap_plots(ncol = 2) + plot_layout(guides = "collect", widths = c(.7, .3)) & labs(x = NULL, y = NULL) & theme(legend.position = "bottom")

回答 1 投票 0

如何创建基于pandas(python)中其他2个DataFrame的最小值的DataFrame?

假设我有 DataFrame df1 和 df2: >>> df1 = pd.DataFrame({'A': [0, 2, 4], 'B': [2, 17, 7], 'C': [4, 9, 11]}) >>> df1 ABC 0 0 2 4 1 2 17 9 2 4 7 11 >...

回答 1 投票 0

根据日期持续时间绘制关卡花费的时间 pandas python

我有这个数据集,其中包含给定时间内问题发生情况的日志。我想标记每个状态,表明它在那段时间达到了什么水平。我在 python 上做的...

回答 1 投票 0

多个数据帧并尝试在它们上使用 PCA

我有三个数据帧(不同的变量),我试图在 python 中运行 PCA。它们的尺寸为: df1 = 17行×60212列(17是模型名称,60212是数据) df2 =...

回答 1 投票 0

Pyspark 中最多两列

这应该很简单,但我还是没有找到方法。我必须计算一个新列,其值为列 col1 和 col2 的最大值。所以如果 col1 是 2 并且 col2 是 4,则 new_col 应该有 4....

回答 1 投票 0

使用 .add(axis=1) 添加两个带有 + 的数据框列会产生 NaN,而使用 .add(axis=1) 会按预期工作吗?

我有一个数据框(此处输出:https://pastebin.com/7RCPsHet;可以使用 pd.DataFrame.from_dict(orient='tight') 读取),其中包含我想要总计的两列。它们看起来像: 分层...

回答 1 投票 0

使用 R 从长格式纵向数据转换表

这个问题是关于如何使用R基本函数或dplyr等常用包从长格式的纵向数据生成频率跃迁表。考虑经度...

回答 1 投票 0

Pandas 对股票重新采样 5 分钟数据未对齐

我有一些库存 5 分钟数据,如下所示: 日期 开盘价 最高价 最低价 收盘量 0 2024-11-19 09:35:00 11.75 11.79 11.55 11.78 32673600 1 2024-11-19 09:40:00 11.78 11.81 ...

回答 1 投票 0

如何从 R 中的多重响应模态创建频率变量?

我正在R中开发一个数据库,这是一份调查问卷的结果。我对某些变量有疑问。有些问题很好,并且被分配给一个变量,例如

回答 1 投票 0

显示以屏幕宽度环绕的值向量

我有一个值向量,每个值都与一个名称相关联;矢量的长度根据用户输入而变化。虽然我使用了与表格相关的命令,但我想知道其他方式来显示它......

回答 1 投票 0

如何从包含 Python 元组列表的 DataFrame 列中过滤和提取特定的 POS 标签?

我正在使用 Python 中的 DataFrame,其中有一列名为“POS_TAGS”。此列中的每个条目都是一个元组列表,其中每个元组包含一个单词及其词性 (POS) 标记。这是一个

回答 1 投票 0

根据字符串中的组合值集过滤行

在 R 中,我有以下数据框,其中“重叠”列列出了在其他列上具有重叠值的行。 df <- data.frame(overlap = c("1,2,3", "1,2,3&

回答 1 投票 0

© www.soinside.com 2019 - 2024. All rights reserved.