数据框是表格数据结构。通常,它包含数据,其中行是观察值,列是各种类型的变量。虽然“数据框架”或“数据框架”是这个概念用于多种语言的术语(R,Apache Spark,deedle,Maple,Python中的pandas库和Julia中的DataFrames库),“table”是用于的术语MATLAB和SQL。
我正在创建一个函数,为州-城市级别的数据框中的各个州创建 gt 表。我会经常更改数据以及 gt 中选定的列,因此我添加...
#X 列包含 V* 列之一的后缀。 需要将 V(X) 中的值放入 Y 列中。 将 pandas 导入为 pd 将 numpy 导入为 np # 示例数据帧 df = pd.DataFrame({ 'EMPLID':[12,...
我有一些类似于下面的数据框的原始数据: df = pd.DataFrame([{'var1': '220-224(偶数)路名1', 'var2': '位置1', 'var3': '区域1'}, {'var1': '5 到 9 的站点(o...
我有一个如下所示的数据框 名称 值 =================== 一个2400 乙-400 C 400 d 600 我需要 df 采用以下格式 名称 Lower_V...
我有一个数据框问题。我有两个数据框。第一个数据帧包含订单详细信息,第二个数据帧包含线圈详细信息。我需要给线圈分配订单以满足订单的需求...
为什么决定让 data.table 和 tibble 与 R 中的 data.frame 不兼容? [已关闭]
我使用 R 已有多年,有时会遇到由于 data.table 和 tibble 与 data.frame 不兼容而导致的问题。这通常需要在它们之间执行转换...
Databricks DLT DataFrame - 如何使用带有注释的模式
Databricks DLT DataFrame - 如何使用模式 我是 Databricks Delta Live Tables 和 DataFrames 的新手,我对阅读时如何使用模式感到困惑 从溪流中。我正在逐桌做
比如说我的数据是: 年份 车门 气缸传动装置 2003 日产 4 4 吨 2006 日产 4 4 吨 2003 本田 2 6 AT 我要闪亮归还f...
我有一个名为 df 的数据框,它有 3 个 Likert 量表级别列和一个过滤器列: df # 小标题:50 × 4 val1 val2 val3 var 我有一个名为 df 的数据框,它有 3 个 Likert 量表级别列和一个过滤器列: df # A tibble: 50 × 4 val1 val2 val3 var <chr> <chr> <chr> <chr> 1 "Very \n Dissatisfied" "Neutral" "Very \n Dissatisf… Yes 2 "Neutral" "Neutral" "Neutral" No 3 "Dissatisfied" "Satisfied" "Neutral" Yes 4 "Very \n Satisfied" "Satisfied" "Very \n Satisfied" Yes 5 "Very \n Dissatisfied" "Very \n Dissatisfied" "Neutral" Yes 6 "Very \n Satisfied" "Very \n Satisfied" "Very \n Satisfied" Yes 7 "Dissatisfied" "Neutral" "Dissatisfied" Yes 8 "Neutral" "Satisfied" "Neutral" Yes 9 "Satisfied" "Very \n Satisfied" "Satisfied" No 10 "Neutral" "Satisfied" "Neutral" Yes 上一个问题的结果函数这里 给了我所有相同值的条形图。这是正确的。!!我想要的只是不要重复 3 次(20,30 和 50),我想在右图的这一栏上重复一次。不是 3 次。 这可能吗? plot_fun <- function(x, y) { .data <- df |> filter(var %in% x) p1 <- .data |> ggstats::gglikert(include = -var) + aes(y = reorder(.question, ifelse( .answer %in% c("Very \n Dissatisfied", "Dissatisfied"), 1, 0 ), FUN = sum ), decreasing = TRUE) + facet_wrap(~paste0("var to ", y))+ scale_fill_manual(values = custom_colors) + theme( strip.text = element_text(size = 14,color = "black"), # Increase facet label size axis.title = element_text(size = 14), # Increase axis title size axis.text = element_text(size = 10))+ # Increase axis text size theme(strip.background = element_rect(color="black", fill="red", size=1.5, linetype="solid")) p2 <- .data %>% tidyr::pivot_longer(-var) |> filter(!is.na(value)) |> mutate( name = reorder(name, ifelse( value %in% c("Very \n Dissatisfied", "Dissatisfied"), 1, 0 ), FUN = sum ) ) |> ggplot(aes(y = name)) + geom_bar(fill = "lightgrey")+ theme_light()+ geom_text(aes(label = ..count..), stat = "count", position=position_stack(vjust = 0.5))+ theme( axis.text.y = element_blank(), axis.ticks.y = element_blank()) list(p1, p2) } .include <- list(No = "No", Yes = "Yes", All = c("Yes", "No")) purrr::imap(.include, plot_fun) |> purrr::reduce(c) |> wrap_plots(ncol = 2) + plot_layout(axes = "collect", guides = "collect", widths = c(.7, .3)) & labs(x = NULL, y = NULL) & theme(legend.position = "bottom") 数据 dput(df) structure(list(val1 = c("Very \n Dissatisfied", "Neutral", "Dissatisfied", "Very \n Satisfied", "Very \n Dissatisfied", "Very \n Satisfied", "Dissatisfied", "Neutral", "Satisfied", "Neutral", "Very \n Dissatisfied", "Very \n Satisfied", "Very \n Dissatisfied", "Satisfied", "Neutral", "Very \n Dissatisfied", "Neutral", "Neutral", "Satisfied", "Neutral", "Very \n Satisfied", "Dissatisfied", "Dissatisfied", "Satisfied", "Neutral", "Dissatisfied", "Satisfied", "Very \n Dissatisfied", "Dissatisfied", "Very \n Dissatisfied", "Very \n Dissatisfied", "Dissatisfied", "Dissatisfied", "Dissatisfied", "Neutral", "Dissatisfied", "Dissatisfied", "Very \n Dissatisfied", "Satisfied", "Satisfied", "Neutral", "Very \n Dissatisfied", "Very \n Satisfied", "Very \n Dissatisfied", "Satisfied", "Very \n Dissatisfied", "Very \n Dissatisfied", "Satisfied", "Dissatisfied", "Dissatisfied"), val2 = c("Neutral", "Neutral", "Satisfied", "Satisfied", "Very \n Dissatisfied", "Very \n Satisfied", "Neutral", "Satisfied", "Very \n Satisfied", "Satisfied", "Very \n Dissatisfied", "Very \n Satisfied", "Satisfied", "Very \n Satisfied", "Satisfied", "Neutral", "Dissatisfied", "Satisfied", "Neutral", "Satisfied", "Satisfied", "Neutral", "Very \n Satisfied", "Very \n Satisfied", "Satisfied", "Satisfied", "Very \n Satisfied", "Satisfied", "Neutral", "Neutral", "Neutral", "Neutral", "Neutral", "Satisfied", "Satisfied", "Dissatisfied", "Neutral", "Satisfied", "Very \n Satisfied", "Satisfied", "Satisfied", "Very \n Dissatisfied", "Satisfied", "Neutral", "Satisfied", "Very \n Dissatisfied", "Neutral", "Satisfied", "Neutral", "Satisfied" ), val3 = c("Very \n Dissatisfied", "Neutral", "Neutral", "Very \n Satisfied", "Neutral", "Very \n Satisfied", "Dissatisfied", "Neutral", "Satisfied", "Neutral", "Very \n Dissatisfied", "Very \n Satisfied", "Very \n Dissatisfied", "Satisfied", "Neutral", "Very \n Dissatisfied", "Satisfied", "Neutral", "Satisfied", "Neutral", "Very \n Satisfied", "Neutral", "Satisfied", "Satisfied", "Neutral", "Dissatisfied", "Satisfied", "Very \n Satisfied", "Neutral", "Very \n Dissatisfied", "Very \n Dissatisfied", "Dissatisfied", "Satisfied", "Dissatisfied", "Dissatisfied", "Very \n Dissatisfied", "Dissatisfied", "Very \n Dissatisfied", "Satisfied", "Satisfied", "Neutral", "Very \n Dissatisfied", "Very \n Satisfied", "Very \n Dissatisfied", "Satisfied", "Very \n Dissatisfied", "Dissatisfied", "Satisfied", "Neutral", "Dissatisfied"), var = c("Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "No", "No", "Yes", "No", "No", "No", "No", "No", "Yes", "No", "No", "Yes", "No", "No", "No", "Yes", "No", "No", "Yes", "No", "No", "No", "No", "No", "Yes", "No", "No", "No", "Yes", "No", "No", "Yes", "Yes", "No", "Yes", "Yes", "No", "No", "No", "Yes" )), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame" )) likert_levels <- c( "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree" ) 只需拆下枢轴部分即可: library(tidyverse) library(patchwork) likert_levels <- c( "Very \n Dissatisfied", "Dissatisfied", "Neutral", "Satisfied", "Very \n Satisfied" ) plot_fun <- function(x, y) { .data <- df |> filter(var %in% x) |> mutate( across(-var, ~ factor(.x, likert_levels)) ) p1 <- .data |> ggstats::gglikert(include = -var) + aes(y = reorder(.question, ifelse( .answer %in% c("Very \n Dissatisfied", "Dissatisfied"), 1, 0 ), FUN = sum ), decreasing = TRUE) + facet_wrap(~ paste0("var to ", y)) + # scale_fill_manual(values = custom_colors) + theme( strip.text = element_text(size = 14, color = "black"), # Increase facet label size axis.title = element_text(size = 14), # Increase axis title size axis.text = element_text(size = 10) ) + # Increase axis text size theme(strip.background = element_rect(color = "black", fill = "red", size = 1.5, linetype = "solid")) p2 <- .data %>% count() |> ggplot(aes(y = factor(1), x = n)) + geom_col(fill = "lightgrey") + theme_light() + geom_text(aes(label = n), position = position_stack(vjust = 0.5) ) + theme( axis.text.y = element_blank(), axis.ticks.y = element_blank() ) list(p1, p2) } .include <- list(No = "No", Yes = "Yes", All = c("Yes", "No")) purrr::imap(.include, plot_fun) |> purrr::reduce(c) |> wrap_plots(ncol = 2) + plot_layout(guides = "collect", widths = c(.7, .3)) & labs(x = NULL, y = NULL) & theme(legend.position = "bottom")
如何创建基于pandas(python)中其他2个DataFrame的最小值的DataFrame?
假设我有 DataFrame df1 和 df2: >>> df1 = pd.DataFrame({'A': [0, 2, 4], 'B': [2, 17, 7], 'C': [4, 9, 11]}) >>> df1 ABC 0 0 2 4 1 2 17 9 2 4 7 11 >...
根据日期持续时间绘制关卡花费的时间 pandas python
我有这个数据集,其中包含给定时间内问题发生情况的日志。我想标记每个状态,表明它在那段时间达到了什么水平。我在 python 上做的...
我有三个数据帧(不同的变量),我试图在 python 中运行 PCA。它们的尺寸为: df1 = 17行×60212列(17是模型名称,60212是数据) df2 =...
这应该很简单,但我还是没有找到方法。我必须计算一个新列,其值为列 col1 和 col2 的最大值。所以如果 col1 是 2 并且 col2 是 4,则 new_col 应该有 4....
使用 .add(axis=1) 添加两个带有 + 的数据框列会产生 NaN,而使用 .add(axis=1) 会按预期工作吗?
我有一个数据框(此处输出:https://pastebin.com/7RCPsHet;可以使用 pd.DataFrame.from_dict(orient='tight') 读取),其中包含我想要总计的两列。它们看起来像: 分层...
我有一些库存 5 分钟数据,如下所示: 日期 开盘价 最高价 最低价 收盘量 0 2024-11-19 09:35:00 11.75 11.79 11.55 11.78 32673600 1 2024-11-19 09:40:00 11.78 11.81 ...
我有一个值向量,每个值都与一个名称相关联;矢量的长度根据用户输入而变化。虽然我使用了与表格相关的命令,但我想知道其他方式来显示它......
如何从包含 Python 元组列表的 DataFrame 列中过滤和提取特定的 POS 标签?
我正在使用 Python 中的 DataFrame,其中有一列名为“POS_TAGS”。此列中的每个条目都是一个元组列表,其中每个元组包含一个单词及其词性 (POS) 标记。这是一个
在 R 中,我有以下数据框,其中“重叠”列列出了在其他列上具有重叠值的行。 df <- data.frame(overlap = c("1,2,3", "1,2,3&