我正在开发一个项目,需要使用 R 中的桑基图可视化医疗保健提供者在护理路径不同阶段的流程。每个阶段代表护理过程中的一个步骤,每个阶段都涉及不同的提供者。
我想确保每个提供商在整个图表中的颜色一致。这意味着如果提供者出现在多个阶段(既作为源又作为目标),它在所有实例中应该具有相同的颜色。
我已经设法使用 networkD3::sankeyNetwork 创建桑基图,并且可以分配颜色,但我还没有弄清楚如何确保同一提供者在护理过程的每个阶段都获得相同的颜色。
这是我正在使用的代码的简化版本:
library(dplyr)
library(readxl)
# Load data
data <- read_excel("healthcare_data.xlsx", sheet = "Sheet1")
# Create nodes and links
providers <- unique(c(data$Phase_Provider, data$Next_Phase_Provider))
nodes <- data.frame(name = providers)
links <- data %>%
mutate(source = match(Phase_Provider, nodes$name) - 1,
target = match(Next_Phase_Provider, nodes$name) - 1) %>%
group_by(source, target) %>%
summarise(value = n()) %>%
ungroup()
# Assigning the same color to all nodes for now
nodes$color <- "#1f77b4"
# Create the Sankey diagram
sankeyNetwork(Links = links, Nodes = nodes, Source = "source", Target = "target",
Value = "value", NodeID = "name", fontSize = 12, nodeWidth = 30)
我想要实现的目标:
问题:
任何有关如何实现这一目标的帮助或指导将不胜感激!
您可以使用
NodeGroup
参数在 nodes
数据框中指定一列,该列定义每个节点属于哪个组,这将确定每个节点的颜色。您可以使用适当的 d3.scaleOrdinal()
命令作为传递给参数 colourScale
的值来显式选择每个组使用的颜色。
links <-
tibble::tribble(
~source, ~target, ~value,
0, 3, 3,
0, 5, 3,
0, 5, 3,
1, 3, 3,
1, 4, 3,
1, 5, 3,
2, 3, 3,
2, 4, 3,
2, 5, 3
)
nodes <-
tibble::tribble(
~name, ~node_group,
"ABC_phase1", "ABC",
"DEF_phase1", "DEF",
"GHI_phase1", "GHI",
"ABC_phase2", "ABC",
"DEF_phase3", "DEF",
"GHI_phase4", "GHI"
)
networkD3::sankeyNetwork(
Links = links,
Nodes = nodes,
Source = "source",
Target = "target",
Value = "value",
NodeID = "name",
NodeGroup = "node_group",
fontSize = 12,
nodeWidth = 30
)
#> Links is a tbl_df. Converting to a plain data frame.
#> Nodes is a tbl_df. Converting to a plain data frame.
networkD3::sankeyNetwork(
Links = links,
Nodes = nodes,
Source = "source",
Target = "target",
Value = "value",
NodeID = "name",
NodeGroup = "node_group",
colourScale = 'd3.scaleOrdinal().domain(["ABC", "DEF", "GHI"]).range(["#7d3945", "#e0677b", "#244457"])',
fontSize = 12,
nodeWidth = 30
)
#> Links is a tbl_df. Converting to a plain data frame.
#> Nodes is a tbl_df. Converting to a plain data frame.