R 中的冲积图,错误:数据格式不正确

问题描述 投票:0回答:1

我想使用冲积图比较两个植被图 (.shp)。 一张植被地图是 2010 年的,一张是 2023 年的。2010 年和 2023 年的制图单位相同。 然而,映射区域并不相同,所以我将两张地图相交。

现在我拥有以下格式的数据(在对数据帧进行彻底重组之后)(15行的随机子集,原始数据集中的总数为9220):

      ID value     total  Jaar
   <int> <chr>     <dbl> <dbl>
 1  1927 H0000  2.33e- 8  2023
 2  1030 H4030  7.64e+ 5  2010
 3  2447 H7120  3.65e- 5  2023
 4   301 H0000  2.47e- 8  2023
 5   611 H0000  2.73e-17  2023
 6  4021 H0000  1.17e+ 5  2010
 7  1531 H0000  3.11e+ 4  2023
 8   759 H0000  2.84e- 4  2010
 9  1339 H6230  6.51e- 7  2010
10  2848 H9999  2.23e- 5  2010
11  1740 H4010A 3.17e- 7  2023
12   335 H4030  5.90e- 5  2023
13  4182 H7120  1.47e- 3  2023
14  2676 H0000  3.81e+ 4  2023
15  2828 H9999  2.89e+ 5  2010

ID = 唯一的 ID,能够将 2010 年地图多边形中的植被与 2023 年同一多边形中的植被耦合起来。每个多边形最多可以出现三种不同的植被类型,并且 2010 年可能只有一种类型发生在多边形中,2023 年有 3 种类型发生在多边形中。这意味着 2010 年将有 1 个 611 ID,2023 年将有 3 个 611 ID。

值=植被类型(栖息地类型)

总计 = 总表面积(平方米)

Jaar = 绘制地图的年份

我使用以下代码来制作冲积图:

 data_alluv_long %>%
   mutate(Jaar = factor(Jaar, levels = c("2010",
                                         "2023")),
          value = factor(value, levels = c("H9999",
                                           "H0000",
                                           "H91D0",
                                           "H7110B",
                                           "H4030",
                                           "H7120",
                                           "H4010A",
                                           "H3160",
                                           "H6230",
                                           "H7150",
                                           "H7110A",
                                           "H2320",
                                           "H0410A",
                                           "H0401A"
                                           ))) %>%
ggplot( 
       aes(x = Jaar, 
           stratum = value, 
           alluvium = ID, 
           y = total)) +
  geom_alluvium(aes(fill = value), 
                alpha = 0.7) +
  geom_stratum() +
  theme_minimal() +
  labs(
    title = "Veranderingen in Habitattypen",
    x = "Jaar",
    y = "Oppervlakte"
  )

我不断收到此错误:

Error in `geom_alluvium()`:
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in `setup_data()`:
! Data is not in a recognized alluvial form (see `help('alluvial-data')` for details).

据我在“帮助”选项中发现,我的 df 格式正确。 有些值非常小(来自交叉点的伪影),因此我尝试通过添加

filter(total > 0.001)
来省略这些值,但出现了相同的错误。

我想从冲积层得到什么: 两个条形图,一个代表 2010 年,一个代表 2023 年。条形图的填充必须是植被类型(栖息地类型)。该流程必须显示 13 年来某些类型如何保持不变,或者某些时间如何变化。

我的问题: 错误从何而来?我的数据结构为何不正确?

完整数据集链接: https://www.dropbox.com/scl/fo/ov0tmhoujvbg4vohsj8td/AGcnP2Cd4Wt3jC4wuIeqYYA?rlkey=aq9lkm6rb0juod4jepkq03vqr&dl=0

(艺术印象)图片示例: enter image description here

r ggplot2 ggalluvial
1个回答
0
投票

is_lodes_form()
检查显示您有重复的 ID 轴配对,考虑到您之前的描述,即某些多边形在年份之间具有不同数量的植被类型,我怀疑可能会发生这种情况。

解决方案:

我们不要求每个 ID 每年只出现一次,而是创建一个唯一的 flow_id,将原始 ID 与该 ID 中每种植被类型的索引相结合。 sol

代码

# First, let's load required libraries
library(tidyverse)
library(ggalluvial)
setwd(dirname(rstudioapi::getSourceEditorContext()$path)) # set the current script's location as working directory



# Read in data from csv
data_alluv_long <- read.csv("alluv_long.csv")

# Fix 1: Filter out very small values that might cause issues
data_filtered <- data_alluv_long %>%
  filter(total > 0.001)

# Fix 2: Ensure each ID appears in both years
data_complete <- data_filtered %>%
  group_by(ID) %>%
  filter(n_distinct(Jaar) == 2) %>%
  ungroup()

is_lodes_form(
  data_complete,
  Jaar,
  value,
  ID
)
# is wrong so there is something wrong
# Duplicated id-axis pairings. This is your error


# Check if IDs appear in both years
id_counts <- data_complete %>%
  group_by(ID) %>%
  summarise(n_years = n_distinct(Jaar))
table(id_counts$n_years)

nrow(data_complete)-2*nrow(id_counts) # ids have more than two values


# Fix it

# Function to prepare the data
prepare_alluvial_data <- function(data) {
  # Step 1: Create a unique identifier for each ID-vegetation combination
  data_prepared <- data %>%
    group_by(ID, Jaar) %>%
    # Create a unique combination identifier
    mutate(veg_index = row_number(),
           # Create a unique flow identifier
           flow_id = paste(ID, veg_index, sep = "_")) %>%
    ungroup()
  
  # Step 2: Ensure the data is properly structured for the alluvial diagram
  data_prepared <- data_prepared %>%
    # Convert Jaar to factor
    mutate(Jaar = factor(Jaar),
           # Ensure value is a factor with specified levels if needed
           value = factor(value))
  
  return(data_prepared)
}

# Using your data:
data_ready <- prepare_alluvial_data(data_alluv_long)

data_ready <- data_ready %>%
  filter(total > 0.001)

# Create the plot with the modified data
ggplot(data_ready,
       aes(x = Jaar, 
           stratum = value, 
           alluvium = flow_id,  # Using the new flow_id instead of ID
           y = total,
           fill = value)) +
  geom_flow(alpha = 0.7) +
  geom_stratum(alpha = 0.8) +
  scale_x_discrete(expand = c(0.1, 0.1)) +
  theme_minimal() +
  labs(
    title = "Veranderingen in Habitattypen",
    x = "Jaar",
    y = "Oppervlakte (m²)"
  ) +
  scale_fill_discrete(name = "Habitattype") +
  theme(legend.position = "right")
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.