R、data.table 迭代、嵌套循环的正确应用

问题描述 投票:0回答:1

基本上,我试图通过过滤某些项目来表示小型聚合表中的数据。

我编写的代码运行得很好。但是我需要进一步修改的帮助。

A.如果有更好的方法来做到这一点,请建议。我将不胜感激。

B.我通常更喜欢 data.table,因为这只是一个示例数据。而且我的原始数据非常大。

现在,输出中需要进行 2 处修改,分别是:

  1. 每张表开头有一个总计表,即其右侧所有 5 个表的总和。所以一张桌子上没有过滤器

    taxa

  2. 总计选项卡是所有工作表的总和,因此第一个表在

    taxa
    plot_type
    上没有任何过滤器,右侧的其他 5 个表在
    plot_type
    上没有任何过滤器。

这是我的代码:

library(data.table)
library(openxlsx)

# Read the data into a data.table
DATA.complete <- fread("https://vincentarelbundock.github.io/Rdatasets/csv/ratdat/complete.csv")

# Replace empty values in 'taxa', 'genus', and 'species' columns with "Unknown"
DATA.complete[taxa == "", taxa := "Unknown"]
DATA.complete[genus == "", genus := "Unknown"]
DATA.complete[species == "", species := "Unknown"]

# Get unique values for plot_type and taxa
plot_type_list <- unique(DATA.complete$plot_type)
taxa_list <- unique(DATA.complete$taxa)

# Create a new workbook
Export.DATA <- createWorkbook()

# Iterate through plot_type_list and add worksheets to the workbook
invisible(sapply(plot_type_list, \(x) { addWorksheet(Export.DATA, x) }))

# Iterate through plot_type_list and taxa_list to create and write data to worksheets
sapply(1:length(plot_type_list), \(z) {
  Pivot <- DATA.complete[plot_type == plot_type_list[z], ]
  sapply(1:length(taxa_list), \(t) {
    Pivot_subset <- Pivot[taxa == taxa_list[t], ]
    tbl1 <- Pivot_subset[, .(Total.days = sum(day)), by = .(taxa,plot_type, genus, species)]
    c <- (t - 1) * 6
    writeData(Export.DATA, plot_type_list[z], tbl1, startRow = 1, startCol = 2 + c, colNames = TRUE)
  })
})

# Save the workbook to a file
saveWorkbook(Export.DATA, "Export.Data.xlsx", overwrite = TRUE)
r data.table nested-loops sapply openxlsx
1个回答
0
投票
library(data.table)
library(openxlsx)

# Read the data into a data.table
DATA.complete <- fread("https://vincentarelbundock.github.io/Rdatasets/csv/ratdat/complete.csv")

# Replace empty values with "Unknown"
DATA.complete[taxa == "", taxa := "Unknown"]
DATA.complete[genus == "", genus := "Unknown"]
DATA.complete[species == "", species := "Unknown"]

# Get unique values for plot_type and taxa
plot_type_list <- unique(DATA.complete$plot_type)
taxa_list <- c("Total", unique(DATA.complete$taxa))

# Create a new workbook
Export.DATA <- createWorkbook()

# Add Total tab
addWorksheet(Export.DATA, "Total")

# Add worksheets for plot_type_list
invisible(sapply(plot_type_list, \(x) { addWorksheet(Export.DATA, x) }))

# Create and write data for Total tab
total_data <- DATA.complete[, .(Total.days = sum(day)), by = .(taxa, genus, species)]
writeData(Export.DATA, "Total", total_data, startRow = 1, startCol = 2, colNames = TRUE)

sapply(taxa_list[-1], \(t) {
  total_subset <- DATA.complete[taxa == t, .(Total.days = sum(day)), by = .(taxa, plot_type, genus, species)]
  writeData(Export.DATA, "Total", total_subset, startRow = 1, startCol = 8 + (which(taxa_list == t) - 2) * 6, colNames = TRUE)
})

# Create and write data for each plot_type
sapply(1:length(plot_type_list), \(z) {
  Pivot <- DATA.complete[plot_type == plot_type_list[z], ]
  
  # Total table for this plot_type
  total_table <- Pivot[, .(Total.days = sum(day)), by = .(taxa, genus, species)]
  writeData(Export.DATA, plot_type_list[z], total_table, startRow = 1, startCol = 2, colNames = TRUE)
  
  sapply(2:length(taxa_list), \(t) {
    Pivot_subset <- Pivot[taxa == taxa_list[t], ]
    tbl1 <- Pivot_subset[, .(Total.days = sum(day)), by = .(taxa, plot_type, genus, species)]
    writeData(Export.DATA, plot_type_list[z], tbl1, startRow = 1, startCol = 2 + (t - 1) * 6, colNames = TRUE)
  })
})

# Save the workbook
saveWorkbook(Export.DATA, "Export.Data.xlsx", overwrite = TRUE)
  • 添加“总计”选项卡,其中包含总体总计表和每个分类单元的单独表。

  • 在每个plot_type选项卡的开头添加一个总计表,这是该特定plot_type的总和,无需对分类单元进行过滤。

  • 保留使用 data.table 来提高效率,这对于大型数据集很有好处。

© www.soinside.com 2019 - 2024. All rights reserved.