我正在尝试将复杂查询的结果放入 r 中进行操作/ETL。我找到了 SQLove,听起来它正是能让我的生活变得轻松的软件包。尝试在一个简单的脚本上测试它 -
if (!require("pacman")
) install.packages("pacman")
pacman::p_load(
#add list of libraries here
SQLove,
odbc,
RJDBC
)
con <-
dbConnect(
odbc::odbc(),
driver = "SQL Server",
server = "ClarDbPrd_Alias",
database = "Clarity"
)
d1 <-
dbGetMultiQuery(
connection = con,
sql_file_path = "S:/QITeam/DataAnalytics/Projects/Frailty/Scripts/test.sql"
)
SQL 文件是一个简单的 Select * 来测试包是否有效
SELECT * FROM CLARITY_DEP;
当我运行它时,出现以下错误:
Error: unable to find an inherited method for function ‘dbSendUpdate’ for signature ‘conn = "Microsoft SQL Server", statement = "SQL"’
尝试研究这个问题,看起来 RJDBC 包中的 dbSendUpdate 仅适用于 JBDC,不适用于 ODBC。是否有解决方法或不同的包可以轻松运行 ODBC .sql 文件类型中的脚本?我喜欢它说它清理评论和所有内容的部分(两者 - 和多行 /**/)
在 GitHub 上对该函数进行一些研究后,我认为它已经过时了。
https://github.com/samkerns/SQLove/blob/main/R/dbGetMultiQuery.R
所以我拿起代码,根据我的目的对其进行了一些调整,并且在我用于工作的更简单的 SQL 脚本上,它毫无问题地弹出了结果
# Check to see if pacman is installed
## Install if it is not loaded
if (!require("pacman")
) install.packages("pacman")
# Get needed libraries
pacman::p_load(
odbc,
DBI,
stringr
)
# Set up connection
con <-
dbConnect(
odbc::odbc(),
driver = "SQL Server",
server = "ClarDbPrd_Alias",
database = "Clarity"
)
# Function to read in SQL file execute all commands and return final query
dbGetMultiQuery <-
function(
connection, # connection set up from dbConnect
sql_file_path # path to the SQL file
) {
#Reading in the SQL file
sql_file <-
readr::read_file(sql_file_path)
#Removing all comments /* and --
## /**/ comments
sql_file <- base::gsub("/\\*.*?\\*/", "", sql_file)
## -- comments
sql_file <- base::gsub("--.*?\\r", "\\\r", sql_file)
## special characters
sql_file <- str_replace_all(sql_file, "[\r\n]", " ")
sql_file <- str_replace_all(sql_file, "[\t]", " ")
## white space before and after to make debugging easier
sql_file <- trimws(sql_file)
# Splitting the SQL file into individual queries
## ignore '; ' because I use that in STRING_AGG
sql_list <- base::strsplit(sql_file, "(?<!')\\;(?! ')", perl = TRUE)
# Get the number of queries
query_length <- base::lengths(sql_list)
#If only 1 query is available, it's a SELECT statement, use DBI::dbGetQuery
if (query_length == 1) {
DBI::dbGetQuery(con, sql_list[[1]][[1]])
#If more than 1 query is available, dbExecute for all but final statement
} else {
for (i in c(1:(query_length - 1))) {
DBI::dbExecute(con, DBI::SQL(sql_list[[1]][[i]]))
print(paste("Statement", i, "of", query_length, "complete"))
}
#Create dataframe from final query statement
DBI::dbGetQuery(con, sql_list[[1]][[query_length]])
}
}
# Test
df1 <-
dbGetMultiQuery(
connection = con,
sql_file_path = "S:/QITeam/DataAnalytics/Projects/Frailty/Scripts/CasesGeriatric_1c.sql"
)
我希望人们在希望在 R 中执行多个查询时遇到这个问题,因为我希望当我第一次开始搜索如何在 R 中执行此操作时遇到这样的答案:)