我在服务器上有XML字符串。我想用R解析XML。到目前为止,我的代码运行一个限制为20行的SQL查询。当我将结果作为数据框下载时,XML字符串太长而文本被截断导致read_xml
to抛出错误。有什么建议如何解决这个问题?我是否需要下载结果来解析它?
这是我的代码:
drv <- odbc::odbc()
impala <- src_impala(
drv = drv,
driver = "Cloudera ODBC Driver for Impala",
host = "host",
dbname = "default",
port = 21050)
sqlResults = dbGetQuery(impala,sqlQuery)
data=sqlResults[2,10]
xmlText=as.character(data)
read_xml(xmlText)
更新:我能够在我的查询中使用substring
来创建包含XML字符串子集的多个列。然后我用dyplr
运行查询并下载结果。接下来,我使用tidyr
中的unite函数将字符串连接到一个完整的XML字符串。最后,我能够用R的read_xml
解析XML。我的查询看起来像这样:
Select
(substr(xmlText,1,30000))'xmlText1',
(substr(xmlText,30001,30000))'xmlText2',
(substr(xmlText,60001,30000))'xmlText3',
(substr(xmlText,90001,30000))'xmlText4',
(substr(xmlText,120001,30000))'xmlText5'
From
myXMLtables