将Hive查询的输出结果通过beeline存储在字符串中。尝试与Popen一起跑,但没有奏效

问题描述 投票:1回答:1

我正在从Python脚本运行一个配置单元查询。当我使用subprocess.getstatusoutput时,我能够在没有任何问题的情况下运行它,但无法将结果存储到变量中。所以我试着用Popen,我得到一个错误,说不能

dd1 = '10-Sep-12'
table = 'testing_table'
1> query = "select distinct(input__file__name) from <db_name>." + table + " where as_of_date =" +"'"+ dd1 +"'"+ " limit 2"

2> cmd = 'beeline -u "jdbc:hive2:<connection string>" -e "'+query + ';"'

3> stat, query_output = subprocess.getstatusoutput(cmd)

这是有效但当我尝试打印query_output时,它会打印所有输出(例如关于所有阶段的'info'标签和查询的确切o / p)

而不是getstatusoutput,当我使用subprocess.Popen或subprocess.check_output时,我收到如下错误:

FileNotFoundError: [Errno 2] No such file or directory: 'beeline -u "<connection string>" -e "select distinct(input__file__name) from <db_name>.<table_name> where as_of_date =\'10-Sep-12\' limit 2;"'
python hive subprocess hiveql beeline
1个回答
0
投票

Attached is a python snippet to read from a file with table list and run hive query for each of the tables in the list and append the results into a file using subprocess

cmd变量存储要从子进程fns调用的要执行的命令,并将输出存储到稍后写入文件的变量中。下一组步骤将读取在第一步中创建的文件,并执行另一个查询并写入另一个文件。

import subprocess
cmd= """ hive -e "use database; show tables;" """
val= subprocess.check_output(cmd,shell=True)        
fl = open('/home/ouput_all_table_list.txt', 'w')
fl.write(val)
fl.close()

fl = open('/home/ouput_all_table_list.txt', 'r')
content = fl.read().splitlines()
for var in content:
    tbl_nm= "'" + var + "'" 
    cmd_ay= 'hive -e "use database; select collect_list(cast(file_dt as string)) as dt, collect_list(cast(cnt as string)) as cnt, '+ tbl_nm +' from (select count(1) cnt,file_dt from database.' + var + ' group by file_dt having count(1) > 0  order by file_dt desc) a;"'
    print cmd_ay
    cmd_out= subprocess.check_output(cmd_ay,shell=True)
    print cmd_out
    fh = open('/home/ouput_all_hive_count_data.txt', 'a')
    fh.write(cmd_out)
    fh.close()
© www.soinside.com 2019 - 2024. All rights reserved.