以下是我正在使用的HIVE查询,我还使用了一个Ranking函数。我在我的本地机器上运行这个查询。
SELECT numeric_id, location, Rank(location), followers_count
FROM (
SELECT numeric_id, location, followers_count
FROM twitter_data
DISTRIBUTE BY numeric_id, location
SORT BY numeric_id, location, followers_count desc
) a
WHERE Rank(location)<10;
我的Rank函数如下。
package org.apache.hadoop.hive.contrib.udaf.ex;
import org.apache.hadoop.hive.ql.exec.UDF;
public final class Rank extends UDF{
private int counter;
private String last_key;
public int evaluate(final String key){
if ( !key.equalsIgnoreCase(this.last_key) ) {
this.counter = 0;
this.last_key = key;
}
return this.counter++;
}
}
我创建了上述文件的Jar,然后在运行hive查询之前,做了以下步骤。我试着用runnable jar来做,也试着用简单的来创建。
ADD JAR /home/adminpc/Downloads/Project_input/Rank.jar;
CREATE TEMPORARY FUNCTION Rank AS 'org.apache.hadoop.hive.contrib.udaf.ex.Rank';
这是我执行Hive查询后的结果------。
hive> SELECT numeric_id, location, Rank(location), followers_count
> FROM (
> SELECT numeric_id, location, followers_count
> FROM twitter_data
> DISTRIBUTE BY numeric_id, location
> SORT BY numeric_id, location, followers_count desc
> ) a
> WHERE Rank(location)<1;
FAILED: NullPointerException null
你的UDF似乎并没有保护输入表中的空值,特别是:检查当位置为空时,会发生什么。