将包中存在的元组值与硬编码的String值进行比较

问题描述 投票:0回答:1

我有这些列的数据集: -

FMID,County,WIC,WICcash

以下是数据样本: -

1002267,Douglas,Y,N
21005876,Douglas,Y,N
1001666,Douglas,N,Y

我根据县对数据进行了分组,并根据County = 'Douglas'过滤了数据。这是输出:

(Douglas,{(1002267,Douglas,Y,N),(21005876,Douglas,Y,N),(1001666,Douglas,N,Y)})

现在,如果WICWICcash列的值为Y,那么我想从两个列中获取值的组合计数。

在这里,结合WICWICcash列我有3个Y值,所以我的输出将是

Douglas 3

我怎样才能做到这一点?

下面是我到目前为止编写的代码

load_data = LOAD 'PigPrograms/Markets/DATA_GOV_US_Farmers_Market_DataSet.csv' USING PigStorage(',') as (FMID:long,County:chararray, WIC:chararray, WICcash:chararray);

group_markets_by_county = GROUP load_data BY County;

filter_county = FILTER group_markets_by_county BY group == 'Douglas';

DUMP filter_county;
apache-pig
1个回答
0
投票

要查看包内,您可以使用嵌套的foreach。

A = LOAD 'input3.txt' AS (FMID:long,County:chararray, WIC:chararray, WICcash:chararray);
B = GROUP A by County;
describe B; /* B: {group: chararray,A: {(FMID: long,County: chararray,WIC: chararray,WICcash: chararray)}} */ 
C = FOREACH B {
        FILTER_WIC_Y = FILTER A by WIC == 'Y';
        COUNT_WIC_Y = COUNT(FILTER_WIC_Y);
        FILTER_WICcash_Y = FILTER A by WICcash == 'Y';
        COUNT_WICcash_Y = COUNT(FILTER_WICcash_Y);
        GENERATE group, COUNT_WIC_Y + COUNT_WICcash_Y as count;
}
dump C;

或者,您可以将“Y”和“N”替换为1和0并将其添加。

A = LOAD 'input3.txt' AS (FMID:long,County:chararray, WIC:chararray, WICcash:chararray);
B = FOREACH A GENERATE FMID, County, (WIC == 'Y' ? 1 : 0 ) as wic, (WICcash == 'Y' ? 1 : 0 ) as wiccash;
C = GROUP B by County;
D = FOREACH C GENERATE group, SUM(B.wic) + SUM(B.wiccash) as count;
dump D;
© www.soinside.com 2019 - 2024. All rights reserved.