SAS 转置和汇总数据

问题描述 投票:0回答:1

我有一个如下所示的数据集:

   Account Number  6m      7m      8m      9m      10m     11m     6m_Metric    7m_metric   8m_metric   9m_metric   10m_metric      11m_metric
    1               Better  X < 10  X < 10  Better  X < 30  X < 30    0.6       0.6         0.9         1.2         0.1             5.0
    2               X < 10  X < 20  X < 30  X < 20  X < 20  X < 20    0.4       0.4         3.4         3.7         4.4             0.3
    3               Better  Better  Better  Better  X < 10  X < 20    1.5       1.5         1.5         0.3         1.5             1.8
    4               X < 10  Better  Same    Same    Same    Same      3.4       3.4         1.8         5.0         5.2             6.8
    5               Same    Better  Same    Same    Same    Same      0.1       0.1         5.0         5.3         5.0             1.8
    6               Same    Same    Same    Better  Better  Better    4.4       4.4         0.3         0.3         5.2             7.4
    7               Same    X < 10  X < 10  X < 10  X < 10  Better    5.0       5.0         1.3         2.1         2.2             0.3
    8               Better  Better  Better  Better  Better  Better    7.8       7.8         5.0         1.5         1.9             7.4
    9               X < 10  X < 10  X < 10  X < 20  X < 30  Better    9.1       9.1         9.4         5.5         5.6             4.6
    10              X < 20  X < 30  X < 30  X < 30  X < 30  X < 30    0.3       0.3         1.5         1.8         2.2             1.5

每个单元格告诉我每个帐号 6-11 个月后发生的情况以及每个帐号每个月的指标值。我希望能够在此处显示任何趋势,因此我希望能够拥有每个月“更好”等的帐户数量,以及本月的平均指标金额。所以我认为它应该看起来像:

Result  6m  7m  8m  9m  10m 11m Avg_met_6m Avg_met_7m       Avg_met_8m  Avg_met_9m  Avg_met_10m  Avg_met_11m
X < 10  3   3   3   2   3   0       4.3         4.3         3.9             2.9         3.9         2.2
X < 20  1   1   0   1   1   2       0.3         0.3         3.4             0           4.4         0.3
X < 30  0   1   2   1   2   1       0           0           1.5             2.8         2.2         3.3
Same    3   1   3   2   2   2       3.2         3.2         0.3             3.5         5.1         4.3
Better  1   4   2   4   2   4       3.3         3.3         3.3             0.9         2.2         7.4

我只是想举一个例子来说明我正在尝试做的事情,如果有任何拼写错误,请道歉。

data have;
    infile datalines dlm='|';
    input "Account Number"n "6m"n$ "7m"n$ "8m"n$ "9m"n$ "10m"n$ "11m"n$ "6m_Metric"n "7m_Metric"n "8m_Metric"n "9m_Metric"n "10m_Metric"n "11m_Metric"n;
    datalines;
1|Better|X < 10|X < 10|Better|X < 30|X < 30|0.6|0.6|0.9|1.2|0.1|5.0
2|X < 10|X < 20|X < 30|X < 20|X < 20|X < 20|0.4|0.4|3.4|3.7|4.4|0.3
3|Better|Better|Better|Better|X < 10|X < 20|1.5|1.5|1.5|0.3|1.5|1.8
4|X < 10|Better|Same|Same|Same|Same|3.4|3.4|1.8|5.0|5.2|6.8
5|Same|Better|Same|Same|Same|Same|0.1|0.1|5.0|5.3|5.0|1.8
6|Same|Same|Same|Better|Better|Better|4.4|4.4|0.3|0.3|5.2|7.4
7|Same|X < 10|X < 10|X < 10|X < 10|Better|5.0|5.0|1.3|2.1|2.2|0.3
8|Better|Better|Better|Better|Better|Better|7.8|7.8|5.0|1.5|1.9|7.4
9| X < 10|X < 10|X < 10|X < 20|X < 30|Better|9.1|9.1|9.4|5.5|5.6|4.6
10| X < 20|X < 30|X < 30|X < 30|X < 30|X < 30|0.3|0.3|1.5|1.8|2.2|1.5
;
run;
arrays sas transpose
1个回答
0
投票

我采取了以下方法:

  • 一些转置以将数据转换为可行的格式
  • PROC TABULATE 计算结果并将结果输出到数据集
  • 更多转置以将这些结果转换为所需的格式

(我建议将数据保留为中间格式,因为它可能更容易使用。)

* Transpose the character variables;
proc transpose data=have out=char_t (rename = col1 = Result) name = time;
    by 'account number'n;
    var _character_;
run;

* Transpose the numeric variables;
proc transpose data=have out=num_t (where = (time ne 'Account Number') rename = col1 = Metric) name = time ;
    by 'account number'n;
    var _numeric_;
run;

* Recode the time variable to match char_t;
data num_t (rename = t = time drop = time);
    length t $ 3;
    set num_t;
    t = prxchange('s/\_Metric\s*$//', -1, time);
run;

* Merge them back together;
proc sort data = num_t; by 'Account number'n time; run;
proc sort data = char_t; by 'Account number'n time; run;

data have_t;
    merge char_t num_t;
    by 'Account Number'n time;
run;
* NOTE: I would leave the data in this format, and use PROCs to 
    do any further analysis;

* Tabulate to get the required results (also outputting to a data set);
proc tabulate data = have_t out=tab;
    class Result time / order = fmt;
    var Metric;
    table Result, metric * (n mean) * time / misstext="0";  
run;

* Need 2 transposes to get the correct column layout in the final data;
* First get the values from Metric_N and Metric_Mean in to 1 column;
proc transpose data = tab out = tab_t;
    by Result time;
    var metric_n metric_mean;
run;

* Then transpose them into the desired wide format;
proc transpose data = tab_t out = want (drop = _name_);
    by Result;
    id time _name_;
    var col1;
run;

* Finally re-order the columns;
data want;
    retain Result '6mMetric_N'n '7mMetric_N'n '8mMetric_N'n '9mMetric_N'n '10mMetric_N'n '11mMetric_N'n 
         '6mMetric_Mean'n '7mMetric_Mean'n '8mMetric_Mean'n '9mMetric_Mean'n '10mMetric_Mean'n '11mMetric_Mean'n; 
    set want;
run;

如果您需要列名完全符合要求,您可以在另一个数据步骤中使用

rename

© www.soinside.com 2019 - 2024. All rights reserved.