SAS Proc 将宽变量转置为长变量

问题描述 投票:0回答:1

我有一些带有长文本字符串的数据,其中包含带有某些日期的数据。我将每个单独的文本部分和相应的日期解析到不同的列中。每个 ID 都会重复每个字符变量。我想将这些变量从宽转置为长,并将它们与正确的 ID 匹配,同时每次观察也使用一个字符变量和日期。我知道描述并不公正,所以我将在下面发布一些示例数据。

data test;
length status_01 $175 status_02 $175 status_03 $175;
infile datalines dsd dlm="|" truncover;
input ID Status_01$ date_01 :mmddyy10. Status_02$ date_02 :mmddyy10. Status_03$ date_03 :mmddyy10.;
format date_01 date_02 date_03 mmddyy10.;
datalines;
1 |example status on 01/29/23 with more text to emphasize this is a text string| 01-29-23 |example status on 02/06/24 with even more text| 02-06-24 |example status on 03/11/24 with yet again more text| 03-11-24
1 |example status on 01/29/23 with more text to emphasize this is a text string| 01-29-23 |example status on 02/06/24 with even more text| 02-06-24 |example status on 03/11/24 with yet again more text| 03-11-24
1 |example status on 01/29/23 with more text to emphasize this is a text string| 01-29-23 |example status on 02/06/24 with even more text| 02-06-24 |example status on 03/11/24 with yet again more text| 03-11-24
2 |example status on 07/17/23 with more text to emphasize this is a text string| 07-17-23 |example status on 12/16/23 with even more text| 12-16-23 |example status on 12/24/23 with yet again more text| 12-24-23
2 |example status on 07/17/23 with more text to emphasize this is a text string| 07-17-23 |example status on 12/16/23 with even more text| 12-16-23 |example status on 12/24/23 with yet again more text| 12-24-23
2 |example status on 07/17/23 with more text to emphasize this is a text string| 07-17-23 |example status on 12/16/23 with even more text| 12-16-23 |example status on 12/24/23 with yet again more text| 12-24-23
3 |example status on 04/26/23 with more text to emphasize this is a text string| 04-26-23 |example status on 05/29/23 with even more text| 05-29-23 |example status on 07/10/23 with yet again more text| 07-10-23
3 |example status on 04/26/23 with more text to emphasize this is a text string| 04-26-23 |example status on 05/29/23 with even more text| 05-29-23 |example status on 07/10/23 with yet again more text| 07-10-23
3 |example status on 04/26/23 with more text to emphasize this is a text string| 04-26-23 |example status on 05/29/23 with even more text| 05-29-23 |example status on 07/10/23 with yet again more text| 07-10-23
;
run;

我知道这对数据来说有点令人困惑,这是我可以根据原始数据集制作数据的最简单的方法。我希望最终产品如下所示:

data test_01;
length status_01 $175;
infile datalines dsd dlm="|" truncover;
input ID Status_01$ date_01 :mmddyy10.;
format date_01 mmddyy10.;
datalines;
1 |example status on 01/29/23 with more text to emphasize this is a text string| 01-29-23
1 |example status on 02/06/24 with even more text| 02-06-24
1 |example status on 03/11/24 with yet again more text| 03-11-24
2 |example status on 07/17/23 with more text to emphasize this is a text string| 07-17-23
2 |example status on 12/16/23 with even more text| 12-16-23
2 |example status on 12/24/23 with yet again more text| 12-24-23
3 |example status on 04/26/23 with more text to emphasize this is a text string| 04-26-23
3 |example status on 05/29/23 with even more text| 05-29-23
3 |example status on 07/10/23 with yet again more text| 07-10-23
;
run;

提前非常感谢!

date sas character transpose
1个回答
0
投票

您可以分两步完成此操作:一次大转置,然后使用该转置的结果来更改数据,使其看起来完全符合您的要求。看起来像这样:

proc transpose data=test out=test_tpose(keep=id _NAME_ date: col1);
    by id date:;
    var status:;
run;

data want;
    set test_tpose;
    by id;

    date_01 = input(vvaluex(cats('date_', scan(_NAME_, 2, '_'))), mmddyy10.);

    rename col1 = status_01;
    drop _NAME_;
run;

发生了什么事

第一次转置后,您会得到一个如下所示的表格:

ID  date_01     date_02     date_03     _NAME_      COL1
1   1/29/2023   2/6/2024    3/11/2024   status_01   example status on 01/29/23 with ...
1   1/29/2023   2/6/2024    3/11/2024   status_02   example status on 02/06/24 with ...
1   1/29/2023   2/6/2024    3/11/2024   status_03   example status on 03/11/24 with ...
2   7/17/2023   12/16/2023  12/24/2023  status_01   example status on 07/17/23 with ...
2   7/17/2023   12/16/2023  12/24/2023  status_02   example status on 12/16/23 with ...
2   7/17/2023   12/16/2023  12/24/2023  status_03   example status on 12/24/23 with ...
3   4/26/2023   5/29/2023   7/10/2023   status_01   example status on 04/26/23 with ...
3   4/26/2023   5/29/2023   7/10/2023   status_02   example status on 05/29/23 with ...
3   4/26/2023   5/29/2023   7/10/2023   status_03   example status on 07/10/23 with ...

我们需要做的就是将

date_01
的最终值映射为:

  • status_01
    =
    date_01
  • status_02
    =
    date_02
  • status_03
    =
    date_03

我们可以使用

date_XX
函数根据从
status_XX
找到的数字动态选择
vvaluex
列。这总是返回它所提取的值的字符串。我们将获取
status_XX
的数字部分,将其与单词
"date_"
连接起来,将其传递到
vvaluex
,并使用
input
函数将返回值转换为 SAS 日期。

date_01 = input(vvaluex(cats('date_', scan(_NAME_, 2, '_'))), mmddyy10.);

最后,我们重命名变量并删除不再需要的变量。

ID  date_01     status_01
1   01/29/2023  example status on 01/29/23 with more text to emphasize this is a text string
1   02/06/2024  example status on 02/06/24 with even more text
1   03/11/2024  example status on 03/11/24 with yet again more text
2   07/17/2023  example status on 07/17/23 with more text to emphasize this is a text string
2   12/16/2023  example status on 12/16/23 with even more text
2   12/24/2023  example status on 12/24/23 with yet again more text
3   04/26/2023  example status on 04/26/23 with more text to emphasize this is a text string
3   05/29/2023  example status on 05/29/23 with even more text
3   07/10/2023  example status on 07/10/23 with yet again more text
© www.soinside.com 2019 - 2024. All rights reserved.