我有一些带有长文本字符串的数据,其中包含带有某些日期的数据。我将每个单独的文本部分和相应的日期解析到不同的列中。每个 ID 都会重复每个字符变量。我想将这些变量从宽转置为长,并将它们与正确的 ID 匹配,同时每次观察也使用一个字符变量和日期。我知道描述并不公正,所以我将在下面发布一些示例数据。
data test;
length status_01 $175 status_02 $175 status_03 $175;
infile datalines dsd dlm="|" truncover;
input ID Status_01$ date_01 :mmddyy10. Status_02$ date_02 :mmddyy10. Status_03$ date_03 :mmddyy10.;
format date_01 date_02 date_03 mmddyy10.;
datalines;
1 |example status on 01/29/23 with more text to emphasize this is a text string| 01-29-23 |example status on 02/06/24 with even more text| 02-06-24 |example status on 03/11/24 with yet again more text| 03-11-24
1 |example status on 01/29/23 with more text to emphasize this is a text string| 01-29-23 |example status on 02/06/24 with even more text| 02-06-24 |example status on 03/11/24 with yet again more text| 03-11-24
1 |example status on 01/29/23 with more text to emphasize this is a text string| 01-29-23 |example status on 02/06/24 with even more text| 02-06-24 |example status on 03/11/24 with yet again more text| 03-11-24
2 |example status on 07/17/23 with more text to emphasize this is a text string| 07-17-23 |example status on 12/16/23 with even more text| 12-16-23 |example status on 12/24/23 with yet again more text| 12-24-23
2 |example status on 07/17/23 with more text to emphasize this is a text string| 07-17-23 |example status on 12/16/23 with even more text| 12-16-23 |example status on 12/24/23 with yet again more text| 12-24-23
2 |example status on 07/17/23 with more text to emphasize this is a text string| 07-17-23 |example status on 12/16/23 with even more text| 12-16-23 |example status on 12/24/23 with yet again more text| 12-24-23
3 |example status on 04/26/23 with more text to emphasize this is a text string| 04-26-23 |example status on 05/29/23 with even more text| 05-29-23 |example status on 07/10/23 with yet again more text| 07-10-23
3 |example status on 04/26/23 with more text to emphasize this is a text string| 04-26-23 |example status on 05/29/23 with even more text| 05-29-23 |example status on 07/10/23 with yet again more text| 07-10-23
3 |example status on 04/26/23 with more text to emphasize this is a text string| 04-26-23 |example status on 05/29/23 with even more text| 05-29-23 |example status on 07/10/23 with yet again more text| 07-10-23
;
run;
我知道这对数据来说有点令人困惑,这是我可以根据原始数据集制作数据的最简单的方法。我希望最终产品如下所示:
data test_01;
length status_01 $175;
infile datalines dsd dlm="|" truncover;
input ID Status_01$ date_01 :mmddyy10.;
format date_01 mmddyy10.;
datalines;
1 |example status on 01/29/23 with more text to emphasize this is a text string| 01-29-23
1 |example status on 02/06/24 with even more text| 02-06-24
1 |example status on 03/11/24 with yet again more text| 03-11-24
2 |example status on 07/17/23 with more text to emphasize this is a text string| 07-17-23
2 |example status on 12/16/23 with even more text| 12-16-23
2 |example status on 12/24/23 with yet again more text| 12-24-23
3 |example status on 04/26/23 with more text to emphasize this is a text string| 04-26-23
3 |example status on 05/29/23 with even more text| 05-29-23
3 |example status on 07/10/23 with yet again more text| 07-10-23
;
run;
提前非常感谢!
您可以分两步完成此操作:一次大转置,然后使用该转置的结果来更改数据,使其看起来完全符合您的要求。看起来像这样:
proc transpose data=test out=test_tpose(keep=id _NAME_ date: col1);
by id date:;
var status:;
run;
data want;
set test_tpose;
by id;
date_01 = input(vvaluex(cats('date_', scan(_NAME_, 2, '_'))), mmddyy10.);
rename col1 = status_01;
drop _NAME_;
run;
发生了什么事
第一次转置后,您会得到一个如下所示的表格:
ID date_01 date_02 date_03 _NAME_ COL1
1 1/29/2023 2/6/2024 3/11/2024 status_01 example status on 01/29/23 with ...
1 1/29/2023 2/6/2024 3/11/2024 status_02 example status on 02/06/24 with ...
1 1/29/2023 2/6/2024 3/11/2024 status_03 example status on 03/11/24 with ...
2 7/17/2023 12/16/2023 12/24/2023 status_01 example status on 07/17/23 with ...
2 7/17/2023 12/16/2023 12/24/2023 status_02 example status on 12/16/23 with ...
2 7/17/2023 12/16/2023 12/24/2023 status_03 example status on 12/24/23 with ...
3 4/26/2023 5/29/2023 7/10/2023 status_01 example status on 04/26/23 with ...
3 4/26/2023 5/29/2023 7/10/2023 status_02 example status on 05/29/23 with ...
3 4/26/2023 5/29/2023 7/10/2023 status_03 example status on 07/10/23 with ...
我们需要做的就是将
date_01
的最终值映射为:
status_01
=date_01
status_02
=date_02
status_03
=date_03
我们可以使用
date_XX
函数根据从 status_XX
找到的数字动态选择 vvaluex
列。这总是返回它所提取的值的字符串。我们将获取 status_XX
的数字部分,将其与单词 "date_"
连接起来,将其传递到 vvaluex
,并使用 input
函数将返回值转换为 SAS 日期。
date_01 = input(vvaluex(cats('date_', scan(_NAME_, 2, '_'))), mmddyy10.);
最后,我们重命名变量并删除不再需要的变量。
ID date_01 status_01
1 01/29/2023 example status on 01/29/23 with more text to emphasize this is a text string
1 02/06/2024 example status on 02/06/24 with even more text
1 03/11/2024 example status on 03/11/24 with yet again more text
2 07/17/2023 example status on 07/17/23 with more text to emphasize this is a text string
2 12/16/2023 example status on 12/16/23 with even more text
2 12/24/2023 example status on 12/24/23 with yet again more text
3 04/26/2023 example status on 04/26/23 with more text to emphasize this is a text string
3 05/29/2023 example status on 05/29/23 with even more text
3 07/10/2023 example status on 07/10/23 with yet again more text