我期待构建一个使用合并函数基于公共值合并数据集的系统。
但是我知道有时候我必须合并2个数据集,有时候我必须合并20个数据集。
所以泛型合并函数如下所示
DATA [data_set];
merge
[loop over data to be merged]
by [factors which I am merging by]
if [a and b]
format [sort order]
run;
问题是,如果[a和b]显然需要生成一个字符串,其长度等于要合并的表的数量。如果我想合并2个表[a和b]很好但是如果我想合并3个表,它必须是[a和b和c]。
有没有办法让我生成一个字符串[a和b和.... N],根据全局变量的长度生成?
希望我的问题很明确,我无法提供我正在使用的实际代码,因为它包含敏感信息。如果有错过的话,我会尽力提供更多的信息/答案问题。
使用@Richard的makedata函数
%let MergeData = X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
DATA X_COMB;
merge
%internalmacro1(
%nrstr(
#L1#;
)
,L1 = &MergeData.
);
by id;
if ; /* create a string based on %&MergeData length, of # and # and # ... where # = a,b,c,d */
run;
但是如果存在的话,我想要%MergeData集合中的项目数量,我想我需要创建一个数组并转换为十六进制的值可能吗? 'a b c'等必须有一些等效的十六进制值。
问题是人们要在集合和MergeData中添加和删除项目,因此我尝试创建的合并需要扩展到正在输入的数据集的大小?对不起,我不能提供更多!
罗茜方法:
proc import
datafile = "FilePath\Alphabet.csv"
DBMS = csv
OUT = AlphabetConversion;
;
data desiredstatements;
set AlphabetConversion (obs=26); /*This limits the observations used*/
run;
proc sql;
select AlphabetConversion into :dynamiccode from desiredstatements separated by " and ";
quit;
%put &dynamiccode.; /*Check the log to see what you got and make sure it's the code you want */
昆汀方法:
DATA Merged_Data
merge
mydata1 (in=_mydata1) mydata2(in=_mydata2) mydata3 (in=_mydata3);
by ID;
if _mydata1 and _mydata2 and _mydata3;
这种结构适用于合并,您可以指定所有输入。我的问题是我正在尝试编写一个宏,它有时会采用mydata1和mydata2,而mydata1-mydata20则需要。我不知道如何制作if _mydata1和_mydata2 .... _mydata20,当它有20个数据集要合并时,_mydata1和_mydata2只有两个。
也许你可以根据这些数据得到一些示例代码
%macro makedata;
%local i;
data %do i = 1 %to 10; x&i(keep=id x&i) %end;;
do id = 1 to 42;
array x(10);
do _n_ = 1 to dim(x);
x(_n_) = id * 100 + _n_;
end;
output;
end;
run
%mend;
%makedata;
data want;
merge ... fill in the rest ...;
... fill in the rest ...
run;
给出了一堆数据集,如:
data mydata1 ;
input id x1 ;
cards ;
1 10
2 20
3 30
;
data mydata2 ;
input id x2 ;
cards ;
1 100
3 300
;
data mydata3 ;
input id x3 ;
cards ;
1 1000
2 2000
3 3000
;
您可以将它们合并在一起并仅保留所有三个数据集之间匹配的记录,如:
data all ;
merge
mydata1 (in=_mydata1)
mydata2 (in=_mydata2)
mydata3 (in=_mydata3)
;
by id ;
if _mydata1 and _mydata2 and _mydata3 ;
run ;
如果你看一下上面的步骤,很明显有两个列表。合并语句中的数据集列表,以及IF语句上的变量列表。您可以使用宏语言生成该步骤。当您调用宏时,您将向其传递要合并的数据集列表。然后宏将生成合并的DATA步骤。
这是一个宏,它使用宏循环来生成两个列表:
%macro innermerge
(data= /*space-delimited list of data sets to be merged*/
,by= /*space-delimited list of BY variables for merge*/
,out= /*output data set*/
)
;
%local i data_i ;
data &out ;
merge
%do i=1 %to %sysfunc(countw(&data,%str( ))) ;
%let data_i=%scan(&data,&i,%str( )) ;
&data_i (in=_&data_i)
%end ;
;
by &by ;
if
%do i=1 %to %sysfunc(countw(&data,%str( ))) ;
%let data_i=%scan(&data,&i,%str( )) ;
%if &i>1 %then %do ;
and
%end ;
_&data_i
%end ;
;
run ;
%mend;
使用如下:
%innermerge
(data=mydata1 mydata2
,by=id
,out=want
)
MPRINT(INNERMERGE): data want ;
MPRINT(INNERMERGE): merge mydata1 (in=_mydata1) mydata2 (in=_mydata2) ;
MPRINT(INNERMERGE): by id ;
MPRINT(INNERMERGE): if _mydata1 and _mydata2 ;
MPRINT(INNERMERGE): run ;
NOTE: There were 3 observations read from the data set WORK.MYDATA1.
NOTE: There were 2 observations read from the data set WORK.MYDATA2.
NOTE: The data set WORK.WANT has 2 observations and 3 variables.
%innermerge
(data=mydata1 mydata2 mydata3
,by=id
,out=want
)
MPRINT(INNERMERGE): data want ;
MPRINT(INNERMERGE): merge mydata1 (in=_mydata1) mydata2 (in=_mydata2) mydata3 (in=_mydata3) ;
MPRINT(INNERMERGE): by id ;
MPRINT(INNERMERGE): if _mydata1 and _mydata2 and _mydata3 ;
MPRINT(INNERMERGE): run ;
NOTE: There were 3 observations read from the data set WORK.MYDATA1.
NOTE: There were 2 observations read from the data set WORK.MYDATA2.
NOTE: There were 3 observations read from the data set WORK.MYDATA3.
NOTE: The data set WORK.WANT has 2 observations and 4 variables.
你接近我之前给你的代码的尝试,但是一个关键的事情(除了我给你错误的SQL语句顺序 - 抱歉,修复如下)是你正在阅读CSV和假设它有标题,而我宁愿在代码中定义变量。然后在命名proc SQL
中的变量时命名数据集名称 - 试试这个:
/*File must have one letter pet row*/
%let file=\\hefce-sas\nuser\user\thomaro\SAS\Temp\Alphabet.csv;
data AlphabetConversion;
infile "&file."
delimiter=',' missover dsd;
format letter $1.;
input letter $;
run;
/*I've explicitly defined this so it runs but this needs to be dynamic - see below*/
%let numstatements = 5;
data desiredstatements;
set AlphabetConversion (obs=&numstatements.); /*This limits the observations used*/
run;
proc sql;
select letter into :dynamiccode separated by ' and ' from desiredstatements;
quit;
/*Check the log to see what you got and make sure it's the code you want */
%put &dynamiccode.;
此构造通常非常有用,可用于动态创建代码。
现在有一个动态定义numstatements
的问题 - 这应该基于代码顶部的行:
%let MergeData = X1 X2 X3 X4 X5 X6 X7 X8 X9 X10;
所以我建议你想要一个计算单词的宏函数 - 没有其中一个,但是你可以使用countw
将正常函数%sysfunc()
变成一个宏函数 - 所以你想要用以下代码替换上面的%let
:
%let numstatements = %sysfunc(countw(&mergedata.));
%put The number of datasets to be merged is &numstatements.;
现在(如果我已经正确地理解了你的问题)当你想用不同数量的数据集运行代码时,你需要做的就是在代码的顶部更改你的%let MergeData=
,你就可以了。
如果我理解了你的问题,以下是否能实现你想要的目标?:
%let ds_list = a b c d;
%let ds_and_ds = %sysfunc(tranwrd(&ds_list,%str( ),%str( and )));
%put ds_list = &ds_list;
%put ds_and_ds = &ds_and_ds;
如果没有,那么请举例说明要在每个数据集之间插入“和”的数据集列表。