您好,很抱歉标题名称很长!我正在处理一些具有长文本字符串的数据(某些观察结果最多有 2000 个字符)。这些字符串中可能有一个单词 (AB/CD),该单词可能位于字符串中的任何位置。我正在尝试检测文本字符串中的 AB/CD,并在该单词出现在文本中时创建一个二进制变量 (ABCD_present)。
以下是一些示例数据
data test;
length status $175;
infile datalines dsd dlm="|" truncover;
input ID Status$;
datalines;
1|This is example text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data AB/CD
2|This is example AB/CD text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data
3|This is example text I am using instead of real data. I AB/CD am making the length of this text longer to mimic the long text strings of my data
4|This is example text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data
5|This is example text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data
6|This is example text I am using instead of real data. I am making the length of this text longer to AB/CD mimic the long text strings of my data
;
run;
任何关于这方面的指导都会很好!我没有太多使用长文本字符串的经验。
提前谢谢您
您可以使用
find
功能。
data want;
set test;
flag_abcd = (find(status, 'AB/CD') > 0);
run;
Status ID flag_abcd
... 1 1
... 2 1
... 3 1
... 4 0
... 5 0
... 6 1