我有一个
trim
函数,有时会在 awk
中使用,但对于大输入来说有点慢:
#!/bin/bash
time {
yes $'\t Lorem ipsum dolor sit amet consectetur adipiscing elit. Duis dapibus rutrum facilisis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Etiam tristique libero eu nibh porttitor amet fermentum.\t \r' |
head -n 1000000 |
awk '
{ trim($0) }
function trim(string) {
gsub(/^[ \t\r]+|[ \t\r]+$/, "", string);
return string
}
'
}
real 0m9.074s
user 0m9.179s
sys 0m0.381s
如何加快速度?
根据经验,正则表达式通常比循环字符串的字符慢,但根据您之后执行的操作,
awk
中的情况可能并非如此。对于特定的 trim
函数,您可以通过仅循环遍历字符串开头和结尾的强制字符来避免一些处理:
#!/bin/bash
time {
yes $'\t Lorem ipsum dolor sit amet consectetur adipiscing elit. Duis dapibus rutrum facilisis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Etiam tristique libero eu nibh porttitor amet fermentum.\t \r' |
head -n 1000000 |
awk '
BEGIN {
to_trim[" "]
to_trim["\t"]
to_trim["\r"]
}
{ trim($0, to_trim) }
function trim(string,chars, i,j) {
i = 1
while (substr(string,i,1) in chars) {++i}
string = substr(string,i)
j = length(string)
while (substr(string,j,1) in chars) {--j}
return substr(string, 1, j)
}
'
}
注意:修改了函数原型
real 0m1.954s
user 0m2.063s
sys 0m0.281s