我有一个包含数十万条记录的文本文件。其中一个字段是日期字段。 有没有办法根据日期字段对文件进行排序?
09-APR-12 04.08.43.632279000 AM
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
19-MAR-12 03.54.32.595348000 PM
27-MAR-12 10.28.14.797580000 AM
28-MAR-12 12.28.02.652969000 AM
27-MAR-12 07.28.02.828746000 PM
输出应为
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.54.32.595348000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
27-MAR-12 10.28.14.797580000 AM
27-MAR-12 07.28.02.828746000 PM
28-MAR-12 12.28.02.652969000 AM
09-APR-12 04.08.43.632279000 AM
我尝试了排序命令来对日期进行排序(将日期字段作为字符串),但它没有给出正确的输出。
Chronicle 的解决方案很接近,但忽略了 AM/PM 的区别,将
27-MAR-12 07.28.02.828746000 PM
排序在 27-MAR-12 10.28.14.797580000 AM
之前。可以修改:
sort -t- -k 3.1,3.2 -k 2M -k 1n -k 3.23,3.24
但这仍然非常脆弱。将日期转换为纪元时间并进行数字比较会更好。
试试这个:
输入.txt
09-APR-12 04.08.43.632279000 AM
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
19-MAR-12 03.54.32.595348000 PM
27-MAR-12 10.28.14.797580000 AM
28-MAR-12 12.28.02.652969000 AM
27-MAR-12 07.28.02.828746000 PM
代码
sort -t "-" -k 3 -k 2M -nk 1 Input.txt
输出
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.54.32.595348000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
27-MAR-12 07.28.02.828746000 PM
27-MAR-12 10.28.14.797580000 AM
28-MAR-12 12.28.02.652969000 AM
09-APR-12 04.08.43.632279000 AM
此脚本按纳秒分辨率的纪元时间排序:
awk '{
t = gensub(/\.([0-9]{2})\./, ":\\1:", 1, $0);
command = "date +%s%N -d \x022" t "\x022";
command | getline t;
close(command);
print t, $0;
}' unsorted.txt | sort -n -k 1 | cut -d ' ' -f 2- > sorted.txt
您可以使用日期,这通常可能是一个不错的主意,特别是如果您不需要担心微秒,否则您可能可以剪掉微秒并将其作为辅助排序字段进行排序。
while read a; do
grep "^${a}" input.txt;
done < <(sed 's/\./:/;s/\./:/' input.txt | xargs -n3 -I{} date -d"{}" +%s | sort | xargs -n1 -I{} date -d @'{}' +'%d-%^h-%y %I.%M.%S')
使用任何 awk、任何排序和任何剪切应用装饰-排序-取消装饰习惯用法:
$ awk -F',' -v OFS='\t' '{
split($NF,t,/[- ]/)
mthNr = (index("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",t[2])+2)/3
printf "%02d%02d%02d%s%s\t%s\n", t[3], mthNr, t[1], t[5], t[4], $0
}' file | sort -k1,1 | cut -f2-
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.54.32.595348000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
27-MAR-12 10.28.14.797580000 AM
27-MAR-12 07.28.02.828746000 PM
28-MAR-12 12.28.02.652969000 AM
09-APR-12 04.08.43.632279000 AM
如果您不确定它是如何工作的,请查看 awk 命令的输出,该命令将键时间戳添加到(装饰)输入,以便在 cut(取消装饰)再次删除它之前进行排序操作:
$ awk -F',' -v OFS='\t' '{
split($NF,t,/[- ]/)
mthNr = (index("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",t[2])+2)/3
printf "%02d%02d%02d%s%s\t%s\n", t[3], mthNr, t[1], t[5], t[4], $0
}' file
120409AM04.08.43.632279000 09-APR-12 04.08.43.632279000 AM
120319PM03.53.38.189606000 19-MAR-12 03.53.38.189606000 PM
120319PM03.56.27.933365000 19-MAR-12 03.56.27.933365000 PM
120319PM04.00.13.387316000 19-MAR-12 04.00.13.387316000 PM
120319PM04.04.45.168361000 19-MAR-12 04.04.45.168361000 PM
120319PM03.54.32.595348000 19-MAR-12 03.54.32.595348000 PM
120327AM10.28.14.797580000 27-MAR-12 10.28.14.797580000 AM
120328AM12.28.02.652969000 28-MAR-12 12.28.02.652969000 AM
120327PM07.28.02.828746000 27-MAR-12 07.28.02.828746000 PM
并注意它会按照所需的顺序按字母顺序排序。