下面是我的源数据,我需要根据一些条件更新
ASOF_DATE
字段。
来源数据:
ASOF_DATE,CUSIP,Current Face,Market Value
'04/11/2024',BENRDUZU0,-400000000
'04/11/2024',BENRDUZR7,-300000000
'04/11/2024',BENRE4H37,-225000000
'04/11/2024',BENRDUYW7,-250000000
要求:
ASOF_DATE
是否是该月的最后一天;如果是,请不要更新 ASOF_DATE 字段列。last_day
(04/11/2024) 值并减去 1,如果结果日期是星期五,则使用 ASOF_DATE
(04/11/2024) 更新 last_day
字段值。last_day
(04/11/2024) 值并减去 2,如果结果日期是星期五,则使用 ASOF_DATE
(04/11/2024) 更新 last_day
字段)值。ASOF_DATE
列。注意:
ASOF_DATE
的格式应为 MM/DD/yyyy
我遇到的问题: 以下脚本不会根据上述条件更新 ASOF_DATE 列,并且 CSV 文件以逗号分隔。 shell脚本在执行后会跳过源文件中的逗号。
期望:
对于上述场景,我需要脚本来检查
ASOF_DATE
列并根据指定的条件更新字段。
我尝试过的脚本:
#!/bin/bash
# Input CSV file
csv_file="your_file.csv"
# Process CSV file and update ASOF_DATE column
awk -F',' '{
if (NR == 1) {
print $0 # Print header
} else {
asof_date=$1 # Get ASOF_DATE from first column
last_day=$(date -d "$asof_date +1 month -1 day" +"%m/%d/%Y") # Calculate last day of month of ASOF_DATE
if (asof_date != last_day && date -d "$last_day -1 day" +"%A" == "Friday") {
$1=last_day # Update ASOF_DATE column with last day of month
} else {
$1=$(date +"%m/%d/%Y") # Update ASOF_DATE column with current date
}
print $0 # Print updated row
}
}' "$csv_file" > temp.csv && mv temp.csv "$csv_file"
上述脚本的输出如下。数据变得重复,更新后源计数发生变化,并且值之间缺少逗号。
ASOF_DATE,CUSIP,Current Face
'04/11/2024' BENRDUZU0 -400000000
'04/11/2024' BENRDUZR7 -300000000
'04/11/2024' BENRE4H37 -225000000
'04/11/2024' BENRDUYW7 -250000000
'04/11/2024' BENRDUZU0 -400000000
'04/11/2024' BENRDUZR7 -300000000
'04/11/2024' BENRE4H37 -225000000
'04/11/2024' BENRDUYW7 -250000000
不同类型的样本输入和预期输出分别:
ASOF_DATE,CUSIP,Current Face,Market Value
'04/11/2024',BENRDUZU0,-400000000
'04/11/2024',BENRDUZR7,-300000000
'04/11/2024',BENRE4H37,-225000000
'04/11/2024',BENRDUYW7,-250000000
上述输入的示例输出:
ASOF_DATE,CUSIP,Current Face,Market Value
04/11/2024,BENRDUZU0,-400000000
04/11/2024,BENRDUZR7,-300000000
04/11/2024,BENRE4H37,-225000000
04/11/2024,BENRDUYW7,-250000000
ASOF_DATE,CUSIP,Current Face,Market Value
'28/06/2024',BENRDUZU0,-400000000
'28/06/2024',BENRDUZR7,-300000000
'28/06/2024',BENRE4H37,-225000000
'28/06/2024',BENRDUYW7,-250000000
上述输入的预期输出:
ASOF_DATE,CUSIP,Current Face,Market Value
30/06/2024,BENRDUZU0,-400000000
30/06/2024,BENRDUZR7,-300000000
30/06/2024,BENRE4H37,-225000000
30/06/2024,BENRDUYW7,-250000000
ASOF_DATE,CUSIP,Current Face,Market Value
30/08/2024,BENRDUZU0,-400000000
30/08/2024,BENRDUZR7,-300000000
30/08/2024,BENRE4H37,-225000000
30/08/2024,BENRDUYW7,-250000000
预期输出:
ASOF_DATE,CUSIP,Current Face,Market Value
31/08/2024,BENRDUZU0,-400000000
31/08/2024,BENRDUZR7,-300000000
31/08/2024,BENRE4H37,-225000000
31/08/2024,BENRDUYW7,-250000000
这里开始使用 GNU awk 来实现时间函数:
$ cat tst.sh
#!/usr/bin/env bash
awk '
BEGIN { FS=OFS="," }
NR > 1 {
old_date = gensub(/\047/,"","g",$1)
new_date = get_ltm_date(old_date)
fmt = gensub(/[^\047]+/,"%s","g",$1)
$1 = sprintf(fmt,new_date)
}
{ print }
function dmy2iso(dmy_date, iso_date, d) {
split(dmy_date,d,/[^0-9]/)
iso_date = sprintf("%04d-%02d-%02d", d[3], d[2], d[1])
return iso_date
}
function get_ltm_date(dmy_old_date ,iso_old_date,d,fnm_date,fnm_secs,ltm_date,ltm_secs) {
# FNM = First of Next Month
# LTM = Last of This Month
iso_old_date = dmy2iso(dmy_old_date)
split(dmy2iso(iso_old_date),d,"-")
if ( d[2] == 12 ) { fnm_date = (d[3]+1) " " 1 " 1" }
else { fnm_date = d[3] " " (d[2]+1) " 1" }
fnm_secs = mktime(fnm_date" 12 0 0")
ltm_secs = fnm_secs - (24 * 60 * 60)
ltm_date = strftime("%F", ltm_secs)
return ltm_date
}
' "${@:--}"
$ ./tst.sh input.csv
ASOF_DATE,CUSIP,Current Face,Market Value
'2024-11-30',BENRDUZU0,-400000000
'2024-11-30',BENRDUZR7,-300000000
'2024-11-30',BENRE4H37,-225000000
'2024-11-30',BENRDUYW7,-250000000
它输出 ISO 8601 日期,因为您确实应该使用这些日期,但您只需按照代码中的示例将其更改为您喜欢的任何其他格式就很简单了。
它不会尝试处理“如果日期是星期五”等条件,因为问题中的示例输入/输出过于分散,无法简单地复制/粘贴来进行测试,但这对您来说是微不足道的使用
strftime("%a",ltm_secs)
来增强它。