我的数据集(
dataset.csv
)的结构如下:
NFDI4Culture, NFDI4Memory, NFDI4Objects, Text+;Alliance
BERD@NFDI, Text+;Event: conference organisation
NFDI4Cat, NFDI4Chem, FAIRmat, DAPHNE4NFDI;Event: conference participation
PUNCH4NFDI, FAIRmat, DAPHNE4NFDI;Event: conference participation
每行至少包含两个由
,
分隔的值,并且在行末尾(由 ;
分隔)有之前项目的分类。
所需的结果应如下所示(列以 tab 分隔):
NFDI4Culture NFDI4Memory Alliance
NFDI4Culture NFDI4Objects Alliance
NFDI4Culture Text+ Alliance
NFDI4Memory NFDI4Objects Alliance
NFDI4Memory Text+ Alliance
NFDI4Objects Text+ Alliance
BERD@NFDI Text+ Event: conference organisation
NFDI4Cat NFDI4Chem Event: conference organisation
NFDI4Cat FAIRmat Event: conference organisation
NFDI4Cat DAPHNE4NFDI Event: conference organisation
NFDI4Chem FAIRmat Event: conference organisation
NFDI4Chem DAPHNE4NFDI Event: conference organisation
FAIRmat DAPHNE4NFDI Event: conference organisation
PUNCH4NFDI FAIRmat Event: conference organisation
PUNCHE4NFDI DAPHNE4NFDI Event: conference organisation
FAIRmat DAPHNE4NFDI Event: conference organisation
需要对数据集中由
,
分隔的每个项目进行排列,并且另外在末尾应打印分类。
通过回收一些旧代码,我几乎得到了所需的输出:
while IFS=\, read -r -a names; do
for ((i = 0; i < ${#names[@]} - 1; ++i)); do
for ((j = i + 1; j < ${#names[@]}; ++j)); do
echo -e "${names[i]}\t${names[j]}"
done
done
done < $dataset
结果是这样的:
NFDI4Culture NFDI4Memory
NFDI4Culture NFDI4Objects
NFDI4Culture Text+;Alliance
NFDI4Memory NFDI4Objects
NFDI4Memory Text+;Alliance
NFDI4Objects Text+;Alliance
BERD@NFDI Text+;Event:conferenceorganisation
NFDI4Cat NFDI4Chem
NFDI4Cat FAIRmat
NFDI4Cat DAPHNE4NFDI;Event:conferenceparticipation
NFDI4Chem FAIRmat
NFDI4Chem DAPHNE4NFDI;Event:conferenceparticipation
FAIRmat DAPHNE4NFDI;Event:conferenceparticipation
PUNCH4NFDI FAIRmat
PUNCH4NFDI DAPHNE4NFDI;Event:conferenceparticipation
FAIRmat DAPHNE4NFDI;Event:conferenceparticipation
问题是分类,显然没有每行打印。
一般方法:
;
上的每一行并保存为变量 vals
和 category
vals
上拆分 ,
并保存在 arr[]
数组中arr[]
条目中删除前导/尾随空格for
循环来生成排列一个
bash
想法:
while IFS=';' read -r vals category # split on ";"
do
IFS=, read -ra arr <<< "${vals}" # split on ","
count="${#arr[@]}"
for ((i=0; i<count; i++))
do
arr[$i]="${arr[$i]## }" # remove leading spaces
arr[$i]="${arr[$i]%% }" # remove trailing spaces
done
for ((i=0; i<(count-1); i++))
do
for ((j=i+1; j<count; j++))
do
printf "%s\t%s\t%s\n" "${arr[$i]}" "${arr[$j]}" "${category}"
done
done
done < dataset.csv
这会生成:
NFDI4Culture NFDI4Memory Alliance
NFDI4Culture NFDI4Objects Alliance
NFDI4Culture Text+ Alliance
NFDI4Memory NFDI4Objects Alliance
NFDI4Memory Text+ Alliance
NFDI4Objects Text+ Alliance
BERD@NFDI Text+ Event: conference organisation
NFDI4Cat NFDI4Chem Event: conference participation
NFDI4Cat FAIRmat Event: conference participation
NFDI4Cat DAPHNE4NFDI Event: conference participation
NFDI4Chem FAIRmat Event: conference participation
NFDI4Chem DAPHNE4NFDI Event: conference participation
FAIRmat DAPHNE4NFDI Event: conference participation
PUNCH4NFDI FAIRmat Event: conference participation
PUNCH4NFDI DAPHNE4NFDI Event: conference participation
FAIRmat DAPHNE4NFDI Event: conference participation