如何用两个不同的分隔符排列一行中的项目?

问题描述 投票:0回答:1

我的数据集(

dataset.csv
)的结构如下:

NFDI4Culture, NFDI4Memory, NFDI4Objects, Text+;Alliance
BERD@NFDI, Text+;Event: conference organisation
NFDI4Cat, NFDI4Chem, FAIRmat, DAPHNE4NFDI;Event: conference participation
PUNCH4NFDI, FAIRmat, DAPHNE4NFDI;Event: conference participation

每行至少包含两个由

,
分隔的值,并且在行末尾(由
;
分隔)有之前项目的分类。

所需的结果应如下所示(列以 tab 分隔):

NFDI4Culture  NFDI4Memory  Alliance
NFDI4Culture  NFDI4Objects  Alliance
NFDI4Culture  Text+  Alliance
NFDI4Memory  NFDI4Objects  Alliance
NFDI4Memory  Text+  Alliance
NFDI4Objects  Text+  Alliance
BERD@NFDI  Text+  Event: conference organisation
NFDI4Cat  NFDI4Chem  Event: conference organisation
NFDI4Cat  FAIRmat  Event: conference organisation
NFDI4Cat  DAPHNE4NFDI  Event: conference organisation
NFDI4Chem  FAIRmat  Event: conference organisation
NFDI4Chem  DAPHNE4NFDI  Event: conference organisation
FAIRmat  DAPHNE4NFDI  Event: conference organisation
PUNCH4NFDI  FAIRmat  Event: conference organisation
PUNCHE4NFDI  DAPHNE4NFDI  Event: conference organisation
FAIRmat  DAPHNE4NFDI  Event: conference organisation

需要对数据集中由

,
分隔的每个项目进行排列,并且另外在末尾应打印分类。

通过回收一些旧代码,我几乎得到了所需的输出:

while IFS=\,  read -r -a names; do
        for ((i = 0; i < ${#names[@]} - 1; ++i)); do
            for ((j = i + 1; j < ${#names[@]}; ++j)); do
                echo -e "${names[i]}\t${names[j]}"
            done
        done
done < $dataset

结果是这样的:

NFDI4Culture    NFDI4Memory
NFDI4Culture    NFDI4Objects
NFDI4Culture    Text+;Alliance
NFDI4Memory NFDI4Objects
NFDI4Memory Text+;Alliance
NFDI4Objects    Text+;Alliance
BERD@NFDI   Text+;Event:conferenceorganisation
NFDI4Cat    NFDI4Chem
NFDI4Cat    FAIRmat
NFDI4Cat    DAPHNE4NFDI;Event:conferenceparticipation
NFDI4Chem   FAIRmat
NFDI4Chem   DAPHNE4NFDI;Event:conferenceparticipation
FAIRmat DAPHNE4NFDI;Event:conferenceparticipation
PUNCH4NFDI  FAIRmat
PUNCH4NFDI  DAPHNE4NFDI;Event:conferenceparticipation
FAIRmat DAPHNE4NFDI;Event:conferenceparticipation

问题是分类,显然没有每行打印。

bash permutation
1个回答
0
投票

一般方法:

  • 拆分
    ;
    上的每一行并保存为变量
    vals
    category
  • vals
    上拆分
    ,
    并保存在
    arr[]
    数组中
  • arr[]
    条目中删除前导/尾随空格
  • 使用 2 个
    for
    循环来生成排列

一个

bash
想法:

while IFS=';' read -r vals category                 # split on ";"
do
    IFS=, read -ra arr <<< "${vals}"                # split on ","
    count="${#arr[@]}"

    for ((i=0; i<count; i++))
    do
        arr[$i]="${arr[$i]## }"                     # remove leading spaces
        arr[$i]="${arr[$i]%% }"                     # remove trailing spaces
    done

    for ((i=0; i<(count-1); i++))
    do
        for ((j=i+1; j<count; j++))
        do
            printf "%s\t%s\t%s\n" "${arr[$i]}" "${arr[$j]}" "${category}"
        done
    done
done < dataset.csv

这会生成:

NFDI4Culture    NFDI4Memory     Alliance
NFDI4Culture    NFDI4Objects    Alliance
NFDI4Culture    Text+   Alliance
NFDI4Memory     NFDI4Objects    Alliance
NFDI4Memory     Text+   Alliance
NFDI4Objects    Text+   Alliance
BERD@NFDI       Text+   Event: conference organisation
NFDI4Cat        NFDI4Chem       Event: conference participation
NFDI4Cat        FAIRmat Event: conference participation
NFDI4Cat        DAPHNE4NFDI     Event: conference participation
NFDI4Chem       FAIRmat Event: conference participation
NFDI4Chem       DAPHNE4NFDI     Event: conference participation
FAIRmat DAPHNE4NFDI     Event: conference participation
PUNCH4NFDI      FAIRmat Event: conference participation
PUNCH4NFDI      DAPHNE4NFDI     Event: conference participation
FAIRmat DAPHNE4NFDI     Event: conference participation
© www.soinside.com 2019 - 2024. All rights reserved.