Nextflow 根据一个值过滤整个元组

问题描述 投票:0回答:1

需要一些额外的指导来根据一个值过滤整个元组。

我有一个元组通道(reads_ch):

[[id:S1, single_end:false], [/home/ubuntu/S1.R1.fastq.gz, /home/ubuntu/S1.R2.fastq.gz], /home/ubuntu/S1.txt, [PASSED: File S1_R1 is not corrupt., PASSED: File S1_R2 is not corrupt.]]
[[id:S2, single_end:false], [/home/ubuntu/S2.fastq.gz, /home/ubuntu/S2.R2.fastq.gz], /home/ubuntu/S2.txt, [PASSED: File S2_R1 is not corrupt., PASSED: File S2_R2 is not corrupt.]]
[[id:S3, single_end:false], [/home/ubuntu/S3.R1.fastq.gz, /home/ubuntu/S3.R2.fastq.gz], /home/ubuntu/S3.txt, [FAILED, FAILED]]

我正在尝试过滤掉任何未“失败”的样本。我已经清除了它[3],但还没有弄清楚如何删除该样本。同时保持剩余样本的结构(S1,S2)

目标:

[[id:S1, single_end:false], [/home/ubuntu/S1.R1.fastq.gz, /home/ubuntu/S1.R2.fastq.gz], /home/ubuntu/S1.txt, [PASSED: File S1_R1 is not corrupt., PASSED: File S1_R2 is not corrupt.]]
[[id:S2, single_end:false], [/home/ubuntu/S2.fastq.gz, /home/ubuntu/S2.R2.fastq.gz], /home/ubuntu/S2.txt, [PASSED: File S2_R1 is not corrupt., PASSED: File S2_R2 is not corrupt.]]

尝试:

passing = read_ch.map { meta, reads, outcome_file, outcome_status ->
tuple(meta, reads, outcome_file, outcome_status.findAll { !it.contains('FAILED') })
}
passing.view()

[[id:S1, single_end:false], [/home/ubuntu/S1.R1.fastq.gz, /home/ubuntu/S1.R2.fastq.gz], /home/ubuntu/S1.txt, [PASSED: File S1_R1 is not corrupt., PASSED: File S1_R2 is not corrupt.]]
[[id:S2, single_end:false], [/home/ubuntu/S2.fastq.gz, /home/ubuntu/S2.R2.fastq.gz], /home/ubuntu/S2.txt, [PASSED: File S2_R1 is not corrupt., PASSED: File S2_R2 is not corrupt.]]
[[id:S3, single_end:false], [/home/ubuntu/S3.R1.fastq.gz, /home/ubuntu/S3.R2.fastq.gz], /home/ubuntu/S3.txt, []]
filter tuples nextflow
1个回答
0
投票

即使只有一个样本失败,整个通道元素也应该被删除吗?如果这就是您想要的,您将在下面找到解决方案:

准备一个与您类似的假频道:

Channel
  .of([
        [id:'S1', single_end:false],
        [file('/home/ubuntu/S1.R1.fastq.gz'), file('/home/ubuntu/S1.R2.fastq.gz')],
        file('/home/ubuntu/S1.txt'),
        ['PASSED: File S1_R1 is not corrupt.', 'PASSED: File S1_R2 is not corrupt.']
      ],
      [
        [id:'S2', single_end:false],
        [file('/home/ubuntu/S2.fastq.gz'), file('/home/ubuntu/S2.R2.fastq.gz')],
        file('/home/ubuntu/S2.txt'),
        ['PASSED: File S2_R1 is not corrupt.', 'PASSED: File S2_R2 is not corrupt.']
      ],
      [
        [id:'S3', single_end:false],
        [file('/home/ubuntu/S3.R1.fastq.gz'), file('/home/ubuntu/S3.R2.fastq.gz')],
        file('/home/ubuntu/S3.txt'),
        ['FAILED', 'FAILED']
      ])
  .set { my_ch }

过滤:

my_ch
  .filter { it[3].findAll { !it.contains('FAILED') } }
  .view()

输出:

你也可以这样做:

my_ch
  .filter { it[3][0] != 'FAILED' && it[3][1] != 'FAILED' }
  .view()

如果您想要其中一个失败而不是两者都失败,那么可以轻松适应。

© www.soinside.com 2019 - 2024. All rights reserved.