nextflow: Filter outputs of a process

Question

How could I filter outputs of a process in the input of the next process? Filtering works fine in channel, but if I try to filter outputs I got compilation error. I tried as follows:
Channel
.fromFilePairs("${params.dir}*.{bed,bim,fam}",size:3)
.set {data}
    process step1a {
    input:
    tuple val(sample_name), path(bfiles) from data
    output:
    tuple val("${prefix}"), path("${prefix}.{bed,bim,fam}") into (step1a_out,step1a_bim)
    script:
    """
    plink --bfile "${sample_name}" --chr 1-22 --out "${prefix}.step1b" --make-bed
    """
    }
    process step1b {
    input:
    tuple val(bim), path(bims) from step1a_bim.filter{bim, files -> name =~/*bim/}
    tuple val(sample_name), path(bfiles) from step1a_out
    output:
    path("${prefix}.step1b.snplist.txt")
    tuple val("${sample_name}"), path("${sample_name}.step1c.{bed,bim,fam}") into step1b_results
    script:
    """
    awk '{ if (($5=="T" && $6=="A")||($5=="A" && $6=="T")||($5=="C" && $6=="G")||($5=="G" && $6=="C")) print $2, "ambig" ; else print $2 ;}' "${prefix}.step1b.bim" | grep ambig > "${prefix}.step1b.snplist.txt"
    plink --bfile "${prefix}.step1b" --exclude "${prefix}.step1b.snplist.txt" --make-bed --out "${prefix}.step1c"
"""

It gives me
Compilation error
-cause: Unexpected input: '{'
process step1b {

Step1a gives me chr1.step1b.bed|bim|fam, chr2.step1b.bed|bim|fam.......chr22.step1b.bed|bim|fam (22 plink files)
I need to access prefix of those outputs to run plink and other tools to generate chr1.step1c.bed|bim|fam ...chr22.step1c.bed|bim|fam (another 22 plink files). run awk to generate chr1.step1b.snplist.txt .... chr22.step1b.snplist.txt (22 txt files).
Any help to filter the inputs so that I could use only *bim files from step1a_bim? Also how to select prefix of files in the input of processes that are from step1a_out to run plink? Does it make sense now?
Best Reagrds
Zillur

Pallie · Answer

Instead of outputting all files and filtering for only bim, output only bims:
output:
tuple val("${prefix}"), path("${prefix}.{bed,bim,fam}") into (step1a_out,step1a_bim)

Do
output:
tuple val("${prefix}"), path("${prefix}.{bed,bim,fam}") into step1a_out
tuple val("${prefix}"), path("${prefix}.bim") into step1a_bim

So you can skip your complicated filter:
process step1b {
    input:
    tuple val(bim), path(bimfile) from step1a_bim
    output:
    path("${prefix}.step1b.snplist.txt")
tuple val("${sample_name}"), path("${sample_name}.step1c.{bed,bim,fam}") into 
step1b_results
    script:
    """
    awk '{ if ((\$5=="T" && \$6=="A")||(\$5=="A" && \$6=="T")||(\$5=="C" && \$6=="G")||(\$5=="G" && \$6=="C")) print \$2, "ambig" ; else print \$2 ;}' "$bimfile" | grep ambig > "${prefix}.step1b.snplist.txt"
    plink --bfile "${prefix}.step1b" --exclude "${prefix}.step1b.snplist.txt" --make-bed --out "${prefix}.step1c"

"""

```

nextflow: Filter outputs of a process

One Answer

Add your own answers!

Ask a Question