.marked.bam
)并将其放入目录结构。
代码2annotated.hg38_multianno.txt
)下游生成的文件,我需要在这些带注释的文件中使用相应的示例名称填充Tumor_Sample_Barcode
列。
目前,
Tumor_Sample_Barcode
annovar_output
)而不是从
.marked.bam
文件名提取的实际示例名称。
目标:填充Tumor_Sample_Barcode
annotated.hg38_multianno.txt
.marked.bam
代码1的正确示例名称的正确示例名称。 示例输入数据框架
Sample1,Polyp
Sample2,Normal
Sample3,Polyp
BAM files:
Sample1.marked.bam
Sample2.marked.bam
Sample3.marked.bam
sample名称是从 * .marked.bam中的 *中得出的。例如,sample1.marked.bam具有示例名称“ sample1”。
既定的输出:
对于每个注释。
列a
tumor_sample_barcode
.marked.bam
文件代码2:处理注释文件细胞2 | ||
---|---|---|
# Extract sample names from .marked.bam files
for bam_file in "$OUTPUT_DIR"/*.marked.bam; do
sample_name=$(basename "$bam_file" | sed 's/\.marked\.bam$//')
echo "Processing BAM file for sample: $sample_name"
# Processing and generating annotated files for each sample...
done
|
# For each annotated file, extract the sample name from its parent directory
for ANNOTATED_FILE in $(find "$ANNOVAR_OUTPUT_DIR" -name "annotated.hg38_multianno.txt"); do
SAMPLE_NAME=$(basename "$(dirname "$ANNOTATED_FILE")")
# Populate "Tumor_Sample_Barcode" column
awk -v OFS='\t' -v sample_name="$SAMPLE_NAME" '
NR == 1 {
# Add column if not present
col = -1
for (i = 1; i <= NF; i++) if ($i == "Tumor_Sample_Barcode") col = i
if (col == -1) {
print $0, "Tumor_Sample_Barcode"
col = NF + 1
} else {
print $0
}
}
NR > 1 {
$col = sample_name
print $0
}
' "$ANNOTATED_FILE" > "${ANNOTATED_FILE}.temp"
mv "${ANNOTATED_FILE}.temp" "$ANNOTATED_FILE"`
done
可能的原因:您的尴尬脚本的前肢线是:
mv "${ANNOTATED_FILE}.temp" "$ANNOTATED_FILE"`
您可以做的第一件事是破坏工具链的错误处理
.sh
让您认为(未触及的,仍然包含占位符)注释。Hg38_multianno.txt是您尴尬运行的结果。
…一旦完成,当然,您只需删除那个未挂起的回音即可释放您的链条。