BUSCO——Benchmarking Universal Single-Copy Orthologs 广泛通用的单拷贝直系同源测试,用于评估基因组组装和注释完备性的一个软件。
其流程是:
genoem assemble | tBLASTn --> Augustus --> HMMER3
Transcriptome | Find ORF --> HMMER3
Gene set | HMMER3
下载安装
# 构建conda的python3情况conda create --name busco-py3.7 python=3.7# 然后激活conda activate busco-py3.7# 实行安装conda install busco使用
阐明书如下:
usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]-i FASTA FILE, --in FASTA FILE #序列文件(FASTA格式),可以是组装好的基因组、转录组、卵白质组-c N, --cpu N # 指定线程-o OUTPUT, --out OUTPUT # 输出文件的名称,不加路径--out_path OUTPUT_PATH #输出文件的路径(默认当前路径)-e N, --evalue N # 为BLAST的E-value cutoff (格式:0.001 or 1e-03;默认 1e-03)-m MODE, --mode MODE # geno/genome;tran/transcriptome;prot/proteins-l LINEAGE, --lineage_dataset LINEAGE # 指定要用的BUSCO lineage(数据库文件夹)-f, --force # 存在文件的欺压重写。当输出文件名称已存在时使用-r, --restart # 继承一个有部分已完成的run--limit REGION_LIMIT # 每次BUSCO思量的候选regions(contig or transcript)数 (默认 3)--augustus_species AUGUSTUS_SPECIES # 指定一个物种用于Augustus training.--auto-lineage # 跑auto-lineage找到符合的lineage path--offline To indicate that BUSCO cannot attempt to download files--config CONFIG_FILE # 提供一个config file-v, --version # 检察版本-h, --help # 检察资助信息--list-datasets #打印可用的BUSCO datasets语法:
busco -i test.fa -c 8 -o test -m genome -l eudicots_odb10 > output.txt得到的结果如:
C:98.1%[S:95.1%,D:3.0%],F:0.6%,M:1.3%,n:2326
2280 Complete BUSCOs (C)
2211 Complete and single-copy BUSCOs (S)
69 Complete and duplicated BUSCOs (D)
14 Fragmented BUSCOs (F)
32 Missing BUSCOs (M)
2326 Total BUSCO groups searched
画图
可以用generate_plot.py 画图(多物种的情况下比力好)
阐明书:
usage: python3 generate_plot.py -wd [WORKING_DIRECTORY] [OTHER OPTIONS]BUSCO plot generation tool.Place all BUSCO short summary files (short_summary.[generic|specific].dataset.label.txt) in a single folder. It will be your working directory, in which the generated plot files will be writtenSee also the user guide for additional informationrequired arguments: -wd PATH, --working_directory PATH Define the location of your working directoryoptional arguments: -rt RUN_TYPE, --run_type RUN_TYPE type of summary to use, `generic` or `specific` --no_r To avoid to run R. It will just create the R script file in the working directory -q, --quiet Disable the info logs, displays only errors -h, --help Show this help message and exit须要把全部的颠末BUSCO检测的结果聚集到一个文件夹之内
mkdir my_summariescp run_SPEC1/short_summary_SPEC1.txt my_summaries/.cp run_SPEC2/short_summary_SPEC2.txt my_summaries/.cp run_SPEC3/short_summary_SPEC3.txt my_summaries/.cp run_SPEC4/short_summary_SPEC4.txt my_summaries/.cp run_SPEC5/short_summary_SPEC5.txt my_summaries/.python scripts/generate_plot.py –wd my_summaries |