快速、准确地检测和分类病毒序列分析工具 ViralCC的介绍和详细使用方法, 附带应用脚本
介绍
viralcc是一个基因组病毒分析工具,可以用于快速、准确地检测和分类病毒序列。
github:dyxstat/ViralCC: ViralCC: leveraging metagenomic proximity-ligation to retrieve complete viral genomes (github.com)
Instruction of reproducing results in ViralCC paper:dyxstat/Reproduce_ViralCC: Instruction of reproducing results in ViralCC paper (github.com)
安装viralcc:
首先,确保你已经安装了Python 3.6或更高版本。
从GitHub上下载viralcc的代码。在终端中输入以下命令:
git clone https://github.com/dyxstat/ViralCC.git
进入viralcc文件夹:
cd viralcc
建议使用mamba 或 conda 直接安装吧:
#安装前先修改配置文件viralcc_linux_env.yaml,将环境名称修改为自己想要的
#其他的东西不要动name: viralcc //修改这个就行了,原来为ViralCC_ENV
channels:- bioconda- conda-forge- defaults- r
dependencies:- _libgcc_mutex=0.1- _openmp_mutex=4.5- _r-mutex=1.0.1- binutils_impl_linux-64=2.35.1- binutils_linux-64=2.35- biopython=1.78- bwidget=1.9.14- bzip2=1.0.8
mamba安装:
mamba env create -f viralcc_linux_env.yaml
使用viralcc:
在终端中输入以下命令,可以查看viralcc的可用命令和选项:
mamba activate viralccpython ./viralcc.py -h
usage: viralcc.py [-h] {pipeline} ...ViralCC: a metagenomic proximity-based tool to retrieve complete viral genomesoptional arguments:-h, --help show this help message and exitcommands:Valid commands
准备输入文件。viralcc支持FASTA和FASTQ格式的输入文件,你可以将你的病毒序列文件准备好。
运行viralcc进行病毒分析测试。在终端中输入以下命令:
python ./viralcc.py pipeline -v Test/final.contigs.fa Test/MAP_SORTED.bam Test/viral_contigs.txt Test/out_test
使用分析流程:
指令:处理原始数据 按照本节的指示,对原始shotgun和Hi-C数据进行处理,并生成ViralCC的输入:
-
清理原始shotgun和Hi-C读段 使用BBTools套件中的bbduk工具去除接头序列,参数为ktrim=r k=23 mink=11 hdist=1 minlen=50 tpe tbo;同时使用bbduk进行质量修剪,参数为trimq=10 qtrim=r ftm=5 minlen=50。另外,通过设置bbduk参数ftl=10来剪切Hi-C读段的前10个核苷酸。使用BBTools套件中的clumpify.sh脚本来移除Hi-C读段中的相同PCR光学重复和Tile边缘重复。
-
组装shotgun读段 对shotgun文库,采用如MEGAHIT之类的de novo组装软件进行元基因组组装。
megahit -1 SG1.fastq.gz -2 SG2.fastq.gz -o ASSEMBLY --min-contig-len 1000 --k-min 21 --k-max 141 --k-step 12 --merge-level 20,0.95
-
将Hi-C双端读段比对到组装得到的contigs上 使用如BWA MEM这样的DNA比对软件将Hi-C双端读段比对至已组装的contigs。然后应用samtools(参数为‘view -F 0x904’)移除未比对、补充比对以及二级比对的读段。需要使用'samtools sort'按名称对BAM文件进行排序。
bwa index final.contigs.fa bwa mem -5SP final.contigs.fa hic_read1.fastq.gz hic_read2.fastq.gz > MAP.sam samtools view -F 0x904 -bS MAP.sam > MAP_UNSORTED.bam samtools sort -n MAP_UNSORTED.bam -o MAP_SORTED.bam
-
从组装的contigs中识别病毒contigs 利用如VirSorter这样的病毒序列检测软件对组装后的contigs进行筛选以识别病毒contigs。
wrapper_phage_contigs_sorter_iPlant.pl -f final.contigs.fa --db 1 --wdir virsorter_output --data-dir virsorter-data
指令:运行ViralCC
python ./viralcc.py pipeline [参数] FASTA文件 BAM文件 VIRAL文件 输出目录
参数说明: --min-len: 可接受的最小contig长度(默认值为1000) --min-mapq: 最小可接受的比对质量(默认值为30) --min-match: 接受的比对至少要有N个匹配(默认值为30) --min-k: 确定宿主邻近图的k值下限(默认值为4) --random-seed: Leiden聚类算法的随机种子(默认值为42) --cover (可选): 覆盖现有文件。如果不指定此选项,若检测到输出文件已存在,则会返回错误。 -v (可选): 显示有关ViralCC过程更多详细信息的详尽输出。
输入文件: FASTA_file: 已组装contig的fasta文件(例如:Test/final.contigs.fa) BAM_file: Hi-C比对结果的bam文件(例如:Test/MAP_SORTED.bam) VIRAL_file: 包含识别出的病毒contigs名称的txt文件,每行一个名称且无表头(例如:Test/viral_contigs.txt)
输出文件: VIRAL_BIN: 包含草稿病毒bin的fasta文件夹 cluster_viral_contig.txt: 聚类结果,包含两列,第一列是病毒contig名称,第二列是组号 viral_contig_info.csv: 病毒contig信息,包含三列(contig名称、contig长度和GC含量) prokaryotic_contig_info.csv: 非病毒contig信息,包含三列(contig名称、contig长度和GC含量) viralcc.log: ViralCC日志文件
示例:
python ./viralcc.py pipeline -v final.contigs.fa MAP_SORTED.bam viral_contigs.txt out_directory
实用脚本位置:Reproduce_ViralCC/Scripts at main · dyxstat/Reproduce_ViralCC (github.com)
concatenation.py
import os
import io
import sys
import argparse
import Bio.SeqIO as SeqIO
import gzip
import numpy as np
import pandas as pddef get_no_hidden_folder_list(wd):folder_list = []for each_folder in os.listdir(wd):if not each_folder.startswith('.'):folder_list.append(each_folder)folder_list_sorte = sorted(folder_list)return folder_list_sortedef main(path , output_file):file_list = get_no_hidden_folder_list(path)bin_num = len(file_list) for k in range(bin_num):seq_file = '%s/%s' % (path , file_list[k])if k==0:op1 = 'echo ' + '\">BIN_' + str(k) + '\" ' + '> ' + output_fileelse:op1 = 'echo ' + '\">BIN_' + str(k) + '\" ' + '>> ' + output_fileos.system(op1)op2 = 'grep ' + '-v ' + '\'>\' ' + seq_file + ' >> ' + output_fileos.system(op2) if __name__ == "__main__":parser = argparse.ArgumentParser()parser.add_argument("-p",help="path")parser.add_argument("-o",help="output_file")args=parser.parse_args()main(args.p,args.o)
find_viral_contig.R
virsorterfile = 'VIRSorter_global-phage-signal.csv'
vs.pred <- read.csv(virsorterfile,quote="",head=F)
vs.head <- read.table(virsorterfile,sep=",",quote="",head=T,comment="",skip=1,nrows=1)
colnames(vs.pred) <- colnames(vs.head)
colnames(vs.pred)[1] <- "vs.id"
vs.cats <- do.call(rbind,strsplit(x=as.character(vs.pred$vs.id[grep("category",vs.pred$vs.id)]),split=" - ",fixed=T))[,2]
vs.num <- grep("category",vs.pred$vs.id)
vs.pred$Category <- paste(c("",rep.int(vs.cats, c(vs.num[-1],nrow(vs.pred)) - vs.num)), vs.pred$Category)
vs.pred <- vs.pred[-grep("#",vs.pred$vs.id),]vs.pred$node <- gsub(pattern="VIRSorter_",replacement="",x=vs.pred$vs.id)
vs.pred$node <- gsub(pattern="-circular",replacement="",x=vs.pred$node)
vs.pred$node <- gsub(pattern="cov_(\\d+)_",replacement="cov_\\1.",x=vs.pred$node,perl=F)rownames(vs.pred) = seq(1 , 1393)vs_phage = vs.pred[1:1338 , ]phage_name = vs_phage$nodefor(i in 1:1338)
{temp = paste0(strsplit(phage_name[i],split='_')[[1]][1] , '_' , strsplit(phage_name[i],split='_')[[1]][2])phage_name[i] = temp
}group_name = rep('group0' , 1338)
phage = cbind(phage_name , group_name)write.table(phage , file = 'viral.txt' , sep='\t', row.names = F , col.names = F , quote =FALSE)
plot_graph.R
####################write ggplot figure###############
library(ggplot2)
library(ggpubr)
library(ggforce)theme_set(theme_bw()+theme(panel.spacing=grid::unit(0,"lines")))##########柱状图对于不同方法和分类###########
Rank = rep(c('F-score' , 'ARI' , 'NMI' , 'Homogeneity') , each = 5)
Pipeline = rep(c('VAMB' , 'CoCoNet' , 'vRhyme' , 'bin3C' , 'ViralCC'),times = 4)
Number = c(0.198,0.485,0.366,0.404,0.795,0.111,0.471,0.302,0.274,0.787,0.724,0.742,0.782,0.817,0.929,0.570,0.723,0.687,0.691,0.921)col = c('#8FBC94' , '#4FB0C6', "#4F86C6", "#527F76", '#CC9966')df <- data.frame(Rank = Rank, Pipeline = Pipeline, Number = Number)
df$Pipeline = factor(df$Pipeline , levels=c('VAMB' , 'CoCoNet' , 'vRhyme' , 'bin3C' , 'ViralCC'))
df$Rank = factor(df$Rank , levels = c('F-score' , 'ARI' , 'NMI', 'Homogeneity'))ggplot(data = df, mapping = aes(x = Rank, y = Number, fill = Pipeline)) + geom_bar(stat = 'identity', position = 'dodge')+scale_fill_manual(values = col,limits= c('VAMB' , 'CoCoNet' , 'vRhyme' , 'bin3C' , 'ViralCC'))+coord_cartesian(ylim = c(0.05,0.975))+labs(x = "Clustering metrics", y = "Scores", title = "The mock human gut dataset")+theme(legend.position="bottom",legend.title=element_blank(),legend.text = element_text(size = 12),panel.grid.major = element_blank(), #不显示网格线panel.grid.minor = element_blank(),axis.text.x = element_text(size = 12),axis.text.y = element_text(size = 12),axis.title.x = element_text(size = 14,face = "bold"),axis.title.y = element_text(size = 14,face = "bold"),title = element_text(size = 16,face = "bold"),plot.title = element_text(hjust = 0.5))ggsave("fig2a.eps", width = 7 , height = 6 , device = cairo_ps)Rank = rep(c('VAMB' , 'CoCoNet' , 'vRhyme' , 'bin3C' , 'ViralCC'),each = 3)
Pipeline = rep(c('Moderately complete' , 'Substantially complete' , 'Near-complete'),times = 5)
Number = c(2,4,1,1,5,5, 6,1,0,1,0,5, 4,2,26)col = c("#8FBC94","#77AAAD","#6E7783")df <- data.frame(Rank = Rank, Pipeline = Pipeline, Number = Number)
df$Pipeline = factor(df$Pipeline , levels=c('Moderately complete' , 'Substantially complete' , 'Near-complete'))
df$Rank = factor(df$Rank , levels = c('VAMB' , 'CoCoNet' , 'vRhyme' , 'bin3C' , 'ViralCC'))ggplot(data = df, mapping = aes(x = Rank, y = Number, fill = Pipeline)) + geom_bar(stat = 'identity', position = 'stack')+scale_fill_manual(values = col,limits= c('Moderately complete' , 'Substantially complete' , 'Near-complete'))+labs(x = "Binning method", y = "Number of viral bins", title = "The mock human gut dataset")+theme(legend.position="bottom",legend.title=element_blank(),legend.text = element_text(size = 12),panel.grid.major = element_blank(), #不显示网格线panel.grid.minor = element_blank(),axis.text.x = element_text(size = 12),axis.text.y = element_text(size = 12),axis.title.x = element_text(size = 14,face = "bold"),axis.title.y = element_text(size = 14,face = "bold"),title = element_text(size = 16,face = "bold"),plot.title = element_text(hjust = 0.5))ggsave("fig2b.eps", width = 7, height = 6, device = cairo_ps)viral_num = data.frame('number' = c(1, 4 , 1 , 1 , 13),'method' = c('VAMB' , 'CoCoNet' , 'vRhyme' , 'bin3C' , 'ViralCC'))viral_num$method = factor(viral_num$method , levels=c('VAMB' , 'CoCoNet' , 'vRhyme' , 'bin3C' , 'ViralCC'))ggplot(data = viral_num, aes(x = method , y = number )) + geom_bar(stat = "identity", position='dodge' , width = 0.9,fill = 'steelblue') + labs(x = 'Binning method', y = 'Number of high-quality vMAGs within the co-host systems', title = "The mock human gut dataset") +theme(panel.grid.major = element_blank(), #不显示网格线panel.grid.minor = element_blank(),axis.text.x = element_text(size = 12),axis.text.y = element_text(size = 12),axis.title.x = element_text(size = 14,face = "bold"),axis.title.y = element_text(size = 14,face = "bold"),title = element_text(size = 16,face = "bold"),plot.title = element_text(hjust = 0.5))ggsave("fig2c.eps", width = 7, height = 6, device = cairo_ps)##############human gut 2a############
Rank = rep(c('ViralCC' ,'bin3C' , 'vRhyme' , 'CoCoNet' , 'VAMB'),each = 5)
Completeness = rep(c( "≥ 50%", "≥ 60%", "≥ 70%", "≥ 80%" , "≥ 90%"),times = 5)
###Number needs to be 4*5 matrix##
Number = c(11 , 12 , 17 , 7 , 78,1 , 0 , 1 , 4 , 33,10, 11, 10, 6, 60,2, 1 , 3 , 2 , 25,10, 11, 14, 15, 69)col = c("#023FA5" ,"#5465AB" ,"#7D87B9" ,"#A1A6C8" ,"#BEC1D4")[5:1]
df <- data.frame(Rank = Rank, Completeness = Completeness, Number = Number)
df$Completeness = factor(df$Completeness , levels=c("≥ 50%", "≥ 60%", "≥ 70%", "≥ 80%" , "≥ 90%"))
df$Rank = factor(df$Rank , levels = c('ViralCC' ,'bin3C' , 'vRhyme' , 'CoCoNet' , 'VAMB'))ggplot(data = df, mapping = aes(x = Rank, y = Number, fill = Completeness)) + geom_bar(stat = 'identity', position = 'stack')+scale_fill_manual(values = col , limits= c("≥ 50%", "≥ 60%", "≥ 70%", "≥ 80%" , "≥ 90%"))+labs(x = "Binning method", y = "Number of bins", title = "CheckV results on the real human gut dataset")+coord_flip()+theme(legend.position="bottom",legend.title=element_text(size = 11),legend.text = element_text(size = 11),panel.grid.major = element_blank(), #不显示网格线panel.grid.minor = element_blank(),axis.text.x = element_text(size = 12),axis.text.y = element_text(size = 12),axis.title.x = element_text(size = 13,face = "bold"),axis.title.y = element_text(size = 13,face = "bold"),title = element_text(size = 14,face = "bold"),plot.title = element_text(hjust = 0.5))ggsave("fig3a.eps", width = 6.3, height = 5, device = cairo_ps)##############cow fecal 2b############
Rank = rep(c('ViralCC' ,'bin3C' , 'vRhyme' , 'CoCoNet' , 'VAMB'),each = 5)
Completeness = rep(c( "≥ 50%", "≥ 60%", "≥ 70%", "≥ 80%" , "≥ 90%"),times = 5)
###Number needs to be 4*5 matrix##
Number = c(21 , 14 , 21 , 9 , 60,14 , 17 , 12 , 8 , 31,18, 14 , 16 , 14 , 36,3, 3 , 2 , 2 , 25,19,17,10,8,23)col = c("#023FA5" ,"#5465AB" ,"#7D87B9" ,"#A1A6C8" ,"#BEC1D4")[5:1]
df <- data.frame(Rank = Rank, Completeness = Completeness, Number = Number)
df$Completeness = factor(df$Completeness , levels=c("≥ 50%", "≥ 60%", "≥ 70%", "≥ 80%" , "≥ 90%"))
df$Rank = factor(df$Rank , levels = c('ViralCC' ,'bin3C' , 'vRhyme' , 'CoCoNet' , 'VAMB'))ggplot(data = df, mapping = aes(x = Rank, y = Number, fill = Completeness)) + geom_bar(stat = 'identity', position = 'stack')+scale_fill_manual(values = col , limits= c("≥ 50%", "≥ 60%", "≥ 70%", "≥ 80%" , "≥ 90%"))+labs(x = "Binning method", y = "Number of bins", title = "CheckV results on the real cow fecal dataset")+coord_flip()+theme(legend.position="bottom",legend.title=element_text(size = 11),legend.text = element_text(size = 11),panel.grid.major = element_blank(), #不显示网格线panel.grid.minor = element_blank(),axis.text.x = element_text(size = 12),axis.text.y = element_text(size = 12),axis.title.x = element_text(size = 13,face = "bold"),axis.title.y = element_text(size = 13,face = "bold"),title = element_text(size = 14,face = "bold"),plot.title = element_text(hjust = 0.5))ggsave("fig3b.eps", width = 6.3, height = 5, device = cairo_ps)##############wastewater 2c############
Rank = rep(c('ViralCC' ,'bin3C' , 'vRhyme' , 'CoCoNet' , 'VAMB'),each = 5)
Completeness = rep(c( "≥ 50%", "≥ 60%", "≥ 70%", "≥ 80%" , "≥ 90%"),times = 5)
###Number needs to be 3*5 matrix##
Number = c(30 , 27 , 21 , 17 , 77,19, 20 , 11 , 11 , 28,14,16,14,15,32,2, 8 , 8 , 6 , 38,20,34,14,13,58)col = c("#023FA5" ,"#5465AB" ,"#7D87B9" ,"#A1A6C8" ,"#BEC1D4")[5:1]
df <- data.frame(Rank = Rank, Completeness = Completeness, Number = Number)
df$Completeness = factor(df$Completeness , levels=c("≥ 50%", "≥ 60%", "≥ 70%", "≥ 80%" , "≥ 90%"))
df$Rank = factor(df$Rank , levels = c('ViralCC' ,'bin3C' , 'vRhyme' , 'CoCoNet' , 'VAMB'))ggplot(data = df, mapping = aes(x = Rank, y = Number, fill = Completeness)) + geom_bar(stat = 'identity', position = 'stack')+scale_fill_manual(values = col , limits= c("≥ 50%", "≥ 60%", "≥ 70%", "≥ 80%" , "≥ 90%"))+labs(x = "Binning method", y = "Number of bins", title = "CheckV results on the real wastewater dataset")+coord_flip()+theme(legend.position="bottom",legend.title=element_text(size = 11),legend.text = element_text(size = 11),panel.grid.major = element_blank(), #不显示网格线panel.grid.minor = element_blank(),axis.text.x = element_text(size = 12),axis.text.y = element_text(size = 12),axis.title.x = element_text(size = 13,face = "bold"),axis.title.y = element_text(size = 13,face = "bold"),title = element_text(size = 14,face = "bold"),plot.title = element_text(hjust = 0.5))ggsave("fig3c.eps", width = 6.35, height = 5, device = cairo_ps)########Fraction of host by different number of viruses#########df<-data.frame(group=c('infected by one virus' , 'infected by two viruses', 'infected by three viruses'),value=c(25,35,45))
df$group = as.vector(df$group)ggplot(df,aes(x="",y=value,fill=group))+geom_bar(stat="identity")+coord_polar("y",start=1) + geom_text(aes(y=c(0,cumsum(value)[-length(value)]),label=percent(value/100)),size=5)+theme_minimal()+theme(axis.title=element_blank(),axis.ticks=element_blank(),axis.text = element_blank(),legend.title = element_blank())+scale_fill_manual(values=c("darkgreen","orange","deepskyblue"))##########Supplementary material###########
########Mock cow fecal dataset#######
Rank = rep(c('F-score' , 'ARI' , 'NMI' , 'Homogeneity') , each = 4)
Pipeline = rep(c( 'CoCoNet' , 'vRhyme', 'bin3C' , 'ViralCC'),times = 4)
Number = c(0.564, 0.763 , 0.936 , 0.936,0.455 ,0.719, 0.926 , 0.926,0.796 , 0.885 , 0.969 , 0.963,0.661 ,0.806, 0.940 , 1)col = c('#4FB0C6', "#4F86C6", "#527F76", '#CC9966')df <- data.frame(Rank = Rank, Pipeline = Pipeline, Number = Number)
df$Pipeline = factor(df$Pipeline , levels=c('CoCoNet' , 'vRhyme', 'bin3C' , 'ViralCC'))
df$Rank = factor(df$Rank , levels = c('F-score' , 'ARI' , 'NMI', 'Homogeneity'))ggplot(data = df, mapping = aes(x = Rank, y = Number, fill = Pipeline)) + geom_bar(stat = 'identity', position = 'dodge')+scale_fill_manual(values = col,limits= c('CoCoNet' , 'vRhyme', 'bin3C' , 'ViralCC'))+coord_cartesian(ylim = c(0.3,1))+labs(x = "Clustering metrics", y = "Scores", title = "The mock cow fecal dataset")+theme(legend.position="bottom",legend.title=element_blank(),legend.text = element_text(size = 12),panel.grid.major = element_blank(), #不显示网格线panel.grid.minor = element_blank(),axis.text.x = element_text(size = 12),axis.text.y = element_text(size = 12),axis.title.x = element_text(size = 14,face = "bold"),axis.title.y = element_text(size = 14,face = "bold"),title = element_text(size = 16,face = "bold"),plot.title = element_text(hjust = 0.5))ggsave("sp1a.eps", width = 6, height = 5, device = cairo_ps)Rank = rep(c('CoCoNet' , 'vRhyme', 'bin3C' , 'ViralCC'),each = 3)
Pipeline = rep(c('Moderately complete' , 'Substantially complete' , 'Near-complete'),times = 4)
Number = c(1 , 1 , 3 , 3,2,2,1, 3 , 5 , 0 ,0 , 8 )col = c("#8FBC94","#77AAAD","#6E7783")df <- data.frame(Rank = Rank, Pipeline = Pipeline, Number = Number)
df$Pipeline = factor(df$Pipeline , levels=c('Moderately complete' , 'Substantially complete' , 'Near-complete'))
df$Rank = factor(df$Rank , levels = c('CoCoNet' , 'vRhyme', 'bin3C' , 'ViralCC'))ggplot(data = df, mapping = aes(x = Rank, y = Number, fill = Pipeline)) + geom_bar(stat = 'identity', position = 'stack')+coord_cartesian(ylim = c(0 , 9))+scale_y_discrete(limits = c(0 , 3 , 6 , 9))+scale_fill_manual(values = col,limits= c('Moderately complete' , 'Substantially complete' , 'Near-complete'))+labs(x = "Binning method", y = "Number of viral bins", title = "The mock cow fecal dataset")+theme(legend.position="bottom",legend.title=element_blank(),legend.text = element_text(size = 12),panel.grid.major = element_blank(), #不显示网格线panel.grid.minor = element_blank(),axis.text.x = element_text(size = 12),axis.text.y = element_text(size = 12),axis.title.x = element_text(size = 14,face = "bold"),axis.title.y = element_text(size = 14,face = "bold"),title = element_text(size = 16,face = "bold"),plot.title = element_text(hjust = 0.5))ggsave("sp1b.eps", width = 6, height = 5, device = cairo_ps)##########Supplementary material###########
########Mock wastewater fecal#######
Rank = rep(c('F-score' , 'ARI' , 'NMI' , 'Homogeneity') , each = 4)
Pipeline = rep(c('CoCoNet' , 'vRhyme', 'bin3C' , 'ViralCC'),times = 4)
Number = c(0.667,0.657,0.858,0.903,0.602 ,0.596,0.828,0.891,0.806 ,0.843, 0.898,0.937,0.687 ,0.746, 0.816,0.881)col = c('#4FB0C6', "#4F86C6", "#527F76", '#CC9966')df <- data.frame(Rank = Rank, Pipeline = Pipeline, Number = Number)
df$Pipeline = factor(df$Pipeline , levels=c('CoCoNet' , 'vRhyme', 'bin3C' , 'ViralCC'))
df$Rank = factor(df$Rank , levels = c('F-score' , 'ARI' , 'NMI', 'Homogeneity'))ggplot(data = df, mapping = aes(x = Rank, y = Number, fill = Pipeline)) + geom_bar(stat = 'identity', position = 'dodge')+scale_fill_manual(values = col,limits= c('CoCoNet' , 'vRhyme', 'bin3C' , 'ViralCC'))+coord_cartesian(ylim = c(0.1,0.97))+labs(x = "Clustering metrics", y = "Scores", title = "The mock wastewater dataset")+theme(legend.position="bottom",legend.title=element_blank(),legend.text = element_text(size = 12),panel.grid.major = element_blank(), #不显示网格线panel.grid.minor = element_blank(),axis.text.x = element_text(size = 12),axis.text.y = element_text(size = 12),axis.title.x = element_text(size = 14,face = "bold"),axis.title.y = element_text(size = 14,face = "bold"),title = element_text(size = 16,face = "bold"),plot.title = element_text(hjust = 0.5))ggsave("sp1c.eps", width = 6, height = 5, device = cairo_ps)Rank = rep(c('CoCoNet' , 'vRhyme', 'bin3C' , 'ViralCC'),each = 3)
Pipeline = rep(c('Moderately complete' , 'Substantially complete' , 'Near-complete'),times = 4)
Number = c( 5 , 3 , 1 , 1,2,2,1, 3 , 1 , 1 ,3 , 12 )col = c("#8FBC94","#77AAAD","#6E7783")df <- data.frame(Rank = Rank, Pipeline = Pipeline, Number = Number)
df$Pipeline = factor(df$Pipeline , levels=c('Moderately complete' , 'Substantially complete' , 'Near-complete'))
df$Rank = factor(df$Rank , levels = c('CoCoNet' , 'vRhyme', 'bin3C' , 'ViralCC'))ggplot(data = df, mapping = aes(x = Rank, y = Number, fill = Pipeline)) + geom_bar(stat = 'identity', position = 'stack')+scale_fill_manual(values = col,limits= c('Moderately complete' , 'Substantially complete' , 'Near-complete'))+labs(x = "Binning method", y = "Number of viral bins", title = "The mock wastewater dataset")+theme(legend.position="bottom",legend.title=element_blank(),legend.text = element_text(size = 12),panel.grid.major = element_blank(), #不显示网格线panel.grid.minor = element_blank(),axis.text.x = element_text(size = 12),axis.text.y = element_text(size = 12),axis.title.x = element_text(size = 14,face = "bold"),axis.title.y = element_text(size = 14,face = "bold"),title = element_text(size = 16,face = "bold"),plot.title = element_text(hjust = 0.5))ggsave("sp1d.eps", width = 6, height = 5, device = cairo_ps)##########CheckM results#############
Rank = rep(c('MetaBAT2' , 'CoCoNet' , 'bin3C' , 'ViralCC'),each = 3)
Pipeline = rep(c('Moderately complete' , 'Substantially complete' , 'Near-complete'),times = 4)
Number = c(3 , 4 , 4 , 5 , 3 , 1 , 1, 3 , 1 , 2 ,2 , 12 )col = c("#8FBC94","#77AAAD","#6E7783")df <- data.frame(Rank = Rank, Pipeline = Pipeline, Number = Number)
df$Pipeline = factor(df$Pipeline , levels=c('Moderately complete' , 'Substantially complete' , 'Near-complete'))
df$Rank = factor(df$Rank , levels = c('MetaBAT2' , 'CoCoNet' , 'bin3C' , 'ViralCC'))ggplot(data = df, mapping = aes(x = Rank, y = Number, fill = Pipeline)) + geom_bar(stat = 'identity', position = 'stack')+scale_fill_manual(values = col,limits= c('Moderately complete' , 'Substantially complete' , 'Near-complete'))+labs(x = "Binning method", y = "Number of bins", title = "Mock wastewater dataset")+theme(legend.position="top",legend.title=element_blank(),legend.text = element_text(size = 11),panel.grid.major = element_blank(), #不显示网格线panel.grid.minor = element_blank(),axis.text.x = element_text(size = 11),axis.text.y = element_text(size = 11),axis.title.x = element_text(size = 14,face = "bold"),axis.title.y = element_text(size = 14,face = "bold"),title = element_text(size = 14,face = "bold"),plot.title = element_text(hjust = 0.5))#######Compute the length of viral contigs########
contig_info = read.csv('contig_viral_info_ww.csv' , sep = ',' , header = F)
min(contig_info[,3])
max(contig_info[,3])#######Chi-square testing############
tableR = matrix(c(72,96,264,36,38,90,21,24,49,38,42,80),nrow=3)
chisq.test(tableR,correct = F)
removesmalls.pl
## removesmalls.pl
##!/usr/bin/perl
## perl removesmalls.pl 200 contigs.fasta > contigs-l200.fasta
use strict;
use warnings;my $minlen = shift or die "Error: `minlen` parameter not provided\n";
{local $/=">";while(<>) {chomp;next unless /\w/;s/>$//gs;my @chunk = split /\n/;my $header = shift @chunk;my $seqlen = length join "", @chunk;print ">$_" if($seqlen >= $minlen);}local $/="\n";
}
相关文章:
快速、准确地检测和分类病毒序列分析工具 ViralCC的介绍和详细使用方法, 附带应用脚本
介绍 viralcc是一个基因组病毒分析工具,可以用于快速、准确地检测和分类病毒序列。 github:dyxstat/ViralCC: ViralCC: leveraging metagenomic proximity-ligation to retrieve complete viral genomes (github.com) Instruction of reproducing resul…...
DNs服务学习笔记
DNS:域名系统(英文:Domain Name System)是一个域名系统,是万维网上作为域名和IP地址相互映射的一个分布式数据库,能够使用户更方便的访问互联网,而不用去记住能够被机器直接读取的IP数串。类似于生活中的11…...
获取线程池中任务执行数量
获取线程池中任务执行数量 通过线程池进行任务处理,有时我们需要知道线程池中任务的执行状态。通过ThreadPoolExecutor的相关API实时获取线程数量,排队任务数量,执行完成线程数量等信息。 实例 private static ExecutorService es new Thr…...
RK3566 Android 11平台上适配YT8512C 100M PHY
RK3566代码之前适配的1000M IC RTL8211F , 现在需要在之前的基础上修改PHY IC 为裕泰的YT8512C ----------------------------------------------------------------------//将1000M 的配置关掉,改为100M 配置,查看RK3566 资料关于以太网的配置即可知道如何修改 #if…...
docker 部署haproxy cpu占用特别高
在部署mysql 主主高可用时,使用haproxy进行负载,在服务部使用的情况下发现服务器cpu占比高,负载也高,因此急需解决这个问题。 1.解决前现状 1.1 部署配置文件 cat > haproxy.cfg << EOF globalmaxconn 4000nbthrea…...
Oracle导出CSV文件
利用spool spool基本格式: spool 路径文件名 select col1||,||col2||,||col3||,||col4 from tablename; spool off spool常用的设置: set colsep ; //域输出分隔符 set echo off; //显示start启动的脚本中的每个sql命令,缺…...
图像分割实战-系列教程12:deeplab系列算法概述
🍁🍁🍁图像分割实战-系列教程 总目录 有任何问题欢迎在下面留言 本篇文章的代码运行界面均在Pycharm中进行 本篇文章配套的代码资源已经上传 1、deeplab概述 图像分割中的传统做法:为了增大感受野,通常都会选择pooling…...
数据库02-07 存储
计算机存储系统: 02.磁道存储...
WPF 入门教程DispatcherTimer计时器
https://www.zhihu.com/tardis/bd/art/430630047?source_id1001 在 WinForms 中,有一个名为 Timer 的控件,它可以在给定的时间间隔内重复执行一个操作。WPF 也有这种可能性,但我们有DispatcherTimer控件,而不是不可见的控件。它几…...
【教学类-43-04】20231229 N宫格数独4.0(n=2,4,6,8) (ChatGPT AI对话大师生成 回溯算法)
作品展示: 背景需求: 幼儿表示自己适合做5宫格 第一次AI生成九宫格数独python代码 【教学类-43-03】20231229 N宫格数独3.0(n1、2、3、4、6、8、9) (ChatGPT AI对话大师生成)-CSDN博客文章浏览阅读162次&…...
WPF美化ItemsControl1:不同颜色间隔
首先我们有的是一个绑定好数据的ItemsControl <ItemsControl ItemsSource"{Binding Starts}"> </ItemsControl> 运行后呢是朴素的将数据竖着排列 如果想要数据之间有间距,可以使用数据模板,将数据放到TextBlock中显示࿰…...
查看进程对应的路径查看端口号对应的进程ubuntu 安装ssh共享WiFi设置MyBatis 使用map类型作为参数,复杂查询(导出数据)
Linux 查询当前进程所在的路径 top 命令查询相应的进程号pid ps -ef |grep 进程名 lsof -I:端口号 netstat -anp|grep 端口号 cd /proc/进程id cwd 进程运行目录 exe 执行程序的绝对路径 cmdline 程序运行时输入的命令行命令 environ 记录了进程运行时的环境变量 fd 目录下是进…...
医院信息系统集成平台—安全保障体系
隐私保护措施 隐私保护及信息安全是医院信息平台所要重点解决的问题,应从患者同意,匿名化服务,依据病种、角色等多维度授权,关键信息(字段级、记录级、文件级)加密存储等方面展开。电子病历等医疗数据进行调阅时,包括强身份认证需求、角色授权需求、责任认…...
【信息论与编码】习题-填空题
目录 填空题1.克劳夫特不等式是判断( )的充要条件。2.无失真信源编码的中心任务是编码后的信息率压缩接近到()限失真压缩中心任务是在给定的失真度条件下,信息率压缩接近到( )。3.常用的检纠错方…...
二叉树的层序遍历经典问题(算法村第六关白银挑战)
基本的层序遍历与变换 二叉树的层序遍历 102. 二叉树的层序遍历 - 力扣(LeetCode) 给你二叉树的根节点 root ,返回其节点值的 层序遍历 。 (即逐层地,从左到右访问所有节点)。 示例 1: 输入…...
信息学奥赛一本通:装箱问题
题目链接:http://ybt.ssoier.cn:8088/problem_show.php?pid1917 题目 1917:【01NOIP普及组】装箱问题 时间限制: 1000 ms 内存限制: 65536 KB 提交数: 4117 通过数: 2443 【题目描述】 有一个箱子容量为V�(正整数,…...
ReactNative 常见问题及处理办法(加固混淆)
ReactNative 常见问题及处理办法(加固混淆) 文章目录 摘要引言正文ScrollView内无法滑动RN热更新中的文件引用问题RN中获取高度的技巧RN强制横屏UI适配问题低版本RN(0.63以下)适配iOS14图片无法显示问题RN清理缓存RN navigation参…...
算法基础之合并果子
合并果子 核心思想: 贪心 Huffman树(算法): 每次将两个最小的堆合并 然后不断向上合并 #include<iostream>#include<algorithm>#include<queue> //用小根堆实现找最小堆using namespace std;int main(){int n;cin>>n;priority_queue&l…...
CSS 使用技巧
CSS 使用技巧 引入苹方字体 苹方提供了六个字重,font-family 定义如下:苹方-简 常规体font-family: PingFangSC-Regular, sans-serif;苹方-简 极细体font-family: PingFangSC-Ultralight, sans-serif;苹方-简 细体font-family: PingFangSC-Light, sans…...
typescript,eslint,prettier的引入
typescript 首先用npm安装typescript,cnpm i typescript 然后再tsc --init生成tsconfig.json配置文件,这个文件在package.json同级目录下 最后在tsconfig.json添加includes配置项,在该配置项中的目录下,所有的d.ts中的类型可以在…...
web前端javaScript笔记——(7)Math和Date方法
Math -Math和其他的对象不同,它不是一个构造函数, 它属于一个工具类不用创建对象,它里边封装了数学运算相关的属性和方法 比如 Math.PI 表示的圆周率 使用方法Math.方法(); Math.abs()可以用来计算一个数的绝对值 Math.ceil()可以对一…...
深入理解Java中资源加载的方法及Spring的ResourceLoader应用
在Java开发中,资源加载是一个基础而重要的操作。本文将深入探讨Java中两种常见的资源加载方式:ClassLoader的getResource方法和Class的getResource方法,并介绍Spring框架中的ResourceLoader的应用。 1. 资源加载的两种方式 1.1 ClassLoader…...
实时记录和查看Apache 日志
Apache 是一个开源的、广泛使用的、跨平台的 Web 服务器,保护 Apache Web 服务器平台在很大程度上取决于监控其上发生的活动和事件,监视 Apache Web 服务器的最佳方法之一是收集和分析其访问日志文件。 Apache 访问日志提供了有关用户如何与您的网站交互…...
Java实战项目五:文本冒险游戏
文章目录 一、实战概述二、知识点概览(一)条件分支与循环结构(二)面向对象设计(三)用户交互与事件处理 三、思路分析(一)系统架构设计(二)功能模块划分详解 四…...
docker_ROS的usb_cam使用与标定
目录 准备 准备标定板 新建容器 新建usb_cam话题的ROS功能包 编写代码 编译 运行功能包 标定 安装camera_calibration标定功能包 启动发布usb_cam话题的功能包 启动camera_calibration标定功能包 准备 usb相机 标定板 一个带有ROS的docker镜像。 准备标定板 图…...
记一次RabbitMQ服务器异常断电之后,服务重启异常的处理过程
转载说明:如果您喜欢这篇文章并打算转载它,请私信作者取得授权。感谢您喜爱本文,请文明转载,谢谢。 问题描述: 机房突然停电,rabbitmq的主机异常断电,集群服务全部需要重启。但是在执行service…...
rime中州韵小狼毫 help lua Translator 帮助消息翻译器
lua 是 Rime中州韵/小狼毫输入法强大的武器,掌握如何在Rime中州韵/小狼毫中使用lua,你将体验到什么叫 随心所欲。 先看效果 在 rime中州韵 输入效果一览 中的 👇 help效果 一节中, 我们看到了在Rime中州韵/小狼毫输入法中输入 h…...
C++完成使用map Update数据 二进制数据
1、在LXMysql.h和LXMysql.cpp分别定义和编写关于pin语句的代码 //获取更新数据的sql语句 where语句中用户要包含where 更新std::string GetUpdatesql(XDATA kv, std::string table, std::string where); std::string LXMysql::GetUpdatesql(XDATA kv, std::string table, std…...
ARCGIS PRO SDK 访问Geometry对象
一、Geometry常用对象 二、主要类 1、ReadOnlyPartCollection:Polyline 和 Polygon 使用的 ReadOnlySegmentCollection 部件的只读集合,属性成员: 名字描述Count获取 ICollection 中包含的元素数。TIEM获取位于指定索引处的元素。Spatial…...
数据结构之各大排序(C语言版)
我们这里话不多说,排序重要性大家都很清楚,所以我们直接开始。 我们就按照这张图来一一实现吧! 一.直接插入排序与希尔排序. 这个是我之前写过的内容了,大家可以通过链接去看看详细内容。 算法之插入排序及希尔排序(…...
学习做网站多久/百度开车关键词
今天干了件很蠢的事情~.. 想要将a电影.srt 改成跟电影.avi 同名字.于是就下了mv a电影.srt 电影.avi ... 当按下enter 时清醒了过来...发现到那avi 已经被我清掉了.. 变srt 了... :~于是就有这篇文章的由来.... 呜... :~虽然是写说档案删除的回复救援, 但事实上我没有成功救回来…...
小程序定制开发团队/汕头seo不错
一、VBS语言基础 1.运算符和表达式 (1)运算符 (2)表达式 a.数学表达式:由算术运算符连接,计算结果为数字 b.字符串表达式:由字符串连接符连接&#x…...
浏览器做单页网站项目/百度排名优化工具
安装 Linux 主机时,如果选择 最小化安装! 配置 vnc 远程桌面可以参考:Linux 配置 VNC 远程桌面 使用 vnc 等工具连接通常显示如下: 也就是无法使用图形化界面,可以通过 yum 直接安装图形化界面: Linux 6: yum groupinstall -y "X Window System" yum gr…...
wordpress底部CSS/app推广平台
据统计,全球范围内被投递的钓鱼邮件每天约达到1亿封,经常会遇到一些邮件发送方,被spammer利用于伪造各种钓鱼/诈骗邮件,如:伪造银行、保险等金融企业,支付宝、Paypal等支付商,知名网站、政府网站…...
郑州网站推广公司案例/百度收录的网站多久更新一次
如搜索框中,每改变一个数值就请求一次搜索接口,当快速的改变数值时并不需要多次请求接口,这就需要一个防抖函数: // 防抖函数 export function debounce(func, delay) { // func 函数 delay间隔时间let timerreturn function (...…...
网站开发方案及报价单/如何推广一款app
一、基本查询语句 select的基本语法格式如下: select 属性列表 from 表名和视图列表 [ where 条件表达式1 ] [ group by 属性名1 [ having 条件表达式2 ] ] [ order by 属性名2 [ asc | desc ] ] 属性列表参数表示需要查询的字段名; 表名和视图列表参数表…...