RNA-seq分析流程二:DEseq2做差别组间差别表达分析

分享
源码 2024-10-8 07:05:59 27 0 来自 中国
利用DEseq2循环做多组间差别表达分析

    当有多组RNA-seq数据时,偶然须要对多个组合举行差别表达分析,比方当我有CIM0/CIM7/CIM14/CIM28四组时,我须要得到每个组合间的差别表达情况,CIM7:CIM0; CIM14:CIM0; CIM14:CIM7等。利用ANOVA的方式也可以举行多组间比力,但由于ANOVA是指定同一个CK,而且不能得到具体是哪组相对于CK有差别表达,不能精准的办理我的需求,因此选择利用DEseq2循环对差别组举行差别表达分析。
一. R脚本

    现在脚本中DEGs(差别表达基因)筛选尺度为log2FoldChange>1或log2FoldChange<-1以及pvalue<0.05, qvalue<0.05,可根据需求更改阈值。
###
library(GenomicFeatures)
library(DESeq2)
library(dplyr)
###load and set output file
file_in <- "counts.txt"
file_design <- "design.txt"
file_compare <- "compare2.txt"
file_deg_num = paste("data//","DE_",file_in,sep="")   ##number of DEGs in all compare
file_final_csv  = paste("data//","DE_",file_in, "_Final_Out.csv",sep="")
file_final_genelist = paste("data//","DEG_geneid_allcomapre.txt",sep="")


################################################################
###########get DEGs in different compare
# counts file
data_in = read.table(file_in, head=TRUE,row.names =1, check.names = FALSE)
mycompare=read.table(file_compare,head=TRUE)
mydesign=read.table(file_design, head = TRUE)
##filter counts
countData = as.data.frame(data_in)
dim(countData)
mycounts_filter <- countData[rowSums(countData) != 0,]
dim(mycounts_filter)
##check group
head(mycompare)
head(mydesign)
total_num=dim(mycompare)[1]  ##get the num of groups
tracking = 0
gene_num_out = c()
pvalue_cut = 0.01
condition_name = c()
for (index_num in c(1:total_num)){
  tracking = tracking + 1
  test_name = as.character(mycompare[index_num,1])
  group1 = as.character(mycompare[index_num, 2])
  group2 = as.character(mycompare[index_num, 3])
  sample1 = mydesign[mydesign$Group == group1,]
  sample2 = mydesign[mydesign$Group == group2,]
  allsample = rbind(sample1, sample2)
  counts_sample = as.character(allsample$counts_id)
  groupreal = as.character(allsample$Group)
  countData = mycounts_filter[, counts_sample]
  colData = as.data.frame(cbind(counts_sample, groupreal))
  names(colData) = c("sample", "condition")
  dds <- DESeqDataSetFromMatrix(countData = countData,
                                colData = colData,
                                design = ~ condition)
  dds <- DESeq(dds)
  resSFtreatment <- results(dds, cooksCutoff =FALSE, contrast=c("condition",group2,group1))
  out_test = as.data.frame(resSFtreatment)
  final_each = cbind( out_test$log2FoldChange,out_test$pvalue,out_test$padj)
  rownames(final_each) = resSFtreatment@rownames
  names = c('(logFC)','(pvalue)','(Qvalue)')
  final_name = paste(test_name,'_',names,sep="")
  colnames(final_each) = final_name
  if (tracking == 1){
    final_table = final_each
  }else{
    final_table = cbind(final_table, final_each)
  }
  gene_sel = out_test[((!is.na(out_test$pvalue))&(!is.na(out_test$log2FoldChange)))&out_test$pvalue < 0.05 & abs(out_test$log2FoldChange) >= 1 & out_test$padj<0.05, ]
  gene_sel<-na.omit(gene_sel)   ##delete rows contain NA
  gene_sel_up = gene_sel [gene_sel$log2FoldChange>0,]
  gene_sel_do = gene_sel [gene_sel$log2FoldChange<0,]
  file_out_up = paste("data//DEGsid//","UP_",test_name,".txt",sep="")
  file_out_do = paste("data//DEGsid//","DOWN_",test_name,".txt",sep="")
  gene_list_up = rownames(gene_sel_up)
  gene_list_do = rownames(gene_sel_do)
  all_ub_down = list(gene_list_up, gene_list_do)
  nameup <- paste(test_name,"_up",sep="")
  namedown <- paste(test_name,"_down",sep="")
  names(all_ub_down) = c(nameup, namedown)
  if (tracking == 1){
    final_genelist = all_ub_down
  }else{
    final_genelist = c(final_genelist, all_ub_down)
  }
  gene_num_up = length(gene_list_up)
  gene_num_do = length(gene_list_do)
  write.table(gene_sel_up, file= file_out_up , row.names = TRUE,col.names = TRUE)
  write.table(gene_sel_do, file= file_out_do , row.names = TRUE,col.names = TRUE)
  condition_name = c(condition_name, paste("UP_",test_name,sep=""),paste("DO_",test_name,sep=""))
  gene_num_out = c(gene_num_out,gene_num_up,gene_num_do)
}
final_DEGs_list<-do.call(cbind, lapply(lapply(final_genelist, unlist), `length<-`, max(lengths(final_genelist))))
final_DEGs_list
out_final2 = cbind(condition_name, gene_num_out)
colnames(out_final2) = c("Tests", "DEG number")
write.table(final_DEGs_list, file=file_final_genelist, row.names = FALSE, sep="\t", na = "") ###all DEGs list
write.table(out_final2, file=file_deg_num, row.names = FALSE, sep="\t")    ##DEG number in different compare
write.csv(final_table, file=file_final_csv, row.names = TRUE,quote=TRUE)  ##allgene in all compare
二. 输入文件

2.1.counts.txt


    DEseq2的输入counts数据需为整数;假如是RSEM结果,利用expected
counts取整。
2.2.design.txt


文件共四列,格式如下:
1.png sample_id: 样本id;
counts_id: counts文件中的样品id;
Group: 样天职组;
rep: 样本重复;
2.3.compare.txt


指定比力组
2.png Test:举行比力的组
denominator: 对照组
numerator: 比力组
注:对照组在前,比力组在后,假如做CIM7相对于CIM0的差别表达基因分析,denominatorCIM0numeratorCIM7
三.输出文件

    运行脚本前在当前目次下创建data文件夹,并在data下创建DEGsid文件夹,用于存放输出文件。脚本运行完成后会在data文件夹下输出3个文件:
3.1.DE_counts.txt_Final_Out.csv

输出总表,格式如下:
3.png 第一列为基因id,包罗全部基因,后几列为全部比力组的logFC,pvalue以及qvaue值。
3.2.DE_counts.txt

全部比力组中差别表达基因的数目。
3.3. DEG_geneid_allcomapre.txt

    全部比力组中差别表达基因的基因id,该文件用于筛选差别组间的重叠基因。“-up”体现比力组内上调的基因,“_down”体现比力组内下调的基因。
3.4.DEGsid


    DEGsid文件夹下天生一系列文件,为每个比力组的DEseq2的分析结果, “UP_”和“DOWN”分别体现上调和下调的基因。
四.找差别比力组间的重叠DEGs


得到这些输出结果后我还想找到差别组间的重叠DEGs:


6.png names(mydegs)得到差别比力组的名称,指定mycom1和mycom2后利用intersect()找到交集。
您需要登录后才可以回帖 登录 | 立即注册

Powered by CangBaoKu v1.0 小黑屋藏宝库It社区( 冀ICP备14008649号 )

GMT+8, 2024-10-18 16:47, Processed in 0.171215 second(s), 35 queries.© 2003-2025 cbk Team.

快速回复 返回顶部 返回列表