数据集采用Seurat内置数据集pbmc_small,有80个细胞,按RNA_snn_res.1分组有3种类型
> pbmc_smallAn object of class Seurat230 features across 80 samples within 1 assayActive assay: RNA (230 features, 20 variable features) 2 dimensional reductions calculated: pca, tsne> unique(pbmc_small$RNA_snn_res.1)[1] 0 2 1Levels: 0 1 2使用sp.size=10,按RNA_snn_res.1分组,每种类型取10个细胞。
> all <- sample_seob(pbmc_small,sp.size=10,group.by='RNA_snn_res.1')> allAn object of class Seurat230 features across 30 samples within 1 assayActive assay: RNA (230 features, 0 variable features)使用sp.total=50,按RNA_snn_res.1分组,取大约50个细胞。
> all <- sample_seob(pbmc_small,sp.total=50,group.by='RNA_snn_res.1')> allAn object of class Seurat230 features across 51 samples within 1 assayActive assay: RNA (230 features, 0 variable features)分析实战——分组随机采样后找每个亚群的DEG
在实际的分析中,我发现FindAllMarkers经常跑着跑着就断了,出现以下报错:
Warning: When testing 15 versus all: The total size of the 5 globals that need to be exported for the future expression (‘FUN()’) is 2.06 GiB. This exceeds the maximum allowed size of 500.00 MiB (option 'future.globals.maxSize'). The three largest globals are ‘data.use’ (2.06 GiB of class ‘S4’), ‘group.info’ (2.56 MiB of class ‘list’) and ‘FUN’ (9.76 KiB of class ‘function’).这种情况一般是因为数据集太大了,由运行内存不足导致,这种情况目前找到的解决办法是随机取样后找DEG,结果也比较可靠,运行起来也快很多。以下是示例封装的函数**subset_deg **,部分参数与上面一致,其它参数与FindAllMarkers一致。
subset_deg <- function(obj,group.by="seurat_clusters",sp.size=NULL,output="./", min.pct=0.25,logfc.threshold=0.25,only.pos=F,assays ="RNA",order=F) {all <- DietSeurat(obj)if (!is.null(sp.size)) {seob_list <- list()i <- 1for (sc in unique(all@meta.data[,group.by])){cellist <- colnames(all)[which(all@meta.data[,group.by] == sc)]ob <- subset(all, cells=cellist)if (length(colnames(ob)) > sp.size) {ob <- subset(ob,cells=sample(colnames(ob), sp.size))}seob_list[] <- obi <- i+1}all <- Reduce(merge,seob_list)}all_markers <- FindAllMarkers(all, only.pos = only.pos, min.pct = min.pct, logfc.threshold = logfc.threshold, verbose = F,assays = assays,order=order)write.table(all_markers, paste0(output, "deg_sample",sp.size,".xls"),sep="\t",quote=F)}也可以简写成下面的形式:
subset_deg <- function(obj,group.by="seurat_clusters",sp.size=NULL,output="./", min.pct=0.25,logfc.threshold=0.25,only.pos=F,assays ="RNA",order=F,diet="true",sp.total=1000) {all <- sample_seob(obj, sp.size=sp.size,group.by=group.by,diet=diet,sp.total=sp.total)all_markers <- FindAllMarkers(all, only.pos = only.pos, min.pct = min.pct, logfc.threshold = logfc.threshold, verbose = F,assays = assays,order=order)write.table(all_markers, paste0(output, "deg_sample",sp.size,".xls"),sep="\t",quote=F)}升级版
突然看到官网有函数是能简便实现这种分组随机取细胞数的,转换Idents后就能根据Idents信息分类别随机选取细胞。
Idents(pbmc) <- "orig.ident"# Downsample the number of cells per identity classob1 <- subset(x = pbmc, downsample = 100)小结与补充