Seurat分组随机选取细胞数实战(随机采样后找亚群DEG) 2022-06-01

  • obj Seurat对象
  • 分组名,默认使用聚类结果seurat_clusters
  • sp.size 取样大小,也就是对分组里的每一个类别选取的细胞数,例如设置为100,将对cluster 1取100个细胞,cluster 2也取100个细胞,以此类推,如果某个cluster的细胞数不足100个将选取这个cluster的所有细胞。
  • diet 是否使用 DietSeurat函数对Seurat对象进行瘦身,默认为true,因为如果Seurat对象包含scale.data等信息会很耗内存,瘦身后能减少内存并加快分析速度。
  • 选取的总细胞数,为可选项,有时候不知道每个亚群取多少细胞数合适,只想大概取到一定的细胞数,例如1000,就可以用sp.total参数,注意,只有sp.size没有赋值的时候,sp.total参数才会生效,即这两个参数是二选一即可。
sample_seob <- function(obj,"seurat_clusters",sp.size=NULL,diet="true", {all <- objif (diet=="true") {all <- DietSeurat(all)}if (is.null(sp.size)) {nlen <- length(unique([,]))sp.size <- ceiling(}seob_list <- list()i <- 1for (sc in unique([,])){cellist <- colnames(all)[which([,] == sc)]ob <- subset(all, cells=cellist)if (length(colnames(ob)) > sp.size) {ob <- subset(ob,cells=sample(colnames(ob), sp.size))}seob_list[] <- obi <- i+1}all <- Reduce(merge,seob_list)return(all)}脚本示例

> pbmc_smallAn object of class Seurat230 features across 80 samples within 1 assayActive assay: RNA (230 features, 20 variable features) 2 dimensional reductions calculated: pca, tsne> unique(pbmc_small$RNA_snn_res.1)[1] 0 2 1Levels: 0 1 2使用sp.size=10,按RNA_snn_res.1分组,每种类型取10个细胞。
> all <- sample_seob(pbmc_small,sp.size=10,'RNA_snn_res.1')> allAn object of class Seurat230 features across 30 samples within 1 assayActive assay: RNA (230 features, 0 variable features)使用,按RNA_snn_res.1分组,取大约50个细胞。
> all <- sample_seob(pbmc_small,,'RNA_snn_res.1')> allAn object of class Seurat230 features across 51 samples within 1 assayActive assay: RNA (230 features, 0 variable features)分析实战——分组随机采样后找每个亚群的DEG

Warning: When testing 15 versus all:        The total size of the 5 globals that need to be exported for the future expression (‘FUN()’) is 2.06 GiB. This exceeds the maximum allowed size of 500.00 MiB (option 'future.globals.maxSize'). The three largest globals are ‘data.use’ (2.06 GiB of class ‘S4’), ‘’ (2.56 MiB of class ‘list’) and ‘FUN’ (9.76 KiB of class ‘function’).这种情况一般是因为数据集太大了,由运行内存不足导致,这种情况目前找到的解决办法是随机取样后找DEG,结果也比较可靠,运行起来也快很多。以下是示例封装的函数**subset_deg **,部分参数与上面一致,其它参数与FindAllMarkers一致。
subset_deg <- function(obj,"seurat_clusters",sp.size=NULL,output="./",                       min.pct=0.25,logfc.threshold=0.25,only.pos=F,assays ="RNA",order=F) {all <- DietSeurat(obj)if (!is.null(sp.size)) {seob_list <- list()i <- 1for (sc in unique([,])){cellist <- colnames(all)[which([,] == sc)]ob <- subset(all, cells=cellist)if (length(colnames(ob)) > sp.size) {ob <- subset(ob,cells=sample(colnames(ob), sp.size))}seob_list[] <- obi <- i+1}all <- Reduce(merge,seob_list)}all_markers <- FindAllMarkers(all, only.pos = only.pos, min.pct = min.pct, logfc.threshold = logfc.threshold, verbose = F,assays = assays,order=order)write.table(all_markers, paste0(output, "deg_sample",sp.size,".xls"),sep="\t",quote=F)}也可以简写成下面的形式:
subset_deg <- function(obj,"seurat_clusters",sp.size=NULL,output="./",                       min.pct=0.25,logfc.threshold=0.25,only.pos=F,assays ="RNA",order=F,diet="true", {all <- sample_seob(obj, sp.size=sp.size,,diet=diet, <- FindAllMarkers(all, only.pos = only.pos, min.pct = min.pct, logfc.threshold = logfc.threshold, verbose = F,assays = assays,order=order)write.table(all_markers, paste0(output, "deg_sample",sp.size,".xls"),sep="\t",quote=F)}升级版

Sample_seob <- function(obj,"seurat_clusters",sp.size=NULL,diet="true", {all <- objif (diet=="true") {all <- DietSeurat(all,dimreducs = c('pca','umap'))}if (is.null(sp.size)) {nlen <- length(unique([,]))sp.size <- ceiling(}ncellist <- c()for (sc in unique([,])){cellist <- colnames(all)[which([,] == sc)]if (length(cellist) > sp.size) {cellist=sample(cellist, sp.size)}ncellist <- c(ncellist,cellist)}all <- subset(all,cells=ncellist)return(all)}极简版

Idents(pbmc) <- "orig.ident"# Downsample the number of cells per identity classob1 <- subset(x = pbmc, downsample = 100)小结与补充

