学习GEO芯片数据下载时踩了各种坑。记录如下:
跟从老师解说,实验使用GEOquery下载:
library('GEOquery')library(dplyr)library(tidyverse)gset <- getGEO(GEO='GSE87211', destdir=".", getGPL = F)### destdir存储目次位置,getGPL=F为拒绝下载解释文件报错。下载龟速,且报错 Timeout of 60 seconds was reached
Found 3 file(s)GSE12417-GPL570_series_matrix.txt.gztrying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE12nnn/GSE12417/matrix/GSE12417-GPL570_series_matrix.txt.gz'Content type 'application/x-gzip' length 23572020 bytes (22.5 MB)========================> options(timeout=60)> gset <- getGEO(GEO='GSE87211', destdir=".",getGPL = F)Found 1 file(s)GSE87211_series_matrix.txt.gztrying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz'Content type 'application/x-gzip' length 35235899 bytes (33.6 MB)downloaded 688 KBError in download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", : download from 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz' failedIn addition: Warning messages:1: In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", : downloaded length 704512 != reported length 352358992: In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", : URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz': Timeout of 60 seconds was reached办理Timeout of 60 seconds was reached(我的Rstudio server原先设定等候时间仅为60s)
#检察timout时间> getOption('timeout')[1] 60#设定timeout时间> options(timeout=100000)##确认一下> getOption('timeout')[1] 1e+05再次运行GEOquery的getGEO。代码顺遂运行,但因某些缘故因由仍下载龟速。
有人提出办理方案:
options( 'download.file.method.GEOquery' = 'libcurl' )## libcurl LibCurl是免费的URL传输库仅有一点点改善,依然龟速。
告急百度,实验使用geoChina代码。此代码基于AnnoProbe包。先安装AnnoProbe。
> install.packages('AnnoProbe')> library(AnnoProbe)#更新镜像库> devtools::install_git("https://gitee.com/jmzeng/GEOmirror")#使用中国镜像下载GEO数据> gset <- AnnoProbe::geoChina(gse='GSE87211', mirror = 'tencent', destdir = '.')#此处mirror仅有企鹅源下载乐成。
Found 1 file(s)GSE87211_series_matrix.txt.gztrying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz'Content type 'application/x-gzip' length 35235899 bytes (33.6 MB)==> gset <- AnnoProbe::geoChina(gse='GSE87211', mirror = 'tencent', destdir = '.')trying URL 'http://49.235.27.111/GEOmirror/GSE87nnn/GSE87211_eSet.Rdata'Content type 'application/octet-stream' length 31922908 bytes (30.4 MB)==================================================downloaded 30.4 MBfile downloaded in .you can also use getGEO from GEOquery, by getGEO('GSE87211', destdir=".", AnnotGPL = F, getGPL = F)>
经比对,与getGEO代码下载所得数据没有差别。 |