[BioC] merge error: 'by.x' and 'by.y' specify different numbers of columns

James W. MacDonald jmacdon at uw.edu
Fri Jun 27 15:45:10 CEST 2014


Hi Nhu Quynh Tran,

You are having problems with merge(), which is a base R function, so 
this question is better asked on R-help.

On 6/26/2014 6:35 PM, Tran, Nhu Quynh T wrote:
> Hi bioconductor group,
>
> I'm working on a chIPseq dataset and trying to annotated my peaks.  So, after getting the gene symbol using biomart and want to merge it back to my bed file and it give me this error.  Any help is appreciated.  Thanks. QT
>
>
>      bed_file_orig <- read.delim(file_name, header=FALSE, skip=1)
>              bed_file <- bed_file_orig[!(stri_sub(bed_file_orig$V1, 1, 2)%in%c("HG", "MT", "GL")),]
>              peakList <- BED2RangedData(bed_file)
>              annotatedPeak = annotatePeakInBatch(peakList, AnnotationData=hs_annotation_tss)
>
>              #add gene ids to the peak: using addGeneIDs gives error if the database does not contain the feature.  So use biomart
>              #annotatedPeak_tss <- addGeneIDs(annotatedPeak_tss,"org.Hs.eg.db",c("symbol", "genename"))
>              feature_ids <- unique(annotatedPeak$feature)
>              feature_ids<-feature_ids[!is.na(feature_ids)]
>              feature_ids<-feature_ids[feature_ids!=""]
>              IDs2Add<-getBM(attributes=c("ensembl_gene_id","external_gene_id"),filters = "ensembl_gene_id", values = feature_ids, mart=mart)
>
>              out_file_name <- paste("../data/processed/",patient,"_",TF,"_",cond_out, ".csv", sep="")
>              write.csv(annotatedPeak, file=out_file_name)
>              annotatedPeak <- read.csv(out_file_name)
>              annotatedPeak_reorder <- annotatedPeak[,c(9, 1:8, 10:15)]
>              annotatedPeak_tss <- merge(annotatedPeak_reorder, IDs2Add, by.x="feature", b.y="ensembl_gene_id")
>

Is that the actual code you used? If so, note that you used the argument 
'b.y', not 'by.y'. So that is likely to be the reason.

Best,

Jim


>
> Error in merge.data.frame(annotatedPeak_reorder, IDs2Add, by.x = "feature",  :
>    'by.x' and 'by.y' specify different numbers of columns
> _______________________________
> Nhu Quynh T. Tran, Ph.D.
> Assistant Professor of Preventive Medicine
> University of Tennessee Health Science Center
> 66 N. Pauline, Suite 633
> Memphis, TN 38105
> Phone: 901-448-1361
> Fax: 901-448-7041
> Email: qtran1 at uthsc.edu<mailto:qtran1 at uthsc.edu>
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list