[R] negative vector length when merging data frames

Michael Dewey ||@t@ @end|ng |rom dewey@myzen@co@uk
Thu Oct 24 12:42:28 CEST 2019


Dear Ana

Since this appears to be genetics data have you thought of looking at 
Bioconductor for help? I do not use genetic data-sets but people there 
must use big files every day three times before breakfast.

Michael

On 24/10/2019 00:33, Ana Marija wrote:
> thanks but I would need solution in R
> 
> On Wed, Oct 23, 2019 at 6:31 PM Jim Lemon <drjimlemon using gmail.com> wrote:
>>
>> I don't have it installed - that was merely a suggestion. I notice
>> that both data.table and dplyr packages are mentioned as possibilities
>> for "merge big datasets in r". Apparently the best way to do it if you
>> have a database manager is to read the two datasets into tables and do
>> the join via SQL or whatever language is available.
>>
>> Jim
>>
>> On Thu, Oct 24, 2019 at 10:17 AM Ana Marija <sokovic.anamarija using gmail.com> wrote:
>>>
>>> no can you please send me an example how the command would look like in my case?
>>>
>>> On Wed, Oct 23, 2019 at 6:16 PM Jim Lemon <drjimlemon using gmail.com> wrote:
>>>>
>>>> Yes. Have you tried the bigmemory package?
>>>>
>>>> Jim
>>>>
>>>> On Thu, Oct 24, 2019 at 10:08 AM Ana Marija <sokovic.anamarija using gmail.com> wrote:
>>>>>
>>>>> Hi Jim,
>>>>>
>>>>> I think one of the issue is that data frames are so big,
>>>>>> dim(l4)
>>>>> [1] 166941635         8
>>>>>> dim(asign)
>>>>> [1] 107371528         5
>>>>>
>>>>> so my example would not reproduce the error
>>>>>
>>>>> On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <drjimlemon using gmail.com> wrote:
>>>>>>
>>>>>> Hi Ana,
>>>>>> When I run this example taken from your email:
>>>>>>
>>>>>> l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal gene_id.LCL
>>>>>> chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
>>>>>> chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
>>>>>> chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
>>>>>> chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
>>>>>> chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
>>>>>> chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
>>>>>> header=TRUE,stringsAsFactors=FALSE)
>>>>>> asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
>>>>>> ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
>>>>>> ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
>>>>>> ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
>>>>>> ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
>>>>>> ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
>>>>>> ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
>>>>>> header=TRUE,stringsAsFactors=FALSE)
>>>>>> merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
>>>>>>   [1] X1           X2           X3           X4           X5
>>>>>> [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
>>>>>> [11] p.val.Retina
>>>>>> <0 rows> (or 0-length row.names)
>>>>>>
>>>>>> It works okay, but there are no matches in the join. So I can't even
>>>>>> guess what the problem is.
>>>>>>
>>>>>> Jim
>>>>>>
>>>>>> On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <sokovic.anamarija using gmail.com> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I have two data frames like this:
>>>>>>>
>>>>>>>> head(l4)
>>>>>>>      X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
>>>>>>> 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
>>>>>>> 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
>>>>>>> 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
>>>>>>> 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
>>>>>>> 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
>>>>>>> 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
>>>>>>>> head(asign)
>>>>>>>                gene  chr                chr_pos   pos p.val.Retina
>>>>>>> 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
>>>>>>> 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
>>>>>>> 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
>>>>>>> 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
>>>>>>> 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
>>>>>>> 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
>>>>>>>> m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
>>>>>>> Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
>>>>>>>    negative length vectors are not allowed
>>>>>>>> sapply(l4,class)
>>>>>>>            X1           X2           X3           X4           X5   variant_id
>>>>>>>   "character"  "character"  "character"  "character"  "character"  "character"
>>>>>>> pval_nominal  gene_id.LCL
>>>>>>>     "numeric"  "character"
>>>>>>>> sapply(asign,class)
>>>>>>>          gene          chr      chr_pos          pos p.val.Retina
>>>>>>>   "character"  "character"  "character"  "character"  "character"
>>>>>>>
>>>>>>> Please advise as to why I am getting this error when merging?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ana
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Michael
http://www.dewey.myzen.co.uk/home.html



More information about the R-help mailing list