[R] Merging data frames, or one column/vector with a data frame filling out empty rows with NA's

Sarah Goslee sarah.goslee at gmail.com
Wed Apr 22 15:37:35 CEST 2009


Hi,

How about this:

> SNP5 <- merge(SNP4, SNP1[,2:3], all.x=TRUE)
> SNP5
  Marker    Animal                  Y x
1  P1001 194073197 0.021088 2
2  P1002 194073197 0.021088 1
3  P1004 194073197 0.021088 2
4  P1005 194073197 0.021088 0
5  P1006 194073197 0.021088 2
6  P1007 194073197 0.021088 0

This ignores Animal, and that may or may not be what you want -
it wasn't clear from your question.

But your error is due to memory limitations - could be due to
specifying the wrong merge, or to having files larger than your
computer can handle. This is a good job for a proper database.

>> SNP5 <- merge(SNP4, SNP1$x, by.x = 'Marker', by.y = 'Marker', all = TRUE)
> Error in fix.by(by.y, y) : 'by' must specify valid column(s)

If you just include SNP1$x, there is no Marker column to merge on. You
need to include at least two columns.

On Wed, Apr 22, 2009 at 3:30 AM, joe1985 <johannes at dsr.life.ku.dk> wrote:
>
> Hello
>
> I have two data frames, SNP4 and SNP1:
>
>> head(SNP4)
>          Animal     Marker        Y
> 3213 194073197  P1001 0.021088
> 1295 194073197  P1002 0.021088
> 915   194073197  P1004 0.021088
> 2833 194073197  P1005 0.021088
> 1487 194073197  P1006 0.021088
> 1885 194073197  P1007 0.021088
>
>> head(SNP1)
>           Animal    Marker x
> 3213 194073197  P1001 2
> 1295 194073197  P1002 1
> 915   194073197  P1004 2
> 2833 194073197  P1005 0
> 1487 194073197  P1006 2
> 1885 194073197  P1007 0
>
> I want these two data frames merged by 'Marker', but when i try
>
>> SNP5 <- merge(SNP4, SNP1, by = 'Marker', all = TRUE)
> Error: cannot allocate vector of size 2.4 Gb
> In addition: Warning messages:
> 1: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
>  Reached total allocation of 1535Mb: see help(memory.size)
> 2: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
>  Reached total allocation of 1535Mb: see help(memory.size)
> 3: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
>  Reached total allocation of 1535Mb: see help(memory.size)
> 4: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
>  Reached total allocation of 1535Mb: see help(memory.size)
>
> And error occurs.
>
> What i want is the column SNP1$x merged together with SNP4 by Marker, so
> some markers will have NA's in the 'x'-column in the SNP5 dataset.
>
> I also tried this
>
>> SNP5 <- merge(SNP4, SNP1$x, by.x = 'Marker', by.y = 'Marker', all = TRUE)
> Error in fix.by(by.y, y) : 'by' must specify valid column(s)
>
> I won't work either.
>
> Does anyone have any idea how to solve this.
>
> Regards,
>
> Johannes.
>
>
>
>



-- 
Sarah Goslee
http://www.functionaldiversity.org




More information about the R-help mailing list