[R] Merging two columns of unequal length

Bailey Hewitt bailster at hotmail.com
Wed Dec 14 16:13:49 CET 2016


Sorry for the delay! Thank you very much! I think I am getting a better understanding of my options from what you have said. Thanks again for the quick replies and the information, I really appreciate it!


Bailey


________________________________
From: R-help <r-help-bounces at r-project.org> on behalf of Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
Sent: December 13, 2016 9:23 AM
To: William Michels; William Michels via R-help; r-help at R-project.org
Subject: Re: [R] Merging two columns of unequal length

I frequently work with mismatched-length data, but I think I would rarely want this behaviour because there is no compelling reason to believe that all of the NA values should wind up at the end of the data as you suggest. Normally there is a second column that controls where things should line up, and the merge function handles that reliably. If merge is not appropriate then I usually regard that as a warning that those data should perhaps be rbinded or stacked rather than cbinded.

I think Hadley Wickham's paper on tidy data [1] describes this philosophy well.

[1] https://www.jstatsoft.org/article/view/v059i10
Tidy Data | Wickham | Journal of Statistical Software<https://www.jstatsoft.org/article/view/v059i10>
www.jstatsoft.org
Authors: Hadley Wickham: Title: Tidy Data: Abstract: A huge amount of effort is spent cleaning data to get it ready for analysis, but there has been little research ...



--
Sent from my phone. Please excuse my brevity.

On December 13, 2016 2:15:15 AM PST, William Michels via R-help <r-help at r-project.org> wrote:
>You should review "The Recycling Rule in R" before attempting to
>perform functions on 2 or more vectors of unequal lengths:
>
>https://cran.r-project.org/doc/manuals/R-intro.html#The-recycling-rule
>
>Most often, the "Recycling Rule" does exactly what the researcher
>intends (automatically). And in many cases, performing functions on
>data of unequal (or not evenly divisible) lengths is either 1) an
>indication of problems with the input data, or 2) an indication that
>the researcher is unnecessarily 'forcing' data into a rectangular data
>structure, when another approach might be better (e.g. the use of the
>tapply function).
>
>However, if you see no other way, the functions "cbind.na" and/or
>"rbind.na" available from Andrej-Nikolai Spiess perform binding of
>vectors without recycling:
>
>http://www.dr-spiess.de/Rscripts.html
Supplemental Data<http://www.dr-spiess.de/Rscripts.html>
www.dr-spiess.de
data.frame.na Create a dataframe with variables of unequal length avoiding repetition or errors by filling with NA s. In contrast to classical data.frame, data.frame ...



>
>All you have to do is download and source the correct R-script, and
>call the function:
>
>> cbind(1:5, 1:2)
>     [,1] [,2]
>[1,]    1    1
>[2,]    2    2
>[3,]    3    1
>[4,]    4    2
>[5,]    5    1
>
>Warning message:
>In cbind(1:5, 1:2) :
>  number of rows of result is not a multiple of vector length (arg 2)
>
>> source("/Users/myhomedirectory/Downloads/cbind.na.R")
>> cbind.na(1:5, 1:2)
>     [,1] [,2]
>[1,]    1    1
>[2,]    2    2
>[3,]    3   NA
>[4,]    4   NA
>[5,]    5   NA
>>
>
>This issue arises so often, Dr. Spiess's two scripts "rbind.na" and
>"cbind.na" have my vote for inclusion into the base-R distribution.
>
>Best of luck,
>
>W Michels, Ph.D.
>
>
>On Mon, Dec 12, 2016 at 3:41 PM, Bailey Hewitt <bailster at hotmail.com>
>wrote:
>>
>> Dear R Help,
>>
>>
>> I am trying to put together two columns of unequal length in a data
>frame. Unfortunately, so far I have been unsuccessful in the functions
>I have tried (such as cbind). The code I am currently using is : (I
>have highlighted the code that is not working)
>>
>>
>> y<- mydata[,2:75]
>>
>> year <- mydata$Year
>>
>> res <- data.frame()
>>
>> for (i in 1:74){
>>
>>   y.val <- y[,i]
>>
>>   lake.lm= lm(y.val ~ year)
>>
>>   lake.res=residuals(lake.lm)
>>
>>   new.res <- data.frame(lake.res=lake.res)
>>
>>   colnames(new.res) <- colnames(y)[i]
>>
>> #cbind doesn't work because of the unequal lengths of my data columns
>>
>>   res <- cbind(res, new.res)
>>
>>   print(res)
>>
>> }
>>
>>
>> mydata is a csv file with "Year" from 1950 on as my first column and
>then each proceeding column has a lake name and a day of year (single
>number) in each row.
>>
>>
>> Please let me know if there is any more information I can provide as
[[elided Hotmail spam]]
>>
>>
>> Bailey Hewitt
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help

thz.ch/mailman/listinfo/r-help>
stat.ethz.ch
The main R mailing list, for announcements about the development of R and the availability of new code, questions and answers about problems and solutions using R ...



>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list