[R] comparing two strings from data

Eric Berger ericjberger at gmail.com
Fri Oct 13 06:54:20 CEST 2017


One additional comment. If you want 0 instead of NA when there is no match
then the match statement should read:

match_list <- match( data_2$data1, data_2$data2, nomatch=0)



On Fri, Oct 13, 2017 at 7:39 AM, Eric Berger <ericjberger at gmail.com> wrote:

> Combining and completing the advice from Greg and Boris the complete
> solution is two lines:
>
> data_2 <- read.csv("excel_data.csv", stringsAsFactors = FALSE)
> match_list <- match( data_2$data1, data_2$data2 )
>
> The vector match_list will have the matching position when it exists and
> NA's otherwise. Its length will be the same as the length of data_2$data1.
>
> You should get experience in reading the help information for R functions.
> In this case, type ?match to get information about the 'match' function.
>
> HTH,
> Eric
>
>
> On Fri, Oct 13, 2017 at 12:16 AM, Boris Steipe <boris.steipe at utoronto.ca>
> wrote:
>
>> It's generally a very good idea to examine the structure of data after
>> you have read it in. str(data2) would have shown you that read.csv() turned
>> your strings into factors, and that's why the == operator no longer does
>> what you think it does.
>>
>> use ...
>>
>> data_2 <- read.csv("excel_data.csv", stringsAsFactors = FALSE)
>>
>> ... to turn this off. Also, the %in% operator will achieve more directly
>> what you are trying to do. No need for loops.
>>
>> B.
>>
>>
>>
>>
>> > On Oct 12, 2017, at 4:25 PM, Yasin Gocgun <yasing053 at gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > I have two columns that contain numbers along with letters (as shown
>> below)
>> > and have different lengths. Each entry in the first column is likely to
>> be
>> > found in the second column at most once.
>> >
>> > For each entry of the first column, if that entry is found in the second
>> > column, I would like to get the corresponding index. For instance, if
>> the
>> > first entry of the first column is 5th entry in the second column, I
>> would
>> > like to keep this index 5.
>> >
>> > AST2017000005534   TUR2017000001428
>> > CTS2017000079930    CTS2017000071989
>> > CTS2017000079931     CTS2017000072015
>> >
>> > In a loop, when I use the following code to get those indices,
>> >
>> >
>> > data_2 = read.csv("excel_data.csv")
>> > column_1 = data_2$data1
>> > column_2 = data_2$data2
>> >
>> > match_list <- array(0,dim=c(310,1));  # 310 is the length of the first
>> > column
>> >
>> > for (indx in 1: 310){
>> >    for(indx2 in 1:713){ # 713 is the length of the second column
>> >        if(column_1[indx] == column_2[indx2] ){
>> >            match_list[indx,1] = indx2;
>> >            break;
>> >        }
>> >    }
>> > }
>> >
>> >
>> > R provides the following error:
>> >
>> > Error in Ops.factor(column_1[indx], column_2[indx2]) :
>> >  level sets of factors are different
>> >
>> > So can someone explain me how I can resolve this issue?
>> >
>> > Thnak you,
>> >
>> > Yasin
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list