[R] Problem with comparing multiple data sets
David L Carlson
dcarlson at tamu.edu
Thu May 28 02:31:45 CEST 2015
cat(paste0("[", 1:length(Out), "] #dac ", Out), sep="\n")
David
From: Mohammad Alimohammadi [mailto:mxalimohamma at ualr.edu]
Sent: Wednesday, May 27, 2015 2:29 PM
To: David L Carlson; r-help at r-project.org
Subject: Re: [R] Problem with comparing multiple data sets
Thanks David it worked !
One more thing. I hope it's not complicated. Is it also possible to display the terms for each row next to it?
for example:
[1] #dac 2
[2] #dac 0
[3] #dac 1
...
On Wed, May 27, 2015 at 2:18 PM, David L Carlson <dcarlson at tamu.edu<mailto:dcarlson at tamu.edu>> wrote:
Save the result of the apply() function:
Out <- apply(df[ ,2:length(df)], 1, mfv)
Then there are several options:
Approximately what you asked for
data.frame(Out)
t(t(Out))
More typing but exactly what you asked for
cat(paste0("[", 1:length(Out), "] ", Out), sep="\n")
David L. Carlson
Department of Anthropology
Texas A&M University
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org>] On Behalf Of Mohammad Alimohammadi
Sent: Wednesday, May 27, 2015 1:47 PM
To: John Kane; r-help at r-project.org<mailto:r-help at r-project.org>
Subject: Re: [R] Problem with comparing multiple data sets
Ok. so I read about the ("modeest") package that gives the results that I
am looking for (most repeated value).
I modified the data frame a little and moved the text to the first column.
This is the data frame with all 3 possible classes for each term.
=================================
structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L,
4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("#dac",
"#mac,#security",
"accountability,anonymous", "data security,encryption,security"
), class = "factor"), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L,
2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L),
class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c("terms", "class.1",
"class.2", "class.3"), class = "data.frame", row.names = c(NA,
-49L))
=============================================
#Then I applied the function below:
======================
library(modeest)
df<- read.csv(file="short.csv", head= TRUE, sep=",")
apply(df[ ,2:length(df)], 1, mfv)
============================
# It gives the most frequent value for each row which is what I need. The
only problem is that all the values are displayed in one single row.
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 1 1 1 1 0 0 0 0 2 1 2
It would be much better to show them in separate rows.
For example:
[1] 0
[2] 0
[3] 1
....
Any idea how to do this?
On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi <
mxalimohamma at ualr.edu<mailto:mxalimohamma at ualr.edu>> wrote:
> Hi Jim,
>
> Thank you for your advice.
>
> I'm not sure how to exactly incorporate this function though. I added a
> portion of the actual data sets. all 3 data sets have the same items (text)
> with different class values. So I need to assign the most repeated class
> (0,1,2) for each text.
>
> For example: if line1 has text "aaa". It may be assigned to class 0 in
> dat1, 2 in dat 2 and 0 in dat3. in this case the "aaa" will be assigned to
> 0 (most repeated value). So it goes for each text.
>
> I really appreciate your help.
>
> =========================================
>
> *dat1*
>
> structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
> 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
> c("#dac",
> "#mac,#security", "accountability,anonymous", "data
> security,encryption,security"
> ), class = "factor")), .Names = c("class.1", "terms"), class =
> "data.frame", row.names = c(NA,
> -49L))
>
>
> *dat2*
>
> structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L,
> 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
> c("#dac",
> "#mac,#security", "accountability,anonymous", "data
> security,encryption,security"
> ), class = "factor")), .Names = c("class.2", "terms"), class =
> "data.frame", row.names = c(NA,
> -49L))
>
>
> *dat3*
>
> structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
> 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L), terms = structure(c(1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
> c("#dac",
> "#mac,#security", "accountability,anonymous", "data
> security,encryption,security"
> ), class = "factor")), .Names = c("class.3", "terms"), class =
> "data.frame", row.names = c(NA,
> -49L))
>
> ===========================================================
>
>
> On Sun, May 24, 2015 at 1:15 AM, Jim Lemon <drjimlemon at gmail.com<mailto:drjimlemon at gmail.com>> wrote:
>
>> Hi Mohammad,
>> You know, I thought this would be fairly easy, but it wasn't really.
>>
>> df1<-data.frame(Class=c(0,2,1),Comment=c("com1","com2","com3"),
>> Term=c("aac","aax","vvx"),Text=c("text1","text2","text3"))
>> df2<-data.frame(Class=c(0,2,1),Comment=c("com1","com2","com3"),
>> Term=c("aac","aax","vvx"),Text=c("text1","text2","text3"))
>> df3<-data.frame(Class=c(2,1,0),Comment=c("com1","com2","com3"),
>> Term=c("aac","aax","vvx"),Text=c("text1","text2","text3"))
>> dflist<-list(df1,df2,df3)
>> dflist
>>
>> # define a function that extracts the value from one field
>> # selected by a value in another field
>> extract_by_value<-function(x,field1,value1,field2) {
>> return(x[x[,field1]==value1,field2])
>> }
>>
>> # define another function that equates all of the values
>> sub_value<-function(x,field1,value1,field2,value2) {
>> x[x[,field1]==value1,field2]<-value2
>> return(x)
>> }
>>
>> conformity<-function(x,fieldname1,value1,fieldname2) {
>> # get the most frequent value in fieldname2
>> # for the desired value in fieldname1
>> most_freq<-as.numeric(names(which.max(table(unlist(lapply(x,
>> extract_by_value,fieldname1,value1,fieldname2))))))
>> # now set all the values to the most frequent
>> for(i in 1:length(x))
>> x[[i]]<-sub_value(x[[i]],fieldname1,value1,fieldname2,most_freq)
>> return(x)
>> }
>>
>> conformity(dflist,"Text","text1","Class")
>>
>> Jim
>>
>> On Sat, May 23, 2015 at 11:23 PM, John Kane <jrkrideau at inbox.com<mailto:jrkrideau at inbox.com>> wrote:
>> > Hi Mohammad
>> >
>> > Welcome to the R-help list.
>> >
>> > There probably is a fairly easy way to what you want but I think we
>> probably need a bit more background information on what you are trying to
>> achieve. I know I'm not exactly clear on your decision rule(s).
>> >
>> > It would also be very useful to see some actual sample data in useable
>> R format.Have a look at these links
>> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>> and http://adv-r.had.co.nz/Reproducibility.html for some hints on what
>> you might want to include in your question.
>> >
>> > In particular, read up about dput() in those links and/or see ?dput.
>> This is the generally preferred way to supply sample or illustrative data
>> to the R-help list. It basically creates a perfect copy of the data as it
>> exists on 'your' machine so that R-help readers see exactly what you do.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > John Kane
>> > Kingston ON Canada
>> >
>> >
>> >> -----Original Message-----
>> >> From: mxalimohamma at ualr.edu<mailto:mxalimohamma at ualr.edu>
>> >> Sent: Fri, 22 May 2015 12:37:50 -0500
>> >> To: r-help at r-project.org<mailto:r-help at r-project.org>
>> >> Subject: [R] Problem with comparing multiple data sets
>> >>
>> >> Hi everyone,
>> >>
>> >> I am very new to R and I have a task to do. I appreciate any help. I
>> have
>> >> 3
>> >> data sets. Each data set has 4 columns. For example:
>> >>
>> >> Class Comment Term Text
>> >> 0 com1 aac text1
>> >> 2 com2 aax text2
>> >> 1 com3 vvx text3
>> >>
>> >> Now I need t compare the class section between 3 data sets and assign
>> the
>> >> most available class to that text. For example if text1 is assigned to
>> >> class 0 in data set 1&2 but assigned as 2 in data set 3 then it should
>> be
>> >> assigned to class 0. If they are all the same so the class will be the
>> >> same. The ideal thing would be to keep the same format and just update
>> >> the
>> >> class. Is there any easy way to do this?
>> >>
>> >> Thanks a lot.
>> >>
>> >> [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> > ____________________________________________________________
>> > FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
>> >
>> > ______________________________________________
>> > R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Mohammad Alimohammadi | Graduate Assistant
> University of Arkansas at Little Rock | College of Science and Mathematics
> (CSAM)
> | mxalimohamma at ualr.edu<mailto:mxalimohamma at ualr.edu> | ualr.edu<http://ualr.edu>
>
> Public URL: http://scholar.google.com/citations?user=MsfN_i8AAAAJ
>
--
Mohammad Alimohammadi | Graduate Assistant
University of Arkansas at Little Rock | College of Science and Mathematics
(CSAM)
501.346.8007<tel:501.346.8007> | mxalimohamma at ualr.edu<mailto:mxalimohamma at ualr.edu> | ualr.edu<http://ualr.edu>
Public URL: http://scholar.google.com/citations?user=MsfN_i8AAAAJ
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Mohammad Alimohammadi | Graduate Assistant
University of Arkansas at Little Rock | College of Science and Mathematics (CSAM)
501.346.8007 | mxalimohamma at ualr.edu<mailto:mxalimohamma at ualr.edu> | ualr.edu<http://ualr.edu/>
Public URL: http://scholar.google.com/citations?user=MsfN_i8AAAAJ
[https://mailtrack.io/trace/mail/d062b2c3f56ab0e306570c96c3e24fb7c7b80685.png]
[[alternative HTML version deleted]]
More information about the R-help
mailing list