[R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.
results=()#character()
myVariableNames=names(x.val)
results[length(myVariableNames)]<-NA
for (i in myVariableNames){
results[i]<-names(x.val[[i]]) # this does not work it returns a
NULL (how can i convert this to x.val$"somevalue" ? )
}
> Hi All,
> I am having difficulties finding a way to find a substitute to the command "names(v.val$PR14)" so that I could generate the command on the fly for all PR14 to PR200 (please see the previous discussion below to understand what the object x.val contains) . I have tried the following
>
> >results=()#character()
> >myVariableNames=names(x.val)
> >results[length(myVariableNames)]<-NA
>
> >for as.vector(unlist(strsplit(str,",")),mode="list")
> + results[i]<-names(x.val$i) # this does not work it returns a NULL (how can i convert this to x.val$"somevalue" ? )
> >}
>
> Allan.
>
>
>
> Thanks so much Jim, Andaikalavan, Gabor and others for the help and suggestions.
> The solution will result in a matrix containing nested matrices to enable each variable name, each variables distinct value and the count of the distinct value to be accessible individually.
> The main matrix will contain the variable names, the first level nested matrices will consist of the variables unique values, and each such variable entry will contain a one element vector to contain the count or occurrence frequency.
> This matrix can now be used in comparing other similar datasets for variable values and their frequencies.
>
> Building on the input received so far, a probable solution in building the matrix will include the following.
>
>
> 1)I reading the csv file (containing column headers)
> >my_data=read.table("<path/to/my/data.csv>",header=TRUE,sep=",",dec=".",fill=TRUE)
>
> 2)I group the values in each variable producing an occurrence count(frequency)
> >x.val<-apply(my_data,2,table)
>
> 3)I obtain a vector of the names of the variables in the table
> >names(x.val)
>
> 4)Now I make use of the names (obtained in step 3) to obtain a vector of distinct values in a given variable (in the example below the variable name is $PR14)
> >names(v.val$PR14)
>
> 5)I obtain a vector (with one element) of the frequency of a value obtained from the step above (in our example the value is "V")
> >as.vector(x.val$PR14["V"])
>
> Todo:
> Now I will need to place the steps above in a script (consisting of loops) to build the matrix, step 4 and 5 seem tricky to do programatically.
>
> Allan.
>
>
>
> Also if you want to access the individual values, you can just leave
> it as a list:
>
> > x.val <- apply(x, 2, table)
> > # access each value
> > x.val$PR14["V"]
> V
> 8
>
>
>
> On 7/25/07, Allan Kamau <kamauallan at yahoo.com> wrote:
> > A subset of the data looks as follows
> >
> > > df[1:10,14:20]
> > PR10 PR11 PR12 PR13 PR14 PR15 PR16
> > 1 V T I K V G D
> > 2 V S I K V G G
> > 3 V T I R V G G
> > 4 V S I K I G G
> > 5 V S I K V G G
> > 6 V S I R V G G
> > 7 V T I K I G G
> > 8 V S I K V E G
> > 9 V S I K V G G
> > 10 V S I K V G G
> >
> > The result I would like is as follows
> >
> > PR10 PR11 PR12 ...
> > [V:10] [S:7,T:3] [I:10]
> >
> > The result can be in a matrix or a vector and each variablename, value and frequency should be accessible so as to be used for comparisons with another dataset later.
> > The frequency can be a count or a percentage.
> >
> >
> > Allan.
> >
> >
> >
> > The name of the table should give you the "value". And if you have a
> > matrix, you just need to convert it into a vector first.
> >
> > > m <- matrix( LETTERS[ c(1:3, 3:5, 2:4) ], nc=3 )
> > > m
> > [,1] [,2] [,3]
> > [1,] "A" "C" "B"
> > [2,] "B" "D" "C"
> > [3,] "C" "E" "D"
> > > tb <- table( as.vector(m) )
> > > tb
> >
> > A B C D E
> > 1 2 3 2 1
> > > paste( names(tb), ":", tb, sep="" )
> > [1] "A:1" "B:2" "C:3" "D:2" "E:1"
> >
> > If this is not what you want, then please give a simple example.
> >
> > Regards, Adai
> >
> >
> >
> > Allan Kamau wrote:
> > > Hi all,
> > > If the question below as been answered before I
> > > apologize for the posting.
> > > I would like to get the frequencies of occurrence of
> > > all values in a given variable in a multivariate
> > > dataset. In short for each variable (or field) a
> > > summary of values contained with in a value:frequency
> > > pair, there can be many such pairs for a given
> > > variable. I would like to do the same for several such
> > > variables.
> > > I have used table() but am unable to extract the
> > > individual value and frequency values.
> > > Please advise.
> > >
> > > Allan.
> > >
>
