[R] closeness of codes

Jim Lemon jim at bitwrit.com.au
Tue Sep 20 11:15:46 CEST 2011


On 09/19/2011 04:46 PM, Henri-Paul Indiogine wrote:
> Greetings!
>
> I am using the R library RQDA to assign certain codes to paragraphs of
> documents in a collection.   Several paragraphs are assigned more than
> 1 code.  E.g. often the codes "poverty" and "education" will be
> assigned to the same paragraph.   Often also "math" and "career" will
> be given to the same paragraphs.  Other codes are never given to the
> same paragraphs.
>
> I would like to calculate the relationship or "closeness" of certain
> codes.  RQDA will generate a cross-codes table.  It has the form of an
> upper triangular matrix where the upper triangle has the number of
> cross occurrences of 2 codes at their intersection.  The lower
> triangle is filled with NA.  The diagonal simply has the number of
> occurrences of the codes by themselves.
>
> The row names are the names of the codes and the column names are the
> IDs of the codes.  E.g.
>
>             1     2     3    4
> code1  3     0      2    1
> code2  NA  4     1     0
> code3  NA NA   2     0
> code4  NA NA  NA   3
>
> We can see that code1 is associated 2 out of 3 times with code3.
> Code2 is present 1 out of 4 times with code3.  Code2 is never assigned
> to the same paragraph as Code1 and Code4 are, and so on.
>
> I am trying to understand how to create some sort of graph or diagram
> to represent this.  Should I use a cluster diagram or a network graph?
>   Also, what sort of R code could I use?

Hi Henri,
The intersectDiagram function in the plotrix package displays the 
intersections of sets as rectangles with widths (and areas) proportional 
to the number of members of each set intersection. This may be a way for 
you to represent your codes. For your example, you could proceed like 
this. Create a file ("hp.csv")containing the following:

paragraph,attribute
p1,code1
p1,code3
p2,code1
p2,code3
p3,code1
p3,code4
p4,code2
p5,code2
p6,code2
p7,code2
p7,code3
p8,code3
p9,code3
p10,code4
p11,code4
p12,code4

then:

library(plotrix)
hp<-read.csv("hp.csv")
intersectDiagram(hp,main="Combinations of codes")

There are other ways to represent your original data that 
intersectDiagram will read in that you might like to try.

Jim



More information about the R-help mailing list