[R] Changing the label name in the plot

Jim Lemon drj|m|emon @end|ng |rom gm@||@com
Wed Jun 12 08:54:38 CEST 2019


Hi Subhamitra,
I may have a better understanding. I have made up the second set of top
level domains as I have no idea what they may be. I hope that this is an
improvement.

spdf<-read.table(text="States1 DMs States2 EMs
JP 2.071 YA 2.038
CH 2.0548 YB 2.017
AT 2.0544 YC 2.007
CL 2.047 YD 1.963
ES 2.033 YE 1.947
PT 2.0327 YF 1.942
PL 2.0321 YG 1.932
FR 2.031 YH 1.924
SE 2.0293 YI 1.913
DE 2.0291 YJ 1.906
DK 2.027 YK 1.892
UK 2.022 YL 1.877
TW 1.9934 YM 1.869
NL 1.993 YN 1.849
HK 1.989 YO 1.848
LU 1.988 YP 1.836
CA 1.987 YQ 1.835
NZ 1.9849 YR 1.819
US 1.9842 YS 1.798
AU 1.981 YT 1.771
MY 1.978 YU 1.762
HU 1.968 YV 1.717
LT 1.96 YW 1.707
SG 1.958 YX 1.688
FI 1.955 YZ 1.683
CR 1.953 ZA 1.671
BY 1.952 ZB 1.664
IL 1.95 ZC 1.646
EE 1.948 ZD 1.633
NO 1.945 ZE 1.624
IE 1.937 ZF 1.621
SI 1.913 ZG 1.584
LV 1.901 ZH 1.487
SK 1.871 ZI 1.482
BH 1.801 ZJ 1.23
SK 1.761 ZK 1.129
AE 1.751 ZL 1.168
IS 1.699 ZM 0.941
BM 1.687 ZN 0.591
KW 1.668 ZO 0.387
CY 1.633 ZP 0.16
AP 1.56 ZQ 0.0002",
header = TRUE,stringsAsFactors=FALSE)
library(cluster)
k1<-kmeans(spdf[,2],centers=2,nstart=25)
k2<-kmeans(spdf[,4],centers=2,nstart=25)
plot(spdf[,2],col=k1$cluster,pch=19,ylim=c(0,2.1))
points(spdf[,4],col=k2$cluster+2,pch=19)
text(1:42+0.5,spdf[,2]+0.05,spdf[,1],col=k1$cluster)
text(1:42-0.5,spdf[,4]-0.05,spdf[,3],col=k2$cluster+2)
legend(10,1,c("cluster 1 DM","cluster 2 DM","cluster 1 EM","cluster 2 EM"),
 col=1:4,pch=19)

The crowding of points and labels remains.

Jim

On Wed, Jun 12, 2019 at 2:37 PM Subhamitra Patra <subhamitra.patra using gmail.com>
wrote:

> Hello Sir,
>
> Thank you very much for your help for which I shall be always grateful to
> you.
>
> Concerning your questions,
> "*1) Are the states in column 3 the same as those in column 1? As
> you initially named the data frame "ts", perhaps the values in columns 2
> and are taken at different times. If not, perhaps they are measured in
> another set of countries, as yet unknown. Perhaps "DMs" and "EMs" are codes
> that will resolve this.", *
> I would like to answer that
> the states in column 3 are not the same as in column 1. The data points in
> column 2, and 4 are the values measured for different sets of countries.
> Thus, they are a different set of values for a different set of countries,
> and will not require one label for both the points in column 2 and 4 (i.e.
> data columns). In particular, column 2 consists of the 45 data points that
> measured for 45 countries (state names in column 1) whereas column 4
> contains the 42 data points that measured for 42 another set of countries
> (state name mentioned in column 3).  I tried both the column 2, and 4
> separately along with their respective name columns, but unable to do
> because K-mean test clusters only the numeric data points and is not
> considering any non-numeric columns (i.e. state names). Thus, I considered
> both the data points simultaneously, and after removing NAs from the data
> table, both columns consist of the 42 data points. Hence, the number of
> observations rather than the states name is coming in the clustered plot.
> In this case, I stuck with the problem of setting a different label
> (mentioned in column 1, and 3) for the different data points of column 2,
> and 4.
>
> Hope I successfully answered your question.
>
> Thank you.
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 06/12/19,
> 9:47:04 AM
>
> On Wed, Jun 12, 2019 at 6:04 AM Jim Lemon <drjimlemon using gmail.com> wrote:
>
>> Hi Subhamitra,
>> It is time to admit that I had the wrong idea about what you wanted to
>> do, due to the combination of trying to solve two problems at once
>> while I was very tired. I appreciate your patience.
>>
>> From your last email, you have a data frame with four columns. The
>> first and third are cryptic names for political states and the second
>> and fourth are values that I assume are measured in those states.
>> 1) Are the states in column 3 the same as those in column 1? As you
>> initially named the data frame "ts", perhaps the values in columns 2
>> and 4 are taken at different times. If not, perhaps they are measured
>> in another set of countries, as yet unknown. Perhaps "DMs" and "EMs"
>> are codes that will resolve this.
>>
>> I assumed that "DMs" and "EMs" should be used as the X and Y values on
>> a scatterplot, as your initial example seemed to indicate. If so, they
>> are different values for the same country and only require one label
>> for each point. Proceeding from this, you can do something like this:
>>
>> spdf<-read.table(text="State DMs EMs
>> JP 2.071 2.038
>> CH 2.0548 2.017
>> AT 2.0544 2.007
>> CL 2.047 1.963
>> ES 2.033 1.947
>> PT 2.0327 1.942
>> PL 2.0321 1.932
>> FR 2.031 1.924
>> SE 2.0293 1.913
>> DE 2.0291 1.906
>> DK 2.027 1.892
>> UK 2.022 1.877
>> TW 1.9934 1.869
>> NL 1.993 1.849
>> HK 1.989 1.848
>> LU 1.988 1.836
>> CA 1.987 1.835
>> NZ 1.9849 1.819
>> US 1.9842 1.798
>> AU 1.981 1.771
>> MY 1.978 1.762
>> HU 1.968 1.717
>> LT 1.96 1.707
>> SG 1.958 1.688
>> FI 1.955 1.683
>> CR 1.953 1.671
>> BY 1.952 1.664
>> IL 1.95 1.646
>> EE 1.948 1.633
>> NO 1.945 1.624
>> IE 1.937 1.621
>> SI 1.913 1.584
>> LV 1.901 1.487
>> SK 1.871 1.482
>> BH 1.801 1.23
>> SK 1.761 1.129
>> AE 1.751 1.168
>> IS 1.699 0.941
>> BM 1.687 0.591
>> KW 1.668 0.387
>> CY 1.633 0.16
>> AP 1.56 0.0002",
>> header = TRUE,stringsAsFactors=FALSE)
>> library(cluster)
>> k2 <- kmeans(spdf[,c(2,3)], centers = 2, nstart = 25)
>> plot(spdf[,c(2,3)],col=k2$cluster,pch=19,xlim=c(1.55,2.1))
>> text(spdf[,2]+rep(c(0.02,-0.02),42),
>>  spdf[,3]+rep(c(-0.05,0.05),42),spdf[,1],col=k2$cluster)
>> segments(spdf[,2],spdf[,3],spdf[,2]+rep(c(0.02,-0.02),42),
>>  spdf[,3]+rep(c(-0.05,0.05),42),col=k2$cluster)
>>
>> I took the liberty of replacing your abbreviations with internet top
>> level domains. As I hope you can see, you have a problem with crowded
>> points and labels, even with the trick of spreading the labels out.
>> You could modify the X and Y offsets by hand and get a much more
>> readable plot.
>>
>> If this is not what you want, a bit more explanation of what you do
>> want may get you there.
>>
>> Jim
>>
>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list