[R] Finding strings in a dataset

Tuhin Chakraborty tuh|nch@kr@borty50 @end|ng |rom gm@||@com
Sun May 16 07:31:22 CEST 2021


Thank you everyone, for the very helpful suggestions. I understand that my
question is not altogether clear. So let me share an example.
The below is a part of a dataset, there are around 40000 rows.
LI(PPM) SC(PPM) TI(PPM) V(PPM)
3.1/0.5 ? ? ?
? ? 0.2/0.3
?
? 2.8/0.75 ? >0.2
0.0389 108.6591 0.0214 85.18818
0.0688 146.1739 0.0117 108.0221
0.0265 121.3268 0.00749 85.34932
0.139901 125.3066 0.00984 97.23175

Now the 0.2/0.3, >0.2 these are treated as strings. When I am using the
spec(Dataset) function in R, it shows me which columns contain strings.
Like it will tell me that LI (PPM), SC(PPM) etc. contain strings. But, I
would like to know if there is someway where I can learn exactly where the
string values are, like for LI(PPM) in the top row. As this is a huge
dataset, it is difficult to go through all the rows manually.
Thank you again and in anticipation.
Tuhin



On Sun, May 16, 2021 at 4:25 AM Avi Gross via R-help <r-help using r-project.org>
wrote:

> Tuhin,
>
> What do you mean by a 2-D dataset? You say some columns contain strings so
> it does not sound like you are using a matrix as then  ALL columns would be
> of the same type.
>
> So are you using a data.frame or tibble or something you made on your own?
>
> Can you address one column at a time and would that be of type vector? Some
> methods work fairly easily on those and some also on lists.
>
> Once you have that vector, there are quite a few ways to find what you
> want.
> Is it fixed text like looking for an exact full match so it would be
> something like "theta" to be matched in full, or would you want to match
> "the" and both "theta" and "lathe" would match? Or are you matching a
> pattern that is more complex like looking for all text that has two vowels
> in a row in it?
>
> Once you figure out what you have and what you want, how do you want to
> identify what you are looking for? Will there be one match or possibly many
> or even all? Many methods will return a TRUE/FALSE vector of the same
> length
> or the integer offset of a match such as telling you it is the fifth item.
>
> R has collections of string functions including in packages like
> stringr/stringi that deal well with many things you might need. For
> matching
> patterns, there is a family of functions using "grep" and so on.
>
> Good luck.
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Tuhin Chakraborty
> Sent: Saturday, May 15, 2021 1:08 PM
> To: r-help using r-project.org
> Subject: [R] Finding strings in a dataset
>
> Hi,
> How can I find the location of string data in my 2D dataset? spec(Dataset)
> will reveal the columns that contain the strings. But can I know where
> exactly the string values are in the column?
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list