[R] Finding strings in a dataset

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Sun May 16 07:55:03 CEST 2021


Do look at the mess below that we received, and make an effort not to send HTML email to this list. What you saw when you sent it is not what we see when it gets to us.

On May 15, 2021 10:31:22 PM PDT, Tuhin Chakraborty <tuhinchakraborty50 using gmail.com> wrote:
>Thank you everyone, for the very helpful suggestions. I understand that
>my
>question is not altogether clear. So let me share an example.
>The below is a part of a dataset, there are around 40000 rows.
>LI(PPM) SC(PPM) TI(PPM) V(PPM)
>3.1/0.5 ? ? ?
>? ? 0.2/0.3
>?
>? 2.8/0.75 ? >0.2
>0.0389 108.6591 0.0214 85.18818
>0.0688 146.1739 0.0117 108.0221
>0.0265 121.3268 0.00749 85.34932
>0.139901 125.3066 0.00984 97.23175
>
>Now the 0.2/0.3, >0.2 these are treated as strings. When I am using the
>spec(Dataset) function in R, it shows me which columns contain strings.
>Like it will tell me that LI (PPM), SC(PPM) etc. contain strings. But,
>I
>would like to know if there is someway where I can learn exactly where
>the
>string values are, like for LI(PPM) in the top row. As this is a huge
>dataset, it is difficult to go through all the rows manually.
>Thank you again and in anticipation.
>Tuhin
>
>
>
>On Sun, May 16, 2021 at 4:25 AM Avi Gross via R-help
><r-help using r-project.org>
>wrote:
>
>> Tuhin,
>>
>> What do you mean by a 2-D dataset? You say some columns contain
>strings so
>> it does not sound like you are using a matrix as then  ALL columns
>would be
>> of the same type.
>>
>> So are you using a data.frame or tibble or something you made on your
>own?
>>
>> Can you address one column at a time and would that be of type
>vector? Some
>> methods work fairly easily on those and some also on lists.
>>
>> Once you have that vector, there are quite a few ways to find what
>you
>> want.
>> Is it fixed text like looking for an exact full match so it would be
>> something like "theta" to be matched in full, or would you want to
>match
>> "the" and both "theta" and "lathe" would match? Or are you matching a
>> pattern that is more complex like looking for all text that has two
>vowels
>> in a row in it?
>>
>> Once you figure out what you have and what you want, how do you want
>to
>> identify what you are looking for? Will there be one match or
>possibly many
>> or even all? Many methods will return a TRUE/FALSE vector of the same
>> length
>> or the integer offset of a match such as telling you it is the fifth
>item.
>>
>> R has collections of string functions including in packages like
>> stringr/stringi that deal well with many things you might need. For
>> matching
>> patterns, there is a family of functions using "grep" and so on.
>>
>> Good luck.
>>
>> -----Original Message-----
>> From: R-help <r-help-bounces using r-project.org> On Behalf Of Tuhin
>Chakraborty
>> Sent: Saturday, May 15, 2021 1:08 PM
>> To: r-help using r-project.org
>> Subject: [R] Finding strings in a dataset
>>
>> Hi,
>> How can I find the location of string data in my 2D dataset?
>spec(Dataset)
>> will reveal the columns that contain the strings. But can I know
>where
>> exactly the string values are in the column?
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list