[R] Finding strings in a dataset

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Sun May 16 09:30:41 CEST 2021


Hello,

The data makes clearer.
Do you want to know where are the values that cannot be coerced to numeric?
The auxiliary function f outputs a logical vector, sapply applies it 
column by column and which(., arr.ind) gives the TRUE values as (row, 
col) pairs.


txt <- "
LI(PPM) SC(PPM) TI(PPM) V(PPM)
3.1/0.5 ? ? ?
? ? 0.2/0.3 ?
? 2.8/0.75 ? >0.2
0.0389 108.6591 0.0214 85.18818
0.0688 146.1739 0.0117 108.0221
0.0265 121.3268 0.00749 85.34932
0.139901 125.3066 0.00984 97.23175
"
df1 <- read.table(text = txt, header = TRUE)
df1

f <- function(x){
   suppressWarnings(is.na(as.numeric(x)))
}
found <- sapply(df1, f)
which(found, arr.ind = TRUE)



Hope this helps,

Rui Barradas


Às 06:31 de 16/05/21, Tuhin Chakraborty escreveu:
> Thank you everyone, for the very helpful suggestions. I understand that my
> question is not altogether clear. So let me share an example.
> The below is a part of a dataset, there are around 40000 rows.
> LI(PPM) SC(PPM) TI(PPM) V(PPM)
> 3.1/0.5 ? ? ?
> ? ? 0.2/0.3
> ?
> ? 2.8/0.75 ? >0.2
> 0.0389 108.6591 0.0214 85.18818
> 0.0688 146.1739 0.0117 108.0221
> 0.0265 121.3268 0.00749 85.34932
> 0.139901 125.3066 0.00984 97.23175
> 
> Now the 0.2/0.3, >0.2 these are treated as strings. When I am using the
> spec(Dataset) function in R, it shows me which columns contain strings.
> Like it will tell me that LI (PPM), SC(PPM) etc. contain strings. But, I
> would like to know if there is someway where I can learn exactly where the
> string values are, like for LI(PPM) in the top row. As this is a huge
> dataset, it is difficult to go through all the rows manually.
> Thank you again and in anticipation.
> Tuhin
> 
> 
> 
> On Sun, May 16, 2021 at 4:25 AM Avi Gross via R-help <r-help using r-project.org>
> wrote:
> 
>> Tuhin,
>>
>> What do you mean by a 2-D dataset? You say some columns contain strings so
>> it does not sound like you are using a matrix as then  ALL columns would be
>> of the same type.
>>
>> So are you using a data.frame or tibble or something you made on your own?
>>
>> Can you address one column at a time and would that be of type vector? Some
>> methods work fairly easily on those and some also on lists.
>>
>> Once you have that vector, there are quite a few ways to find what you
>> want.
>> Is it fixed text like looking for an exact full match so it would be
>> something like "theta" to be matched in full, or would you want to match
>> "the" and both "theta" and "lathe" would match? Or are you matching a
>> pattern that is more complex like looking for all text that has two vowels
>> in a row in it?
>>
>> Once you figure out what you have and what you want, how do you want to
>> identify what you are looking for? Will there be one match or possibly many
>> or even all? Many methods will return a TRUE/FALSE vector of the same
>> length
>> or the integer offset of a match such as telling you it is the fifth item.
>>
>> R has collections of string functions including in packages like
>> stringr/stringi that deal well with many things you might need. For
>> matching
>> patterns, there is a family of functions using "grep" and so on.
>>
>> Good luck.
>>
>> -----Original Message-----
>> From: R-help <r-help-bounces using r-project.org> On Behalf Of Tuhin Chakraborty
>> Sent: Saturday, May 15, 2021 1:08 PM
>> To: r-help using r-project.org
>> Subject: [R] Finding strings in a dataset
>>
>> Hi,
>> How can I find the location of string data in my 2D dataset? spec(Dataset)
>> will reveal the columns that contain the strings. But can I know where
>> exactly the string values are in the column?
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list