[R] StrSplit

Sat Oct 9 19:04:37 CEST 2010

On Oct 9, 2010, at 12:46 PM, Jeffrey Spies wrote:

> Jim's solution is the ideal way to read in the data: using the sep=";"
> argument in read.table.
>
> However, if you do for some reason have a vector of strings like the
> following (maybe someone gives you an Rdata file instead of the raw
> data file):
>
> MF_Data <- c("106506;AIG India Liquid Fund-Institutional Plan-Daily
> Dividend Option;1001.0000;1001.0000;1001.0000;02-Oct-2010","106511;AIG
> India Liquid Fund-Institutional Plan-Growth
> Option;1210.4612;1210.4612;1210.4612;02-Oct-2010")
>
> Then you can use this to get a data frame:
>
> as.data.frame(do.call(rbind, lapply(MF_Data, function(x)
> unlist(strsplit(x, ';')))))
>

If you are suggesting that Jim's solution would not work here, then I  
would disagree and suggest you try offering your vector (without the  
<cr>'s inserted by our mail clients) to his code. It should work just  
fine and be far more readable.

On the other hand if you were offering this with an explanation that  
strsplit's split argument is more flexible than the sep argument in  
the read functions because it accepts regular expressions and so can  
handle situations where multiple separators exist in the same line,  
then I would applaud you.

-- 
David.

> Cheers,
>
> Jeff.
>
> On Sat, Oct 9, 2010 at 12:30 PM, jim holtman <jholtman at gmail.com>  
> wrote:
>> Is this what you are after:
>>
>>> x <- c("Scheme Code;Scheme Name;Net Asset Value;Repurchase  
>>> Price;Sale Price;Date"
>> + , ""
>> +  ,"Open Ended Schemes ( Liquid )"
>> + , ""
>> + , ""
>> + , "AIG Global Investment Group Mutual Fund"
>> + , "106506;AIG India Liquid Fund-Institutional Plan-Daily Dividend
>> Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
>> + , "106511;AIG India Liquid Fund-Institutional Plan-Growth
>> Option;1210.4612;1210.4612;1210.4612;02-Oct-2010"
>> + , "106507;AIG India Liquid Fund-Institutional Plan-Weekly Dividend
>> Option;1001.8765;1001.8765;1001.8765;02-Oct-2010"
>> + , "106503;AIG India Liquid Fund-Retail Plan-DailyDividend
>> Option;1001.0000;1001.0000;1001.0000;02-Oct-2010")
>>>
>>> myData <- read.table(textConnection(x[7:10]), sep=';')
>>> closeAllConnections()
>>> str(myData)
>> 'data.frame':   4 obs. of  6 variables:
>>  $ V1: int  106506 106511 106507 106503
>>  $ V2: Factor w/ 4 levels "AIG India Liquid Fund-Institutional
>> Plan-Daily Dividend Option",..: 1 2 3 4
>>  $ V3: num  1001 1210 1002 1001
>>  $ V4: num  1001 1210 1002 1001
>>  $ V5: num  1001 1210 1002 1001
>>  $ V6: Factor w/ 1 level "02-Oct-2010": 1 1 1 1
>>> myData
>>      V1
>> V2       V3       V4       V5          V6
>> 1 106506  AIG India Liquid Fund-Institutional Plan-Daily Dividend
>> Option 1001.000 1001.000 1001.000 02-Oct-2010
>> 2 106511          AIG India Liquid Fund-Institutional Plan-Growth
>> Option 1210.461 1210.461 1210.461 02-Oct-2010
>> 3 106507 AIG India Liquid Fund-Institutional Plan-Weekly Dividend
>> Option 1001.876 1001.876 1001.876 02-Oct-2010
>> 4 106503          AIG India Liquid Fund-Retail Plan-DailyDividend
>> Option 1001.000 1001.000 1001.000 02-Oct-2010
>>>
>>>
>>
>>
>> On Sat, Oct 9, 2010 at 12:18 PM, Santosh Srinivas
>> <santosh.srinivas at gmail.com> wrote:
>>> Newbie question ...
>>>
>>> I am looking something equivalent to read.delim but  which accepts  
>>> a text line as parameter instead of a file input.
>>>
>>> Below is my problem, I'm unable to get the exact output which is a  
>>> simple data frame of the data where the delimiter exists ...  
>>> coming quite close though
>>>
>>> I have a data frame with 10 lines called MF_Data
>>>> MF_Data [1:10]
>>>  [1] "Scheme Code;Scheme Name;Net Asset Value;Repurchase  
>>> Price;Sale Price;Date"
>>>  [2] ""
>>>  [3] "Open Ended Schemes ( Liquid )"
>>>  [4] ""
>>>  [5] ""
>>>  [6] "AIG Global Investment Group Mutual Fund"
>>>  [7] "106506;AIG India Liquid Fund-Institutional Plan-Daily  
>>> Dividend Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
>>>  [8] "106511;AIG India Liquid Fund-Institutional Plan-Growth  
>>> Option;1210.4612;1210.4612;1210.4612;02-Oct-2010"
>>>  [9] "106507;AIG India Liquid Fund-Institutional Plan-Weekly  
>>> Dividend Option;1001.8765;1001.8765;1001.8765;02-Oct-2010"
>>> [10] "106503;AIG India Liquid Fund-Retail Plan-DailyDividend  
>>> Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
>>>
>>>
>>> Now for the lines below .. they are delimted by ; ... I am using
>>>
>>>  tempTxt <- MF_Data[7]
>>>  MF_Data_F <-   unlist(strsplit(tempTxt,";", fixed = TRUE))
>>>  tempTxt <- MF_Data[8]
>>>  MF_Data_F1 <-  unlist(strsplit(tempTxt,";", fixed = TRUE))
>>>  MF_Data_F <- rbind(MF_Data_F,MF_Data_F1)
>>>
>>> But MF_Data_F is not a simple 2X6 data frame which is what I want
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT