[R] StrSplit

Jeffrey Spies jspies at virginia.edu
Sat Oct 9 20:14:16 CEST 2010


Obviously Jim's solution does work, and I did not intend to imply it
didn't.  In fact, his read.table solution would work both if the OP
had a semi-colon delimited file to begin with (which I was trying to
say was ideal from a workflow standpoint) or a vector of strings (for
use when paired with textConnections).  Using strsplit is merely
another solution for the latter situation.  I thought the OP might
appreciate seeing how to use the function that they indicated they
were having problems with.  Plus, I have a penchant for R-ishly
"unreadble" code. ;)

Thanks for clarifying,

Jeff.

On Sat, Oct 9, 2010 at 1:04 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Oct 9, 2010, at 12:46 PM, Jeffrey Spies wrote:
>
>> Jim's solution is the ideal way to read in the data: using the sep=";"
>> argument in read.table.
>>
>> However, if you do for some reason have a vector of strings like the
>> following (maybe someone gives you an Rdata file instead of the raw
>> data file):
>>
>> MF_Data <- c("106506;AIG India Liquid Fund-Institutional Plan-Daily
>> Dividend Option;1001.0000;1001.0000;1001.0000;02-Oct-2010","106511;AIG
>> India Liquid Fund-Institutional Plan-Growth
>> Option;1210.4612;1210.4612;1210.4612;02-Oct-2010")
>>
>> Then you can use this to get a data frame:
>>
>> as.data.frame(do.call(rbind, lapply(MF_Data, function(x)
>> unlist(strsplit(x, ';')))))
>>
>
> If you are suggesting that Jim's solution would not work here, then I would
> disagree and suggest you try offering your vector (without the <cr>'s
> inserted by our mail clients) to his code. It should work just fine and be
> far more readable.
>
> On the other hand if you were offering this with an explanation that
> strsplit's split argument is more flexible than the sep argument in the read
> functions because it accepts regular expressions and so can handle
> situations where multiple separators exist in the same line, then I would
> applaud you.
>
> --
> David.
>
>> Cheers,
>>
>> Jeff.
>>
>> On Sat, Oct 9, 2010 at 12:30 PM, jim holtman <jholtman at gmail.com> wrote:
>>>
>>> Is this what you are after:
>>>
>>>> x <- c("Scheme Code;Scheme Name;Net Asset Value;Repurchase Price;Sale
>>>> Price;Date"
>>>
>>> + , ""
>>> +  ,"Open Ended Schemes ( Liquid )"
>>> + , ""
>>> + , ""
>>> + , "AIG Global Investment Group Mutual Fund"
>>> + , "106506;AIG India Liquid Fund-Institutional Plan-Daily Dividend
>>> Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
>>> + , "106511;AIG India Liquid Fund-Institutional Plan-Growth
>>> Option;1210.4612;1210.4612;1210.4612;02-Oct-2010"
>>> + , "106507;AIG India Liquid Fund-Institutional Plan-Weekly Dividend
>>> Option;1001.8765;1001.8765;1001.8765;02-Oct-2010"
>>> + , "106503;AIG India Liquid Fund-Retail Plan-DailyDividend
>>> Option;1001.0000;1001.0000;1001.0000;02-Oct-2010")
>>>>
>>>> myData <- read.table(textConnection(x[7:10]), sep=';')
>>>> closeAllConnections()
>>>> str(myData)
>>>
>>> 'data.frame':   4 obs. of  6 variables:
>>>  $ V1: int  106506 106511 106507 106503
>>>  $ V2: Factor w/ 4 levels "AIG India Liquid Fund-Institutional
>>> Plan-Daily Dividend Option",..: 1 2 3 4
>>>  $ V3: num  1001 1210 1002 1001
>>>  $ V4: num  1001 1210 1002 1001
>>>  $ V5: num  1001 1210 1002 1001
>>>  $ V6: Factor w/ 1 level "02-Oct-2010": 1 1 1 1
>>>>
>>>> myData
>>>
>>>     V1
>>> V2       V3       V4       V5          V6
>>> 1 106506  AIG India Liquid Fund-Institutional Plan-Daily Dividend
>>> Option 1001.000 1001.000 1001.000 02-Oct-2010
>>> 2 106511          AIG India Liquid Fund-Institutional Plan-Growth
>>> Option 1210.461 1210.461 1210.461 02-Oct-2010
>>> 3 106507 AIG India Liquid Fund-Institutional Plan-Weekly Dividend
>>> Option 1001.876 1001.876 1001.876 02-Oct-2010
>>> 4 106503          AIG India Liquid Fund-Retail Plan-DailyDividend
>>> Option 1001.000 1001.000 1001.000 02-Oct-2010
>>>>
>>>>
>>>
>>>
>>> On Sat, Oct 9, 2010 at 12:18 PM, Santosh Srinivas
>>> <santosh.srinivas at gmail.com> wrote:
>>>>
>>>> Newbie question ...
>>>>
>>>> I am looking something equivalent to read.delim but  which accepts a
>>>> text line as parameter instead of a file input.
>>>>
>>>> Below is my problem, I'm unable to get the exact output which is a
>>>> simple data frame of the data where the delimiter exists ... coming quite
>>>> close though
>>>>
>>>> I have a data frame with 10 lines called MF_Data
>>>>>
>>>>> MF_Data [1:10]
>>>>
>>>>  [1] "Scheme Code;Scheme Name;Net Asset Value;Repurchase Price;Sale
>>>> Price;Date"
>>>>  [2] ""
>>>>  [3] "Open Ended Schemes ( Liquid )"
>>>>  [4] ""
>>>>  [5] ""
>>>>  [6] "AIG Global Investment Group Mutual Fund"
>>>>  [7] "106506;AIG India Liquid Fund-Institutional Plan-Daily Dividend
>>>> Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
>>>>  [8] "106511;AIG India Liquid Fund-Institutional Plan-Growth
>>>> Option;1210.4612;1210.4612;1210.4612;02-Oct-2010"
>>>>  [9] "106507;AIG India Liquid Fund-Institutional Plan-Weekly Dividend
>>>> Option;1001.8765;1001.8765;1001.8765;02-Oct-2010"
>>>> [10] "106503;AIG India Liquid Fund-Retail Plan-DailyDividend
>>>> Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
>>>>
>>>>
>>>> Now for the lines below .. they are delimted by ; ... I am using
>>>>
>>>>  tempTxt <- MF_Data[7]
>>>>  MF_Data_F <-   unlist(strsplit(tempTxt,";", fixed = TRUE))
>>>>  tempTxt <- MF_Data[8]
>>>>  MF_Data_F1 <-  unlist(strsplit(tempTxt,";", fixed = TRUE))
>>>>  MF_Data_F <- rbind(MF_Data_F,MF_Data_F1)
>>>>
>>>> But MF_Data_F is not a simple 2X6 data frame which is what I want
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Jim Holtman
>>> Cincinnati, OH
>>> +1 513 646 9390
>>>
>>> What is the problem that you are trying to solve?
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>



More information about the R-help mailing list