[R] Find String Between Characters

jim holtman jholtman at gmail.com
Sun May 15 23:40:27 CEST 2011


I would assume that you have lines of text that do not include 'CIK='
and therefore  the 'sub' fails and you get the original string.  If
you only want the lines with "CIK", then use 'grepl' to just extract
those lines before processing.

On Sat, May 14, 2011 at 10:14 PM, Sparks, John James <jspark4 at uic.edu> wrote:
> Hi Jim,
>
> Thanks for your note.
>
> Unfortunately, when I attempt your solution in my exact setting, I get a
> weird and slightly different answer.
>
> First, let me be more clear.  What I am attempting to do is pull the CIK
> number out of the information from the web page itself after it has loaded
> to R (this may not be optimal, but I am new at this), not from the web
> page reference (as you have done).
>
> So, when I execute the following as per your suggestion:
>
> require(scrapeR)
> mmm<-scrape(url="http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000320193&owner=exclude&count=40")
>
> num <- sub("^.*CIK=([0-9]+).*", "\\1", mmm)
>
> I get
> [1] "<pointer: 0x00000000001265c0>"
>
> Is this just a hex representation of the same number, or is something else
> going on here?
>
> Comments from any and all would be much appreciated.
>
> --John J. Sparks, Ph.D.
>
> On Sat, May 14, 2011 7:57 pm, jim holtman wrote:
>> Is this what you want:
>>
>>> mmm<-"http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000320193&owner=exclude&count=40"
>>> num <- sub("^.*CIK=([0-9]+).*", "\\1", mmm)
>>> num
>> [1] "0000320193"
>>>
>>
>>
>> On Sat, May 14, 2011 at 8:20 PM, Sparks, John James <jspark4 at uic.edu>
>> wrote:
>>> Dear R Helpers,
>>>
>>> I am trying to isolate a set of characters between two other characters
>>> in
>>> a long string file.  I tried some of the examples on the R help pages
>>> and
>>> elsewhere, but I am not able to get it.  Your help would be much
>>> appreciated.
>>>
>>> require(scrapeR)
>>> mmm<-scrape(url="http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000320193&owner=exclude&count=40")
>>> str(mmm)
>>>
>>> I want to get the number 0000320193 that is between the CIK= and the &.
>>>  I
>>> have tried
>>>
>>> g <- grep( "CIK=|&", mmm )
>>> and
>>> temp<-grep(mmm,\CIK=\&)
>>>
>>> and variations on these themes, but all won't run or come bask as an
>>> empty
>>> object.  How can I grab this number?
>>>
>>> Best wishes,
>>> --John J. Sparks, Ph.D.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>>
>>
>
>
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list