[R] strange strsplit gsub problem 0 is this a bug or a string length limitation?

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jul 10 17:51:29 CEST 2009


Marc has already answered your question.  It may also be
possible to avoid the long formua in the first place in the context
of lm and certain similar functions as we can write this

lm(y1 ~ x1 + x2 + x3 + x4, anscombe)

as

lm(y1 ~., anscombe[1:5])

where anscombe is a data set that is built into R.

On Fri, Jul 10, 2009 at 8:18 AM, tradenet<nodecorum at yahoo.com> wrote:
>
> I was working with the rmetrics portfolioBacktesting function and dug into
> the code to try to find why my formula with 113 items, i.e. A1 thru A113,
> was being truncated and I only get 85 items, not 113.
>
> Is it due to a string length limitation in R or is it a bug in the strsplit
> or gsub functions, or in my string?
>
> I'd very much appreciate any suggestions
>
>
> ============Input script:
>
> backtestFormula<-SPX~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17+A18+A19+A20+A21+A22+A23+A24+A25+A26+A27+A28+A29+A30+A31+A32+A33+A34+A35+A36+A37+A38+A39+A40+A41+A42+A43+A44+A45+A46+A47+A48+A49+A50+A51+A52+A53+A54+A55+A56+A57+A58+A59+A60+A61+A62+A63+A64+A65+A66+A67+A68+A69+A70+A71+A72+A73+A74+A75+A76+A77+A78+A79+A80+A81+A82+A83+A84+A85+A86+A87+A88+A89+A90+A91+A92+A93+A94+A95+A96+A97+A98+A99+A100+A101+A102+A103+A104+A105+A106+A107+A108+A109+A110+A111+A112+A113
> benchmarkName = as.character(backtestFormula)[2]
> print(as.character(backtestFormula)[3])
> print(benchmarkName)
>    assetsNames <- strsplit(gsub(" ", "", as.character(backtestFormula)[3]),
> "\\+")[[1]]
>    nAssets = length(assetsNames)
> print(nAssets)
> list(assetsNames)
>
> ===============output:
>
>
>> backtestFormula<-SPX~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17+A18+A19+A20+A21+A22+A23+A24+A25+A26+A27+A28+A29+A30+A31+A32+A33+A34+A35+A36+A37+A38+A39+A40+A41+A42+A43+A44+A45+A46+A47+A48+A49+A50+A51+A52+A53+A54+A55+A56+A57+A58+A59+A60+A61+A62+A63+A64+A65+A66+A67+A68+A69+A70+A71+A72+A73+A74+A75+A76+A77+A78+A79+A80+A81+A82+A83+A84+A85+A86+A87+A88+A89+A90+A91+A92+A93+A94+A95+A96+A97+A98+A99+A100+A101+A102+A103+A104+A105+A106+A107+A108+A109+A110+A111+A112+A113
>
>> benchmarkName = as.character(backtestFormula)[2]
>
>> print(benchmarkName)
> [1] "SPX"
>
>> print(as.character(backtestFormula)[3])
> [1] "A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 +
> A14 + A15 + A16 + A17 + A18 + A19 + A20 + A21 + A22 + A23 + A24 + A25 + A26
> + A27 + A28 + A29 + A30 + A31 + A32 + A33 + A34 + A35 + A36 + A37 + A38 +
> A39 + A40 + A41 + A42 + A43 + A44 + A45 + A46 + A47 + A48 + A49 + A50 + A51
> + A52 + A53 + A54 + A55 + A56 + A57 + A58 + A59 + A60 + A61 + A62 + A63 +
> A64 + A65 + A66 + A67 + A68 + A69 + A70 + A71 + A72 + A73 + A74 + A75 + A76
> + A77 + A78 + A79 + A80 + A81 + A82 + A83 + A84 + A85 + "
>
>> assetsNames <- strsplit(gsub(" ", "", as.character(backtestFormula)[3]),
>> "\\+")[[1]]
>
>> print(nAssets)
> [1] 85
>
>> nAssets = length(assetsNames)
>
>> print(nAssets)
> [1] 85
>
>> list(assetsNames)
> [[1]]
>  [1] "A1"  "A2"  "A3"  "A4"  "A5"  "A6"  "A7"  "A8"  "A9"  "A10" "A11" "A12"
> "A13" "A14" "A15" "A16" "A17" "A18" "A19" "A20" "A21" "A22" "A23" "A24"
> "A25" "A26" "A27" "A28" "A29" "A30" "A31" "A32" "A33"
> [34] "A34" "A35" "A36" "A37" "A38" "A39" "A40" "A41" "A42" "A43" "A44" "A45"
> "A46" "A47" "A48" "A49" "A50" "A51" "A52" "A53" "A54" "A55" "A56" "A57"
> "A58" "A59" "A60" "A61" "A62" "A63" "A64" "A65" "A66"
> [67] "A67" "A68" "A69" "A70" "A71" "A72" "A73" "A74" "A75" "A76" "A77" "A78"
> "A79" "A80" "A81" "A82" "A83" "A84" "A85"
>
>
>
> --
> View this message in context: http://www.nabble.com/strange-strsplit-gsub-problem-0-is-this-a-bug-or-a-string-length-limitation--tp24426457p24426457.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list