[R] Does the function "c" have a character limit?

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Wed Jul 13 19:46:49 CEST 2022


If I follow this thread, it looks clear that the problem is superficial and not really about the c() function as it is below sea level.

Is this also a problem if you replace c() with max () or list() as I think it may be? Then it is more about what length the interpreter is able to handle everywhere or on your installation or memory constraints.

There are, of course, lots of ways to work around it and some have been mentioned, as you clearly have data.frames with underlying vectors that are millions of units long including with character data like yours.

I was able to reproduce your problem within RSTUDIO and noted the editor window actually cuts off the text and asks you to click to see more, which may be a hint. Have you tried a paste of this long thing directly to an R interpreter not through RSTUDIO?

I did an experiment and broke up the big monster that failed into multiple short lines and it works fine. It looks like a LINE LENGTH limit, not a statement limit.

So if your data was entered say like this:

MES=c(
  "A2M",
  "ABRACL",
  "ACADVL",
  "ACAP2",
  ...,
  "TIMP1",
  "TJP1"
  )

Then it should work for much larger amounts of data.

And, of course, you can enter multiple smaller units and concatenate them together in the code and remove the originals, as long as each unit was small enough. Reading the data in from a file also should bypass the issue if done right.

There is no reason every programmer should try to make everything a one-liner.


-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Ebert,Timothy Aaron
Sent: Wednesday, July 13, 2022 9:21 AM
To: Rui Barradas <ruipbarradas using sapo.pt>; core_contingency <ccontingency using gmail.com>; r-help using r-project.org
Subject: Re: [R] Does the function "c" have a character limit?

The limits to the size of vectors, matrices, data frames, lists, or other data structure does not have a simple answer.
1) 2^31 - 1  is the maximum number of rows. https://stackoverflow.com/questions/5233769/practical-limits-of-r-data-frame#:~:text=The%20number%20is%202%5E31,start%20collecting%20several%20of%20them.
2) help(Memory) suggests that the default limit for all variables is 6 Mb. The help page tells you how to change this.
Neither of these two factors have any bearing on this problem except that your vector is not close to these limits.

I got the same result you did when I entered your vector into my system (R 4.2 in RStudio, on 64 bit Windows). I shortened it by removing the first entry and it works.

I can copy the entire line into Microsoft Word, and count the number of characters (including spaces) and I get 4089. There were seven characters in the first entry including the comma and space. If I add seven spaces between MES and the equal sign I get the original outcome. So the limit is on the number of characters in the line. You can get more entries by shortening each entry, or fewer if each entry was longer.

As others have suggested, I would break the line into two pieces and then combine the pieces.

Tim



-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rui Barradas
Sent: Wednesday, July 13, 2022 6:36 AM
To: core_contingency <ccontingency using gmail.com>; r-help using r-project.org
Subject: Re: [R] Does the function "c" have a character limit?

[External Email]

Hello,

This is documented behavior.
 From R-intro, last line of section 1.8 [1].


Command lines entered at the console are limited4 to about 4095 bytes (not characters).


The number 4 in limited4 is a footnote link:


some of the consoles will not allow you to enter more, and amongst those which do some will silently discard the excess and some will use it as the start of the next line.



Prof. Ripley called the r-devel mailing list's attention to this in August 2006 when the limit was 1024 [2], it was then increased to the current 4095. I remember seeing a limit of 2048 (?) but couldn't find where.


Try creating a file with your command as only content, then run


x <- readLines("rhelp.txt")
nchar(x)
# [1] 4096


You are above the limit by 1 byte.
Standard solutions are to break the command line, in your case into at least 2 lines, or to source the command from file, like David proposed.


[1]
https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.org_doc_manuals_R-2Dintro.html-23R-2Dcommands-5F003b-2Dcase-2Dsensitivity-2Detc&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=bDsMNmn2OhMbQIvptul8Yl752vW7YkxJ0v91xFMGuYQo9NsNdzAp5k0CD2XzMc9x&s=g5cbrpfyHaIN9sXycd_-f2iDsOcbuzLe2u3KjvQNm-0&e=
[2] https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2006-2DAugust_038985.html&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=bDsMNmn2OhMbQIvptul8Yl752vW7YkxJ0v91xFMGuYQo9NsNdzAp5k0CD2XzMc9x&s=B1YSca31vHlpy9WJG8o0MBTh7bX4v7M61eWfQ4tmdog&e=


Hope this helps,

Rui Barradas


Às 00:36 de 13/07/2022, core_contingency escreveu:
> To Whom it May Concern,
>
> I am creating a vector with the base R function "c", with many 
> arguments as shown below:
>
>       $ R
>       > MES = c("A2M", "ABRACL", "ACADVL", "ACAP2", "ACTA2", "ACTN1", 
> "ADAM19", "ADAM9", "ADAMTS5", "ADGRE5", "ADGRG6", "AEBP1", "AJUBA", 
> "ALDH1A3", "AMMECR1", "ANTXR1", "ANXA1", "ANXA2", "ANXA5", "ANXA6", 
> "APOE", "APP", "ARHGAP1", "ARHGEF40", "ARL1", "ARL4A", "ARMCX2", 
> "ARPC1B", "ASPH", "ATP10D", "ATP1B1", "ATP2B1", "ATP2B4", "ATP6V0E1", 
> "ATP8B2", "ATXN1", "B2M", "BAG3", "BGN", "BMP5", "BNC2", "BOC", 
> "BTN3A2", "C1orf198", "C1orf54", "C4orf32", "C6orf120", "CALD1", 
> "CALU", "CAPN2", "CAPN6", "CBFB", "CBLB", "CCDC80", "CD164", "CD44", 
> "CD59", "CD63", "CDH11", "CETN2", "CFH", "CFI", "CILP", "CKAP4", 
> "CLIC4", "CMTM3", "CMTM6", "CNN3", "COL11A1", "COL12A1", "COL1A1", 
> "COL27A1", "COL3A1", "COL4A1", "COL4A2", "COL5A1", "COL5A2", "COL6A1", 
> "COL6A2", "COL6A3", "COPA", "CPED1", "CPS1", "CRABP2", "CREB3L2", 
> "CREG1", "CRELD2", "CRISPLD1", "CRTAP", "CSRP1", "CTDSP2", "CTNNA1", 
> "CTSB", "CTSC", "CTSO", "CXCL12", "CYBRD1", "CYFIP1", "CYP26A1", 
> "CYR61", "DCAF6", "DDOST", "DDR2", "DESI2", "DKK3", "DLC1", "DLX1", 
> "DLX2", "DMD", "DNAJC1", "DNAJC10", "DNAJC3", "DNM3OS", "DPY19L1", 
> "DSE", "DUSP14", "DUSP5", "DUSP6", "EDEM1", "EDNRA", "EFEMP2", "EGFR", 
> "EGR1", "EGR3", "EHD2", "ELAVL1", "ELF1", "ELK3", "ELK4", "EMILIN1", 
> "EMP1", "ENAH", "EPHA3", "EPS8", "ERBIN", "ERLIN1", "ERRFI1", "ETS1", 
> "EVA1A", "EXT1", "EXTL2", "F2R", "F2RL2", "FAM102B", "FAM114A1", 
> "FAM120A", "FAM129A", "FAM3C", "FAM43A", "FAM46A", "FAT1", "FBN1", 
> "FBN2", "FGFR1", "FIBIN", "FILIP1L", "FKBP14", "FLNA", "FLRT2", 
> "FMOD", "FN1", "FNDC3B", "FSTL1", "FUCA2", "FZD1", "FZD2", "FZD7", 
> "GABRR1", "GALNT10", "GAS1", "GAS2", "GDF15", "GJA1", "GNAI1", 
> "GNG12", "GNS", "GORAB", "GPC6", "GPR137B", "GPX8", "GRN", "GSN", 
> "HES1", "HEXB", "HIBADH", "HIPK3", "HIST1H2AC", "HIST1H2BK", "HLA-A", 
> "HLA-B", "HLA-C", "HLA-F", "HLX", "HNMT", "HOMER1", "HS3ST3A1", 
> "HSP90B1", "HSPA5", "HSPB1", "HTRA1", "HYOU1", "ID1", "ID3", "IFI16", 
> "IFITM2", "IFITM3", "IGF2R", "IGFBP5", "IGFBP6", "IL13RA1", "IL6ST", 
> "INSIG1", "IQGAP2", "ITGA10", "ITGA4", "ITGAV", "ITGB1", "ITM2B", 
> "ITM2C", "ITPR1", "ITPRIPL2", "JAK1", "JAM3", "KANK2", "KCNK2", 
> "KCTD12", "KDELC2", "KDELR2", "KDELR3", "KDM5B", "KIAA1462", "KIF13A", 
> "KIRREL", "KLF10", "KLF4", "KLF6", "L3HYPDH", "LAMB1", "LAMC1", 
> "LAMP1", "LAPTM4A", "LASP1", "LATS2", "LEPROT", "LGALS1", "LHFP", 
> "LHX8", "LIFR", "LIPA", "LITAF", "LIX1L", "LMAN1", "LMNA", "LOXL2", 
> "LPP", "LRP10", "LRRC17", "LRRC8C", "LTBP1", "LUZP1", "MAGT1", 
> "MAML2", "MAN2A1", "MANF", "MBD2", "MBNL1", "MBTPS1", "MEOX1", 
> "MEOX2", "MEST", "MGAT2", "MGP", "MGST1", "MICAL2", "MMP2", "MOB1A", 
> "MRC2", "MXRA5", "MYADM", "MYDGF", "MYL12A", "MYL12B", "MYLIP", 
> "NANS", "NBR1", "NEK7", "NES", "NFIA", "NFIC", "NID1", "NID2", 
> "NOTCH2", "NOTCH2NL", "NPC2", "NPTN", "NQO1", "NR3C1", "NRP1", 
> "OGFRL1", "OLFML2A", "OLFML2B", "OLFML3", "OSTC", "P4HA1", "PALLD", 
> "PAPSS2", "PCDH18", "PCOLCE2", "PCSK5", "PDE3A", "PDE7B", "PDGFC", 
> "PDIA3", "PDIA4", "PDIA6", "PDLIM1", "PEA15", "PEAK1", "PHLDA3", 
> "PHLDB2", "PHTF2", "PIAS3", "PLAGL1", "PLEKHA2", "PLEKHH2", "PLK2", 
> "PLOD2", "PLOD3", "PLPP1", "PLS3", "PLSCR1", "PLSCR4", "PLXDC2", 
> "POLR2L", "PON2", "POSTN", "PPIB", "PPIC", "PPT1", "PRCP", "PRDM6", 
> "PRDX4", "PRDX6", "PROM1", "PRRX1", "PTBP1", "PTGER4", "PTGFRN", 
> "PTN", "PTPN14", "PTPRG", "PTPRK", "PTRF", "PXDC1", "PXDN", "PYGL", 
> "QKI", "QSOX1", "RAB13", "RAB29", "RAB31", "RAP1A", "RAP1B", "RBMS1", 
> "RCN1", "RECK", "REST", "RGL1", "RGS10", "RGS3", "RHOC", "RHOJ", 
> "RIN2", "RIT1", "RNFT1", "RNH1", "ROBO1", "ROR1", "RRBP1", "S1PR3", 
> "SASH1", "SCPEP1", "SCRG1", "SDC2", "SDC4", "SDCBP", "SDF4", 
> "SEC14L1", "SEL1L3", "SEMA3C", "SEMA3F", "SEPT10", "SERPINE2", 
> "SERPINH1", "SFT2D1", "SFT2D2", "SGK1", "SH3BGRL", "SHC1", "SHROOM3", 
> "SIX1", "SIX4", "SKIL", "SLC16A4", "SLC30A1", "SLC30A7", "SLC35F5", 
> "SLC38A2", "SLC38A6", "SLC39A14", "SMAD3", "SNAI2", "SNAP23", 
> "SOSTDC1", "SOX9", "SPARC", "SPARCL1", "SPATA20", "SPCS3", "SPRED1", 
> "SPRY1", "SPRY4", "SPRY4-IT1", "SQSTM1", "SRPX", "SSBP4", "SSR1", 
> "SSR3", "STAT1", "STAT3", "STEAP1", "STK38L", "SUCLG2", "SURF4", "SVIL", "SYDE1", "SYNJ2", "SYPL1", "TCF7L2", "TFE3", "TFPI", "TGFB1I1", "TGFBR2", "THBS1", "TIMP1", "TJP1")
>       +
>
> For some reason, the R console does not display a ">" symbol, 
> indicating that it has completed the function, but displays a "+"
> symbol instead, which indicates that the function is still waiting for more input.
> However, I believe that my syntax is correct. If I shorten my command 
> by a few characters by removing the last entry, "TJP1":
>
>       $ R
>       > MES = c("A2M", "ABRACL", "ACADVL", "ACAP2", "ACTA2", "ACTN1", 
> "ADAM19", "ADAM9", "ADAMTS5", "ADGRE5", "ADGRG6", "AEBP1", "AJUBA", 
> "ALDH1A3", "AMMECR1", "ANTXR1", "ANXA1", "ANXA2", "ANXA5", "ANXA6", 
> "APOE", "APP", "ARHGAP1", "ARHGEF40", "ARL1", "ARL4A", "ARMCX2", 
> "ARPC1B", "ASPH", "ATP10D", "ATP1B1", "ATP2B1", "ATP2B4", "ATP6V0E1", 
> "ATP8B2", "ATXN1", "B2M", "BAG3", "BGN", "BMP5", "BNC2", "BOC", 
> "BTN3A2", "C1orf198", "C1orf54", "C4orf32", "C6orf120", "CALD1", 
> "CALU", "CAPN2", "CAPN6", "CBFB", "CBLB", "CCDC80", "CD164", "CD44", 
> "CD59", "CD63", "CDH11", "CETN2", "CFH", "CFI", "CILP", "CKAP4", 
> "CLIC4", "CMTM3", "CMTM6", "CNN3", "COL11A1", "COL12A1", "COL1A1", 
> "COL27A1", "COL3A1", "COL4A1", "COL4A2", "COL5A1", "COL5A2", "COL6A1", 
> "COL6A2", "COL6A3", "COPA", "CPED1", "CPS1", "CRABP2", "CREB3L2", 
> "CREG1", "CRELD2", "CRISPLD1", "CRTAP", "CSRP1", "CTDSP2", "CTNNA1", 
> "CTSB", "CTSC", "CTSO", "CXCL12", "CYBRD1", "CYFIP1", "CYP26A1", 
> "CYR61", "DCAF6", "DDOST", "DDR2", "DESI2", "DKK3", "DLC1", "DLX1", 
> "DLX2", "DMD", "DNAJC1", "DNAJC10", "DNAJC3", "DNM3OS", "DPY19L1", 
> "DSE", "DUSP14", "DUSP5", "DUSP6", "EDEM1", "EDNRA", "EFEMP2", "EGFR", 
> "EGR1", "EGR3", "EHD2", "ELAVL1", "ELF1", "ELK3", "ELK4", "EMILIN1", 
> "EMP1", "ENAH", "EPHA3", "EPS8", "ERBIN", "ERLIN1", "ERRFI1", "ETS1", 
> "EVA1A", "EXT1", "EXTL2", "F2R", "F2RL2", "FAM102B", "FAM114A1", 
> "FAM120A", "FAM129A", "FAM3C", "FAM43A", "FAM46A", "FAT1", "FBN1", 
> "FBN2", "FGFR1", "FIBIN", "FILIP1L", "FKBP14", "FLNA", "FLRT2", 
> "FMOD", "FN1", "FNDC3B", "FSTL1", "FUCA2", "FZD1", "FZD2", "FZD7", 
> "GABRR1", "GALNT10", "GAS1", "GAS2", "GDF15", "GJA1", "GNAI1", 
> "GNG12", "GNS", "GORAB", "GPC6", "GPR137B", "GPX8", "GRN", "GSN", 
> "HES1", "HEXB", "HIBADH", "HIPK3", "HIST1H2AC", "HIST1H2BK", "HLA-A", 
> "HLA-B", "HLA-C", "HLA-F", "HLX", "HNMT", "HOMER1", "HS3ST3A1", 
> "HSP90B1", "HSPA5", "HSPB1", "HTRA1", "HYOU1", "ID1", "ID3", "IFI16", 
> "IFITM2", "IFITM3", "IGF2R", "IGFBP5", "IGFBP6", "IL13RA1", "IL6ST", 
> "INSIG1", "IQGAP2", "ITGA10", "ITGA4", "ITGAV", "ITGB1", "ITM2B", 
> "ITM2C", "ITPR1", "ITPRIPL2", "JAK1", "JAM3", "KANK2", "KCNK2", 
> "KCTD12", "KDELC2", "KDELR2", "KDELR3", "KDM5B", "KIAA1462", "KIF13A", 
> "KIRREL", "KLF10", "KLF4", "KLF6", "L3HYPDH", "LAMB1", "LAMC1", 
> "LAMP1", "LAPTM4A", "LASP1", "LATS2", "LEPROT", "LGALS1", "LHFP", 
> "LHX8", "LIFR", "LIPA", "LITAF", "LIX1L", "LMAN1", "LMNA", "LOXL2", 
> "LPP", "LRP10", "LRRC17", "LRRC8C", "LTBP1", "LUZP1", "MAGT1", 
> "MAML2", "MAN2A1", "MANF", "MBD2", "MBNL1", "MBTPS1", "MEOX1", 
> "MEOX2", "MEST", "MGAT2", "MGP", "MGST1", "MICAL2", "MMP2", "MOB1A", 
> "MRC2", "MXRA5", "MYADM", "MYDGF", "MYL12A", "MYL12B", "MYLIP", 
> "NANS", "NBR1", "NEK7", "NES", "NFIA", "NFIC", "NID1", "NID2", 
> "NOTCH2", "NOTCH2NL", "NPC2", "NPTN", "NQO1", "NR3C1", "NRP1", 
> "OGFRL1", "OLFML2A", "OLFML2B", "OLFML3", "OSTC", "P4HA1", "PALLD", 
> "PAPSS2", "PCDH18", "PCOLCE2", "PCSK5", "PDE3A", "PDE7B", "PDGFC", 
> "PDIA3", "PDIA4", "PDIA6", "PDLIM1", "PEA15", "PEAK1", "PHLDA3", 
> "PHLDB2", "PHTF2", "PIAS3", "PLAGL1", "PLEKHA2", "PLEKHH2", "PLK2", 
> "PLOD2", "PLOD3", "PLPP1", "PLS3", "PLSCR1", "PLSCR4", "PLXDC2", 
> "POLR2L", "PON2", "POSTN", "PPIB", "PPIC", "PPT1", "PRCP", "PRDM6", 
> "PRDX4", "PRDX6", "PROM1", "PRRX1", "PTBP1", "PTGER4", "PTGFRN", 
> "PTN", "PTPN14", "PTPRG", "PTPRK", "PTRF", "PXDC1", "PXDN", "PYGL", 
> "QKI", "QSOX1", "RAB13", "RAB29", "RAB31", "RAP1A", "RAP1B", "RBMS1", 
> "RCN1", "RECK", "REST", "RGL1", "RGS10", "RGS3", "RHOC", "RHOJ", 
> "RIN2", "RIT1", "RNFT1", "RNH1", "ROBO1", "ROR1", "RRBP1", "S1PR3", 
> "SASH1", "SCPEP1", "SCRG1", "SDC2", "SDC4", "SDCBP", "SDF4", 
> "SEC14L1", "SEL1L3", "SEMA3C", "SEMA3F", "SEPT10", "SERPINE2", 
> "SERPINH1", "SFT2D1", "SFT2D2", "SGK1", "SH3BGRL", "SHC1", "SHROOM3", 
> "SIX1", "SIX4", "SKIL", "SLC16A4", "SLC30A1", "SLC30A7", "SLC35F5", 
> "SLC38A2", "SLC38A6", "SLC39A14", "SMAD3", "SNAI2", "SNAP23", 
> "SOSTDC1", "SOX9", "SPARC", "SPARCL1", "SPATA20", "SPCS3", "SPRED1", 
> "SPRY1", "SPRY4", "SPRY4-IT1", "SQSTM1", "SRPX", "SSBP4", "SSR1", 
> "SSR3", "STAT1", "STAT3", "STEAP1", "STK38L", "SUCLG2", "SURF4", "SVIL", "SYDE1", "SYNJ2", "SYPL1", "TCF7L2", "TFE3", "TFPI", "TGFB1I1", "TGFBR2", "THBS1", "TIMP1")
>       >
>
> The function now works, and returns a ">" symbol, indicating that the 
> function completed successfully. The ls() function proves it:
>
>       $ R
>       > ls()
>       [1] "MES"
>
> Is this a bug in the base R "c" function? It seems like the "c" 
> function can only accept so many characters before it fails.
>
> Thank you for your time,
> core_contingency
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
> man_listinfo_r-2Dhelp&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs
> Rzsn7AkP-g&m=bDsMNmn2OhMbQIvptul8Yl752vW7YkxJ0v91xFMGuYQo9NsNdzAp5k0CD
> 2XzMc9x&s=x5SJbPFsoqRiJYh7Y5B0QDKio2Wy4Je38lBBi99AbAE&e=
> PLEASE do read the posting guide
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
> g_posting-2Dguide.html&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
> sRzsn7AkP-g&m=bDsMNmn2OhMbQIvptul8Yl752vW7YkxJ0v91xFMGuYQo9NsNdzAp5k0C
> D2XzMc9x&s=lCG5D1ItMs8G_wkshvm4nBaVq4Ehy_zyq5mnp2zpt2Y&e=
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=bDsMNmn2OhMbQIvptul8Yl752vW7YkxJ0v91xFMGuYQo9NsNdzAp5k0CD2XzMc9x&s=x5SJbPFsoqRiJYh7Y5B0QDKio2Wy4Je38lBBi99AbAE&e=
PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=bDsMNmn2OhMbQIvptul8Yl752vW7YkxJ0v91xFMGuYQo9NsNdzAp5k0CD2XzMc9x&s=lCG5D1ItMs8G_wkshvm4nBaVq4Ehy_zyq5mnp2zpt2Y&e=
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list