[R] Combine recursive lists in a single list or data frame and write it to file

Jim Lemon drjimlemon @ending from gm@il@com
Thu Dec 20 03:35:56 CET 2018


Hi Ek,
Look at unlist and the argument "recursive". You can step down through
the levels or a nested list to convert it to a single level list.

Jim

On Thu, Dec 20, 2018 at 1:33 PM Ek Esawi <esawiek using gmail.com> wrote:
>
> Thank you Bert. I don't see how unlist will help. I want to combine
> them but keep the "rectangular structure",e.g. list, data frame,
> matrix  because i want to get the tables in their original form.
> Unlist converts the whole output to a single vector; unless i am
> missing something.
>
> On Wed, Dec 19, 2018 at 9:10 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:
> >
> > Does ?unlist not help? Why not?
> >
> > Bert
> >
> >
> > On Wed, Dec 19, 2018, 5:13 PM Ek Esawi <esawiek using gmail.com wrote:
> >>
> >> Hi All—
> >>
> >>  I am using the R tabulizer package to extract tables from pdf files.
> >> The output is a set of lists of matrices. The package extracts tables
> >> and a lot of extra stuff which is nearly impossible to clean with
> >> RegEx. So, I want to clean it manually.
> >> To do so I need to (1) combine all lists in a single list or data
> >> frame and (2) then write the single entity to a text file to edit it.
> >> I could not figure out how.
> >>
> >> I tried something like this but did not work.
> >> lapply(MyTables, function(x)
> >> lapply(x,write.table(file="temp.txt",append = TRUE)))
> >>
> >>  Any help is greatly appreciated.
> >>
> >>  Here is my code:
> >>
> >> install.packages("rJava")    ;library(rJava)
> >> install.packages("tabulizer");library(tabulizer)
> >> MyPath <- "C:/Users/name/Documents/tEMP"
> >> ExtTable <- function (Path,CalOrd){
> >>   FileNames <- dir(Path, pattern =".(pdf|PDF)",full.names = TRUE)
> >>   MyFiles <- lapply(FileNames, function(i) extract_tables(i,method = "stream"))
> >>   if(CalOrd == "Yes"){
> >>     MyOFiles <- gsub("(\\s.*)|(.pdf|.PDF)","",basename(FileNames))
> >>     MyOFiles <- match(MyOFiles,month.name)
> >>     MyNFiles <- MyFiles[order(MyOFiles)]}
> >>   else
> >>     MyFiles
> >> }
> >> MyTables <- ExtTable(Path=MyPath,CalOrd = "No")
> >>
> >> Here is cleaned portion of the output: The whole output consists of 3
> >> lists, each contains 12, 15, and 12 sub-lists.
> >>
> >>  [[2]][[2]]
> >>  [,1]        [,2]    [,3]    [,4]  [,5]    [,6]    [,7]    [,8]    [,9]  [,10]
> >>  [1,] ""          "Avg."  "+_ lo" "n"   "Med."  ""      "Avg."  "+_
> >> lo" "n"   "Med."
> >>  [2,] "SiOz"      "44.0"  "1.26"  "375" "44.1"  "Nb"    "4.8"   "6.3"
> >>  "58"  "2.7"
> >>  [3,] "T i O  2"  "0.09"  "0.09"  "561" "0.09"  "Mo(b)" "50"    "30"
> >>  "3"   "35"
> >>  [4,] "A1203"     "2.27"  "1.10"  "375" "2.20"  "Ru(b)" "12.4"  "4.1"
> >>  "3"   "12"
> >>  [5,] "FeO total" "8.43"  "1.14"  "375" "8.19"  "Pd(b)" "3.9"   "2.1"
> >>  "19"  "4.1"
> >>  [6,] "MnO"       "0.14"  "0.03"  "366" "0.14"  "Ag(b)" "6.8"   "8.3"
> >>  "17"  "4.8"
> >>  [7,] "MgO"       "41.4"  "3.00"  "375" "41.2"  "Cd(b)" "41"    "14"
> >>  "16"  "37"
> >>  [8,] "CaO"       "2.15"  "1.11"  "374" "2.20"  "In(b)" "12"    "4"
> >>  "19"  "12"
> >>  [9,] "Na20"      "0.24"  "0.16"  "341" "0.21"  "Sn(b)" "54"    "31"
> >>  "6"   "36"
> >> [10,] "K20"       "0.054" "0.11"  "330" "0.028" "Sb(b)" "3.9"   "3.9"
> >>  "11"  "3.2"
> >> [11,] "P205"      "0.056" "0.11"  "233" "0.030" "Te(b)" "11"    "4"
> >>  "18"  "10"
> >> [12,] "Total"     "98.88" ""      ""    "98.43" "Cs(b)" "10"    "16"
> >>  "17"  "1.5"
> >> [13,] ""          ""      ""      ""    ""      "Ba"    "33"    "52"
> >>  "75"  "17"
> >> [14,] "Mg-value"  "89.8"  "1.1"   "375" "90.0"  "La"    "2.60"  "5.70"
> >>  "208" "0.77"
> >> [15,] "Ca/AI"     "1.28"  "1.6"   "374" "1.35"  "Ce"    "6.29"  "11.7"
> >>  "197" "2.08"
> >> [16,] "AI/Ti"     "22"    "29"    "361" "22"    "Pr"    "0.56"  "0.87"
> >>  "40"  "0.21"
> >> [17,] "F e / M n" "60"    "10"    "366" "59"    "Nd"    "2.67"  "4.31"
> >>  "162" "1.52"
> >> [18,] ""          ""      ""      ""    ""      "Sm"    "0.47"  "0.69"
> >>  "214" "0.25"
> >> [19,] "Li"        "1.5"   "0.3"   "6"   "1.5"   "Eu"    "0.16"  "0.21"
> >>  "201" "0.097"
> >> [20,] "B"         "0.53"  "0.07"  "6"   "0.55"  "Gd"    "0.60"  "0.83"
> >>  "67"  "0.31"
> >> [21,] "C"         "110"   "50"    "13"  "93"    "Tb"    "0.070"
> >> "0.064" "146" "0.056"
> >> [22,] "F"         "88"    "71"    "15"  "100"   "Dy"    "0.51"  "0.35"
> >>  "58"  "0.47"
> >> [23,] "S"         "157"   "77"    "22"  "152"   "Ho"    "0.12"  "0.14"
> >>  "54"  "0.090"
> >> [24,] "C1"        "53"    "45"    "15"  "75"    "Er"    "0.30"  "0.22"
> >>  "52"  "0.28"
> >> [25,] "Sc"        "12.2"  "6.4"   "220" "12.0"  "Tm"    "0.038"
> >> "0.026" "40"  "0.035"
> >> [26,] "V"         "56"    "21"    "132" "53"    "Yb"    "0.26"  "0.14"
> >>  "201" "0.27"
> >> [27,] "Cr"        "2690"  "705"   "325" "2690"  "Lu"    "0.043"
> >> "0.023" "172" "0.045"
> >> [28,] "Co"        "112"   "10"    "166" "111"   "Hf"    "0.27"  "0.30"
> >>  "71"  "0.17"
> >> [29,] "Ni"        "2160"  "304"   "308" "2140"  "Ta"    "0.40"  "0.51"
> >>  "38"  "0.23"
> >> [30,] "Cu"        "11"    "9"     "94"  "9"     "W(b)"  "7.2"   "5.2"
> >>  "6"   "4.0"
> >> [31,] "Zn"        "65"    "20"    "129" "60"    "Re(b)" "0.13"  "0.11"
> >>  "18"  "0.09"
> >> [32,] "Ga"        "2.4"   "1.3"   "49"  "2.4"   "Os(b)" "4.0"   "1.8"
> >>  "18"  "3.7"
> >> [33,] "Ge"        "0.96"  "0.19"  "19"  "0.92"  "Ir(b)" "3.7"   "0.9"
> >>  "34"  "3.0"
> >> [34,] "As"        "0.11"  "0.07"  "7"   "0.10"  "Pt(b)" "7"     "-"
> >>  "1"   "-"
> >> [35,] "Se"        "0.041" "0.056" "18"  "0.025" "Au(b)" "0.65"  "0.53"
> >>  "30"  "0.5"
> >> [36,] "Br"        "0.01"  "0.01"  "6"   "0.01"  "Tl(b)" "1.2"   "1.0"
> >>  "13"  "0.9"
> >> [37,] "Rb"        "1,9"   "4.8"   "97"  "0.38"  "Pb"    "0.16"  "0.11"
> >>  "17"  "0.16"
> >> [38,] "Sr"        "49"    "60"    "110" "20"    "Bi(b)" "1.7"   "0.7"
> >>  "13"  "1.6"
> >> [39,] "Y"         "4.4"   "5.5"   "86"  "3.1"   "Th*"   "0.71"  "1.2"
> >>  "71"  "0.22"
> >> [40,] "Zr"        "21"    "42"    "82"  "8.0"   "U"     "0.12"  "0.23"
> >>  "48"  "0.040"
> >> [[2]][[4]]
> >> [,1]       [,2]                 [,3]     [,4]                  [,5]
> >>  [,6]
> >>  [1,] ""         "Spinel peridotites" ""       "Garnet  peridotites"
> >> ""       "Primitive"
> >>  [2,] ""         "Avg. Meal."         "M-A sp" "M-A gt B-M"
> >> "Jordan" "mantle"
> >>  [3,] "SiO 2"    "44.0 44.1"          "44.15"  "44.99 45.00"
> >> "45.55"  "44.8"
> >>  [4,] "TiO 2"    "0.09 0.09"          "0.07"   "0.06 0.08"
> >> "0.11"   "0.21"
> >>  [5,] "A1203"    "2.27 2.20"          "1.96"   "1.40 1.31"
> >> "1.43"   "4.45"
> >>  [6,] "Cr203"    "0.39 0.39"          "0.44"   "0.32 0.38"
> >> "0.34"   "0.43"
> >>  [7,] "FeOtotal" "8.43 8.19"          "8.28"   "7.89 6.97"
> >> "7.61"   "8.40"
> >>  [8,] "Mn O"     "0.14 0.14"          "0.12"   "0.11 0.13"
> >> "0.11"   "0.14"
> >>  [9,] "MgO"      "41.4 41.2"          "42.25"  "42.60 44.86"
> >> "43.55"  "37.2"
> >> [10,] "NiO"      "0.27 0.27"          "0.27"   "0.26 0.29"
> >> "-"      "0.24"
> >> [11,] "CaO"      "2.15 2.20"          "2.08"   "0.82 0.77"
> >> "1.05"   "3.60"
> >> [12,] "Na  20"   "0.24 0.21"          "0.18"   "0.11 0.09"
> >> "0.14"   "0.34"
> >> [13,] "K 2 0"    "0.054 0.028"        "0.05"   "0.04 0.10"
> >> "0.11"   "0.028"
> >> [14,] "P205"     "0.056 0.030"        "0.02"   "- 0.01"
> >> "-"      "0.022"
> >> [15,] "Total"    "99.49 99.05"        "99.87"  "98.60 100.00"
> >> "100.00" "99.86"
> >> [16,] "Mg-value" "89.8 90.0"          "90.1"   "90.6 92.0"
> >> "91.1"   "88.8"
> >> [17,] "olivine"  "62 63"              "67"     "65 68"
> >> "66"     "56 57"
> >> [18,] "opx"      "24 24"              "22"     "28 25"
> >> "28"     "22 17"
> >> [19,] "cpx"      "12 11"              "9"      "3 2"
> >> "3"      "19 10"
> >> [20,] "spinel"   "2 2"                "2"      "- -"
> >> "-"      "3 -"
> >>
> >> Here is portion of the output for str(MyTables):
> >>
> >> str(MyTables)
> >>
> >> List of 3
> >>  $ :List of 12
> >> $ : chr [1:3, 1:2] "south of the artificial lake Lokka. Intrusive
> >> complexes" "of alkaline rocks are found at Sokli (phosphorite-bear-"
> >> "ing and a possible Nb-occurrence) in Finland, and at" "(Eriksson,
> >> 1992). During this period, Northern Europe" ...
> >>   ..$ : chr [1:55, 1:15] "Element" "Ag" "Al" "Al_XRF" ...
> >>   ..$ : chr [1:56, 1:2] "in the till is mainly of local origin,
> >> although some cob-" "bles and boulders may have been transported over
> >> sev-" "eral kilometres. The moraine formations in the study" "area are
> >> mostly gravelly and sandy tills, locally hum-" ...
> >>   ..$ : chr [1:53, 1:2] "requisites. PCA accounts for maximum variance
> >> of all" "variables, while FA is based on the correlation structure"
> >> "of the variables. The model of factor analysis allows that" "the
> >> common factors do not explain the total variation of" ...
> >>   ..$ : chr [1:54, 1:7] "lished examples of the use of factor
> >> analysis, it is neglec-" "ted that regional geochemical (and
> >> environmental) data" "almost never follow a normal distribution.
> >> Continuing Method" "with factor analysis in such a case must lead to
> >> biased" ...
> >>   ..$ : chr [1:16, 1:2] "shows the factor loadings of the different
> >> variables" "entering each factor. Names of variables with an abso-"
> >> "lute value of the loadings <0.3 are not plotted. Fig. 5" "shows 8
> >> results of factor analyses using a selection of all" ...
> >>   ..$ : chr [1:21, 1:2] "pretable results, notwithstanding the fact
> >> that on the" "basis of the foregoing discussion it should probably
> >> not" "be used with these data. Do these results warrant the use" "of a
> >> quite work-intensive method? Unfortunately not," ...
> >>   ..$ : chr [1:55, 1:8] "" "Ag" "Al" "Al_XRF" ...
> >>   ..$ : chr [1:23, 1:2] "addition, geochemical reasoning (e.g.
> >> geochemical asso-" "ciations and/or pathfinder elements for different
> >> types of" "ore deposits) was used to select further sub-sets of vari-"
> >> "ables. In geochemistry, the selection of elements entered" ...
> >>   ..$ : chr [1:55, 1:2] "Fig. 10C cuts several geological units, and
> >> is most likely" "indicative of alteration processes related to a
> >> deep-" "seated fault. It was revealed again in a factor analysis"
> >> "carried out with all those elements extracted by aqua" ...
> >>   ..$ : chr [1:50, 1:2] "well justified in stating that it is not very
> >> scientific to" "play with the selection of elements and number of
> >> fac-" "tors extracted until one
> >> â\200\230â\200\230findsâ\200\231â\200\231 an
> >> â\200\230â\200\230interestingâ\200\231â\200\231 result." "On the other
> >> hand, even all the different results pre-" ...
> >>   ..$ : chr [1:24, 1:2] "Niemelä, J., Ekman, I., Lukashov, A. (Eds.),
> >> 1993. Quaternary" "Deposits of Finland and Northwestern Part of
> >> Russian Fed-" "eration and Their Resources 1:1,000,000. Geological
> >> Survey" "of Finland, Espoo, Finland." ...
> >>  $ :List of 15
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list