[R] convert list to Dataframe

onyourmark william108 at gmail.com
Sun Nov 1 14:24:54 CET 2009


Hello. The "fields" are separated by a ';'. I think that the data is
"rectangular" in the sense that there are about 15 fields for each row. Some
of the fields are empty. In the dput() display below, it seems that the rows
are delimited by ' " ' .
Any idea from this?

Here is the end of the output for dput(twitter)

"4927861;05:04:14;28;10;2009;HOYTSTHEATRES;GameStop Brings  15K  Manage
Holiday Rush [Black Friday]
http://bit.ly/2d3OJg;Australia;Australia;;;;-25.274398;133.775136", 
"4927863;05:04:14;28;10;2009;padden;Rachel  master chef  cook 
anytime!;Sydney, Australia;Australia;NSW;;;-33.867139;151.207114", 
"4927878;05:04:17;28;10;2009;GSpotMagazine;The penalty  success   bored 
attentions  people  formerly snubbed you. -Mary Wilson Little
#quote;UK;United Kingdom;;;;55.378051;-3.435973", 
"4927885;05:04:20;28;10;2009;super_assassin;@triplejsr flight  conchords,
pleeeeeaaase :) thanks rosie
xx;Australia;Australia;;;;-25.274398;133.775136", 
"4927893;05:04:21;28;10;2009;SLMFE;Gestern:Achso,ja okey,um 5 nach las ich
jemanden komen der dir die Akupunkturnadel(zb 5!im Ohr!)entfernt..Um 10 n.
kommt immer noch keiner..;Germany;Germany;;;;51.165691;10.451526", 
"4927901;05:04:23;28;10;2009;mikesemple;HHS Secretary pushes health care
reform  rural America: By Christopher Smart The health-care crisis  ..
http://bit.ly/49Iqcu;London;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362", 
"4927913;05:04:26;28;10;2009;coax_k;Facebook Headquarters  Studio O+A: San
Francisco based interior design firm Studio O+A  designed  ..
http://bit.ly/hdqWp;Sydney;Australia;NSW;;;-33.867139;151.207114"
), Author = character(0), DateTimeStamp = structure(list(sec =
56.4049999713898, 
    min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L, 
    wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", "min", 
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXt", 
"POSIXlt"), tzone = "GMT"), Description = character(0), Heading =
character(0), ID = "1", Language = "en", LocalMetaData = list(), Origin =
character(0), class = c("PlainTextDocument", 
"TextDocument", "character"))), CMetaData = structure(list(NodeID = 0, 
    MetaData = structure(list(create_date = structure(list(sec =
56.4059998989105, 
        min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L, 
        wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", 
    "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
    ), class = c("POSIXt", "POSIXlt"), tzone = "GMT"), creator =
structure("", .Names = "LOGNAME")), .Names = c("create_date", 
    "creator")), Children = NULL), .Names = c("NodeID", "MetaData", 
"Children"), class = "MetaDataNode"), DMetaData = structure(list(
    MetaID = 0), .Names = "MetaID", row.names = c(NA, -1L), class =
"data.frame"), class = c("VCorpus", 
"Corpus", "list"))




onyourmark wrote:
> 
> Hi. I have a huge list called twitter:
> 
>> dim(twitter)
> NULL
>> str(twitter)
> List of 1
>  $ :Classes 'PlainTextDocument', 'TextDocument', 'character'  atomic
> [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons
> For Governance From Campaigner-in-chief: President obama jumps  campaign
> 09  tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
> 12210;10:47:37;20;10;2009;David_Stringer;William Hague heading  Washington 
> meets  Gen. Jim Jones, Sen. John McCain  others. Will Obama team raise
> worries  EU ties?;London, England;United Kingdom;Greater
> London;Westminster;;51.5001524;-0.1262362
> 12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses
> wearing thin  Obama, media pals... http://tinyurl.com/yfw6cd9;So.
> California;USA;CA;;;36.778261;-119.4179324
> 12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama   Afghanistan
> troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama
> #video;USA;USA;;;;37.09024;-95.712891 ...
>   .. ..- attr(*, "Author")= chr(0) 
>   .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31
> 04:46:56"
>   .. ..- attr(*, "Description")= chr(0) 
>   .. ..- attr(*, "Heading")= chr(0) 
>   .. ..- attr(*, "ID")= chr "1"
>   .. ..- attr(*, "Language")= chr "en"
>   .. ..- attr(*, "LocalMetaData")= list()
>   .. ..- attr(*, "Origin")= chr(0) 
>  - attr(*, "CMetaData")=List of 3
>   ..$ NodeID  : num 0
>   ..$ MetaData:List of 2
>   .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
>   .. ..$ creator    : Named chr ""
>   .. .. ..- attr(*, "names")= chr "LOGNAME"
>   ..$ Children: NULL
>   ..- attr(*, "class")= chr "MetaDataNode"
>  - attr(*, "DMetaData")='data.frame':   1 obs. of  1 variable:
>   ..$ MetaID: num 0
>  - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"
> 
> It contains tweets but in many languages. The "columns" are separated by
> semi-colons. I am using the tm package and it is a "corpus".
> 
> It looks like this:
> 
> 547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1   day
> :p;Huddersfield/Lincoln;United
> Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
> 547283;06:37:17;21;10;2009;fabiomafra;alguém traz mais lenha pro
> computador da facool? BOM DIA.;Belo Horizonte - MG -
> BR;Brazil;MG;;;-19.8157306;-43.9542226
> 547284;06:37:17;21;10;2009;romanotr;Вау, "Репортеры без границ"
> опубликовали список стран со свободой слова, из 173 Грузия на 81 месте
> опережая Украину. Успехи,успехи...;Portugal
> Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169
> 547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton &lt\;Someone's
> Daughter&gt\;;Kanazawa, Japan;Japan;Ishikawa
> Prefecture;;;36.5613254;136.6562051
> Error: invalid input
> '547286;06:37:18;21;10;2009;Atogey;支持你,国家需要他们,但是国家的未来不能靠他们…RT
> @zuola ￿我觉得 @wenyunc
> 
> I want to convert it to "fields" or columns and so I thought I should
> convert it to a dataframe. I tried
> 
>> twitterDF<-as.data.frame(twitter)
> Error in sort.list(y) : 
>   invalid input
> '547286;06:37:18;21;10;2009;Atogey;支持你,国家需要他们,但是国家的未来不能靠他们…RT
> @zuola ￿我觉得 @wenyunchao
> 一点都不乐观。真正的乐观应该是:你关我又怎么样,反正政治斗争不会丢掉性命,老子出来后更是一条好汉。北风还是舍不得*霸地位、肉、书、女人和网络的,不过牢里不会提供这些。另…;山西,浙江;China;Zhejiang;;;28.695035;119.751054'
> in 'utf8towcs'
>> 
> 
> Can anyone suggest what I can do? 
> 
> P.S. Actually, I would love to remove all the non-English tweets but I
> have no clue about how to do that.
> 
> 

-- 
View this message in context: http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148893.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list