[R] Novice question about getting data into R

Petr PIKAL petr.pikal at precheza.cz
Thu Jun 21 15:21:00 CEST 2012


Hi

I can read the example you provided without much problem.

dput(head(test))
structure(list(n = 0:5, X = c(NA, NA, NA, NA, NA, NA), start = c(11185L, 
39530L, 40544L, 109684L, 114629L, 118841L), X.1 = c(NA, NA, NA, 
NA, NA, NA), dur = c(1L, 2L, 1L, 1L, 0L, 1L), X.2 = c(NA, NA, 
NA, NA, NA, NA), pause = c(28344L, 1012L, 69139L, 4944L, 4212L, 
2558L), X.3 = c(NA, NA, NA, NA, NA, NA), par = c(0, 100, 100, 
100, 0, 100), X.4 = c(NA, NA, NA, NA, NA, NA), ins = c(2L, 3L, 
2L, 2L, 1L, 2L), X.5 = c(NA, NA, NA, NA, NA, NA), del = c(0L, 
0L, 0L, 0L, 0L, 0L), X.6 = c(NA, NA, NA, NA, NA, NA), sid = 
structure(c(10L, 
13L, 16L, 1L, 11L, 12L), .Label = c(" -1", " -1+11+13+15", " -1+110", 
" -1+16", " -1+26+29", " -1+27+30", " -1+32", " -1+4+5", " -1+48", 
" 1", " 17", " 18+19", " 2", " 20", " 28", " 3", " 36", " 37", 
" 38", " 42", " 43", " 45", " 49", " 50", " 53", " 54", " 58", 
" 59", " 61+64"), class = "factor"), X.7 = c(NA, NA, NA, NA, 
NA, NA), tid = structure(c(1L, 6L, 20L, 30L, 38L, 39L), .Label = c(" 1", 
" 10+11+12", " 13+14", " 15+16+17", " 18+19", " 2+3", " 20", 
" 21", " 22", " 23", " 24+25", " 26", " 27+28+29", " 30+31+32", 
" 33+34", " 35", " 36+37", " 38", " 39", " 4", " 40", " 41", 
" 42", " 43", " 44+45", " 46", " 47", " 48", " 49", " 5", " 50", 
" 51", " 52+93", " 53", " 54", " 55", " 56", " 6", " 7", " 8", 
" 9"), class = "factor"), X.8 = c(NA, NA, NA, NA, NA, NA), str = 
structure(c(5L, 
6L, 5L, 5L, 4L, 5L), .Label = c(" ,", " ,_", " .", " ・", " ・・", 
" ・・・", " ・・・.", " ・・・・", " ・・・・・"), class = "factor")), .Names = c("n", 
"X", "start", "X.1", "dur", "X.2", "pause", "X.3", "par", "X.4", 
"ins", "X.5", "del", "X.6", "sid", "X.7", "tid", "X.8", "str"
), row.names = c(NA, 6L), class = "data.frame")

Only Chinese characters are missing and some extra columns appear

> str(test)
'data.frame':   41 obs. of  19 variables:
 $ n    : int  0 1 2 3 4 5 6 7 8 9 ...
 $ X    : logi  NA NA NA NA NA NA ...
 $ start: int  11185 39530 40544 109684 114629 118841 121400 128201 129793 
131852 ...
 $ X.1  : logi  NA NA NA NA NA NA ...
 $ dur  : int  1 2 1 1 0 1 1 1 436 608 ...
 $ X.2  : logi  NA NA NA NA NA NA ...
 $ pause: int  28344 1012 69139 4944 4212 2558 6800 1591 1623 3573 ...
 $ X.3  : logi  NA NA NA NA NA NA ...
 $ par  : num  0 100 100 100 0 100 100 100 0 100 ...
 $ X.4  : logi  NA NA NA NA NA NA ...
 $ ins  : int  2 3 2 2 1 2 2 2 3 3 ...
 $ X.5  : logi  NA NA NA NA NA NA ...
 $ del  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ X.6  : logi  NA NA NA NA NA NA ...
 $ sid  : Factor w/ 29 levels " -1"," -1+11+13+15",..: 10 13 16 1 11 12 1 
1 2 4 ...
 $ X.7  : logi  NA NA NA NA NA NA ...
 $ tid  : Factor w/ 41 levels " 1"," 10+11+12",..: 1 6 20 30 38 39 40 41 2 
3 ...
 $ X.8  : logi  NA NA NA NA NA NA ...
 $ str  : Factor w/ 9 levels " ,"," ,_"," .",..: 5 6 5 5 4 5 5 5 6 6 ...

> sessionInfo()
R Under development (unstable) (2012-03-03 r58569)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Czech_Czech Republic.1250  LC_CTYPE=Czech_Czech 
Republic.1250 
[3] LC_MONETARY=Czech_Czech Republic.1250 LC_NUMERIC=C  
[5] LC_TIME=Czech_Czech Republic.1250 

Regards
Petr

> Dear Professor Daalgard,
> 
> I beginning to participate in one research of statiscal modelling of
> translators'activity data, and recently install R and try to generate 
the
> one Translation Progress Graph, as my colleagues do (with sucess), but 
in my
> Windows platform was found the error below. According R'FAQs, it seems 
to be
> very common error, as I'm not even familiar with the program R and even 
with
> the ProGra, could you help me? Please!
> 
> Note: the Translation Progress Graph is compost by quintuple data {S, T, 
A,
> F, K} for Source and Target Text, Alignment, Fixation and Keyboar data,
> respectively. 
> 
> 
> >ReadData("C:/Users/schmaltz/Dropbox/EN-CH/proGra/EN-ZH_P2_T4_T2")
> Reading Fixation Units:
> C:/Users/schmaltz/Dropbox/EN-CH/proGra/EN-ZH_P2_T4_T2 .fu
> Reading Production Units:
> C:/Users/schmaltz/Dropbox/EN-CH/proGra/EN-ZH_P2_T4_T2 .pu
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
na.strings, : 
> line 38 did not have 10 elements
> 
> Note: We try to delete the line 38, and the program results in another 
line
> error. Even delete all lines after, the some error occur. I think is not 
one
> encoding error, due the fact my colleague use Linux, and I Windows.
> 
> Sample of file above:
> n   start   dur   pause   par   ins   del   sid   tid   str
> 0   11185   1   28344   0   2   0   1   1   尽管
> 1   39530   2   1012   100.00   3   0   2   2+3   发展中
> 2   40544   1   69139   100.00   2   0   3   4   国家
> 3   109684   1   4944   100.00   2   0   -1   5   关于
> 4   114629   0   4212   0   1   0   17   6   为
> 5   118841   1   2558   100.00   2   0   18+19   7   贫困
> 6   121400   1   6800   100.00   2   0   -1   8   人民
> 7   128201   1   1591   100.00   2   0   -1   9   争取
> 8   129793   436   1623   0   3   0   -1+11+13+15   10+11+12   更好的
> 9   131852   608   3573   100.00   3   0   -1+16   13+14   生活的
> 10   136033   1202   1309   100.00   5   0   -1+4+5   15+16+17   说辞是
可以
> 11   138544   468   3682   100.00   3   0   -1   18+19   理解的
> 12   142694   359   10811   0   2   0   20   20   ,_
> 13   153864   0   2121   0   1   0   -1   21   但
> 14   155985   1   2838   100.00   2   0   -1   22   其实
> 15   158824   1   1435   100.00   2   0   -1   23   保护
> 16   160260   421   3619   87.65   3   0   -1   24+25   环境和
> 17   164300   1   1075   100.00   2   0   28   26   经济
> 18   165376   1108   1030   100.00   4   0   -1+26+29   27+28+29   发展
是不
> 19   167514   1466   8440   54.98   4   0   -1+27+30   30+31+32   冲突的
.
> 20   177420   906   4023   100.00   4   0   -1+32   33+34   我们必须
> 21   182349   1   1622   100.00   2   0   36   35   鼓励
> 22   183972   2   1573   100.00   3   0   37   36+37   发展中
> 23   185547   1   15381   100.00   2   0   38   38   国家
> 24   200929   1   1934   100.00   2   0   42   39   扩展
> 25   202864   1   5864   100.00   2   0   43   40   绿色
> 26   208729   1   4383   100.00   2   0   -1   41   植被
> 27   213113   0   1497   0   1   0   45   42   ,
> 28   214610   1   2963   100.00   2   0   -1   43   发展
> 29   217574   906   5085   100.00   4   0   -1+48   44+45   节能科技
> 30   223565   0   1575   0   1   0   49   46   ,
> 31   225140   1   2683   100.00   2   0   50   47   并且
> 32   227824   1   2136   100.00   2   0   53   48   帮助
> 33   229961   1   6613   100.00   2   0   -1   49   它们
> 34   236575   1   6068   100.00   2   0   54   50   减少
> 35   242644   1   2635   100.00   2   0   -1   51   环境
> 36   245280   343   8315   100.00   3   0   -1+110   52+93   污染和
> 37   253938   1   1653   100.00   2   0   -1   53   破坏
> 38   255592   0   25381   0   1   0   58   54   .
> 39   280973   1   1809   100.00   2   0   59   55   一些
> 40   282783   1   16878   100.00   2   0   61+64   56   国家
> 
> Thank you very much!
> 
> Marcia
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Novice-
> question-about-getting-data-into-R-tp866806p4633954.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list