[R] select portion of text file using R

Duncan Mackay dulcalma at bigpond.com
Tue Apr 28 03:30:00 CEST 2015


Hi Luigi

I think there may be problems with \t being equivalent to tab chr(9)

Therefore try

xlines <-
readLines(textConnection("* Block Type = Array Card Block
* Calibration Background is expired = No
* Calibration Background performed on = 2014-12-02 11:27:49 AM PST
* Calibration FAM is expired = No
* Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
* Calibration ROI is expired = No
* Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
* Calibration ROX is expired = No
* Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
* Calibration Uniformity is expired = No
* Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
* Calibration VIC is expired = No
* Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
* Chemistry = TAQMAN
* Experiment Barcode =
* Experiment Comments =
* Experiment File Name = F:\2015-04-13 Gastro array 59 Luigi - plate 3.eds
* Experiment Name = 2015-04-13 171216
* Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
* Experiment Type = Comparative C? (??C?)
* Experiment User Name =
* Instrument Name = 278882033
* Instrument Serial Number = 278882033
* Instrument Type = ViiA 7
* Passive Reference = ROX
* Quantification Cycle Method = Ct
* Signal Smoothing On = false
* Stage/ Cycle where Analysis is performed = Stage 3, Step 2
Well  Cycle   Target  Name  Rn
  1   1   Adeno 1   0.82
  1   2   Adeno 1   0.93
  2   1   Adeno 2   0.78") )
xlines = sub("^\\*.*$","", xlines)
xlines = xlines[nchar(xlines)>0]
xlines = sub("^[[:space:]]+","", xlines)
xlines = xlines[-1]
datc = data.frame(do.call(rbind, lapply(xlines, function(x) unlist(strsplit(x, "[[:space:]]+")))))
names(datc) = c("Well","Cycle","Target","Name","Rn")
dat = datc
for (j in c(1,2,4,5)) dat[,j] = as.numeric(dat[,j])

Regards

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mackay at northnet.com.au


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Luigi Marongiu
Sent: Tuesday, 28 April 2015 07:20
To: Duncan Murdoch; r-help
Subject: Re: [R] select portion of text file using R

Dear Duncan,
thank you for your reply,
I tried to read the file using skip and nrows but it did not work.
Here i am pasting the code I wrote and the head of the file i need to
read. Probably the error is due to the fact that the column "well" has
duplication, but how can i add a row column with unique row names? How
can I overcome this error?
Best regards
Luigi

CODE
raw.data<-read.table(
      mydata,
      header=TRUE,
      row.names=31,
      dec=".",
      sep="\t",
      skip = 30,
      nrows = 17281,
      row.names = 1:17281
    )


HEAD OF MYDATA
* Block Type = Array Card Block
* Calibration Background is expired = No
* Calibration Background performed on = 2014-12-02 11:27:49 AM PST
* Calibration FAM is expired = No
* Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
* Calibration ROI is expired = No
* Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
* Calibration ROX is expired = No
* Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
* Calibration Uniformity is expired = No
* Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
* Calibration VIC is expired = No
* Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
* Chemistry = TAQMAN
* Experiment Barcode =
* Experiment Comments =
* Experiment File Name = F:\2015-04-13 Gastro array 59 Luigi - plate 3.eds
* Experiment Name = 2015-04-13 171216
* Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
* Experiment Type = Comparative Cт (ΔΔCт)
* Experiment User Name =
* Instrument Name = 278882033
* Instrument Serial Number = 278882033
* Instrument Type = ViiA 7
* Passive Reference = ROX
* Quantification Cycle Method = Ct
* Signal Smoothing On = false
* Stage/ Cycle where Analysis is performed = Stage 3, Step 2

[Amplification Data]

Well \tCycle \tTarget \tName \tRn
\t1 \t1 \tAdeno 1 \t0.82
\t1 \t2 \tAdeno 1\ \t0.93
...
\t2 \t1 \tAdeno 2 \t0.78
...

On Mon, Apr 20, 2015 at 12:17 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
> On 20/04/2015 3:28 AM, Luigi Marongiu wrote:
>> Dear all,
>> I have a flat file (tab delimited) derived from an excel file which is
>> subdivided in different parts: a first part is reporting metadata,
>> then there is a first spreadsheet indicated by [ ], then the actual
>> data and the second spreadsheet with the same format [ ] and then the
>> data.
>> How can I import such file using for instance read.table()?
>
> read.table() by itself can't recognize where the data starts, but it has
> arguments "skip" and "nrows" to control how much gets read.  If you
> don't know the values for those arguments, you can use readLines() to
> read the entire file, then use grep() to recognize your table data, and
> either re-read the file, or just extract those lines and read from them
> as a textConnection.
>
> Duncan Murdoch
>
>> Many thanks
>> regards
>> Luigi
>>
>> Here is a sample of the file:
>> * Experiment Barcode =
>> * Experiment Comments =
>> * Experiment File Name = F:\array 59
>> * Experiment Name = 2015-04-13 171216
>> * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
>> ...
>> [Amplification Data]
>> Well    Cycle    Target Name    Rn    Delta Rn
>> 1    1    Adeno 1-Adeno 1    0.820    -0.051
>> 1    2    Adeno 1-Adeno 1    0.827    -0.042
>> 1    3    Adeno 1-Adeno 1    0.843    -0.025
>> 1    4    Adeno 1-Adeno 1    0.852    -0.015
>> 1    5    Adeno 1-Adeno 1    0.858    -0.008
>> 1    6    Adeno 1-Adeno 1    0.862    -0.002
>> ...
>> [Results]
>> Well    Well Position    Omit    Sample Name    Target Name    Task
>> Reporter    Quencher    RQ    RQ Min    RQ Max    CT    Ct Mean    Ct
>> SD    Quantity    Delta Ct Mean    Delta Ct SD    Delta Delta Ct
>> Automatic Ct Threshold    Ct Threshold    Automatic Baseline
>> Baseline Start    Baseline End    Efficiency    Comments    Custom1
>> Custom2    Custom3    Custom4    Custom5    Custom6    NOAMP
>> EXPFAIL
>> 1    A1    false    P17    Adeno 1-Adeno 1    UNKNOWN    FAM
>> NFQ-MGB                Undetermined                            false
>>  0.200    true    3    44    1.000    N/A                            N
>>    Y
>> 2    A2    false    P17    Adeno 40/41 EH-AIQJCT3    UNKNOWN    FAM
>> NFQ-MGB                Undetermined
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list