[R] select portion of text file using R

Luigi Marongiu marongiu.luigi at gmail.com
Mon Apr 27 23:20:10 CEST 2015


Dear Duncan,
thank you for your reply,
I tried to read the file using skip and nrows but it did not work.
Here i am pasting the code I wrote and the head of the file i need to
read. Probably the error is due to the fact that the column "well" has
duplication, but how can i add a row column with unique row names? How
can I overcome this error?
Best regards
Luigi

CODE
raw.data<-read.table(
      mydata,
      header=TRUE,
      row.names=31,
      dec=".",
      sep="\t",
      skip = 30,
      nrows = 17281,
      row.names = 1:17281
    )


HEAD OF MYDATA
* Block Type = Array Card Block
* Calibration Background is expired = No
* Calibration Background performed on = 2014-12-02 11:27:49 AM PST
* Calibration FAM is expired = No
* Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
* Calibration ROI is expired = No
* Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
* Calibration ROX is expired = No
* Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
* Calibration Uniformity is expired = No
* Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
* Calibration VIC is expired = No
* Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
* Chemistry = TAQMAN
* Experiment Barcode =
* Experiment Comments =
* Experiment File Name = F:\2015-04-13 Gastro array 59 Luigi - plate 3.eds
* Experiment Name = 2015-04-13 171216
* Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
* Experiment Type = Comparative Cт (ΔΔCт)
* Experiment User Name =
* Instrument Name = 278882033
* Instrument Serial Number = 278882033
* Instrument Type = ViiA 7
* Passive Reference = ROX
* Quantification Cycle Method = Ct
* Signal Smoothing On = false
* Stage/ Cycle where Analysis is performed = Stage 3, Step 2

[Amplification Data]

Well \tCycle \tTarget \tName \tRn
\t1 \t1 \tAdeno 1 \t0.82
\t1 \t2 \tAdeno 1\ \t0.93
...
\t2 \t1 \tAdeno 2 \t0.78
...

On Mon, Apr 20, 2015 at 12:17 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
> On 20/04/2015 3:28 AM, Luigi Marongiu wrote:
>> Dear all,
>> I have a flat file (tab delimited) derived from an excel file which is
>> subdivided in different parts: a first part is reporting metadata,
>> then there is a first spreadsheet indicated by [ ], then the actual
>> data and the second spreadsheet with the same format [ ] and then the
>> data.
>> How can I import such file using for instance read.table()?
>
> read.table() by itself can't recognize where the data starts, but it has
> arguments "skip" and "nrows" to control how much gets read.  If you
> don't know the values for those arguments, you can use readLines() to
> read the entire file, then use grep() to recognize your table data, and
> either re-read the file, or just extract those lines and read from them
> as a textConnection.
>
> Duncan Murdoch
>
>> Many thanks
>> regards
>> Luigi
>>
>> Here is a sample of the file:
>> * Experiment Barcode =
>> * Experiment Comments =
>> * Experiment File Name = F:\array 59
>> * Experiment Name = 2015-04-13 171216
>> * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
>> ...
>> [Amplification Data]
>> Well    Cycle    Target Name    Rn    Delta Rn
>> 1    1    Adeno 1-Adeno 1    0.820    -0.051
>> 1    2    Adeno 1-Adeno 1    0.827    -0.042
>> 1    3    Adeno 1-Adeno 1    0.843    -0.025
>> 1    4    Adeno 1-Adeno 1    0.852    -0.015
>> 1    5    Adeno 1-Adeno 1    0.858    -0.008
>> 1    6    Adeno 1-Adeno 1    0.862    -0.002
>> ...
>> [Results]
>> Well    Well Position    Omit    Sample Name    Target Name    Task
>> Reporter    Quencher    RQ    RQ Min    RQ Max    CT    Ct Mean    Ct
>> SD    Quantity    Delta Ct Mean    Delta Ct SD    Delta Delta Ct
>> Automatic Ct Threshold    Ct Threshold    Automatic Baseline
>> Baseline Start    Baseline End    Efficiency    Comments    Custom1
>> Custom2    Custom3    Custom4    Custom5    Custom6    NOAMP
>> EXPFAIL
>> 1    A1    false    P17    Adeno 1-Adeno 1    UNKNOWN    FAM
>> NFQ-MGB                Undetermined                            false
>>  0.200    true    3    44    1.000    N/A                            N
>>    Y
>> 2    A2    false    P17    Adeno 40/41 EH-AIQJCT3    UNKNOWN    FAM
>> NFQ-MGB                Undetermined
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list