[BioC] edgeR

Davis McCarthy dmccarthy at wehi.EDU.AU
Thu Feb 17 05:03:06 CET 2011


Hi Sridhara

This is not a problem as such, your issue will hopefully be solved with a little more explanation of how readDGE() (an edgeR function) and read.delim() (a base R function) work. 

You will see in the documentation for readDGE() 
> ?readDGE
that it accepts several named arguments and an argument '...' which indicates that any further arguments are passed to read.delim(). In your example from the User's Guide, the arguments 'skip=5' and 'comment.char="!"' are arguments that are passed to read.delim(). 'skip=x' skips the first x lines of each file being read in. 'comment.char="!"' skips (does not read in) any line beginning with '!'. These extra arguments were needed for the dataset used in the example in the User's Guide, but are not needed generally. That, perhaps, is not clear in the User's Guide (an overhaul thereof is on my TODO list). 

You are missing one line even when you set 'skip=0' because one of the defaults for read.delim() is 'header=TRUE', which means that by default read.delim() assumes the first line of your file is a header and does not read it into the table in R. Unsurprisingly, you can change this behaviour by setting 'header=FALSE'. 

See 
> ?read.delim 
for more information about the use of read.delim. 

For your example, the call
> d <- readDGE(targets, header=FALSE)
should get all of your data read into a DGEList object in your R session.

Hope that explains why your issue was arising and fixes it so that you can proceed with your analysis.

Best wishes
Davis





On Feb 17, 2011, at 1:47 PM, Sridhara Gupta Kunjeti wrote:

> Hello,
> I was using the edgeR for Reading in the data and creating DGEList objects.
> I followed the instruction as described in edgeR user guide.
> 
> I noticed that not all the entries in the files were loaded into R,
> especially first few entries.
> The steps / codes that I used were:
> 
> step 1. created a plane text with the following information:
> files    group    description
> F0a    PhyP18B1.txt    PhyP18    Phytophthora phaseoli
> F0b    PhyP18B2.txt    PhyP18    Phytophthora phaseoli
> P3a    PPLB3dpiB1.txt    PPLB3dpi    Phytophthora phaseoli
> P3b    PPLB3dpiB2.txt    PPLB3dpi    Phytophthora phaseoli
> 
> step 2.
>> setwd("C:/Users/SRIDHARA/Documents/test/bowtie/0_mismatch")
>> targets <- read.delim(file = "targets.txt", stringsAsFactors = FALSE)
>> targets
>                     files    group           description
> F0a   PhyP18B1.txt   PhyP18 Phytophthora phaseoli
> F0b   PhyP18B2.txt   PhyP18 Phytophthora phaseoli
> P3a PPLB3dpiB1.txt PPLB3dpi Phytophthora phaseoli
> P3b PPLB3dpiB2.txt PPLB3dpi Phytophthora phaseoli
> P6a PPLB6dpiB1.txt PPLB6dpi Phytophthora phaseoli
> P6b PPLB6dpiB2.txt PPLB6dpi Phytophthora phaseoli
> 
> Step 3.
>> d <- readDGE(targets, skip = 5, comment.char = "!")
>> d
> An object of class "DGEList"
> $samples
>             files    group           description lib.size norm.factors
> F0a   PhyP18B1.txt   PhyP18 Phytophthora phaseoli  2442435            1
> F0b   PhyP18B2.txt   PhyP18 Phytophthora phaseoli  7355562            1
> P3a PPLB3dpiB1.txt PPLB3dpi Phytophthora phaseoli   474592            1
> P3b PPLB3dpiB2.txt PPLB3dpi Phytophthora phaseoli    13778            1
> P6a PPLB6dpiB1.txt PPLB6dpi Phytophthora phaseoli  3280812            1
> P6b PPLB6dpiB2.txt PPLB6dpi Phytophthora phaseoli  3906611            1
> 
> $counts
> 
>                                 F0a F0b P3a P3b P6a P6b
> PITG_23029 | Pi Crinkler (CRN) family protein, pseudogene (1794 nt) 170
> 109   0     0     12   8
> PITG_14644 | Pi AMP-binding enzyme, putative (2568 nt)                   5
>  46    0     0     44  44
> PITG_09824 | Pi metalloprotease family M12A, putative (1230 nt)       7
> 17    1     0     33   8
> 
> Here it first five entries were removed,
> when I use the following codes:
>> d <- readDGE(targets, skip = 0, comment.char = "!")
> OR
>> d <- readDE(targets)
> I noticed that first entry is removed. There is first entry with counts,
> which I wanted to be taken into account for the DGE.
> 
> I was wondering if I am doing something wrong, or is there a way to fix this
> problem?
> 
> Any comments or suggestions will be appreciated.
> 
> Many thanks in advance,
> Sridhara
> 
> -- 
> Sridhara G Kunjeti
> PhD Candidate
> University of Delaware
> Department of Plant and Soil Science
> email- sridhara at udel.edu
> Ph: 832-566-0011
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

------------------------------------------------------------------------
Davis J McCarthy
Research Technician
Bioinformatics Division
Walter and Eliza Hall Institute of Medical Research
1G Royal Parade, Parkville, Vic 3052, Australia
dmccarthy at wehi.edu.au
http://www.wehi.edu.au



______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list