[BioC] Problems selecting rows from dataframe (exprs) of GNF Atlas data....

Bas Jansen bjhjansen at gmail.com
Tue Jan 3 19:23:25 CET 2012


Hi Sean (and Axel and Sebastian!):

My apologies for the typo, your were right. That said, I'm now pretty
confident I did the right thing all along, but I have been fooled by
the article, and a bit confused by the GEO entries associated with it
(GSE vs GDS etc). From the PNAS study (see: A gene atlas of the mouse
and human protein-encoding transcriptomes. Su AI, Wiltshire T, Batalov
S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G,
Cooke MP, Walker JR, Hogenesch JB. Proc Natl Acad Sci U S A. 2004 Apr
20;101(16):6062-7.) it is clear that both Affy HG-U133A and the custom
array GNF1H have been used. The probesets that have been mapped in the
UCSC Genome Browser include *all* data, of both platforms. Of course,
the list of probes I have has been derived from that source. Anyway,
at first I assumed that they used a 'hybrid' array, consisting of both
the HG-U133A and their own probe sets, and called them collectively
GNF1H. I have now figured out that this is not the case, and that I
have been looking for HG-U133A probesets in their custom arrays. I now
have analyzed only half of the data. My bad, and I apologize for
wasting your time.

Kind regards,
Bas

On Tue, Jan 3, 2012 at 3:55 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
>
> On Tue, Jan 3, 2012 at 9:37 AM, Bas Jansen <bjhjansen at gmail.com> wrote:
>>
>> Hi Axel, hi Sebastian:
>>
>> Thanks for the cookie, Axel. Anyway, I have done the following:
>>
>> > exprs <- as.dataframe(exprs(eset))
>> > rownames(exprs)
>>    [1] "200000_s_at"                 "200001_at"
>>    [3] "200002_at"                   "200003_s_at"
>>    [5] "200004_at"                   "200005_at"
>>    [7] "200006_at"                   "200007_at"
>>    [9] "200008_s_at"                 "200009_at"
>>   [11] "200010_at"                   "200011_s_at"
>>   [13] "200012_x_at"                 "200013_at"
>>   [15] "200014_s_at"                 "200015_s_at"
>>   [17] "200016_x_at"                 "200017_at"
>> etc.
>>
>
> Hi, Bas.
>
> These are recognized as rownames, yes.  However, if you look at the original
> data from GEO, you will see that these all have "null" for the value; these
> "null" values become NAs in R.  So, if you are concerned about rows of NAs
> when selecting these rownames, you should not be, as this is the correct
> result.
>
> See my note below about your original question, also.
>
>>
>> So I would argue that the 'numbers' are recognized as rownames here,
>> but I cannot select them as indicated in a previous email. Strange,
>> isn't it?
>> I still need to try Sebastian's suggestions though, so let's not run
>> off the cliff just yet. Below the sessionInfo.
>>
>> Kind regards,
>> Bas
>>
>> > sessionInfo()
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C/UTF-8/C/C/C/C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] fortunes_1.4-2 Biobase_2.14.0
>>
>> loaded via a namespace (and not attached):
>> [1] tcltk_2.14.0 tools_2.14.0
>>
>>
>> On Tue, Jan 3, 2012 at 2:33 PM,  <axel.klenk at actelion.com> wrote:
>> > Dear Bas,
>> >
>> > I think you'll need to show us your original code, in particular what
>> > your
>> > 'exprs' is
>> > and how you have obtained it. If you have "extracted the expression
>> > values" from
>> > an ExpressionSet ES like
>> >
>> > x <- exprs(ES)
>> >
>> > then x is a matrix and not a data.frame -- but then your output would
>> > look
>> > slightly
>> > different. If you have done something like
>> >
>> > x <- data.frame(exprs(ES))
>> >
>> > I can reproduce your output, including rows that are all NA -- for
>> > rownames that
>> > do not exist.
>> >
>> > So: how did you create 'exprs' and are you sure your rownames are ok?
>> >
>> > Cheers,
>> >
>> >  - axel
>> >
>> >
>> > BTW: try
>> >
>> > install.packages("fortunes")
>> > library("fortunes")
>> > fortune("dog")
>> >
>> > to see why 'exprs' may not be a good name for your object... :-)
>> >
>> >
>> >
>> > Axel Klenk
>> > Research Informatician
>> > Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil /
>> > Switzerland
>> >
>> >
>> >
>> >
>> > From:
>> > Bas Jansen <bjhjansen at gmail.com>
>> > To:
>> > Sebastian Thieme <thieme at mi.fu-berlin.de>
>> > Cc:
>> > bioconductor at r-project.org
>> > Date:
>> > 03.01.2012 13:48
>> > Subject:
>> > Re: [BioC] Problems selecting rows from dataframe (exprs) of GNF Atlas
>> > data....
>> > Sent by:
>> > bioconductor-bounces at r-project.org
>> >
>> >
>> >
>> > Dear Sebastian:
>> >
>> > Thanks for your swift reply. It works, but only for the probe ID that
>> > start with a character (only ~15 out of the > 100 probe IDs I want to
>> > investigate). Those that start with a number report back with "<0
>> > rows> (or 0-length row.names)". The motto for the New Year seems to be
>> > 'Solve a problem, only to find new ones'. Phew.
>> >
>> > Kind regards,
>> > Bas
>> >
>> > On Tue, Jan 3, 2012 at 11:19 AM, Sebastian Thieme
>> > <thieme at mi.fu-berlin.de> wrote:
>> >> Hello,
>> >>
>> >> happy new year too =)
>> >>
>> >> you can use exprs[ rownames(exprs) %in% "gnf1h00499_at",] or exprs[
>> >> rownames(exprs) %in% vectorOfNames,], where vectorOfNames is a list or
>> >> a vector of the names you are looking for. Important is that the
>> >> object you are search in has to be the first argument. If you want
>> >> requesting a high number of names use lists instead of dataframes.
>> >>
>> >> best
>> >>
>> >> Basti
>> >>
>> >> 2012/1/3 Bas Jansen <bjhjansen at gmail.com>:
>> >>> Dear fellow Bioconductor users:
>> >>>
>> >>> Happy New Year!
>> >>> At the moment I am analyzing the GNF Atlas data. I retrieved the data
>> >>> from the Gene Expression Omnibus using the package GEOquery, converted
>> >>> it to an expressionSet and extracted the expression values. So now I
>> >>> have a data frame from which I would like to extract the expression
>> >>> values of > 100 probe IDs for 79 tissues. Thing is, if I use a single
>> >>> probe ID, things go fine. However, whenever I use a string of probe
>> >>> IDs, things go awry.
>> >>>
>> >>> See below:
>> >>>
>> >>> ***
>> >>>> exprs[c("gnf1h00499_at"),]
>> >>>              GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781
>> > GSM18774
>> >>> gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074
>> > 4.472488
>> >>> (abbreviated for reasons of clarity)
>> >>> ***
>> >>>
>> >>> As stated above: whenever I use a string of probe IDs (say, like 2
>> >>> probe IDs), things go awry:
>> >>>
>> >>> ***
>> >>>> exprs[c("gnf1h00499_at","gnf1h500_at"),]
>> >>>              GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781
>> > GSM18774
>> >>> gnf1h00499_at 5.770829 7.708739 5.161888 7.459432 6.332708 6.902074
>> > 4.472488
>> >>> NA                  NA       NA       NA       NA       NA       NA
>> >  NA
>> >>> etc.
>> >>> ***
>> >>>
>> >>> The gnf1h00500 probe is reported as NA, and I'm pretty sure it has
>> >>> real expression values associated with it.
>
>
> Yes, the gnf1h00500_at probeset and rowname will work fine.  However, your
> code used "gnf1h500_at" and NOT "gnf1h00500_at".  The latter works fine for
> me.
>
> Sean
>
>>
>> >>> The following just works fine:
>> >>>
>> >>> ***
>> >>>> exprs[c(1:20,30:70),]
>> >>>            GSM18768 GSM18769 GSM18756 GSM18757 GSM18780 GSM18781
>> > GSM18774
>> >>> 200000_s_at        0        0        0        0        0        0
>> >  0
>> >>> 200001_at          0        0        0        0        0        0
>> >  0
>> >>> 200002_at          0        0        0        0        0        0
>> >  0
>> >>> 200003_s_at        0        0        0        0        0        0
>> >  0
>> >>> etc.
>> >>> ***
>> >>>
>> >>> So, how do I select rows on the basis of probe IDs? Or better yet:
>> >>> what am I overlooking????
>> >>>
>> >>> Thanks & kind regards,
>> >>> Bas
>> >>>
>> >>> _______________________________________________
>> >>> Bioconductor mailing list
>> >>> Bioconductor at r-project.org
>> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>> Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>> >
>> >
>> > The information of this email and in any file transmitted with it is
>> > strictly confidential and may be legally privileged.
>> > It is intended solely for the addressee. If you are not the intended
>> > recipient, any copying, distribution or any other use of this email is
>> > prohibited and may be unlawful. In such case, you should please notify the
>> > sender immediately and destroy this email.
>> > The content of this email is not legally binding unless confirmed by
>> > letter.
>> > Any views expressed in this message are those of the individual sender,
>> > except where the message states otherwise and the sender is authorised to
>> > state them to be the views of the sender's company. For further information
>> > about Actelion please see our website at http://www.actelion.com
>> >
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list