[BioC] merging two sets of genes

Kasper Daniel Hansen khansen at stat.Berkeley.EDU
Sat Dec 31 15:59:30 CET 2005


On Dec 29, 2005, at 12:56 PM, kfbargad at ehu.es wrote:

> Hi, sorry, I think I found the answer to my previous email.
>
> setdiff() will do the trick, right?

Yes, although it will depend on how your list are structured since  
setdiff works on vectors.

/Kasper


> I also found the help for %in%, it was under ?'%in%'
>
> best,
>
> David
>
>> Hi,
>>   thanks for the clarification. Then it depends on whether you want
> to
>> use the union or the intersection of the probes you selected in the
> two
>> different ways.
>> union and intersect, applied to geneNames(PinS) and geneNames of
> PinC
>> should get you somewhere close, you might also want to consider
> match
>> and %in%, depending on just how you want to select.
>> After that, you will need to create a matrix with the combined
>> expressions and use that as input in a call to
>>   new, the vignettes for Biobase should demonstrate how to make an
>> exprSet from a matrix, but please ask if anything is not clear
>>
>> best wishes
>>    Robert
>>
>> kfbargad at ehu.es wrote:
>>> Dear Seth and Robert,
>>>
>>> I apologise, but I didn´t make myself clear.
>>>
>>> PinS and PinC come from the same experiment, i.e. the same eset.
> It is
>>> just that I followed two different approaches to the analysis and
> now
>>> I want to continue working with the union of these two lists. So I
> am
>>> not intending to match across different arrays.
>>>
>>> Hope this explains my question
>>>
>>> David
>>>
>>>
>>>> Hi,
>>>>  I think that the problem is that the arrays are not the same -
> and
>>>> then life is much harder. There are some papers on it (G.
> Parmigiani
>>>
>>> et
>>>
>>>> al have produced MergeMaid, as one option). I have done some work
> on
>>>> this problem, with Wolfgang Huber and Markus Rauschaupt (you can
>>>
>>> find
>>>
>>>> the technical report under the Bioconductor publications link - I
>>>
>>> hope).
>>>
>>>>  It is not so simple to match across different arrays, where
>>>
>>> different
>>>
>>>> probes were used (you can take the expedient of mapping to some
>>>
>>> common
>>>
>>>> set of IDs and matching on those, some code in packages GeneMeta
> and
>>>> GeneMetaEx, if I recall correctly), but just because they map to
> the
>>>> same Entrez gene id (for example) does not mean that the same
> thing
>>>
>>> was
>>>
>>>> measured - whence MergeMaid and similar tools.
>>>>
>>>>  And if this is correct, then combining them is contra-indicated
>>>
>>> and
>>>
>>>> some of the tools for synthesizing experiments, such as meta-
>>>
>>> analysis or
>>>
>>>> the more general random effects models will be needed. Just
> because
>>>
>>> you
>>>
>>>> can jam, either the raw data or the processed data together, does
>>>
>>> not
>>>
>>>> mean that it is sensible to do so.
>>>>
>>>> And finally, even if the arrays are identical, unless they were
> all
>>>> essentially done at the same time under very similar conditions I
>>>
>>> would
>>>
>>>> still take the approach in the paragraph above and use a random
>>>
>>> effects
>>>
>>>> model.
>>>>
>>>>  best wishes
>>>>    Robert
>>>>
>>>>
>>>> Seth Falcon wrote:
>>>>
>>>>> On 26 Dec 2005, kfbargad at ehu.es wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Dear list,
>>>>>>
>>>>>> I have two sets of genes from the same experiment,
>>>>>>
>>>>>>
>>>>>>
>>>>>>> PinC
>>>>>>
>>>>>> Expression Set (exprSet) with
>>>>>> 1310 genes
>>>>>> 8 samples
>>>>>> phenoData object with 2 variables and 8 cases
>>>>>> varLabels
>>>>>> FileName: read from file
>>>>>> Target: read from file
>>>>>>
>>>>>>
>>>>>>> PinS
>>>>>>
>>>>>> Expression Set (exprSet) with
>>>>>> 2891 genes
>>>>>> 8 samples
>>>>>> phenoData object with 2 variables and 8 cases
>>>>>> varLabels
>>>>>> FileName: read from file
>>>>>> Target: read from file
>>>>>>
>>>>>>
>>>>>> How can I merge these two sets? I tried union() on two vectors
>>>>>> created from the probe IDs but failed. Any hints?
>>>>>
>>>>>
>>>>> One approach would be to create a new exprSet object manually
> using
>>>>> the data from PinC and PinS.  Basically, create a new phenoData
>>>
>>> object
>>>
>>>>> with the data for all 16 cases, and a new epxression matrix with
> 16
>>>>> columns (assuming the two original exprSets represent disjoint
>>>
>>> sets of
>>>
>>>>> samples).
>>>>>
>>>>> Thinking out loud, is this a common enough operation to warrant a
>>>>> method for exprSets?  I could imagine c() being defined on
> exprSets
>>>>> such that if the phenoData columns are the same and the "sample
>>>
>>> ids"
>>>
>>>>> as given by the rownames of phenoData/colnames of exprs are
>>>
>>> disjoint,
>>>
>>>>> then do the obvious thing, else error.
>>>>>
>>>>> + seth
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>
>>>>
>>>> -- 
>>>> Robert Gentleman, PhD
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M2-B876
>>>> PO Box 19024
>>>> Seattle, Washington 98109-1024
>>>> 206-667-7700
>>>> rgentlem at fhcrc.org
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>
>>>
>>>
>>>
>>>
>>
>> -- 
>> Robert Gentleman, PhD
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> PO Box 19024
>> Seattle, Washington 98109-1024
>> 206-667-7700
>> rgentlem at fhcrc.org
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list