[BioC] Combining data from different versions of Illumina HumanHT-12 v3

Wei Shi shi at wehi.EDU.AU
Fri Apr 15 13:24:05 CEST 2011


Dear Gavin:

	Thanks for the further information. The probe "ILMN_2038777" is not only a gene probe but also a positive control probe (control type: housekeeping). You can find more information about this probe in the HT12 manifest file. But I do not know why it was absent in your TB2 dataset. Anyway, it will be quite safe to remove the housekeeping "ILMN_2038777" from your TB1 dataset. Then you can combine these two datasets together. Below is the code to do this:

x1 <- read.ilmn("your_TB1_probe_profile","your_TB1_control_probe profile")
x2 <- read.ilmn("your_TB2_probe_profile","your_TB2_control_probe profile")
x1 <- x1[!(x1$genes$Probe_Id == "ILMN_2038777" & tolower(x1$genes$Status) == "housekeeping"),]
m <- match(x1$genes$Probe_Id, x2$genes$Probe_Id)
x.merged <- cbind(x1,x2[m,])

This will combine TB1 with TB2. For the other four datasets, you can merge them to x.merged using the same procedure (removing housekeeping "ILMN_2038777" from the dataset first if it has, then using match and cbind commands to merge them).

Hope this will work for you. But let you know it doesn't.

Cheers,
Wei


On Apr 15, 2011, at 9:16 PM, Gavin Koh wrote:

> Dear Wei,
> 
> Thank you for replying so quickly. There appear to be 6 batches in
> this dataset (TB1 to 6)
> 
>> TB1$genes[1:10]
> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337"
> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316"
> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689"
>> TB2$genes[1:10]
> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229"
> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282"
> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698"
>> TB3$genes[1:10]
> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337"
> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316"
> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689"
>> TB4$genes[1:10]
> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229"
> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282"
> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698"
>> TB5$genes[1:10]
> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337"
> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316"
> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689"
>> TB6$genes[1:10]
> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229"
> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282"
> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698"
> 
> 多謝謝您的幫助!
> 
> Gavin
> 
> On 15 April 2011 11:45, Wei Shi <shi at wehi.edu.au> wrote:
>> Hi Gavin:
>> 
>>        It would be best if you can match the two batches using the probe identifiers because they are much less likely to have duplicates. Would it possible to show the first several probes in each dataset so that I can write some code to help you do this?
>> 
>> Cheers,
>> Wei
>> 
>> 
>> On Apr 15, 2011, at 7:54 PM, Gavin Koh wrote:
>> 
>>> Dear Wei,
>>> 
>>> A little more information: the difference seems to be a single duplicated probe.
>>> Just comparing two batches (TB1 and TB2) with different probe numbers:
>>>> length(TB1$genes)
>>> [1] 48804
>>>> length(TB2$genes)
>>> [1] 48803
>>>> length(unique(TB2$genes))
>>> [1] 48803
>>>> length(unique(TB1$genes))
>>> [1] 48803
>>>> setdiff(TB1$genes,TB2$genes)
>>> character(0)
>>>> setequal(TB1$genes,TB2$genes)
>>> [1] TRUE
>>> 
>>> That still leaves me the problem that I don't know how to identify the
>>> repeated probe or how to cbind TB1 and TB2... :-(
>>> 
>>> Gavin
>>> 
>>> On 15 April 2011 02:38, Wei Shi <shi at wehi.edu.au> wrote:
>>>> Hi Gavin:
>>>> 
>>>>        The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis.
>>>> 
>>>>        Hope this helps.
>>>> 
>>>> Cheers,
>>>> Wei
>>>> 
>>>> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote:
>>>> 
>>>>> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published
>>>>> Dec last year).
>>>>> According to the study methods, the data are Illumina HumanHT-12 v3
>>>>> Expression BeadChips, but the hybridisation seems to have been done in
>>>>> several batches, with different numbers of probes in each batch,
>>>>> alternating between 48803 and 48804. Can anyone tell me how to combine
>>>>> these different batches into the same file, please? I am trying to
>>>>> read the probe data using the read.ilmn() function in limma, but
>>>>> failing, because cbind complains the matrices are not the same length
>>>>> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of
>>>>> rows of matrices must match (see arg 2)").
>>>>> 
>>>>> Thank you in advance,
>>>>> 
>>>>> Gavin Koh
>>>>> 
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> 
>>>> 
>>>> ______________________________________________________________________
>>>> The information in this email is confidential and intended solely for the addressee.
>>>> You must not disclose, forward, print or use it without the permission of the sender.
>>>> ______________________________________________________________________
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Hofstadter's Law: It always takes longer than you expect, even when
>>> you take into account Hofstadter's Law.
>>> —Douglas Hofstadter (in Gödel, Escher, Bach, 1979)
>> 
>> 
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the addressee.
>> You must not disclose, forward, print or use it without the permission of the sender.
>> ______________________________________________________________________
>> 
> 
> 
> 
> -- 
> Hofstadter's Law: It always takes longer than you expect, even when
> you take into account Hofstadter's Law.
> —Douglas Hofstadter (in Gödel, Escher, Bach, 1979)


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list