[BioC] RE : RE : maping SNPs

Thu Jan 6 18:47:53 CET 2011

   Thank you.  Sorry for the delay.  As I say, on my computer, thing run really
   slowy...  I have no error but I have some warning and no result...

   library("IRanges")
   library("GenomicRanges")
   library("GenomicFeatures")
   #Ã€ changer si une version plus rÃ©cente de la librairie est tÃ©lÃ©chargÃ©e.
   library(SNPlocs.Hsapiens.dbSNP.20101109)

   > getSNPcount()
       ch1     ch2     ch3     ch4     ch5     ch6     ch7     ch8     ch9    c
   h10
   1849438 1936836 1613418 1613633 1453710 1446827 1335745 1243129  995075
   1158707
      ch11    ch12    ch13    ch14    ch15    ch16    ch17    ch18    ch19    c
   h20
   1147722 1105364  815729  740129  657719  757926  641905  645646  520666
   586708
      ch21    ch22     chX     chY     chMT
    338254  331060  529608   67438     624
   > ch22snps <- getSNPlocs("ch22")
   > ch22snps[1:5, ]
     RefSNP_id alleles_as_ambig      loc
   1  56342815                K 16050353
   2   7288968                S 16050994
   3   6518357                M 16051107
   4   7292503                R 16051209
   5   6518368                Y 16051241
   >
   >
   > makeGRangesFromRefSNPids <- function(myids)
   + {
   +      ans_seqnames <- character(length(myids))
   +      ans_seqnames[] <- "unknown"
   +      ans_locs <- integer(length(myids))
   +      for (seqname in names(getSNPcount())) {
   +          locs <- getSNPlocs(seqname)
   +          ids <- paste("rs", locs$RefSNP_id, sep="")
   +          myrows <- match(myids, ids)
   +          ans_seqnames[!is.na(myrows)] <- seqname
   +          ans_locs[!is.na(myrows)] <- locs$loc[myrows]
   +      }
   +      GRanges(seqnames=ans_seqnames,
   +              IRanges(start=ans_locs, width=1),
   +              RefSNP_id=myids)
   + }
   >
   >
   > myids   <-  c("rs7547453",  "rs2840542",  "rs1999527",  "rs4648545",
   "rs10915459", "rs16838750", "rs12128230", "rs999999999")
   > mysnps <- makeGRangesFromRefSNPids(myids)
   Warning message:
   In ans_locs[!is.na(myrows)] <- locs$loc[myrows] :
     number of items to replace is not a multiple of replacement length
   > mysnps  # a GRanges object with 1 SNP per row
    GRangeswith 8 ranges and 1 elementMetadata value
       seqnames             ranges strand |   RefSNP_id
          <Rle>          <IRanges>  <Rle> | <character>
   [1]      ch1 [2195117, 2195117]      * |   rs7547453
   [2]      ch1 [2291680, 2291680]      * |   rs2840542
   [3]      ch1 [3256108, 3256108]      * |   rs1999527
   [4]      ch1 [3577321, 3577321]      * |   rs4648545
   [5]      ch1 [4230463, 4230463]      * |  rs10915459
   [6]      ch1 [4404344, 4404344]      * |  rs16838750
   [7]      ch1 [4501911, 4501911]      * |  rs12128230
   [8]  unknown [      0,       0]      * | rs999999999
   seqlengths
        ch1 unknown
         NA      NA
   > txdb <- makeTranscriptDbFromUCSC(genome="hg19", tablename="refGene")
    Downloadthe refGene table ... OK
    Downloadthe refLink table ... OK
    Extractthe 'transcripts' data frame ... OK
    Extractthe 'splicings' data frame ... OK
    Downloadand preprocess the 'chrominfo' data frame ... OK
    Preparethe 'metadata' data frame ... OK
    Makethe TranscriptDb object ... OK
   Il y a eu 50 avis ou plus (utilisez warnings() pour voir les 50 premiers)
   > txdb
    TranscriptDbobject:
   | Db type: TranscriptDb
   | Data source: UCSC
   | Genome: hg19
   | UCSC Table: refGene
   | Type of Gene ID: Entrez Gene ID
   | Full dataset: yes
   | transcript_nrow: 38098
   | exon_nrow: 230201
   | cds_nrow: 204683
   | Db created by: GenomicFeatures package from Bioconductor
   | Creation time: 2011-01-06 11:55:44 -0500 (Thu, 06 Jan 2011)
   | GenomicFeatures version at creation time: 1.2.3
   | RSQLite version at creation time: 0.9-4
   | DBSCHEMAVERSION: 1.0
   >
   > #voir pour exonsBy() Ã  la place de tx
   > txExon = exonsBy(txdb, by= "tx")
   > txExon
    GRangesListof length 38098
   $1
   GRanges with 25 ranges and 3 elementMetadata values
        seqnames               ranges strand
   |   exon_id   exon_name exon_rank
            <Rle>             <IRanges>   <Rle>   | <integer> <character>
   <integer>
    [1]     chr1  [66999825,  67000051]       +   |         1          NA
   1
    [2]     chr1  [67091530,  67091593]       +   |         2          NA
   2
    [3]     chr1  [67098753,  67098777]       +   |         3          NA
   3
    [4]     chr1  [67101627,  67101698]       +   |         4          NA
   4
    [5]     chr1  [67105460,  67105516]       +   |         5          NA
   5
    [6]     chr1  [67108493,  67108547]       +   |         6          NA
   6
    [7]     chr1  [67109227,  67109402]       +   |         7          NA
   7
    [8]     chr1  [67126196,  67126207]       +   |         8          NA
   8
    [9]     chr1  [67133213,  67133224]       +   |         9          NA
   9
    ...       ...                   ...     ... ...       ...         ...
   ...
   [17]     chr1  [67155873,  67155999]       +   |        17          NA
   17
   [18]     chr1  [67161117,  67161176]       +   |        18          NA
   18
   [19]     chr1  [67184977,  67185088]       +   |        19          NA
   19
   [20]     chr1  [67194947,  67195102]       +   |        20          NA
   20
   [21]     chr1  [67199431,  67199563]       +   |        21          NA
   21
   [22]     chr1  [67205018,  67205220]       +   |        22          NA
   22
   [23]     chr1  [67206341,  67206405]       +   |        23          NA
   23
   [24]     chr1  [67206955,  67207119]       +   |        24          NA
   24
   [25]     chr1  [67208756,  67210768]       +   |        25          NA
   25
   ...
   <38097 more elements>
   seqlengths
                     chr1                  chr2 ... chr18_gl000207_random
                249250621             243199373 ...                  4262
   >
   > tx <- transcripts(txdb, columns=c("tx_id", "tx_name", "gene_id"))
   > tx  # a GRanges object with 1 transcript per row
    GRangeswith 38098 ranges and 3 elementMetadata values
           seqnames               ranges strand   |     tx_id      tx_name
              <Rle>            <IRanges>  <Rle>   | <integer>  <character>
       [1]     chr1     [ 69091,  70008]      +   |      1021 NM_001005484
       [2]     chr1     [323892, 328581]      +   |      1023    NR_028327
       [3]     chr1     [323892, 328581]      +   |      1024    NR_028322
       [4]     chr1     [323892, 328581]      +   |      1025    NR_028325
       [5]     chr1     [367659, 368597]      +   |      1022 NM_001005277
       [6]     chr1     [367659, 368597]      +   |      1026 NM_001005221
       [7]     chr1     [367659, 368597]      +   |      1027 NM_001005224
       [8]     chr1     [763064, 789740]      +   |       174    NR_015368
       [9]     chr1     [861121, 879961]      +   |      1035    NM_152486
       ...      ...                  ...    ... ...       ...          ...
   [38090]     chrY [27177050, 27198251]      -   |     18991    NM_004678
   [38091]     chrY [27177050, 27198251]      -   |     18992 NM_001002760
   [38092]     chrY [27177050, 27198251]      -   |     18993 NM_001002761
   [38093]     chrY [27209230, 27246039]      -   |     18994    NR_001525
   [38094]     chrY [27209230, 27246039]      -   |     18995    NR_002177
   [38095]     chrY [27209230, 27246039]      -   |     18996    NR_002178
   [38096]     chrY [27329790, 27330920]      -   |     18997    NR_001526
   [38097]     chrY [27329790, 27330920]      -   |     18998    NR_002179
   [38098]     chrY [27329790, 27330920]      -   |     18999    NR_002180
                             gene_id
           <CompressedCharacterList>
       [1]                     79501
       [2]                 100133331
       [3]                 100132287
       [4]                 100132062
       [5]                     81399
       [6]                    729759
       [7]                     26683
       [8]                    643837
       [9]                    148398
       ...                       ...
   [38090]                      9083
   [38091]                    442867
   [38092]                    442868
   [38093]                    114761
   [38094]                    474150
   [38095]                    474149
   [38096]                    252949
   [38097]                    474152
   [38098]                    474151
   seqlengths
                     chr1                  chr2 ... chr18_gl000207_random
                249250621             243199373 ...                  4262
   >
   > seqnames(mysnps) <- sub("ch", "chr", seqnames(mysnps))
   > map <- as.matrix(findOverlaps(mysnps, tx))
   Message d'avis :
   In .local(query, subject, maxgap, minoverlap, type, select, ...) :
     No seqnames from the 'query' and 'subject' were identical
   > mapExon <- as.matrix(findOverlaps(mysnps, txExon))
   Message d'avis :
   In .local(query, subject, maxgap, minoverlap, type, select, ...) :
     No seqnames from the 'query' and 'subject' were identical
   >
   > mapped_genes <- values(tx)$gene_id[map[, 2]]
   > mapped_snps     <-    rep.int(values(mysnps)$RefSNP_id[map[,    1]],
   elementLengths(mapped_genes))
   > snp2gene           <-          unique(data.frame(snp_id=mapped_snps,
   gene_id=unlist(mapped_genes)))
   > rownames(snp2gene) <- NULL
   > snp2gene[1:4, ]
        snp_id gene_id
   NA     <NA>    <NA>
   NA.1   <NA>    <NA>
   NA.2   <NA>    <NA>
   NA.3   <NA>    <NA>

   On our super computer, I can't update manualy the package.  They are suposed
   to     update automatically     once     per month according    to our
   informactician.  The result of sessionInfo() is :

   > sessionInfo()
   R version 2.12.0 (2010-10-15)
   Platform: x86_64-pc-linux-gnu (64-bit)
   locale:
    [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
    [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
    [5] LC_MONETARY=C              LC_MESSAGES=en_CA.UTF-8
    [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
    [9] LC_ADDRESS=C               LC_TELEPHONE=C
   [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
    attachedbase packages:
   [1] stats     graphics  grDevices utils     datasets  methods   base
    otherattached packages:
   [1] GenomicFeatures_1.2.3
   [2] SNPlocs.Hsapiens.dbSNP.20101109_0.99.2
   [3] GenomicRanges_1.2.2
   [4] IRanges_1.8.7
    loadedvia a namespace (and not attached):
    [1] Biobase_2.10.0     biomaRt_2.6.0      Biostrings_2.18.2  BSgenome_1.18.
   2
    [5] DBI_0.2-5          RCurl_1.5-0
   RSQLite_0.9-4      rtracklayer_1.10.6
    [9] tools_2.12.0       XML_3.2-0

   On my computer , something stragge happen.  The package get unstalled when
   I run
   source([1]"http://bioconductor.org/biocLite.R")
   update.packages(repos=biocinstallRepos(), ask=FALSE, checkBuilt=TRUE)

   And when I try to reinstall SNPlocs.Hsapiens.dbSNP.20101109, I now get
   an error message...

   > source("[2]http://www.bioconductor.org/biocLite.R")
    BioC_mirror= [3]http://www.bioconductor.org
   Change using chooseBioCmirror().
   > biocLite("SNPlocs.Hsapiens.dbSNP.20101109")
   Using R version 2.12.0, biocinstall version 2.7.4.
    InstallingBioconductor version 2.7 packages:
   [1] "SNPlocs.Hsapiens.dbSNP.20101109"
    Pleasewait...
   Message d'avis :
   In getDependencies(pkgs, dependencies, available, lib) :
     package â€˜SNPlocs.Hsapiens.dbSNP.20101109â€™ is not available

   But for sessionInfo, it's :

   > sessionInfo()
   R version 2.12.0 (2010-10-15)
   Platform: i386-pc-mingw32/i386 (32-bit)

   locale:
   [1] LC_COLLATE=French_Canada.1252  LC_CTYPE=French_Canada.1252
   [3] LC_MONETARY=French_Canada.1252 LC_NUMERIC=C
   [5] LC_TIME=French_Canada.1252

    attachedbase packages:
   [1] stats     graphics  grDevices utils     datasets  methods   base

    otherattached packages:
   [1] GenomicFeatures_1.2.3 GenomicRanges_1.2.2   IRanges_1.8.8

    loadedvia a namespace (and not attached):
   [1] Biobase_2.10.0     biomaRt_2.6.0      Biostrings_2.18.2  BSgenome_1.18.2

   [5]      DBI_0.2-5               RCurl_1.5-0.1           RSQLite_0.9-4
   rtracklayer_1.10.6
   [9] XML_3.2-0.2

   Now I  have  a  super  computer that can't run  the program because of
   a subscript  out of bounds and a personal computer than have a missing
   library...      So now nothing     work:(  Other problem    : why when
   I install again IRanges I got a version older than your for exemple?

   Simon NoÃ«l
   CdeC

References

   1. http://bioconductor.org/biocLite.R
   2. http://www.bioconductor.org/biocLite.R
   3. http://www.bioconductor.org/