[R] Help with DNA Methylation Analysis

Spencer Brackett @pbr@ckett20 @end|ng |rom @@|ntjo@ephh@@com
Mon Aug 27 05:49:23 CEST 2018


Hello all,

  To begin my analysis, I downloaded two TCGA datasets (GBM and LGG), both
csv files, onto on r script after loading the cBioLite package. Following
this, I inputted the following argument...

> the_data<-read.csv(file=“c:/file_name.csv,header=TRUE,sep=“,”)

Upon running the line I received this...

+

If continue to press enter, the + sign continues to appear on every
subsequent/new line.

Does anyone know what this is indicative of and how I may continue on with
my analysis

My next step after this would have been the following (the numbers before
each command being line markers; not part of line)..

1 library(TCGAbiolinks)
2
3 # Download the DNA methylation data: HumanMethylation450 LGG and GBM.
4 path <– "."

Best wishes,

Spencer Brackett

On Sun, Aug 26, 2018 at 9:13 PM Caitlin <bioprogrammer using gmail.com> wrote:

> You're welcome Spencer :)
>
> I hope I was able to help you. If this problem persists, or a new one
> appears, feel free to post or email. You might also like:
>
> https://www.biostars.org/
>
> It is quite similar to StackOverflow but with a biological sciences focus.
>
> Hope this helps!
>
> ~Caitlin
>
>
>
> On Sun, Aug 26, 2018 at 6:02 PM Spencer Brackett <
> spbrackett20 using saintjosephhs.com> wrote:
>
>> Caitlin,
>>
>>  Thanks again! I already have the two files stored in those two CSV files
>> via my desktop, but if tuning those with this function do not work, then I
>> will try it with a flash drive.
>>
>> Best,
>>
>> Spencer Brackett
>>
>> On Sun, Aug 26, 2018 at 8:56 PM Caitlin <bioprogrammer using gmail.com> wrote:
>>
>>> Hmm...could you store each in its own file (a flash drive would be fine)
>>> then use:
>>>
>>> the_data <- read.csv(file="c:/file_name.csv", header=TRUE, sep=",")
>>>
>>> to read each into your script? The data would then exist as a dataframe object that you could then work with.
>>>
>>>
>>> On Sun, Aug 26, 2018 at 5:50 PM Spencer Brackett <
>>> spbrackett20 using saintjosephhs.com> wrote:
>>>
>>>> Caitlin,
>>>>
>>>>  Perhaps that is the problem. To be more specific, the data was
>>>> transferred from the TCGA database to a CSV file... there are technically
>>>> two separate files (CSV) for this analysis.... one for GBM and one for LGG.
>>>> Both CVS files were then individually downloaded onto my open R console.
>>>> Upon arranging them with the summary () function, the data expanded and
>>>> took up the whole console page... even seemingly abrogating the arguments
>>>> which allowed for the data to be downloaded onto R in the first place. Are
>>>> you suggesting that I would need to utilize a flash drive to successfully
>>>> utilize the function you suggested? Or could I perhaps do so with the CSV
>>>> field I mentioned? If so, how?
>>>>
>>>> -Spencer B
>>>>
>>>> On Sun, Aug 26, 2018 at 8:42 PM Caitlin <bioprogrammer using gmail.com>
>>>> wrote:
>>>>
>>>>> No worries Spencer. There is no downloaded data? Nothing is physically
>>>>> stored on your hard drive? The dot in the path would be interpreted (no pun
>>>>> intended!) as something like the following:
>>>>>
>>>>> If the TCGA data was stored in a file named "tcga_data.dat" and it was
>>>>> in a directory named "C:\spencer", the 4th line of that script would set
>>>>> the path to "C:\spencer\tcga_data.dat" if you ran the script from that same
>>>>> folder. If your tcga data is not stored in the same file from which the
>>>>> script is being ran, it won't find any data to work with. Does this help?
>>>>>
>>>>>
>>>>> On Sun, Aug 26, 2018 at 5:34 PM Spencer Brackett <
>>>>> spbrackett20 using saintjosephhs.com> wrote:
>>>>>
>>>>>> Caitlin,
>>>>>>
>>>>>>   Forgive me, but I’m not quite sure exactly what your question is
>>>>>> asking. The data is originally from the TCGA and I have it downloaded onto
>>>>>> another R script. I opened a new script to perform the functions I posted
>>>>>> to this forum because I was unable to input any other commands into the
>>>>>> console.... due to the fact that the translated data filled the entirety of
>>>>>> said consule. Perhaps overloaded it? Regardless, I was unable to input any
>>>>>> further commands.
>>>>>>
>>>>>> -Spencer Brackett
>>>>>>
>>>>>>
>>>>>> On Sun, Aug 26, 2018 at 8:27 PM Caitlin <bioprogrammer using gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> You're welcome Spencer :)
>>>>>>>
>>>>>>> The 4th line:
>>>>>>>
>>>>>>> path <– "."
>>>>>>>
>>>>>>> refers to the current directory (the dot in other words). Is the
>>>>>>> data stored in the same directory where the code is being run?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Aug 26, 2018 at 5:22 PM Spencer Brackett <
>>>>>>> spbrackett20 using saintjosephhs.com> wrote:
>>>>>>>
>>>>>>>>  Thank you! I will make note of that. Unfortunately, lines 1 and 4
>>>>>>>> of the first portion of this analysis appear to be where the error
>>>>>>>> begins... to which several subsequent lines also come up as ‘errored’.
>>>>>>>> Perhaps this is an issue of the capitalization and/or spacing (something
>>>>>>>> within the text)? The proposed method for methylation data extraction is
>>>>>>>> based on the first third of the following TCGA workflow:
>>>>>>>> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5302158/#!po=0.0715308
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Spencer Brackett
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 26, 2018 at 8:07 PM Caitlin <bioprogrammer using gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Spencer.
>>>>>>>>>
>>>>>>>>> Should you capitalize the following library import?
>>>>>>>>>
>>>>>>>>> library(summarizedExperiment)
>>>>>>>>>
>>>>>>>>> In other words, I think that line should be:
>>>>>>>>>
>>>>>>>>> library(SummarizedExperiment)
>>>>>>>>>
>>>>>>>>> Hope this helps.
>>>>>>>>>
>>>>>>>>> ~Caitlin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 26, 2018 at 2:09 PM Spencer Brackett <
>>>>>>>>> spbrackett20 using saintjosephhs.com> wrote:
>>>>>>>>>
>>>>>>>>>> Good evening,
>>>>>>>>>>
>>>>>>>>>>   I am attempting to run the following analysis on TCGA data,
>>>>>>>>>> however
>>>>>>>>>> something is being reported as an error in my arguments... any
>>>>>>>>>> ideas as to
>>>>>>>>>> what is incorrect in the following? Thanks!
>>>>>>>>>>
>>>>>>>>>> 1 library(TCGAbiolinks)
>>>>>>>>>> 2
>>>>>>>>>> 3 # Download the DNA methylation data: HumanMethylation450 LGG
>>>>>>>>>> and GBM.
>>>>>>>>>> 4 path <– "."
>>>>>>>>>> 5
>>>>>>>>>> 6 query.met <– TCGAquery(tumor =
>>>>>>>>>> c("LGG","GBM"),"HumanMethylation450",
>>>>>>>>>> level = 3)
>>>>>>>>>> 7 TCGAdownload(query.met, path = path )
>>>>>>>>>> 8 met <– TCGAprepare(query = query.met,dir = path,
>>>>>>>>>> 9                      add.subtype = TRUE, add.clinical = TRUE,
>>>>>>>>>> 10                    summarizedExperiment = TRUE,
>>>>>>>>>> 11                      save = TRUE, filename = "lgg_gbm_met.rda")
>>>>>>>>>> 12
>>>>>>>>>> 13 # Download the expression data: IlluminaHiSeq_RNASeqV2 LGG and
>>>>>>>>>> GBM.
>>>>>>>>>> 14 query.exp <– TCGAquery(tumor = c("lgg","gbm"), platform =
>>>>>>>>>> "IlluminaHiSeq_
>>>>>>>>>> RNASeqV2",level = 3)
>>>>>>>>>> 15
>>>>>>>>>> 16 TCGAdownload(query.exp,path = path, type =
>>>>>>>>>> "rsem.genes.normalized_
>>>>>>>>>> results")
>>>>>>>>>> 17
>>>>>>>>>> 18 exp <– TCGAprepare(query = query.exp, dir = path,
>>>>>>>>>> 19                    summarizedExperiment = TRUE,
>>>>>>>>>> 20                      add.subtype = TRUE, add.clinical = TRUE,
>>>>>>>>>> 21                    type = "rsem.genes.normalized_results",
>>>>>>>>>> 22                      save = T,filename = "lgg_gbm_exp.rda")
>>>>>>>>>>
>>>>>>>>>> To download data on DNA methylation and gene expression…
>>>>>>>>>>
>>>>>>>>>> 1 library(summarizedExperiment)
>>>>>>>>>> 2 # get expression matrix
>>>>>>>>>> 3 data <– assay(exp)
>>>>>>>>>> 4
>>>>>>>>>> 5 # get sample information
>>>>>>>>>> 6 sample.info <– colData(exp)
>>>>>>>>>> 7
>>>>>>>>>> 8 # get genes information
>>>>>>>>>> 9 genes.info <– rowRanges(exp)
>>>>>>>>>>
>>>>>>>>>> Following stepwise procedure for obtaining GBM and LGG clinical
>>>>>>>>>> data…
>>>>>>>>>>
>>>>>>>>>> 1 # get clinical patient data for GBM samples
>>>>>>>>>> 2 gbm_clin <– TCGAquery_clinic("gbm","clinical_patient")
>>>>>>>>>> 3
>>>>>>>>>> 4 # get clinical patient data for LGG samples
>>>>>>>>>> 5 lgg_clin <– TCGAquery_clinic("lgg","clinical_patient")
>>>>>>>>>> 6
>>>>>>>>>> 7 # Bind the results, as the columns might not be the same,
>>>>>>>>>> 8 # we will plyr rbind.fill , to have all columns from both files
>>>>>>>>>> 9 clinical <– plyr::rbind.fill(gbm_clin ,lgg_clin)
>>>>>>>>>> 10
>>>>>>>>>> 11 # Other clinical files can be downloaded,
>>>>>>>>>> 12 # Use ?TCGAquery_clinic for more information
>>>>>>>>>> 13 clin_radiation <– TCGAquery_clinic("lgg","clinical_radiation")
>>>>>>>>>> 14
>>>>>>>>>> 15 # Also, you can get clinical information from different tumor
>>>>>>>>>> types.
>>>>>>>>>> 16 # For example sample 1 is GBM, sample 2 and 3 are TGCT
>>>>>>>>>> 17 data <– TCGAquery_clinic(clinical_data_type =
>>>>>>>>>> "clinical_patient",
>>>>>>>>>> 18    samples = c("TCGA-06-5416-01A-01D-1481-05",
>>>>>>>>>> 19  "TCGA-2G-AAEW-01A-11D-A42Z-05",
>>>>>>>>>> 20  "TCGA-2G-AAEX-01A-11D-A42Z-05"))
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> # Searching idat file for DNA methylation
>>>>>>>>>> query <- GDCquery(project = "TCGA-GBM",
>>>>>>>>>>                  data.category = "Raw microarray data",
>>>>>>>>>>                  data.type = "Raw intensities",
>>>>>>>>>>                  experimental.strategy = "Methylation array",
>>>>>>>>>>                  legacy = TRUE,
>>>>>>>>>>                  file.type = ".idat",
>>>>>>>>>>                  platform = "Illumina Human Methylation 450")
>>>>>>>>>>
>>>>>>>>>> **Repeat for LGG**
>>>>>>>>>>
>>>>>>>>>> To access mutational information concerning TMZ methylation…
>>>>>>>>>>
>>>>>>>>>> > mutation <– TCGAquery_maf(tumor = "lgg")
>>>>>>>>>> 2   Getting maf tables
>>>>>>>>>> 3   Source: https://wiki.nci.nih.gov/display/TCGA/TCGA+MAF+Files
>>>>>>>>>> 4   We found these maf files below:
>>>>>>>>>> 5       MAF.File.Name
>>>>>>>>>> 6   2             hgsc.bcm.edu_LGG.IlluminaGA_DNASeq.1.somatic.maf
>>>>>>>>>> 7
>>>>>>>>>> 8   3
>>>>>>>>>> LGG_FINAL_ANALYSIS.aggregated.capture.tcga.uuid.curated.somatic.maf
>>>>>>>>>> 9
>>>>>>>>>> 10       Archive.Name Deploy.Date
>>>>>>>>>> 11   2 hgsc.bcm.edu_LGG.IlluminaGA_DNASeq_automated.Level_2.1.0.0
>>>>>>>>>>   10-DEC-13
>>>>>>>>>> 12   3 broad.mit.edu_LGG.IlluminaGA_DNASeq_curated.Level_2.1.3.0
>>>>>>>>>>  24-DEC-14
>>>>>>>>>> 13
>>>>>>>>>> 14   Please, select the line that you want to download: 3
>>>>>>>>>>
>>>>>>>>>> **Repeat this for GBM***
>>>>>>>>>>
>>>>>>>>>> Selecting specified lines to download…
>>>>>>>>>>
>>>>>>>>>> 1 gbm.subtypes <− TCGAquery_subtype(tumor = "gbm")
>>>>>>>>>> 2 lgg.subtypes <− TCGAquery_subtype(tumor = "lgg”)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Downloading data via the Bioconductor package RTCGAtoolbox…
>>>>>>>>>>
>>>>>>>>>> library(RTCGAToolbox)
>>>>>>>>>> 2
>>>>>>>>>> 3 # Get the last run dates
>>>>>>>>>> 4 lastRunDate <− getFirehoseRunningDates()[1]
>>>>>>>>>> 5 lastAnalyseDate <− getFirehoseAnalyzeDates(1)
>>>>>>>>>> 6
>>>>>>>>>> 7 # get DNA methylation data, RNAseq2 and clinical data for LGG
>>>>>>>>>> 8 lgg.data <− getFirehoseData(dataset = "LGG",
>>>>>>>>>> 9       gistic2_Date = getFirehoseAnalyzeDates(1), runDate =
>>>>>>>>>> lastRunDate,
>>>>>>>>>> 10       Methylation = TRUE, RNAseq2_Gene_Norm = TRUE, Clinic =
>>>>>>>>>> TRUE,
>>>>>>>>>> 11       Mutation = T,
>>>>>>>>>> 12       fileSizeLimit = 10000)
>>>>>>>>>> 13
>>>>>>>>>> 14 # get DNA methylation data, RNAseq2 and clinical data for GBM
>>>>>>>>>> 15 gbm.data <− getFirehoseData(dataset = "GBM",
>>>>>>>>>> 16       runDate = lastDate, gistic2_Date =
>>>>>>>>>> getFirehoseAnalyzeDates(1),
>>>>>>>>>> 17       Methylation = TRUE, Clinic = TRUE, RNAseq2_Gene_Norm =
>>>>>>>>>> TRUE,
>>>>>>>>>> 18       fileSizeLimit = 10000)
>>>>>>>>>> 19
>>>>>>>>>> 20 # To access the data you should use the getData function
>>>>>>>>>> 21 # or simply access with @ (for example gbm.data using Clinical)
>>>>>>>>>> 22 gbm.mut <− getData(gbm.data,"Mutations")
>>>>>>>>>> 23 gbm.clin <− getData(gbm.data,"Clinical")
>>>>>>>>>> 24 gbm.gistic <− getData(gbm.data,"GISTIC")
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Genomic Analysis/Final data extraction:
>>>>>>>>>>
>>>>>>>>>> Enable “getData” to access the data
>>>>>>>>>>
>>>>>>>>>> Obtaining GISTIC results…
>>>>>>>>>>
>>>>>>>>>> 1 # Download GISTIC results
>>>>>>>>>> 2 gistic <− getFirehoseData("GBM",gistic2_Date ="20141017" )
>>>>>>>>>> 3
>>>>>>>>>> 4 # get GISTIC results
>>>>>>>>>> 5 gistic.allbygene <− gistic using GISTIC@AllByGene
>>>>>>>>>> 6 gistic.thresholedbygene <− gistic using GISTIC@ThresholedByGene
>>>>>>>>>>
>>>>>>>>>> Repeat this procedure to obtain LGG GISTIC results.
>>>>>>>>>>
>>>>>>>>>> ***Please ignore the 'non-coded' text as they are procedural
>>>>>>>>>> steps/classifications***
>>>>>>>>>>
>>>>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>>>>
>>>>>>>>>> ______________________________________________
>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>>>>
>>>>>>>>>

	[[alternative HTML version deleted]]




More information about the R-help mailing list