[BioC] how to change file format

Uri David Akavia uridavid at netvision.net.il
Tue Jul 26 10:12:53 CEST 2005


If you have a UNIX system you can use AWK.
Assuming that the original file (ORIGINAL) is seperated by tabs, I would 
use something like (in one line)
cat ORIGINAL | awk -F"\t" '{print $1"\t"$2" - 
"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}'

If the original file is seperated by something else, say commas, replace 
the F"\t"  with the appropriate seperator (F"," and so forth).

Or you could try using something like EXCEL.
I'm not sure R would be very useful, since I believe it would have to 
read the entire file into memory, which might be slow.

Yours,

Uri David Akavia

weinong han wrote:
> Dear All,
>  
> My question seems not to be fit for the mail list, however, I really need your help. Crouching tigers and Hidden dragons are There!
>  
> Now ,I have the file format including 10 headers(gene, name, description, arry1,array2...array7)
> Gene Name Descriptin Array 1 Array 2 Array 3 Array 4 Array 5 Array 6 Array 7
> Gene 1 Name 1  Description 1   0.2 -0.1 -1.1 0.4 -4 -2 0.2
> Gene 2 Name 2  Description 2   2.3 2.1 -3 1.1 1.2 -1.6 0.1
> Gene 3 Name 3  Description 3   0.1 1.6 1.2 1.5 2.7 0.4 -0.4
> Gene 4 Name 4  Description 4   0.3 -1.5 -1.7 0.2 0.4 2 -2.1
> Gene 5 Name 5  Description 5   1.7 2.3 2.3 2.3 3 -2 2.1
> Gene 6 Name 6  Description 6   0.2 4 4 4 0.2 -3 -4
> Gene 7 Name 7  Description 7   -0.3 1.5 1.5 1.5 -0.2 1.7 3
> Gene 8 Name 8  Description 8   1.4 -0.6 -1.1 -0.3 -3 -3 1.4
>  
> I want to get the following file format:
>  
> 
> Gene	Name	Array 1	Array 2	Array 3	Array 4	Array 5	Array 6	Array 7Gene 1	Name 1 - Description 1	0.2	-0.1	-1.1	0.4	-4	-2	0.2Gene 2	Name 2 - Description 2	2.3	2.1	-3	1.1	1.2	-1.6	0.1Gene 3	Name 3 - Description 3	0.1	1.6	1.2	1.5	2.7	0.4	-0.4Gene 4	Name 4 - Description 4	0.3	-1.5	-1.7	0.2	0.4	2	-2.1Gene 5	Name 5 - Description 5	1.7	2.3	2.3	2.3	3	-2	2.1Gene 6	Name 6 - Description 6	0.2	4	4	4	0.2	-3	-4Gene 7	Name 7 - Description 7	-0.3	1.5	1.5	1.5	-0.2	1.7	3Gene 8	Name 8 - Description 8	1.4	-0.6	-1.1	-0.3	-3	-3	1.4
> 
> in the above file format,The first row is a header row, where the names of the 
> 
> arrays/experiments are specified from column 3 and on. The second row and on specify 
> 
> expression data for each gene, where the first column is the unique identifier of each gene, 
> 
> the second column specifies the name and the description of the gene, where the name 
> 
> and description are separated by " - " (the surrounding spaces are important), and column 3 
> 
> and on specify the expression data for the gene across all experiments.
> 
> thanks much for your help in advance
> 
> Any suggestions and advice will be much appreicated.
> 
> 
> 
> Best Regards
>  
> Han Weinong  
> hanweinong at yahoo.com
> 
> __________________________________________________
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> 
>



More information about the Bioconductor mailing list