[R] filter a tab delimited text file

Gabor Grothendieck ggrothendieck at gmail.com
Fri Sep 10 20:49:17 CEST 2010


On Fri, Sep 10, 2010 at 1:24 PM, Duke <duke.lists at gmx.com> wrote:
>  Hi all,
>
> I have to filter a tab-delimited text file like below:
>
> "GeneNames"    "value1"    "value2"    "log2(Fold_change)"
>  "log2(Fold_change) normalized"    "Signature(abs(log2(Fold_change)
> normalized) > 4)"
> ENSG00000209350    4    35    -3.81131293562629    -4.14357714689656    TRUE
> ENSG00000177133    142    2    5.46771720082336    5.13545298955309    FALSE
> ENSG00000116285    115    1669    -4.54130810709955    -4.87357231836982
>  TRUE
> ENSG00000009724    10    162    -4.69995182667858    -5.03221603794886
>  FALSE
> ENSG00000162460    3    31    -4.05126372834704    -4.38352793961731    TRUE
>
> based on the last column (TRUE), and then write to a new text file, meaning
> I should get something like below:
>
> "GeneNames"    "value1"    "value2"    "log2(Fold_change)"
>  "log2(Fold_change) normalized"    "Signature(abs(log2(Fold_change)
> normalized) > 4)"
> ENSG00000209350    4    35    -3.81131293562629    -4.14357714689656    TRUE
> ENSG00000116285    115    1669    -4.54130810709955    -4.87357231836982
>  TRUE
> ENSG00000162460    3    31    -4.05126372834704    -4.38352793961731    TRUE
>
> I used read.table and write.table but I am still not very satisfied with the
> results. Here is what I did:
>
> expFC <- read.table( "test.txt", header=T, sep="\t" )
> expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",]
> write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t" )
>
> Result:
>
> "GeneNames"    "value1"    "value2"    "log2.Fold_change."
>  "log2.Fold_change..normalized"
>  "Signature.abs.log2.Fold_change..normalized....4."
> "ENSG00000209350"    4    35    -3.81131293562629    -4.14357714689656
>  TRUE
> "ENSG00000116285"    115    1669    -4.54130810709955    -4.87357231836982
>  TRUE
> "ENSG00000162460"    3    31    -4.05126372834704    -4.38352793961731
>  TRUE
>
> As you can see, there are two points:
>
> 1. The headers were altered. All the special characters were converted to
> dot (.).
> 2. The gene names (first column) were quoted (which were not in the original
> file).
>

This will copy input lines matching pattern as well as the header to
the output verbatim preserving all quotes, spacing, etc.

myFilter <- function(infile, outfile, pattern = "TRUE$") {
	L <- readLines(infile)
	cat(L[1], "\n", file = outfile)
	L2 <- grep(pattern, L[-1], value = TRUE)
	for(el in L2) cat(el, "\n", file = outfile, append = TRUE)
}

# e.g.
myFilter("infile.txt", "outfile.txt")

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list