Intro to the UNIX environment

Session 3. Putting all together: extracting useful information from files

1. Connecting commands: pipes

% command1 | command2 [pipe]

% command | more [reading long outputs]
% command | wc [long outputs size]

2. The GREP command:

% grep pattern file [regular expression search]

% command | grep "pattern" [filter condition]
% command | grep -v "pattern" [filter no condition]

3. The SORT command:

% sort file [sort alphabetically]

% sort -n file [sort numerically]
% sort -r file [reverse sort]
% sort +x file [sort by column x+1]
% sort file | uniq [remove repeated lines]
% sort file | uniq -c [line counts]

4. The JOIN command (sorted files):

% join file1 file2 [matches between files (column 1)]

% join -1 i -2 j file1 file2 [columns i ,j]
% join -v 1 file1 file2 [Lines from file1 not in file2]

Practice 3. Putting all together: extracting useful information from files

Go to the UCSC human genome browser home page at http://genome.cse.ucsc.edu

Click on Downloads (on the left frame)

Select the Database directory

Save the file refGene.txt.gz

Unzip the file

Type the following commands (tutorial):

% cd

% pwd

% mkdir work3

% cd work3

% wc refGene.txt (number of genes)

% grep "chr21" refGene.txt | wc (number of genes located in chr21)

% grep "+" refGene.txt | wc (number of genes in positive strand)

% grep "chr21" refGene.txt | sort +3n | more (chr21 genes sorted by position)

% sort +7nr refGene.txt | more (genes sorted by number of exons)

Enrique Blanco © 2004 -- eblanco@imim.es

1. Connecting commands: pipes

% command1 \| command2	[pipe]

% command \| more	[reading long outputs]
% command \| wc	[long outputs size]

2. The GREP command:

% grep pattern file	[regular expression search]

% command \| grep "pattern"	[filter condition]
% command \| grep -v "pattern"	[filter no condition]

3. The SORT command:

% sort file	[sort alphabetically]

% sort -n file	[sort numerically]
% sort -r file	[reverse sort]
% sort +x file	[sort by column x+1]
% sort file \| uniq	[remove repeated lines]
% sort file \| uniq -c	[line counts]

4. The JOIN command (sorted files):

% join file1 file2	[matches between files (column 1)]

% join -1 i -2 j file1 file2	[columns i ,j]
% join -v 1 file1 file2	[Lines from file1 not in file2]