Session 3. Putting all together: extracting useful information from files
1. Connecting commands: pipes % command1 | command2 [pipe] % command | more [reading long outputs] % command | wc [long outputs size]
2. The GREP command: % grep pattern file [regular expression search] % command | grep "pattern" [filter condition] % command | grep -v "pattern" [filter no condition]
3. The SORT command: % sort file [sort alphabetically] % sort -n file [sort numerically] % sort -r file [reverse sort] % sort +x file [sort by column x+1] % sort file | uniq [remove repeated lines] % sort file | uniq -c [line counts]
4. The JOIN command (sorted files): % join file1 file2 [matches between files (column 1)] % join -1 i -2 j file1 file2 [columns i ,j] % join -v 1 file1 file2 [Lines from file1 not in file2]
Practice 3. Putting all together: extracting useful information from files
- Go to the UCSC human genome browser home page at http://genome.cse.ucsc.edu
- Click on Downloads (on the left frame)
- Select the Database directory
- Save the file refGene.txt.gz
- Unzip the file
Type the following commands (tutorial):
- % cd
- % pwd
- % mkdir work3
- % cd work3
- % wc refGene.txt (number of genes)
- % grep "chr21" refGene.txt | wc (number of genes located in chr21)
- % grep "+" refGene.txt | wc (number of genes in positive strand)
- % grep "chr21" refGene.txt | sort +3n | more (chr21 genes sorted by position)
- % sort +7nr refGene.txt | more (genes sorted by number of exons)
Enrique Blanco © 2004 -- eblanco@imim.es