Session 5. Bioinformatics research in UNIX environments
A. Local bioinformatics resources (data):
- Directory genomes:
A.aegyptis/ C.elegans/ F.catus/ M.crystallinum/ P.knowlesi/ S.propinquum/ A.albopictus/ C.familiaris/ F.sporotrichioides/ M.grisea/ P.taeda/ S.salar/ A.flavus/ C.immitis/ G.arboreum/ M.mulatta/ P.tremulaxP.tremuloides/ S.scrofa/ A.fumigatus/ C.intestinalis/ G.gallus/ M.musculus/ P.troglodytes/ S.tuberosum/ A.gambiae/ C.neoformans/ G.hirsutum/ M.polymorpha/ P.vivax/ T.aestivum/ A.mellifera/ C.parvum/ G.lamblia/ M.sativa/ P.yezoensis/ T.annulata/ A.nidulans/ C.pipiens/ G.max/ M.truncatula/ P.yoelii/ T.brucei/ A.parasiticus/ C.reinhardtii/ G.morsitans/ N.caninum/ readme/ T.cruzi/ A.sativa/ Cryptococcus/ H.annuus/ N.crassa/ README T.gondii/ A.thaliana/ C.savignyi/ H.sapiens/ O.cuniculus/ #README# T.nigroviridis/ A.triseriatus/ data/ H.vulgare/ O.latipes/ R.norvegicus/ T.parva/ A.variegatum/ D.discoideum/ I.punctatus/ O.mykiss/ S.bicolor/ T.pseudonana/ B.bovis/ D.melanogaster/ Leishmania/ O.sp/ S.cereale/ T.rubripes/ B.malayi/ D.pseudoobscura/ L.esculentum/ O.volvulus/ S.cerevisiae/ T.thermophila/ B.oleracea/ D.rerio/ Lettuce/ P.berghei/ SeaUrchin/ X.laevis/ B.taurus/ E.caballus/ L.hirsutum/ P.chabaudi/ S.japonicum/ X.tropicalis/ B.vulgaris/ E.cuniculi/ L.japonicus/ P.chrysosporium/ S.mansoni/ Z.mays/ C.albicans/ E.hystolitica/ L.major/ P.cynocephalus/ S.neurona/ C.briggsae/ E.tenella/ L.pennellii/ P.falciparum/ S.pombe/
- Directory H.sapiens:
celera_1.0_20010216/ ensembl_4_28/ golden_path_20010806/ golden_path_20030410/ sanger_2.3_20000519.tar dbEST/ ensembl_7_29/ golden_path_20011222/ golden_path_200307/ TIGR_GI/ ensembl-0.7.4/ ensembl_8_30/ golden_path_20020405/ golden_path_200405/ UniGene/ ensembl_20030401/ eVOC-1.9/ golden_path_20020628/ H-InvDB/ ensembl_3_26/ golden_path_20010401/ golden_path_20021114/ ncbi_20001219/
- Directory golden_path_200405:
bigZips/ chromFa/ chromFa_msk/ database/
- ChromFa/:
chr10.fa chr14.fa chr18.fa chr21.fa chr4.fa chr6_random.fa chrM.fa chr10_random.fa chr15.fa chr18_random.fa chr22.fa chr4_random.fa chr7.fa chrX.fa chr11.fa chr15_random.fa chr19.fa chr22_random.fa chr5.fa chr7_random.fa chrX_random.fa chr12.fa chr16.fa chr19_random.fa chr2.fa chr5_random.fa chr8.fa chrY.fa chr12_random.fa chr16_random.fa chr1.fa chr2_random.fa chr6.fa chr8_random.fa temp/ chr13.fa chr17.fa chr1_random.fa chr3.fa chr6_hla_hap1.fa chr9.fa chr13_random.fa chr17_random.fa chr20.fa chr3_random.fa chr6_hla_hap2.fa chr9_random.fa
- chr1.fa (chunk):
>chr1 taaccctaaccctaaccctaaccctaaccctaaccctaaccctaacccta accctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaac cctaacccaaccctaaccctaaccctaaccctaaccctaaccctaacccc taaccctaaccctaaccctaaccctaacctaaccctaaccctaaccctaa ccctaaccctaaccctaaccctaaccctaacccctaaccctaaccctaaa ccctaaaccctaaccctaaccctaaccctaaccctaaccccaaccccaac cccaaccccaaccccaaccccaaccctaacccctaaccctaaccctaacc ctaccctaaccctaaccctaaccctaaccctaaccctaacccctaacccc taaccctaaccctaaccctaaccctaaccctaa
- chr21.fa (chunk):
>chr21 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNN
- Directory seq (genomic databases):
blastncbi/ blastwu/ databases/ datasets/ fasta/ genomes/ lost+found/ predictions/ scratch/ tools/
- Directory databases:
ASD/ embl-Release_77@ hs.gbff pdb/ pfam-15@ refseq/ swiss-prot-Release45@ trembl-Release28@ EGO/ hs.faa hssp/ pdb_seq/ pir/ sp_tr_nrdb/ tgi/ embl/ hs.fna mysql/ pfam/ pir-Release72@ swiss-prot/ trembl/
B. Local bioinformatics resources (tools):
- Directory /localmolbio/bin
abiview* cusp* fastasplit@ lrna* pir2fasta@ showdb* acdc* cutseq* fastasubseq@ lsadt* plotcon* showfeat* addlongorfs* cytoscape.sh* FastaToTbl* MakeCons* plotorf* showorf* ali2gff* dan* fastatranslate@ makemat@ poincare@ showseq* AlignACE* das-client* faToNib@ mapview* polydot* shuffleseq* alistat* DateRepeats@ faToTwoBit@ markov@ predator* sigcleave* AnalyseDists* dba@ fgenesh@ marscan* preg* silent* AnalyseSeqs* dbiblast* findall* MaskerAid@ pressdb@ sim2gff.pl* anomaly* dbifasta* findkm* MaskerAidMP@ prettyplot* Singular@ antigenic* dbiflat* fitch@ maskfeat* prettyseq* snap* apollo* dbigcg* formatdb@ maskseq* prima* sp2fasta@ avid@ degapseq* freak* mast@ primersearch* spidey* backtranseq* descseq* fuzznuc* matcher* printsextract* splitter* banana* dicodontable* fuzzpro* MatrixAid@ PrintStrat* sreformat* bin/ diffseq* fuzztran* mcl@ ProcessRepeats@ ssaha@ bl2seq@ digest* garnier* mclpipeline@ profit* stretcher* blast2gff* dnacomp@ gb2fasta@ mcx@ proml@ stssearch* blast3* dnadist@ gde* mcxarray@ promlk@ supermatcher* blasta@ dnainvar@ geecee* mcxassemble@ prophecy* syco* blastall@ dnal@ gendist@ mcxconvert@ prophet* tblastn@ blastcl3@ dnaml@ geneid@ mcximac* prosextract* tblastx@ blastclust@ dnamlk@ geneid1.1.OLD* mcxmap@ protdist@ TblToFasta* blastn@ dnamove@ geneid_p* mcxsubs@ protpars@ t_coffee* blastp@ dnapars@ generage* megablast@ prss33* textsearch* blastpgp@ dnapenny@ genewise@ megamerger* pscan* tfasta* blastruntool.pl* dollop@ genewisedb@ meme@ pslPretty@ tfextract* blastx@ dolmove@ genModel@ memfile@ pslReps@ tfm* blat@ dolpenny@ genomewise@ merger* pslSort@ tfscan* blat2gff.pl* domainer* genotator* mfasta2split.pl* psw@ tmap* BLOSUM62@ dotmatcher* genscan* minimize@ pswdb@ transeq* btwisted* dotpath* genSymm@ mix@ readseq* translate* build-icm* DotPlotTool* getorf* mocca* rebaseextract* Translate* CAP2* dottup* getseq* move@ recode* treedist@ chaos* drawgram@ gfClient@ msbar* redata* treetool* charge* drawtree@ gff2aplot* needle* remap* trf@ checktrans* dreg* gff2aplot.pl* neighbor@ RepeatMasker@ tribe-families@ chips* einverted* gff2ps* newcpgreport* restdist@ tribe-matrix@ circuits@ embossdata* gfServer@ newcpgseek* restml@ tribemcl@ cirdna* emma* glimmer2* newseq* restover* tribe-parse@ cleanup@ emowse* glimmerm@ nibFrag@ restrict* trimseq* clique@ entret* glimmerm_malaria@ noreturn* Restriction* trnascan-1.4@ clmconf* equicktandem* gmhmme@ notseq* retree@ tRNAscan-SE@ clmdist@ ESingular@ graver@ nrdb@ revseq* twoBitInfo@ clmformat@ est2genome* groebner@ nrscope* RNAdistance* twoBitToFa@ clmimac@ estwise@ gt2fasta@ nthseq* RNAeval* varpos* clminfo@ estwisedb@ heapsortHGL* octanol* RNAfold* vcluster@ clmmate@ etandem* helixturnhelix* oddcomp* RNAheat* vectorstrip* clmmeet@ eufindtRNA@ hilbert@ output@ RNAinverse* VERA* clmresidue@ evaluation* hmmalign@ pabla* RNApdist* water* clus* exonerate@ hmmbuild@ palindrome* rose* webblast* clustalv* exstral.pl* hmmcalibrate@ pam@ rpsblast@ wobble* clustalw* exstralpwise.pl* hmmcalibrate-pvm* pars@ run-glimmer2* wordcount* clustalx* extract* hmmconvert@ parse* SAM* wordmatch* coallike* extract_from_pdb* hmmemit@ parseblast@ scanblastpairs* wossname* codcmp* extractseq* hmmfetch@ parseblast.pl* scan_for_matches* wu-blastall@ coderet* factor@ hmmindex@ parseblast.pl.old* scluster@ wu-blastn@ compare-lists* fasta* hmmpfam@ pasteseq* scope* wu-blastp@ comparemasked@ fastaclean@ hmmpfam-pvm* patdb@ search-launcher* wu-blastx@ complex* fastaclip@ hmmsearch@ patmatdb* SearchLauncher* wu-formatdb@ compseq* fastacmd@ hmmsearch-pvm* patmatmotifs* seaview@ wu-tblastn@ cons* fastacomposition@ hmoment* pb@ SECISearch@ wu-tblastx@ consense@ fastadiff@ iep* pbold* seealso* xblastmathtool.pl* Consto01mask* fastaexplode@ impala@ penny@ seedtop@ xblastperltool.pl* contml@ fastafetch@ imview@ pepcoil* seg* xblast.pl* contrast@ fastahardmask@ infoseq* pepinfo* seqboot@ xblastseqtool.pl* copymat@ fastaindex@ ipcress@ pepnet* seqmatchall* xblastxtool.pl* count* fastalength@ isochore* pepstats* seqret* xdformat@ covels-SE@ fastanrdb@ kitsch@ pepwheel* seqretall* xdget@ coves-SE@ fastaoverlap@ lalign2list* pepwindow* seqretset* xnu* cpgplot* fastareformat@ LibraryAid@ pepwindowall* seqretsplit* ZUKERGDE.sh* cpgreport* fastaremove@ liftOver@ Pick70_script1* seqstat* Zuk_to_gen* critica* fastarevcomp@ lindna* Pick70_script1_contig* setdb@ crna* fastasoftmask@ long-orfs* Pick70_script2* share/ cross_match@ fastasort@ LoopTool* pictogram@ sho_helix*
- Programming and scripting languages:
- Perl, C, Java, GAWK, ...
- Shell programming (bash,tcsh,...):
1.) % ls plots/ps | while read file; do echo $file; convert plots/ps/$file plots/jpgs/$file.jpg; done2) % ls plots/ps | while read file; do echo $file; convert plots/ps/$file plots/jpgs/$file.jpg; done
C. How to tackle genome-wide analysis:
- Extract samples from the human genome:
- Gene prediction in the human genome:
% ls chromFa | while read file; do echo file; geneid -P human.param chromFa > predictions/$file.genes; done
Enrique Blanco © 2004 eblanco@imim.es