Links :: Bioinformatics
Bioinformatics :: Programming Portals - MySQL
- PostgreSQL
- Unix Home
- Free BSD
- Perl Home
- Perl Central Directory for All things Perl
- CPAN (Comprehensive Perl Archive Network)
- CPAN Search Index of Perl Modules:
- PHP Home
- Oracle Home
- MySQL Home
- SQL Server Central
- Linux SQL Databases and Tools
Bioinformatics :: Biology Portals - ENTREZ
The Entrez Global Query Cross-Database Search System is a powerful federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website
- National Center for Biotechnology Information (NCBI)
- Open Wetware
Useful protocols section to find out what how things are done
- The Arabidopsis Information Resource (TAIR)
Bioinformatics :: Bioinformatic Portals - Bioinformatics.Org
- BioPerl
libraries for perl that have useful biological functions
- EMBOSS
EMBOSS is "The European Molecular Biology Open Software Suite" which includes a variety of different applications for sequence analysis. http://emboss.sourceforge.net/
- Biopython
similar to bioperl but for python
- Sequence Manipulation Tools
Useful suite of tools to do many common sequence manipulation tasks
- UCSC genome browser || View Abstracts
- Generic Model Organism Database
Collection of open source software tools for creating and managing genome-scale biological databases
Bioinformatics :: Statistics - The R Project
- TASSEL
TASSEL evaluates linkage disequilibrium, nucleotide diversity, and trait associations
- Bioconductor
Bioconductor is an open source and open development software project to provide tools for the analysis and comprehension of genomic data
- SAPS
SAPS (Statistical Analysis of Protein Sequences) evaluates by statistical criteria a wide variety of protein sequence properties.
- RPy
Interface for using R in python
Bioinformatics :: Web Applications - BLAST
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences
- FGENE web interface
Gene prediction program web interface see link to program below
- PLACE || View Abstracts
Database of plant cis-acting regulatory DNA elements (final update 1/2007)
- European Union DGXII Biotechnology FW IV Research Programme
Development, optimization and validation of molecular tools for assessment of biodiversity in forest trees
- mVista
mVISTA is a comprehensive suite of programs and databases for comparative analysis of genomic sequences. Web interface for LAGAN, AVID
- SIM4 web interface
A program to align cDNA and genomic DNA, see sim4 entry below
- CD Search
search the Conserved Domain Database with Reverse Position Specific BLAST.
- ENZYME
ENZYME is a repository of information relative to the nomenclature of enzymes. It is primarily based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) and it describes each type of characterized enzyme for which an EC (Enzyme Commission) number has been provided .
- Fast sequence alignment || View Abstracts
FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment--previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo.
Bioinformatics :: Alignment and Phylogenetics Software - BLAST
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences
- zPicture
Z-Picture is a dynamic alignment and visualization tool used for comparative genomics
- Clustal || View Abstracts
ClustalW is a general purpose multiple alignment program for DNA or proteins
- CompareProspector || View Abstracts
CompareProspector uses comparative genomics information to aid in sequence motif finding.
- PhyloVista
PHYLO-VISTA is an interactive tool for analyzing multiple DNA sequence alignments by vizualizing a similarity measure for DNA sequences for different species while considering their phylogenic relationships
- TaxPlot
a tool for 3-way comparisons of genomes on the basis of the protein sequences they encode.
- Dotter
A dot-matrix program with interactive greyscale rendering for genomic DNA and Protein sequence analysis
- Jdotter
Java interactive interface for the Linux version of Dotter
- Gepard
Dotplots for large data sets
Bioinformatics :: Gene Prediction - GeneMark || View Abstracts
Determines the protein-coding potential of a DNA sequence by using species specific parameters of the Markov models of coding and non-coding regions.
- GENSCAN || View Abstracts
Genscan predicts the locations and exon-intron structures of genes in genomic sequences from a variety of organisms
- FGENESH || View Abstracts
FGENESH uses HMMs and protein similarity to perform gene prediction
- GRAILEXP || View Abstracts
Grail-EXP is a software package that predicts exons, genes, promoters, polyas, CpG islands, EST similarities, and repetitive elements within DNA sequence
- GeneMachine || View Abstracts
GeneMachine is an integrated tool intended to perform both comparative and predictive gene identification (requires registration)
- HMMgene || View Abstracts
HMMgene is a program for prediction of genes in anonymous DNA
- Geneid || View Abstracts
Geneid is a program that predict genes, exons, splice sites and other signals along a DNA sequence
- Glimmer || View Abstracts
Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses
- MZEF || View Abstracts
MZEF provides a new method for predicting internal coding exons in genomic DNA sequences. This computer program allows users to predict putative internal protein coding exons, adjust prior probability and to output alternative overlapping exons. It is based on a prediction algorithm using the quadratic discriminant function for multivariate statistical pattern recognition
- EST2Genome
Est2genome is a software tool to aid the prediction of genes by sequence homology
- Spidey
Spidey is an mRNA-to-genomic alignment program
- RBSfinder
RBSfinder will search for regions in the vicinity of the gene start where the ribosome might bind. Based on its findings RBSfinder might propose a different gene start.
- GeneZilla || View Abstracts
GeneZilla is a state-of-the-art program for computational prediction of protein-coding genes in eukaryotic DNA, and is based on the Generalized Hidden Markov Model (GHMM) framework, similar to GENSCAN and GENIE. (formerly TIGRscan)
Bioinformatics :: Mapping and Assembly - Phred || View Abstracts
Primary tool used in base calling and quality value determination for sequencing data.
- Phrap
Primary tool used in shotgun DNA Sequence assembly
- Consed || View Abstracts
This tool generates a visual/graphical overview of assembled sequence data, allowing for the manual detecting and fixing of assemblies.
- Staden Package || View Abstracts
Assembler package that includes programs such as PreGap4 and Gap4.
- Genetic Mapping Primer/Tutorial
- Genetic Mapping Primer from NCBI
- T-DNA Express || View Abstracts
Gene Mapping Tool for Arabidopsis
- Qtl Cartographer
A suite of programs to map quantitative traits using a map of molecular markers
- Mapmaker 3 || View Abstracts
Contains two programs:
1. Mapmaker/Exp - A linkage analysis package designed to help construct primary
linkage maps of markers segregating in experimental crosses.
2.MAPMAKER/QTL - a companion program to MAPMAKER/EXP which allows one to map
genes controlling polygenic quantitative traits in F2 intercrosses and BC1 backcrosses relative to a genetic linkage map
- Qtl Express || View Abstracts
Provides an analysis of quantitative trait data from outbred populations (last update Feb 2008, replaced with GridQTL that requires registration)
- QTX || View Abstracts
QTX detects and localizes quantitative trait loci (development ended)
- MapQTL
MapQTL is computer software for the calculation of positions of quantitative trait loci (QTLs) on genetic maps
- MultiQTL
MultiQTL software integrates a broad spectrum of data mining, statistical analysis, interactive visualization and modeling tools that allow QTL analysis based on advanced and sophisticated methods for maximum extraction of the mapping information from data
- Map Viewer
The Map Viewer provides a wide variety of genome mapping and sequencing data.
- EagleView || View Abstracts
EagleView is an information-rich genome assembler viewer with data integration capability. EagleView can display a dozen different types of information including base qualities, machine specific trace signals, and genome feature annotations.
- Gblocks || View Abstracts
Gblocks eliminates poorly aligned positions and divergent regions of a DNA or protein alignment so that it becomes more suitable for phylogenetic analysis.
Bioinformatics :: Sequence Annotation - ESTScan || View Abstracts
ESTScan is a program that can detect coding regions in DNA sequences, even if they are of low quality. ESTScan will also detect and correct sequencing errors that lead to frameshifts
- ORF Finder
The ORF Finder (Open Reading Frame Finder) is a graphical analysis tool which finds all open reading frames of a selectable minimum size in a users sequence or in a sequence already in the database
- AlignAce || View Abstracts
AlignACE (Aligns Nucleic Acid Conserved Elements) is a program which finds sequence elements conserved in a set of DNA sequences
- BEARR || View Abstracts
BEARR (Batch Extraction and Analysis of cis-Regulatory Regions) is a software suite for analyzing transcriptional regulatory regions in the genome
- BioProspector || View Abstracts
BioProspector, a C program using a Gibbs sampling strategy, examines the upstream region of genes in the same gene expression pattern group and looks for regulatory sequence motifs
- CONFAC || View Abstracts
Conserved Transcription Factor Binding Site Finder (CONFAC) takes a list of human gene names and identifiers as input, and compares them with their mouse orthologues to identify conserved transcription factor binding sites
- Eponine || View Abstracts
Eponine is a probabilistic method for detecting transcription start sites (TSS) in mammalian genomic sequence
- FirstEF || View Abstracts
FirstEF is a 5 terminal exon and promoter prediction program
- IslandPath || View Abstracts
IslandPath incorporates both DNA sequence signal features and annotation features to aid the identification of genome islands. Mostly for prokaryotes
- IsoFinder || View Abstracts
The IsoFinder web server allows accurate and reliable isochore predictions in genome sequences (GC content)
- JASPAR || View Abstracts
JASPAR is a collection of transcription factor DNA-binding preferences, modeled as matrices. These can be converted into Position Weight Matrices (PWMs or PSSMs), used for scanning genomic sequences.
- McPromoter || View Abstracts
McPromoter is a program aiming at the exact localization of eukaryotic RNA polymerase II transcription start sites. mostly focused on D. melanogaster DNA
- PromoSer || View Abstracts
PromoSer is a web-based service aimed specifically at the extraction of a large number of promoter sequences from mammalian genomes. To identify the transcription start site (TSS) of a gene, we map all available mRNA and EST sequence data onto the genome and track the overlapping alignments (denoted as a cluster)
- rVista || View Abstracts
rVista combines database searches with comparative sequence analysis to find potential regulatory elements in noncoding regions of the human genome
- seqVista || View Abstracts
eqVISTA presents a holistic, graphical view of features annotated on nucleotide or protein sequences. Allows easy searching for sequence motifs or extraction of particular subsequences
- tRNAscan-SE || View Abstracts
Search for tRNA genes in genomic sequence
- VecScreen
A tool for identifying segments of a nucleic acid sequence that may be of vector, linker, or adapter origin prior to sequence analysis or submission.
- Gibbs Motif Sampler || View Abstracts
Gibbs Motif Sampler allows you to identify motifs, conserved regions, in DNA or protein sequences. This tool can be applied for the detection of transcription factor binding sites (TFBS).
- SNAP || View Abstracts
SNAP (SNP Annotation and Proxy Search) finds proxy SNPs based on linkage disequilibrium, physical distance and/or membership in selected commercial genotyping arrays.
- Signal Search Analysis || View Abstracts
SSA is a software package for the analysis of nucleic acid sequence motifs that are positionally correlated with a functional site such as a transcription initiation site.
- T-coffee, M-Coffee, expresso || View Abstracts
A collection of tools for Computing, Evaluating and Manipulating Multiple Alignments of DNA, RNA, Protein Sequences and Structures.
- MaxAlign || View Abstracts
A tool to remove sequences (taxa) with many gaps in the post-process of alignments in order to improve the alignment area. Maximizes the number of characters that are present in gap-free columns (alignment area) by selecting an optimal subset of sequences
- sim4 || View Abstracts
Sim4 is similarity-based tool designed to align an expressed DNA sequence with a genomic sequence, allowing for introns. SIBsim4 is a derivative work from sim4
- MAKER || View Abstracts
Genome annotation pipeline for eukaryotic genomes and creates genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab initio gene predictions, and automatically synthesizes these data into gene annotations having evidence-based quality indices.
- Apollo Genome Annotation Curation Tool || View Abstracts
Bioinformatics :: Proteins and Pathways - DALI || View Abstracts
Dali is a network service for comparing protein structures in 3D
- STRUCTAL || View Abstracts
STRUCTAL provides database comparison of 3D protein structure (no clue how old this is, hard to find info about it from main page)
- UNIPROT
UNIPROT is a comprehensive database resource for protein sequence and annotation data, comprised of: UniProtKB, UniRef, UniParc
- PROSITE
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them
- CN3D
Cn3D is a 3-dimensional visualization tool for biomolecular structures, sequences, and sequence alignments
- CATH || View Abstracts
CATH is a hierarchical classification of protein domain structures, which clusters proteins at four major levels: Class (C), Architecture (A), Topology (T) and Homologous superfamily (H). The boundaries and assignments for each protein domain are determined using a combination of automated and manual procedures which include computational techniques, empirical and statistical evidence, literature review and expert analysis.
- SSAP
The SSAP server allows users to compare the structures of two proteins and view the subsequent structural alignment
- VAST
VAST Search is the NCBI structure-structure similarity search service
- SSM || View Abstracts
Secondary Structure Matching (SSM) is an interactive service for comparing protein structures in 3D
- SCOP || View Abstracts
SCOP is a database, created by manual inspection and abetted by a battery of automated methods, that aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known (Alexey Murzin)
- KEGG
KEGG is a database of biological systems, consisting of genetic building blocks of genes and proteins, chemical building blocks of both endogenous and exogenous substances, molecular wiring diagrams of interaction and reaction networks, and hierarchies and relationships of various biological objects.
- PICR || View Abstracts
Protein Identifier Cross-Reference Service is a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc.
- PDB
The PDB(Protein Data Bank) archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies.
- Pfam
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
- PSIPRED || View Abstracts
The PSIPRED protein structure prediction server allows you to submit a protein sequence, perform a prediction of your choice and receive the results of the prediction via e-mail.
- TMHMM || View Abstracts
Prediction of transmembrane helices in proteins based on hidden Markov model. Main website has very nice tools and services
- SignalP
SignalP 3.0 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes.
- ALDENTE || View Abstracts
Aldente is a tool to identify proteins from peptide mass fingerprinting data. This new, fast and powerful tool takes advantage of the Hough transform for spectra recalibration and outlier exclusion.
- Motif Scan
Motif scanning means finding all known motifs that occur in a sequence. This form lets you paste a protein sequence, select the collections of motifs to scan for, and launch the search.
Bioinformatics :: Sequence Retrieval and Submissions - ENTREZ
The Entrez Global Query Cross-Database Search System is a powerful federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website
- Sequin
Sequin is a stand-alone software tool developed by the NCBI for submitting and updating entries to the GenBank, EMBL, or DDBJ sequence databases
- BankIt
BankIt is a web-based sequence submission tool
- FASTA
FASTA provides sequence similarity searching against nucleotide and protein databases using the FASTA programs
- dbSNP
Database of known Single Nucleotide Polymorphisms
- GenBank
GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences
- MatchMiner
MatchMiner is a set of tools that enables the user to translate between disparate ids for the same gene. It uses data from the UCSC, LocusLink, Unigene, OMIM, Affymetrix and Jackson data sources to determine how different ids relate. Supported id types include, gene symbols and names, IMAGE and FISH clones, GenBank accession numbers and UniGene cluster ids.
- Miamexpress
MIAME (minimum information about a microarray experiment) compliant microarray data submission tool
- IBM Genome Annotation
IBM Bio-Dictionary-based Annotations Of Completed Genomes page lists annotations for over 75 complete genomes (very old) a complete list can be found here: http://cbcsrv.watson.ibm.com/Annotations/ and some tools here: http://cbcsrv.watson.ibm.com/Tspd.html
Bioinformatics :: Expression (microarray) - Laboratory for Genomics and Bioinformatics - University of Georgia
MAGIC Database, The database can be thought of as consisting of two major components - a segment that focuses on gene discovery through DNA sequencing, with a focus on EST projects, and a second that deals with microarray data.
- Genevestigator || View Abstracts
reference expression database and meta-analysis system
- caGEDA || View Abstracts
A web application for the integrated analysis of global gene expression patterns in cancer
- UNIGENE
An Organized View of the Transcriptome
- Expression Profiler || View Abstracts
An open, extensible web-based collaborative platform for microarray gene expression, sequence and PPI data analysis, exposing distinct chainable components for clustering, pattern discovery, statistics (thru R), machine-learning algorithms and visualization
- ACID || View Abstracts
ACID is a web-based comprehensive database server for information about reporters/probes/genes used in microarray experiments.
- ArrayXPath || View Abstracts
ArrayXPath is a web-based service for matching microarray gene-expression profiles with known biological pathways.
- BAGEL || View Abstracts
Bayesian Analysis of Gene Expression Levels is a program that allows
statistical inferences to be made regarding differential gene expression between two or
more samples measured on spotted microarrays
- CIBEX
The Center for Information Biology gene EXpression database (CIBEX) is a public repository for gene expression experimental data
- CARRIE || View Abstracts
CARRIE is a program takes two condition microarray data and applies promoter analysis to infer the stimulated/repressed transcriptional regulatory network.
- GOAL || View Abstracts
GOAL is a resource designed for functional analysis of DNA microarrays and maintained at the "Data Mining for Analysis of DNA Microarrays"
- KARMA || View Abstracts
KARMA (Keck Array Manager and Annotator) allows you compare and annotate your own microarrays against other available arrays
- Microarray and Gene Expression Database
MGED aims to facilitate the sharing of data generated using the microarray and other functional genomics technologies for a variety of applications including expression profiling
- Stanford Microarray Database
SMD stores raw and normalized data from microarray experiments, as well as their corresponding image files. In addition, SMD provides interfaces for data retrieval, analysis and visualization
- PrimerBank || View Abstracts
PrimerBank is a public resource for PCR primers. These primers are designed for gene expression detection or quantification (real-time PCR)
- RTPrimerDB || View Abstracts
RTPrimerDB is a public database for primer and probe sequences used in real-time PCR assays employing popular chemistries (SYBR Green I, Taqman, Hybridisation Probes, Molecular Beacon) to prevent time-consuming primer design and experimental optimisation, and to introduce a certain level of uniformity and standardisation among different laboratories
- GEO
The Gene Expression Omnibus (GEO) provides several tools to assist with the visualization and exploration of GEO data.
Bioinformatics :: Repeat Masking - Tallymer || View Abstracts
kmer counting method for identifying repeats
- RepeatMasker
Program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence
- Tandem Repeat Occurrence Locator (troll) || View Abstracts
Lightweight SSR finder based on a slight modification of the Aho-Corasick algorithm.
- Tandem Repeat Finder (TRF) || View Abstracts
- RAP || View Abstracts
This tool allow to identify repeated sequences using word counting based algorithm.
- Piler || View Abstracts
Can search for different types of repeats, bad for short sequences (<50bp)
- RECON || View Abstracts
De novo identification and classification of repeat sequence families from genomic sequences. Our extensions use multiple alignment information to define the boundaries of individual copies of the repeats and to distinguish homologous but distinct repeat element families. RECON should be useful for first-pass automatic classification of repeats in newly sequenced genomes.
- RepeatScout || View Abstracts
De Novo identify repeat family sequences from genomes where hand-curated repeat databases (a la RepBase update) are not available.
- AAARF || View Abstracts
Using a combination of sample sequencing, bioinformatics and novel software design approaches to characterize the high copy number repeats. The program is composed of a single Perl program that utilizes the freely available bioperl module libraries. AAARF allows a high level of automation
|