Utility to convert the binary file format from anABI PRISM TM 377 DNA Sequencer to an xml file
→ 300 Comments
“Motivation: The motivation is to identify, through machine learning techniques, specific patterns in HIV and HCV viral polyprotein amino acid residues where viral protease cleaves the polyprotein as it leaves the ribosome. An understanding of viral protease specificity may help the development of future anti-viral drugs involving protease inhibitors by identifying specific features of protease activity for further experimental investigation. While viral sequence information is growing at a fast rate, there is still comparatively little understanding of how viral polyproteins are cut into their functional unit lengths. The aim of the work reported here is to investigate whether it is possible to generalise from known cleavage sites to unknown cleavage sites for two specific viruses-HIV and HCV. An understanding of proteolytic activity for specific viruses will contribute to our understanding of viral protease function in general, thereby leading to a greater understanding of protease families and their substrate characteristics. Results: Our results show that artificial neural networks and symbolic learning techniques (See5) capture some fundamental and new substrate attributes, but neural networks outperform their symbolic counterpart. Availability: Publicly available software was used (Stuttgart Neural Network Simulator-http://www-ra.informatik.uni-tuebingen.de/SNNS/ and See5-http://www.rulequest.com. The datasets used (HIV, HCV) for See5 are available at: http://www.dcs.ex.ac.uk/~anarayan/bioinf/ismbdatasets/ Keywords: protase cleavage; protease inhibitors; machine learning; neural networks; decision trees. Contact: a.narayanan@ex.ac.uk; wuxikun@yahoo.com; z.r.yang@ex.ac.uk”
→ 400 Comments
Meskalin is a *nix parallel port control programit allows you to easily control LEDs or wetwire devices like brainwave machinesported for: FreeBSD, netBSD, openBSD and Linux
MOTIVATION: Correct prediction of residue-residue contacts in proteins that lack good templates with known structure would take ab initio protein structure prediction a large step forward. The lack of correct contacts, and in particular long-range contacts, is considered the main reason why these methods often fail. RESULTS: We propose a novel hidden Markov model (HMM)-based method for predicting residue-residue contacts from protein sequences using as training data homologous sequences, predicted secondary structure and a library of local neighborhoods (local descriptors of protein structure). The library consists of recurring structural entities incorporating short-, medium- and long-range interactions and is general enough to reassemble the cores of nearly all proteins in the PDB. The method is tested on an external test set of 606 domains with no significant sequence similarity to the training set as well as 151 domains with SCOP folds not present in the training set. Considering the top 0.2 x L predictions (L = sequence length), our HMMs obtained an accuracy of 22.8% for long-range interactions in new fold targets, and an average accuracy of 28.6% for long-, medium- and short-range contacts. This is a significant performance increase over currently available methods when comparing against results published in the literature. AVAILABILITY: http://predictioncenter.org/Services/FragHMMent/.
→ 100 Comments
MOTIVATION: Reverse phase protein arrays (RPPA) measure the relative expression levels of a protein in many samples simultaneously. A set of identically spotted arrays can be used to measure the levels of more than one protein. Protein expression within each sample on an array is estimated by borrowing strength across all the samples, but using only within array information. When comparing across slides, it is essential to account for sample loading, the total amount of protein printed per sample. Currently, total protein is estimated using either a housekeeping protein or the sample median across all slides. When the variability in sample loading is large, these methods are suboptimal because they do not account for the fact that the protein expression for each slide is estimated separately. RESULTS: We propose a new normalization method for RPPA data, called variable slope (VS) normalization, that takes into account that quantification of RPPA slides is performed separately. This method is better able to remove loading bias and recover true correlation structures between proteins. AVAILABILITY: Code to implement the method in the statistical package R and anonymized data are available at (http://bioinformatics.mdanderson.org/supplements.html).
SUMMARY: The Generic Genome Browser (GBrowse) is one of the most widely used tools for visualizing genomic features along a reference sequence. However, the installation and configuration of GBrowse is not trivial for biologists. We have developed a web server, WebGBrowse that allows users to upload genome annotation in the GFF3 format, configure the display of each genomic feature by simply using a web browser and visualize the configured genomic features with the integrated GBrowse software. AVAILABILITY: WebGBrowse is accessible via http://webgbrowse.cgb.indiana.edu/ and the system is also freely available for local installations.
ViaComplex is an open-source application that builds landscape maps of gene expression networks. The motivation for this software comes from two previous publications (Nucleic Acids Res., 35, 1859-1867, 2007; Nucleic Acids Res., 36, 6269-6283, 2008). The first article presents a network-based model of genome stability pathways where we defined a set of genes that characterizes each genetic system. In the second article we analyzed this model by projecting functional information from several experiments onto the gene network topology. In order to systematize the methods developed in these articles, ViaComplex provides tools that may help potential users to assess different high-throughput experiments in the context of six core genome maintenance mechanisms. This model illustrates how different gene networks can be analyzed by the same algorithm. AVAILABILITY: (http://lief.if.ufrgs.br/pub/biosoftwares/viacomplex).
DESCRIPTION: VARNA is a tool for the automated drawing, visualization and annotation of the secondary structure of RNA, designed as a companion software for web servers and databases. FEATURES: VARNA implements four drawing algorithms, supports input/output using the classic formats dbn, ct, bpseq and RNAML and exports the drawing as five picture formats, either pixel-based (JPEG, PNG) or vector-based (SVG, EPS and XFIG). It also allows manual modification and structural annotation of the resulting drawing using either an interactive point and click approach, within a web server or through command-line arguments. AVAILABILITY: VARNA is a free software, released under the terms of the GPLv3.0 license and available at http://varna.lri.fr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.