![]() |
||
|
|
||
|
|
Career in Bioinformatics Part A: Mapping the genome Part B: Computing the genome
Key Bioinformatic Areas Seamless High Performance Computing: Megabases of DNA sequence being analyzed each day will strain the capacity of existing supercomputing centers. Interoperability between high-performance computing centers will be needed to provide the aggregate computing power, managed through the use of sophisticated resource management tools. The system must be fault-tolerant to machine and network failures so that no data or results are lost. Sequence Annotation: Computers can be used very effectively to indicate the location of genes and of regions that control the expression of genes and to discover relationships between each new sequence and other known sequences from many different organisms. The process is referred to as sequence annotation. Annotation (the elucidation and description of biologically relevant features in the sequence) is the essential prerequisite before the genome sequence data can be useful and the quality with which annotation is done will directly affect the value of the sequence. Simulation: The process involves using known information about a system along with a mathematical or physiochemical model to simulate properties of the system. The category is incredibly diverse from simulating the motion of interacting protein molecules to modelling the flow of chemicals through biochemical pathways. Data Mining and Information Retrieval: Methods are needed to locate and retrieve information relevant to newly discovered genes. If similar genes or proteins are discovered through sequence comparison, often experiments have been performed on one or more homologues that can provide insight into the newly discovered gene or protein. Relevant information is contained in more than 100 databases scattered throughout the world, including DNA and protein sequence databases, genome mapping databases, metabolic pathway databases, gene expression databases, gene function and phenotype databases, and protein structure data-bases. This data can provide insight into a gene’s biochemical or whole organism function, pattern of expression in tissues, protein structure type or class, functional family, metabolic role, and potential relationship to disease phenotypes. Data Warehousing: The information retrieved by intelligent agents or calculated by the analysis system must be collected and stored in a local repository from which it can be retrieved and used in further analysis processes, seen by researchers, or downloaded into community databases. Numerous data of many types need to be stored and managed in such a way that descriptions of genomic regions and links to external data can be maintained and updated continually. In addition, large volumes of data in the warehouse must be accessible to the analysis systems running at multiple sites at a moment’s notice. Visualization for Data and Collaboration: The sheer volume and complexity of the analyzed information and links to data in many remote databases require advanced data visualization methods to allow user access to the data. Users need to interface with the raw sequence data; the analysis process; and the resulting synthesis of gene models, features, patterns, genome map data, anatomical or disease phenotypes; and other relevant data. In addition, collaborations among multiple sites are required for most large genome analysis problems, Even more complex and hierarchical displays are needed that that will be able to zoom in from each chromosome to see the chromosome fragments (or clones) that have been sequenced and then display the genes and other functional features at the sequence level. Linked (or hyperlinked) to each feature will be detailed information about its properties, the computational or experimental methods used for its characterization, and further links to many remote databases that contain additional information. Analysis processes and intelligent retrieval agents will provide the feature details available in the interface and dynamically construct links to remote data. Parallel Algorithms for Sequence Analysis: The recognition of important features in a sequence, such as genes, must be highly automated to eliminate the need for time-consuming manual gene model building. Five distinct types of algorithms (pattern recognition, statistical measurement, sequence comparison, gene modeling, and data mining) must be combined into a coordinated toolkit to synthesize the complete analysis. One of the key types of algorithms needed is pattern recognition. Methods need to be designed to detect the subtle statistical patterns characteristic of biologically important sequence features, such as genes or gene regulatory regions. DNA sequences are remarkably difficult to interpret through visual examination. Key skills: Knowledge of relational databases like Oracle and Sybase, ability to work comfortably in a command line scripting environment and knowledge of programming languages such and C, C++ and a scripting language such as Perl are fundamental key skills. A detailed list of skills needed for various posts are listed below Software Engineer Informatics: If you want to be a software engineer (informatics) you must possess knowledge of relational databases like Oracle Sybase or SQL Strong object related design and development skills in Java or C++ would be of great help. Software Engineer Bioinformatics: Strong object oriented design and skills in Java C, C++ along with knowledge of Oracle PL/SQL. XML middleware or application servers are a plus. Support Engineers: Here again strong object oriented design and skills in Java C, C++ along with knowledge of Oracle PL/SQL are needed. XML middle ware or application servers are a plus. Quality Engineers: Familiarity with sequence analysis tools such as BLAST, FAST A is desired. Other desired skills are Perl and Shell programming. Oracle SQL and Unix computing. Programmer Analyst: Knowledge of Unix operating environment and database management system like SQL, Sybase and Oracle is a plus. Knowledge of user application software such as PC database packages spreadsheets and word processing programs would also be helpful.
Email this article | Respond to this article ---------------------------------------------------------------------------------------------------------
|