The Story So FarAround the turn of this century, scientists began exploring genomics (the study of genetic material) and biological computation. Traditional datasets suddenly exploded, and the term “Big Data” was born. Fast forward to 2017, and analyzing so much data and extracting insight and knowledge continues to be a very real problem. This challenge led to an entirely new discipline: High Performance Data Analytics (HPDA).
To be fair, the CERN folks, working on the Large Hadron Collider, were churning out 30 Petabytes of data per year even earlier. Their techniques for handling and analyzing that much data (e.g., data staging and refining data analytics) have considerably influenced numerous disciplines. In the case of data generated by genomic sequencers, the scale caught practitioners by surprise and caused them to adopt new methods to handle and analyze genomic data for genome assembly, gene expression, protein structure determination and protein to protein interaction. This involved building the high performance computing (HPC) infrastructure to address each stage of this data explosion. Research labs and academic institutions, as well as private companies, quickly assembled teams of specialists to merge traditional techniques with a newer computational approach.
Bioinformatics, an interdisciplinary field that uses supercomputers, HPC and other computational methods to manage biological data, sprang from the need to manage and make sense of this data. Genomics and a huge variety of other scientific areas of study still struggle with the problem of extracting information from data — and, ultimately, knowledge from the extracted information. The field has broadened significantly with the use of social media and smart devices. Social data mining has become an enormous business, fed by information mining and data fusion from various disparate sources to build up profiles.
We’re Going to Need a Bigger ComputerData analytics practitioners, when faced with extreme data size or complexity, now look to HPC to reuse well-established techniques that target parallelism and scalability to conquer problems at scales associated with grand challenges. These fundamental problems in science and engineering can’t yet be tackled, even with state-of-the-art computing tools, as they require enormous quantities of CPU time. Application of HPC techniques to tackle the solution of such problems will have considerable economic and scientific impact.
Many of Engility’s technically savvy customers are looking to us for a clearer understanding of how to apply data analytics, machine learning and artificial intelligence techniques to data of interest. Engility is helping to address these requirements by enabling the creation of data analytics platforms that combine the use of databases, parallel file systems AI/ML techniques, statistical programming languages and tools on an HPC platform with easy-to-use graphical user interfaces (GUIs). Engility also provides training to new users on HPDA awareness and applications, enabling them to break into this new field.
Now What?The crystal ball is slightly cloudy at this stage, and the HPC industry is now exploring how best to address HPDA. Right now, Engility is studying AI-based approaches that apply known techniques to scientific data analysis. These could automatically examine scientific data and contextually extract information from results of intense computations, ultimately synthesizing knowledge from the extracted information and guiding researchers and practitioners in designing better experiments or refining and significantly enhancing products. Engility’s Synthetic Analyst platform, when tied into HPC resources, could provide human-grade analytic workflows while dramatically lowering training, integration and other costs. The application potential for these tools is exciting, and it opens up a new world of possibilities.
Share this Post: