The creation and understanding of the binary code was the cornerstone of the development of the modern digital technology age. The language of 0’s and 1’s opened the door to many new technologies including the internet, modern media platforms, telecommunications, smartphone applications and more. For Life Sciences, something similar happened when we unlocked the genetic code. The language of A’s, T’s, C’s and G’s could explain all life forms and led to an explosion in the development of products and services in the areas of pharmaceuticals, healthcare, and agriculture.
In 1976, a single-stranded RNA virus named “Bacteriophage MS2” that infects E coli became the first genome with 3569 bases to be completely sequenced. Then in 1995, the Institute for Genomic Research performed the completed genome sequencing of “Haemophilus influenza” – the first full genome sequencing of a living organism with 1.8M base pairs (Mb). But when the Human Genome Project sequenced the entire human genome with 3.2 billion base pairs in 2001, it opened a whole new world of possibilities.
Demands on IT divisions aka “The NGS Data Challenge”
The field of Life Sciences continues to remain at the cutting edge of scientific research with improvements in technologies such as next generation sequencing (NGS), proteomics, etc. NGS instrument such as the Illumina HiSeq X Ten Systems can now sequence over 18,000 human genomes per year which is unbelievable when you consider that it took more than a decade to sequence the entire human genome as part of the Human Genome Project. And, keep in mind each human genome needs about 100 GB of storage, so an Illumina HiSeq X Ten Systems needs about 1.8 PB of storage per year. The improvements, particularly in NGS technology and the growing applications of NGS means that more and more organizations are using NGS services and generating data at a rate as never before. And, in this day and age of flattening IT budgets, the IT departments in these organizations cannot keep up with the pace of data generation.
For most organizations performing NGS, the IT department has to deliver on the following fronts:
- Meet the performance and capacity needs: Most Life Sciences workflows containing NGS include data generation, analysis, and archiving stages. Each of these stages has unique capacity and performance needs. Companies performing NGS usually require sequencing to run 24/7 and it is left to the IT department to ensure that the right production performance and storage tiers are available as needed.
- Plan for the future: For IT departments with a longer IT procurement cycle, it is a struggle to ensure that the IT purchase process keeps up with the storage needs of the exponentially growing data produced from NGS processes. IT divisions have to ensure that they plan for today as well as the future to meet the demanding customer SLAs.
- Cost efficient solutions: IT budget growth typically is nowhere close to the rate of the data growth. This means that IT departments have to ensure that they are smart and cost efficient with their storage solutions.
Isilon, a trusted partner for Life Sciences organizations – now and in the future
EMC Isilon is a leader and trusted partner for hundreds of Life Sciences organizations worldwide including leading genome centers, pharmaceutical companies, and academic research centers. Isilon provides a highly available, and reliable, single file system, OneFS, for Life Sciences workflows and consolidates all the workflows into a single, scalable volume for the ease of management. Isilon can host multiple types of nodes – S, X, NL and HD – in a single cluster. S and X nodes are ideal for high performance workflows such as mapping and alignment of NGS data while NL and HD nodes provide high-density, low cost storage of raw data and analysis results for archive needs. Isilon SmartFail and Auto Balance features also ensure that data is protected across the entire cluster.
Isilon gets better
Recently EMC Isilon announced Data Lake 2.0 – including OneFS 8.0, Isilon SD Edge and Isilon CloudPools – to extend the data lake from the data center to edge locations, including remote and branch offices and to both public and private cloud.
The Data Lake 2.0 has massive benefits for Life Science organizations using Isilon. With OneFS 8.0 organizations can now achieve non-disruptive upgrade and rollback capabilities to ensure continuous operations at all times. IsilonSD Edge is a software-defined storage solution that can be deployed on commodity hardware on remote research facilities. This can provide Life Science organizations with the ease of management to connect geographically dispersed research facilities in a cost efficient manner.
Isilon CloudPools is a game changer for organizations performing NGS, providing a cost efficient way of archiving data allowing the IT department to make the most of their budget. Isilon CloudPools offers a transparent, policy-based, automated storage tiering of data from an on-premise Isilon cluster to either a private cloud based on EMC Elastic Cloud Storage (ECS), another Isilon cluster or into a public cloud like AWS or Azure for long term storage of Life Science research data and other archiving needs. By moving the cold archive data to either a public or private cloud, CloudPools helps the IT departments to free up space on production environments and ensure that the NGS analysis pipelines are not affected due to a lack of high performance storage. This also allows IT departments to deliver on customer SLAs while at the same time, reducing costs.
Improve Life Sciences Data Management
We understand the need of Life Science organizations for a versatile storage solution to balance their capacity and performance requirements. That’s why we continue to evolve our Isilon platform to meet the needs of our Life Science customers. New innovative features like EMC Isilon OneFS 8.0, IsilonSD Edge and Isilon CloudPools makes sure that our Life Sciences customers have a proven solution to run NGS non-stop and achieve faster time to insights. With EMC Isilon, Life Sciences organizations have a trusted partner to handle their NGS workloads not only today but in the future too.
For more information on how EMC Emerging Technology Division can help your Life Science organization click here.
CloudPools, Isilon, IsilonSD Edge, Life Sciences, NGS, OneFS