GT-Scan2: Bringing Bioinformatics to Alibaba Cloud

本文涉及的产品
Serverless 应用引擎免费试用套餐包,4320000 CU,有效期3个月
函数计算FC,每月15万CU 3个月
简介: Learn how Alibaba Cloud powers the cutting-edge genome sequence search tool, GT-Scan2, with its suite of big data products and serverless computing platform.

CRISPR-Cas9 is a genome editing tool that is creating a buzz in the science world. It is faster, cheaper and more accurate than previous techniques for editing the genome of living cells. It hence has the potential to revolutionize a wide range of applications.

CRISPR-Cas9 has a lot of potential especially in the health space as it allows the treatment of medical conditions that have a genetic component, including cancer, hepatitis B or even high cholesterol. Clinical trials have already started for patients with specific blood and solid cancer types.

CRISPR-Cas9 is suitable for these applications because it can be programmed to recognize and edit specific locations in the genome by pattern-matching unique sequences of DNA. However, for robust application in the clinic, the efficiency of CRISPR-Cas9 needs to be increased as does the speed with which target sites can be designed.

Researchers in the eHealth program of the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia, developed GT-Scan2, a novel software tool to address both issues.

GT-Scan2 can help researchers find the most effective CRISPR/Cas9 targets in a genomic region by ranking targets by the predicted cutting efficiency. You can think of it as the "search-engine for the genome". GT-Scan2 will also report the number of potential off-targets for each target, where potential off-targets are other regions in the genome with 0-3 mismatches to the target.

  • Identifies optimal CRISPR-Cas9 targets in the human genome.
  • Combines information about the chromatin environment and sequence of the target site.

Architecture

A Web Application front end is used to access the GT Scan2 application and to submit the relevant jobs.

Image_1

When a user submits a job, GT-Scan2 inserts the job parameters as an item into a TableStore table via an API call. This allows the solution to be freely scalable without creating a bottleneck. The database entry triggers the first Function Compute function, which finds all putative CRISPR targets in the user-specified DNA sequence (fetched automatically upon user submission). Potential CRISPR target sites have fixed rules and can be easily found using a regular expression that completes in seconds and are inserted into a second TableStore table.

Image_2
GT-Scan2 is served directly from OSS making it a static web app without server-side processing. It retrieves the dynamic content (such as job results and parameters) via API calls using API Gateway from a NoSQL database (TableStore) using a JavaScript framework.

Applying Serverless Computing

All potential targets need to be evaluated for their off-target risk using the efficient string matching tool, Bowtie. Though Bowtie only requires a reduced representation of the 3 billion letter genomic sequence, the size of these index files still reaches 915 MB for the human genome. Even though Alibaba Cloud Function Compute supports temp spaces of this size, the implementation divides the genome into smaller blocks to enable parallel processing. For an average run, GT-Scan2 hence triggers 200-500 individual Function Compute functions, which simultaneously update the scores for the different putative targets in TableStore. During this process, the frontend is polling this table via API Gateway and updating the webpage as results come in, eliminating the need for server-side compute.

Alibaba Cloud Function Compute provides a framework to develop a future-ready software package that is able to support medical genome engineering applications. It has the ability to instantaneously scale at run time to the optimal capability by spawning the appropriate number of functions to cope with the varying complexity of different genes. Other benefits include only paying for the storage when no compute is triggered; jobs not competing with web server resources as the website is a static page with dynamic content being updated through Angular 2 and the API Gateway; as well as not needing to maintain compute instances (security patches of OS).

Improvements

GT-Scan deployment benefitted from the Alibaba Cloud specific architectural patterns and services. Some of them are listed below.

  • Uses asynchronous invoke method instead of queue based triggers. This allows shorter invoke times and removes the dependency on message queue.
  • Applies Batch read/write when accessing data from the NoSQL database, making IO more efficient.
  • GT Scan deployment streams all logs to Alibaba Cloud Log Service, which allows easier troubleshooting of issues with the workflow operations. Access to logs in a single ___location allows user to pin point issues easily without having to spend time on logging into server or individual service consoles.

Image_3

Automated Deployments

The open sourced Fun Tool (Fun with Serverless) will enable automated deployments of API Gateway and Function Compute resources making deployments of new GT Scan versions a breeze. The tool allows automated deployments of components defined in a simple YAML file.

What's Next?

Analytics

Leverage Alibaba Cloud's award winning big data platform to create a Machine Learning Pipeline will enable sophisticated analyses to be integrated in the application. This is of specific relevance for personalized health applications, which identify editing strategies for individual patients.

Image_4

Log Analysis

Alibaba Cloud Log Service allows exporting log files for future analysis leveraging Alibaba Cloud's big data platform of existing open sources analysis platforms available at CSIRO's disposal. The log file exports can then be plugged into an existing machine learning pipeline to learn from the usage patterns of the GT-Scan application.

Image_5

Ref

https://community.alibabacloud.com/blog/gt-scan2%253A-bringing-bioinformatics-to-alibaba-cloud_593841?spm=a2c65.11461537.0.0.62ef5355hBhpcO

相关实践学习
消息队列+Serverless+Tablestore:实现高弹性的电商订单系统
基于消息队列以及函数计算,快速部署一个高弹性的商品订单系统,能够应对抢购场景下的高并发情况。
阿里云表格存储使用教程
表格存储(Table Store)是构建在阿里云飞天分布式系统之上的分布式NoSQL数据存储服务,根据99.99%的高可用以及11个9的数据可靠性的标准设计。表格存储通过数据分片和负载均衡技术,实现数据规模与访问并发上的无缝扩展,提供海量结构化数据的存储和实时访问。 产品详情:https://www.aliyun.com/product/ots
目录
相关文章
|
8月前
|
运维 监控 Linux
服务器管理面板大盘点: 8款开源面板助你轻松管理Linux服务器
在数字化时代,服务器作为数据存储和计算的核心设备,其管理效率与安全性直接关系到业务的稳定性和可持续发展。随着技术的不断进步,开源社区涌现出众多服务器管理面板,这些工具以其强大的功能、灵活的配置和友好的用户界面,极大地简化了Linux服务器的管理工作。本文将详细介绍8款开源的服务器管理面板,包括Websoft9、宝塔、cPanel、1Panel等,旨在帮助运维人员更好地选择和使用这些工具,提升服务器管理效率。
|
10月前
|
Python
anaconda下载安装,镜像源配置修改及虚拟环境的创建
这篇文章介绍了Anaconda的下载安装过程,包括Anaconda的简介、安装步骤、配置修改、创建虚拟环境以及一些常用命令的使用方法。文章还提供了如何修改conda的镜像源为国内镜像源以加速下载的步骤。
anaconda下载安装,镜像源配置修改及虚拟环境的创建
|
11月前
|
小程序
|
数据可视化 SDN Python
复动力系统 | 混沌 | Lozi 映射吸引子的可视化与交互式探索
该文介绍了一篇关于Lozi映射吸引子可视化和交互式探索的文章。Lozi映射是混沌理论中的一个模型,展示非线性动力系统的复杂性。通过Python和matplotlib,作者实现了Lozi映射的可视化,并添加交互功能,允许用户缩放以详细观察混沌吸引子。文中还给出了Lozi映射的数学定义,并提供了Python代码示例,演示如何绘制和动态调整吸引子的显示。
|
Python
ERROR: Failed building wheel for osgeo
ERROR: Failed building wheel for osgeo
535 0
|
机器学习/深度学习 算法 数据挖掘
周志华《机器学习》课后习题(第九章):聚类
周志华《机器学习》课后习题(第九章):聚类
1337 0
周志华《机器学习》课后习题(第九章):聚类
|
存储 JavaScript 编译器
【TypeScript教程】# 3:TS的类型声明
【TypeScript教程】# 3:TS的类型声明
303 0
【TypeScript教程】# 3:TS的类型声明
|
弹性计算 容灾 安全
阿里云服务器购买流程(新手入门教程)
阿里云服务器购买流程(新手入门教程),选购云服务器有两个入口,一个是选择活动机,只需要选择云服务器地域、系统、带宽即可;另一个是在云服务器页面,自定义选择云服务器配置,这种方式购买云服务器较为复杂,需要选付费方式、地域及可用区、ECS实例规格、镜像、网络、公网IP、安全组等配置
阿里云服务器购买流程(新手入门教程)
|
SQL 关系型数据库 数据库
|
存储 数据采集 人工智能
数据中台的智能进化—阿里巴巴十二年数据平台发展历程
从2016年诞生起,“中台”概念就一路火热至今,对互联网与金融行业数字化转型产生了极为深远的影响。 作为“中台”概念的提出者和先行者,阿里巴巴用12年的实践探索了中台能力建设和数据应用。在不断升级和重构的过程中,阿里巴巴的中台建设经历了从分散的数据分析到数据中台化能力整合,再到全局数据智能化的时代。
10447 8
数据中台的智能进化—阿里巴巴十二年数据平台发展历程