你好,游客 登录 注册 发布搜索
背景:
阅读新闻

[PDF]Simple and scalable scripting for large sequencing data sets in Hadoop

[日期:2014-02-03] 来源:  作者: [字体: ]

Simple and scalable scripting for large sequencing data sets in Hadoop

Andre´ Schumacher Luca Pireddu  Matti Niemenmaa Aleksi Kallio Eija Korpelainen Gianluigi Zanetti and Keijo Heljanko

Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig’s scalability over many computing nodes and illustrate its use with example scripts.

Simple and scalable scripting for large sequencing data sets in Hadoop

 

收藏 推荐 打印 | 录入:574107552 | 阅读:
相关新闻       Hadoop 
本文评论   查看全部评论 (0)
表情: 表情 姓名: 字数
点评:
       
评论声明
  • 尊重网上道德,遵守中华人民共和国的各项有关法律法规
  • 承担一切因您的行为而直接或间接导致的民事或刑事法律责任
  • 本站管理人员有权保留或删除其管辖留言中的任意内容
  • 本站有权在网站内转载或引用您的评论
  • 参与本评论即表明您已经阅读并接受上述条款