Fundamentals of Data Mining in Genomics and Proteomics [Dubitzky, Granzow & Berrar 2006-12-19](1).pdf

(20490 KB) Pobierz
992361905.002.png
FUNDAMENTALS OF DATA MINING IN
GENOMICS AND PROTEOMICS
992361905.003.png
FUNDAMENTALS OF DATA MINING IN
GENOMICS AND PROTEOMICS
Edited by
Werner Dubitzky
University of Ulster, Coleraine, Northern Ireland
Martin Granzow
Quantiom Bioinformatics GrmbH & Co. KG, Weingarten/Baden, Germany
Daniel Berrar
University of Ulster, Coleraine, Northern Ireland
Springer
992361905.004.png
Library of Congress Control Number: 2006934109
ISBN-13: 978-0-387-47508-0
e-ISBN-13: 978-0-387-47509-7
ISBN-10: 0-387-47508-7
e-ISBN-10: 0-387-47509-5
Printed on acid-free paper.
© 2007 Springer Science+Business Media, LLC
All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street,
New York, NY 10013, USA), except for brief excerpts in coimection with reviews or scholarly
analysis. Use in cotmection with any form of information storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed is forbidden.
The use in this publication of trade names, trademarks, service marks and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or not
they are subject to proprietary rights.
987654321
springer.com
992361905.005.png
Preface
As natural phenomena are being probed and mapped in ever-greater detail,
scientists in genomics and proteomics are facing an exponentially growing vol-
ume of increasingly complex-structured data, information, and knowledge. Ex-
amples include data from microarray gene expression experiments, bead-based
and microfluidic technologies, and advanced high-throughput mass spectrom-
etry. A fundamental challenge for life scientists is to explore, analyze, and
interpret this information effectively and efficiently. To address this challenge,
traditional statistical methods are being complemented by methods from data
mining, machine learning and artificial intelligence, visualization techniques,
and emerging technologies such as Web services and grid computing.
There exists a broad consensus that sophisticated methods and tools from
statistics and data mining are required to address the growing data analysis
and interpretation needs in the life sciences. However, there is also a great deal
of confusion about the arsenal of available techniques and how these should
be used to solve concrete analysis problems. Partly this confusion is due to
a lack of mutual understanding caused by the different concepts, languages,
methodologies, and practices prevailing within the different disciplines.
A typical scenario from pharmaceutical research should illustrate some of
the issues. A molecular biologist conducts nearly one hundred experiments
examining the toxic effect of certain compounds on cultured cells using a
microarray gene expression platform. The experiments include different com-
pounds and doses and involves nearly 20 000 genes. After the experiments are
completed, the biologist presents the data to the bioinformatics department
and briefly explains what kind of questions the data is supposed to answer.
Two days later the biologist receives the results which describe the output of
a cluster analysis separating the genes into groups of activity and dose. While
the groups seem to show interesting relationships, they do not directly address
the questions the biologist has in mind. Also, the data sheet accompanying
the results shows the original data but in a different order and somehow trans-
formed. Discussing this with the bioinformatician again it turns out that what
992361905.001.png
Zgłoś jeśli naruszono regulamin