Faculty of Science, The Chinese University of Hong Kong (CUHK) - Dr. Fazel Famili (3 November 2009)

Home

Knowledge Discovery and Management in Life Sciences


Date: 3 November 2009 (Tuesday)
Time: 4:30pm - 5:30pm
Venue: ERB LT, William M. W. Mong Engineering Building (Engineering Building Complex Phase 2)
Speaker: Dr. Fazel Famili, Knowledge Discovery (KD) Group, Institute for Information Technology National Research Council of Canada, Ottawa, Canada

 

Abstract: Knowledge discovery has emerged as a fundamental solution in understanding the real value of large amounts of data that we collect. Particular examples are related to life sciences, physical systems (e.g. sensor-based systems) and financial domain. Of the more complex of these examples is the life sciences domain where one tries to integrate and analyze large amounts of high-throughput genomics and proteomics data obtained from either single time point or time-series applications. Similar to many other domains, in life sciences, various methods have also been developed, and many data mining tools (commercial, non-commercial) have been introduced. These applications have all contributed to: (i) identification of certain genes or proteins and their functions, (ii) gene response analysis in biological studies, such in-vitro, in-vivo or x-vivo, research and (iii) understanding the molecular mechanism of certain species and their associated biological pathways. This wealth of newly discovered and existing knowledge has prompted a question: what is the best way to properly manage all discovered knowledge, when it is validated. This question has also been one of the motivations behind several data mining research projects that we have initiated in the KD group. Here, in addition to searching for patterns in genomics and proteomics data, we have been working to identify proper ways to represent, structure, and distribute all forms of knowledge, most preferably taking an AI approach. This talk consists of two parts. In part one, we provide an overview of knowledge discovery focusing on life sciences and describe the main motivations for developing and applying knowledge discovery methods to analyze complex biological data. We also briefly describe a few of our case studies where we have analyzed high throughput biological data using unsupervised or supervised machine learning techniques. These are cases in which real biological data sets (obtained from public or private sources) have been analyzed and studied for tasks such as gene function identification and gene response analysis. In part two of this talk, we describe how discovered and validated knowledge could be structured into knowledge bases where it can be integrated with other forms of knowledge, for dissemination to multiple users. We conclude our talk with some lessons learned and the research directions that we are currently pursuing.