To summarize, in this chapter we have seen that:
- In clustering, we identify structure in unlabeled data. For example, we might use clustering to identify groups of genes that display similar expression profiles.
- – Partitioning clustering algorithms, construct non-overlapping clusters such that each item is assigned to exactly one cluster. Example: k-means
- – Agglomerative clustering algorithms construct a hierarchical set of nested clusters, indicating the relatedness between clusters. Example: hierarchical clustering
- – By using clustering algorithms, we can reveal hidden structure of a gene expression matrix, which gives us valuable clues for understanding the mechanism of complicated diseases and categorizing different diseases
- In classification, we partition data into known labels. For example, we might construct a classifier to partition a set of tumor samples into those likely to respond to a given drug and those unlikely to respond to a given drug based on their gene expression profiles. We will focus on classification in the next chapter.
 J.Z. Huang, M.K. Ng, Hongqiang Rong, and Zichen Li. Automated variable weighting in k-means type clustering. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27(5):657 –668, may 2005.
 Christopher A. Maher, Chandan Kumar-Sinha, Xuhong Cao, Shanker Kalyana-Sundaram, Bo Han, Xiao- jun Jing, Lee Sam, Terrence Barrette, Nallasivam Palanisamy, and Arul M. Chinnaiyan. Transcriptome sequencing to detect gene fusions in cancer. Nature, 458(7234):97–101, Mar 05 2009.