LEARNING DISCRIMINATIVE REPRESENTATIONS FOR CHROMOSOME CLASSIFICATION WITH SMALL DATASETS

Abstract

Chromosome classification is crucial for karyotype analysis in cytogenetics. Karyotype analysis is a fundamental approach for clinical cytogeneticists to identify numerical and structural chromosomal abnormalities. However, classifying chromosomes accurately and robustly in clinical application is still challenging due to: 1) rich deformations of chromosome shape, 2) similarity of chromosomes, and 3) imbalanced and insufficient labelled dataset. This paper proposes a novel pipeline for the automatic classification of chromosomes. Unlike existing methods, our approach is primarily based on learning meaningful data representations rather than only finding classification features in given samples. The proposed pipeline comprises three stages: The first stage extracts meaningful visual features of chromosomes by utilizing ResNet with triplet loss. The second stage optimizes features from stage one to obtain a linear discriminative representation via maximal coding rate reduction. It ensures the clusters representing different chromosome types are far away from each other while embeddings of the same type are close to each other in the cluster. The third stage is to identify chromosomes. Based on the meaningful feature representation learned in the previous stage, traditional machine learning algorithms such as SVM are adequate for the classification task. Evaluation results on a publicly available dataset show that our method achieves 97.22% accuracy and is better than state-of-the-art methods.

1. INTRODUCTION

Human chromosome classification is crucial for karyotype analysis in cytogenetics. Karyotype analysis is a fundamental approach for clinical cytogeneticists to identify numerical and structural chromosomal abnormalities, such as Turner syndrome, Chronic myelogenous leukaemia, Edwards syndrome, and Down syndrome (Stebbins & Ledyard., 1950; Sharma et al., 2017) . In clinical practice, karyotyping requires the preparation of a complete set of micro-photographed metaphase chromosomes in the cells, or more precisely, a karyogram (Figure 1 ). To do so, the cytogeneticists need to classify and sort these chromosomes into 23 pairs of chromosomes, including 22 pairs of autosomes and a pair of sex chromosomes (X and Y chromosomes in male cells and double X in female cells) (Jindal et al., 2017) . 



Figure 1: (a) A G-stained microscopic image of male chromosomes for one case. (b) The karyogram of (a).

