Assessing Copy Number Variation Using Genome-Wide Alignments

Abstract: The sequencing of entire mammalian genomes is a major achievement that is expected to have profound implications for biology and medicine. Many modern high-throughput experimental methods, such as optical mapping and short-read sequencing, rely on genome-wide alignments to such a reference sequence. However, this implicitly assumes that the genome of the individual being studied is largely identical to the reference genome, which is not always true. One important type of difference is copy number variation, whereby relatively large segments of the genome have a lower or higher number of copies than expected. Copy number alterations have been implicated in diseases such as cancer. Copy number variation can also be a nuisance factor when comparing data from two or more individuals. In this talk, I will outline a method that uses alignments to a reference genome to study copy number variations. We frame the problem as a comparison of two non-homogeneous Poisson processes, where changes in copy number are equivalent to changes in the relative intensities of the processes, and use a hidden Markov model to detect such changes.