Partial Least Squares Correspondence Analysis: A Framework to Simultaneously Analyze Behavioral and Genetic Data
Hervé Abdi, Derek Beaton, Joseph Dunlop, "Partial Least Squares Correspondence Analysis: A Framework to Simultaneously Analyze Behavioral and Genetic Data", in Psychological Methods, vol. 21(4), déc. 2016, p. 621-651.
For nearly a century, detecting the genetic contributions to cognitive and behavioral phenomena has been a core interest for psychological research. Recently, this interest has been reinvigorated by the availability of genotyping technologies (e.g., microarrays) that provide new genetic data, such as single nucleotide polymorphisms (SNPs). These SNPs—which represent pairs of nucleotide letters (e.g., AA, AG, or GG) found at specific positions on human chromosomes—are best considered as categorical variables, but this coding scheme can make difficult the multivariate analysis of their relationships with behavioral measurements, because most multivariate techniques developed for the analysis between sets of variables are designed for quantitative variables. To palliate this problem, we present a generalization of partial least squares—a technique used to extract the information common to 2 different data tables measured on the same observations—called partial least squares correspondence analysis—that is specifically tailored for the analysis of categorical and mixed (“heterogeneous”) data types. Here, we formally define and illustrate—in a tutorial format—how partial least squares correspondence analysis extends to various types of data and design problems that are particularly relevant for psychological research that include genetic data. We illustrate partial least squares correspondence analysis with genetic, behavioral, and neuroimaging data from the Alzheimer’s Disease Neuroimaging Initiative. R code is available on the Comprehensive R Archive Network and via the authors’ websites.