Molecular Medicine Israel

Studying the genetics of participation using footprints left on the ascertained genotypes

Abstract

The trait of participating in a genetic study probably has a genetic component. Identifying this component is difficult as we cannot compare genetic information of participants with nonparticipants directly, the latter being unavailable. Here, we show that alleles that are more common in participants than nonparticipants would be further enriched in genetic segments shared by two related participants. Genome-wide analysis was performed by comparing allele frequencies in shared and not-shared genetic segments of first-degree relative pairs of the UK Biobank. In nonoverlapping samples, a polygenic score constructed from that analysis is significantly associated with educational attainment, body mass index and being invited to a dietary study. The estimated correlation between the genetic components underlying participation in UK Biobank and educational attainment is estimated to be 36.6%—substantial but far from total. Taking participation behaviour into account would improve the analyses of the study data, including those of health traits.

Main

For all sample surveys, ascertainment bias, meaning that the sample is not representative of the target population, could lead to seriously misleading conclusions1,2. By its very nature, ascertainment bias usually cannot be evaluated based on the sample alone3. Typically, other variables (covariates) that have known distributions for both sample and population are needed for adjustments1,3. Such adjustments are inherently imperfect as the covariates are unlikely to fully capture the underlying bias1,3. For genetic studies, among participants of the primary study who have contributed DNA, further engagement in optional components of the study has been demonstrated to have associations with genotypes and phenotypes4,5,6,7. That, however, does not address the genotypic difference between the primary study participants and the target population. Thus, it is striking that one can investigate how the sampled genotypes are biased based on themselves alone. A recent study identified single nucleotide polymorphisms (SNPs) that had significant allele frequency differences between the sampled males and females, and proposed that those variants have differential participation effects for the sexes8. This approach, however, cannot identify variants that affect primary study participation of both sexes in a similar manner. Here, we show how to do so.

Results

Three allele comparisons

All individuals are genetically related to some degree. Furthermore, each individual has two copies of genetic segments on autosomal chromosomes, and some of these segments are identical by descent (IBD), that is inherited from a recent common ancestor, with genetic segments in a relative. Instead of comparing individuals, we compare genetic segments. The key idea is that an allele that has higher frequency in participants than nonparticipants would also have higher frequency in segments that are in two participants than in segments that are in only one. Following this observation, we present below three principles of genetic induced participation bias, and show how to use only the sampled genetic data to perform genome-wide association scans (GWAS) for study participation that capture only direct genetic effects9,10, and are unaffected by population stratification11.

First principle of genetic induced ascertainment bias

On average, between two ascertained individuals, genetic segments shared IBD, relative to segments that are not, are enriched with alleles that have positive direct effects on ascertainment probability. Figure 1a illustrates this general principle. With a large sample of individuals of the same ancestry, at a specific SNP locus, many pairs of individuals would share one long haplotype inherited IBD from a not-very-distant common ancestor. Each of such pairs has one distinct shared haplotype, and two distinct not-shared haplotypes. The SNP allele that promotes participation would tend to have a higher frequency in the shared than the not-shared haplotypes. The shared and not-shared alleles are considered as cases and controls, respectively, and matched as they are in the same individuals. Still, that does not remove potential confounding entirely as haplotypes driven to higher frequency through natural selection would also be shared by more individuals. Ascertainment bias is a form of selection and, to cleanly distinguish it from other forms of selection, requires more stringent matching of shared and not-shared haplotypes. That is achieved by using ascertained parent–offspring and sibling pairs (sib-pairs)…

Sign up for our Newsletter