Introduction

Cognitive functioning is positively associated with greater longevity and less physical and psychiatric morbidity, and negatively associated with many quantitative disease risk factors and indices.1 Some specific associations between cognitive functions and health appear to arise because an illness has lowered prior levels of cognitive function.2, 3 For others, the direction of causation appears to be the reverse: there are many examples of associations between lower cognitive functions in youth, even childhood, and higher risk of later mental and physical illness and earlier death.4, 5, 6, 7, 8 In some cases, it is not clear whether illness affects cognitive functioning or vice versa, or whether both are influenced by some common factors. Many examples of these phenotypic and cognitive–illness associations are shown in Supplementary Table 1. Overall, the causes of these cognitive–health associations remain unknown and warrant further investigation. It is also well recognized that lower educational attainment is associated with adverse health outcomes,9 and educational attainment has been used as a successful proxy for cognitive ability in genetic research.10, 11 A study that included three cohorts of twins indicated that the association between higher cognitive function and increased lifespan was mostly owing to common genetic effects.12

The associations between cognitive and health and illness variables may, in part, reflect shared genetic influences. Cognitive functions show moderate heritability,13 and so do many physical and mental illnesses and health-associated anthropometric measures.14 Therefore, researchers have begun to examine pleiotropy between scores on tests of cognitive ability and health-related variables.15, 16 Pleiotropy is the overlap between the genetic architecture of two or more traits, perhaps owing to a variety of shared causal pathways.17 Originally, the possibility of pleiotropy in cognitive–health associations was tested using family- and twin-based designs.18 However, now data from single-nucleotide polymorphism (SNP) genotyping can assess pleiotropy, opening the possibility for larger-scale, population-generalizable studies.

Multiple methods can be used to test for pleiotropy using SNP-based genetic data. Calculating genetic correlations between health measures using the summary results of previous genome-wide association studies (GWAS) has become possible using the method of linkage disequilibrium (LD) score regression.19 In addition, the method of polygenic profile scoring20 also uses summary GWAS data to test whether genetic liability to a given illness or health-related anthropometric measurement is associated with phenotypes such as cognitive test scores in a second independent data set. For example, lower cognitive functioning was associated significantly in healthy older people with higher polygenic risk for schizophrenia21 and stroke,22 but not for dementia.23 However, most polygenic profile studies to date have been limited in the information they provide on this important topic: they have reported on single health outcomes, and have used relatively small cohorts with available cognitive data. Two recent papers, using polygenic profile scoring and/or LD score regression, have identified pleiotropy between a number of health-related traits and diseases, and educational attainment15 and between a number of psychiatric disorders and cognitive traits, and behavioural traits.16

We aimed to discover whether cognitive functioning is associated with many physical and mental health and health-related anthropometric measurements, in part, because of their shared genetic aetiology using the recently released UK Biobank genetic data (http://www.ukbiobank.ac.uk).24 We curated GWAS meta-analyses for 24 health-related measures, and used them in two complementary methods to test for cognitive–health pleiotropy. First, we used LD score regression to derive genetic correlations between cognitive function and educational attainment traits measured in UK Biobank, and 24 health-related measures from the GWAS meta-analyses. Second, to provide a measure of the phenotypic variance that these genetic correlations account for, the summary data from GWAS meta-analyses were used to calculate polygenic profile scores in UK Biobank, which includes cognitive, educational and genome-wide SNP data on over 110 000 individuals. We calculated the associations between polygenic profile scores for the 24 health-related measures and the cognitive domains of memory, processing speed and verbal-numerical reasoning, and educational attainment in UK Biobank participants. These new data and results provide a substantial advance in understanding the aetiology of cognitive–health associations.

Materials and methods

Study design and participants

This study includes baseline data from the UK Biobank Study (http://www.ukbiobank.ac.uk).24 UK Biobank received ethical approval from the Research Ethics Committee. The Research Ethics Committee reference for UK Biobank is 11/NW/0382. UK Biobank is a health resource for researchers that aims to improve the prevention, diagnosis and treatment of a range of illnesses. A total 502 655 community-dwelling participants aged between 37 and 73 years were recruited between 2006 and 2010 in the United Kingdom. They underwent cognitive and physical assessments, provided blood, urine and saliva samples for future analysis, gave detailed information about their backgrounds and lifestyles, and agreed to have their health followed longitudinally. For the present study, genome-wide genotyping data were available on 112 151 individuals (58 914 females) aged 40–73 years (mean age=56.9 years, s.d.=7.9) after the quality control process (see below).

Procedures

Cognitive phenotypes

Three cognitive tests were used in the present study. These tests, which cover three important cognitive domains, were reaction time (n=496 891; of whom 111 484 also had genotyping data), memory (n=498 486; 112 067 with genotyping) and verbal-numerical reasoning (n=180 919; 36 035 with genotyping). The reaction time test was a computerized ‘Snap’ game, in which participants were to press a button as quickly as possible when two ‘cards’ on screen were matching. There were eight experimental trials, with a Cronbach α reliability of 0.85. In the memory test, participants were shown a set of 12 cards (six pairs) on a computer screen for 5 s, and had to recall which were matching after the cards had been obscured. We used the number of errors in this task as the (inverse) measure of memory ability. The verbal-numerical reasoning task involved a series of 13 items assessing verbal and arithmetical deduction (Cronbach α reliability=0.62). A total 492 513 participants also reported whether or not they had a college or university degree (henceforth referred to as ‘educational attainment’), 111 114 of whom had genotyping data. Full details on the content and administration of each test (and the educational attainment question) are provided in the Supplementary Materials. Supplementary Figure 1 shows that the broad age distribution of the cognitive tests did not differ between the full and genotyped samples.

Genotyping and quality control

A total 152 729 UK Biobank blood samples were genotyped using either the UK BiLEVE array (N=49 979)25 or the UK Biobank axiom array (N=102 750). Details of the array design, genotyping, quality control and imputation are available in a publication25 and in the Supplementary Materials. Quality control was performed by Affymetrix, the Wellcome Trust Centre for Human Genetics, and by the present authors; this included removal of participants based on missingness, relatedness, gender mismatch, non-British ancestry and other criteria, and is described in the Supplementary Materials.

GWAS analyses in the UK Biobank sample

GWAS analyses were performed on the three UK Biobank cognitive test scores and on the Educational Attainment data to use the summary results for LD score regression. Details of the GWAS procedures are provided in the Supplementary Materials.

Curation of summary results from GWAS consortia on health-related variables

To conduct LD score regression and polygenic profile score analyses between the UK Biobank cognitive data and the genetic predisposition to a large number of health-related variables, selected because of previous associations with cognitive functions (Supplementary Table 1), we gathered 24 sets of summary results from international GWAS consortia. Details of the health-related variables, the consortia’s websites, key references and number of subjects included in each consortia’s GWAS are given in Supplementary Table 2.

Statistical analysis

Computing genetic associations between health and cognitive variables

We used two methods to compute genetic associations between health variables from 24 GWAS consortia, and cognitive and educational attainment variables measured in UK Biobank: LD score regression and polygenic profile scoring. Each provides a different metric to infer the existence of pleiotropy between pairs of traits. LD score regression was used to derive genetic correlations to determine the degree to which the polygenic architecture of a trait overlaps with that of another. Next, the polygenic profile score method was used to test the extent to which these genetic correlations are predictive of phenotypic variance across samples. Both LD score regression and polygenic profile scores are dependent on the traits analysed being highly polygenic in nature, that is, where a large number of variants of small effect contribute toward phenotypic variation.19, 20 This was tested for each trait before running the LD score regression and polygenic profile analyses.

LD score regression

To quantify the level of pleiotropy between the traits assessed here, LD score regression was used.15, 19 LD score regression is a class of techniques that exploits the correlational structure of the SNPs found across the genome. In the Supplementary Materials, we provide more details of LD score regression. Here, we use LD score regression to derive genetic correlations between health-related and cognitive traits using 24 large GWAS consortia data sets that enable pleiotropy of their health-related traits to be quantified with the cognitive traits in UK Biobank. We followed the data processing pipeline devised by Bulik-Sullivan et al.,19 described in more detail in the Supplementary Materials. To ensure that the genetic correlation for the Alzheimer’s disease phenotype was not driven by a single locus or biased the fit of the regression model, a 500 kb region centred on the APOE locus was removed and this phenotype was re-run. This additional model is referred to in the tables and figures below as ‘Alzheimer’s disease (500 kb)’.

Polygenic profiling

The UK Biobank genotyping data required recoding from numeric (1, 2) allele coding to standard ACGT format before being used in polygenic profile scoring analyses. This was achieved using a bespoke programme developed by one of the present authors (DCML), details of which are provided in the Supplementary Materials.

Polygenic profiles were created for 24 health-related phenotypes (see Table 3, and Supplementary Table 1) in all genotyped participants using PRSice.26 PRSice calculates the sum of alleles associated with the phenotype of interest across many genetic loci, weighted by their effect sizes estimated from a GWAS of that phenotype in an independent sample. Before creating the scores, SNPs with a minor allele frequency <0.01 were removed and clumping was used to obtain SNPs in linkage equilibrium with an r2<0.25 within a 200 bp window. Multiple scores were then created for each phenotype containing SNPs selected according to the significance of their association with the phenotype. The GWAS summary data for each of the 24 health-related phenotypes were used to create five polygenic profiles in the UK Biobank participants, at thresholds of P<0.01, P<0.05, P<0.1, P<0.5 and all SNPs.

Correlation coefficients were calculated between each of the UK Biobank cognitive phenotypes. The associations between the polygenic profiles and the target phenotype were examined in regression models (linear regression for the continuous cognitive traits and logistic regression for the binary education variable), adjusting for age at measurement, sex, genotyping batch and array, assessment centre and the first 10 genetic principal components to adjust for population stratification. We corrected for multiple testing across all polygenic profile scores at all significance thresholds for associations with all cognitive phenotypes (470 tests) using the false discovery rate method.27 We conducted sensitivity analyses as follows. Where the original findings were false discovery rate significant, UK Biobank participants with cardiovascular disease (N=5300) were then removed from analyses of coronary artery disease, those with diabetes (N=5800) were removed from type 2 diabetes analyses, and those with hypertension (N=26 912) were removed from systolic blood pressure analyses. See Supplementary Materials for further details of these sensitivity analysis. Four multivariate regression models were then performed, including all 24 polygenic profile scores and the covariates described above.

Results

Phenotypic and genetic associations among the UK Biobank’s cognitive traits

In addition to descriptive statistics, Table 1 shows phenotypic correlations and genetic correlations (calculated using LD score regression) among the cognitive traits in UK Biobank. There were modest correlations between the three cognitive test scores; those who did well on one test tended to do well on the other two. Verbal-numerical reasoning showed the highest phenotypic correlation with educational attainment (r=0.30). The strongest genetic correlation was also between verbal-numerical reasoning and educational attainment (rg=0.79). All but one of the other genetic correlations were statistically significant: there was a nonsignificant genetic correlation between educational attainment and reaction time. These results show that different cognitive traits and educational attainment correlate, in part, due to overlapping genetic architecture.

Table 1 Descriptive statistics and phenotypic (below diagonal) and genetic (above diagonal) correlations for the UK Biobank cognitive and educational variables in all genotyped participants

Cognitive–health pleiotropy: overview

To test for pleiotropy between cognitive and health traits, we present the LD score regression (Table 2, Figure 1) and polygenic profile score (Table 3, Supplementary Tables 4a and d) results for the four UK Biobank cognitive function and educational attainment traits, and the 24 health-related traits from GWAS consortia. Results are false discovery rate-corrected for multiple comparisons.

Table 2 Genetic correlations between the cognitive and education phenotypes documented in the UK Biobank data set and the health-related variables collected from GWAS consortia
Figure 1
figure 1

Heat map of genetic correlations calculated using LD regression between cognitive phenotypes in UK Biobank and health-related variables from GWAS consortia. Hues and colours depict, respectively, the strength and direction of the genetic correlation between the cognitive phenotypes in UK Biobank and the health-related variables. Red and blue indicate positive and negative correlations, respectively. Correlations with the darker shade associated with a stronger association. Based on results in Table 2. ADHD, attention deficit hyperactivity disorder; FEV1, forced expiratory volume in 1 s; GWAS, genome-wide association study; LD, linkage disequilibrium; NA, not available.

PowerPoint slide

Table 3 Associations between polygenic profiles of health-related traits created from GWAS consortia summary data, and UK Biobank cognitive and education phenotypes controlling for age, sex, assessment centre, genotyping batch and array and 10 principal components for population structure

In overview, using LD score regression results, educational attainment showed significant genetic correlations with 14 of the 22 health-related traits, verbal-numerical reasoning had significant genetic correlations with 10 of the 24, and memory and reaction time both correlated significantly with 3 of the 24 (Table 2, Figure 1).

In the polygenic profile score results, using the best SNP threshold of the five that were created, summary GWAS data from 19 of 22 consortia predicted significant phenotypic variation in educational attainment in the UK Biobank sample (Table 3). For verbal-numerical reasoning, results from 15 of the 24 consortia predicted significant phenotypic variation. The numbers for memory and reaction time were seven and six, respectively. The numbers of SNPs included in each polygenic threshold score for each of the 24 health-related traits are shown in Supplementary Table 3. The fuller results relating all five of the SNP thresholds for all the 24 health-associated variables with the UK Biobank cognitive phenotypes are shown in Supplementary Tables 4a–d.

Cognitive–health pleiotropy: brain-cognitive traits

Cognitive and brain traits provided a first check before moving to more mainstream health measures whose phenotypes appear more distant from cognitive function. In LD score regression analyses, we expected significant genetic correlations between GWAS consortia results from cognitive- and brain-related traits and the UK Biobank GWAS results for cognitive variables. In polygenic risk score analyses, we expected the polygenic scores based on the health-related GWAS consortia’s results to predict phenotypic variation in UK Biobank’s cognitive traits.

Using LD score regression, genetic correlations of greater than 0.9 were found between UK Biobank’s Education Attainment variable and GWAS consortia results from childhood cognitive ability, college degree attainment and years of education (Table 2, Figure 1). The genetic correlation between UK Biobank’s Educational Attainment variable and ENIGMA’s intracranial volume was 0.44 and with Early Growth Genetics Consortium’s infant head circumference was 0.25. Verbal-numerical reasoning in UK Biobank had a genetic correlation of 0.81 with childhood cognitive ability, of 0.75 with attaining a college degree (Social Science Genetic Association Consortium), 0.72 with years of education (Social Science Genetic Association Consortium), and 0.25 with intracranial volume. These results demonstrate substantial shared genetic aetiology between brain size, cognitive ability and educational attainment. Reaction time and memory did not have significant genetic correlations with the brain, cognitive or educational variables.

In summarizing polygenic profile analyses’ results in the text, we shall use standardized betas (β). Polygenic profiles for higher childhood cognitive ability and a higher level of educational attainment (Social Science Genetic Association Consortium) were significantly associated with increased likelihood of a college degree (β between 0.12 and 0.28), higher scores on verbal-numerical reasoning (β between 0.08 and 0.11), and faster reaction time (β about 0.001). Only childhood cognitive ability was significantly associated with memory (β=0.01). Polygenic profiles for greater intracranial volume and greater infant head circumference were significantly associated with increased likelihood of a college degree (β=0.04 and 0.05, respectively), and higher scores on verbal-numerical reasoning (both β=0.02).

Cognitive–health pleiotropy: neuropsychiatric disorders

Using LD score regression, there were negative genetic correlations between Alzheimer’s disease and UK Biobank’s educational attainment variable and verbal-numerical reasoning (rg between −0.27 and −0.39; Table 2, Figure 1). Autism had positive genetic correlations with the same two Biobank variables: 0.34 for educational attainment and 0.19 for verbal-numerical reasoning. Bipolar disorder and schizophrenia showed a pattern of having a positive genetic correlation with educational attainment (0.22 and 0.13, respectively) and a negative genetic correlation with verbal-numerical reasoning (−0.11 and −0.30, respectively). Schizophrenia was genetically associated with slower reaction time (−0.24) and poorer memory (−0.34). Bipolar disorder was also genetically associated with poorer memory (−0.24). Major depressive disorder was genetically associated with slower reaction time (−0.25).

Polygenic profile scoring replicated the directions of association found with LD score regression-estimated genetic correlations (Table 3). Higher polygenic risk for Alzheimer’s disease was associated with lower educational attainment and a lower score on verbal-numerical reasoning and memory (β between −0.01 and −0.05). Higher polygenic risk for ADHD was associated with lower educational attainment (β=−0.03). Higher polygenic risk for autism was associated with higher educational attainment and better verbal-numerical reasoning (β=0.07 and 0.02, respectively). Higher polygenic risk for bipolar disorder and schizophrenia had a positive association with educational attainment (β=0.06 and 0.03, respectively), and the genetic risk for schizophrenia and major depressive disorder had a negative association with verbal-numerical reasoning (β=−0.06 and −0.02, respectively).

Cognitive–health pleiotropy: vascular–metabolic diseases

There were significant negative genetic correlations between UK Biobank’s educational attainment variable and coronary artery disease (−0.26), ischaemic stroke (−0.17) and large vessel disease stroke (−0.34; Table 2, Figure 1). There was a significant negative genetic correlation between ischaemic stroke and verbal-numerical reasoning (−0.23).

Greater polygenic risk for coronary artery disease, type 2 diabetes, and ischaemic, and large and small vessel disease stroke were all associated with lower educational attainment (β between −0.02 and −0.05). Greater polygenic risk for coronary artery disease, and ischaemic and large vessel disease stroke were associated with lower verbal-numerical reasoning scores (β between −0.01 and −0.02). Greater polygenic risk for large vessel disease stroke was associated with more errors in the memory task (β=−0.01). There was little change to the results for coronary artery disease when individuals diagnosed with cardiovascular disease were removed from the analyses, and little change for type 2 diabetes results when individuals with diabetes were removed (Table 3).

Cognitive–health pleiotropy: physical, physiological and anthropometric measures

There were significant genetic correlations between UK Biobank’s educational attainment variable and body mass index (−0.23), and height (0.12), both obtained from the GIANT Consortium. There was a significant negative genetic correlation between body mass index and verbal-numerical reasoning (−0.12) and a significant positive genetic correlation with memory (0.15).

Greater polygenic risk for diastolic and systolic blood pressure (obtained from International Consortium for Blood Pressure), and body mass index were all associated with lower educational attainment (β of −0.02, −0.04 and −0.09, respectively). A polygenic profile for greater height was associated with higher educational attainment (β=0.07). Higher verbal-numerical reasoning scores were associated with lower polygenic risk for body mass index (β=−0.03), and a polygenic profile for greater height (β=0.02) and higher systolic blood pressure (β=0.01). Results for systolic blood pressure changed little when individuals with hypertension were removed from the analysis (Table 3).

Multivariate models predicting cognitive variance using many polygenic profile scores

We next ran four multivariate regression models that included all 24 polygenic profile scores alongside the same covariates as were described above. This tested whether there was redundancy among the polygenic profile scores, and the extent to which including them all together in a multivariate model would improve the prediction of the cognitive phenotype. We compared the R2 value of models including all the profile scores to models including only the covariates. The polygenic profile scores alone accounted for 3.33% of the variance in educational attainment of a college or university degree, 2.26% of the variance in verbal-numerical reasoning scores, 0.12% in reaction time and 0.16% in memory scores. See Supplementary Table 5 for full results.

Discussion

The present study has combined the power of UK Biobank’s very large genotyped and cognitively tested sample with the summary results of 24 large international GWAS consortia of physical and mental disorders and health-related traits. Two methods—LD score regression and polygenic profile scoring based on previous GWAS findings—discovered extensive cognitive–health pleiotropy and showed that it can be used to predict phenotypic variance between GWAS data sets. Our results provide comprehensive new findings on the overlaps between phenotypic cognitive ability levels, genetic bases for health-related characteristics such as height and blood pressure, and liabilities to physical and psychiatric disorders even in mostly healthy, non-diagnosed individuals. They make important steps toward understanding the specific patterns of overlap between biological influences on health and their consequences for key cognitive abilities. For example, some of the association between educational attainment—often used as a social background indicator—and health appears to have a genetic aetiology. These results should stimulate further research that will be informative about the specific genetic mechanisms of the associations found here, which likely involves both protective and detrimental effects of different genetic variants.

Findings for polygenic risk for coronary artery disease were not confounded by individuals with a diagnosis of cardiovascular disease, findings for type 2 diabetes were not confounded by individuals with a diagnosis of diabetes and findings for systolic blood pressure were not confounded by individuals with a diagnosis of hypertension. These results indicate that even in healthy individuals, being at high polygenic risk for coronary artery disease, type 2 diabetes or high blood pressure is associated with lower cognitive function and lower educational attainment.

Using LD score regression, we quantified for the genetic correlations from molecular genetic evidence between tests of cognitive ability and a wealth of health and anthropometric traits in over 100 000 individuals. As shown in Table 2, verbal-numerical reasoning and educational attainment showed a greater degree of pleiotropy than reaction time and memory, with many of the health and anthropometric variables studied here. Novel genetic correlations were quantified between cognitive function, using verbal-numerical reasoning, and schizophrenia, Alzheimer’s disease and ischaemic stroke. It has not escaped our notice that there are multiple possible interpretations of these genetic correlations. Not only might particular genes contribute both to cognitive and health-related traits, but genetic variants relating to health conditions could have indirect effects on cognitive ability (for example, via medications used to treat disorders), and vice versa (for example, via cognitively associated lifestyle choices). See Solovieff et al.17 for discussion of these issues of causality and pleiotropy.

Perhaps counter-intuitively, our results indicated that the genetic variants associated with obtaining a college degree were also related to higher genetic risk of schizophrenia, bipolar disorder and autism, which for bipolar disorder and autism support the findings from a previous study.15 For the cognitive tests, only polygenic risk for autism was related with higher cognitive ability, in agreement with a previous study.28 Genes related to bipolar disorder were negatively related to all of the cognitive tests, and genes related to schizophrenia even more so. Previous epidemiological studies indicate that both very high and very low educational achievement is associated with an increased risk of bipolar disorder29 and high polygenic risk of schizophrenia and bipolar disorder were recently associated with higher levels of creativity.30 The discrepancy between the cognitive and educational results may be explained by the age of the participants: if schizophrenia genes are detrimental to cognitive functioning only later in life, they may have differential effects on educational attainment, which tends to peak before age 30, and the cognitive tests, which were taken in UK Biobank at an average age of 56.9 years. It should also be noted the UK Biobank sample consists of individuals who are, on average, older than those in most schizophrenia genetic studies, have a higher level of education and are of a higher social class. It is also likely to have been the case that individuals in middle age with a history of serious mental illness will have been less likely than those without such a history to volunteer as UK Biobank participants. These demographic and clinical factors might have contributed to apparently contradictory findings with respect to cognitive function and previous educational attainment.

The educational attainment variable demonstrated pleiotropy with 20 of the 24 health-related variables, indicating that the genetic variants that collectively act to facilitate an individual’s progress through the educational system to degree level make important contributions to many important health outcomes. One explanation for this is that the educational attainment variable shows the greatest degree of pleiotropy with general cognitive ability in childhood with a genetic correlation of 0.906. This could indicate that it is genes related to cognitive ability early in life that are responsible for the pleiotropy with health variables: educational attainment, therefore, might act as a proxy phenotype for general cognitive ability, as others have demonstrated.8

The significant genetic correlations across traits enabled the use of polygenic profile scores to predict phenotypic cognitive variance in the UK Biobank sample. The amount of variance explained by the polygenic profile scores for each UK Biobank cognitive phenotype is small, as would be expected by the fact that not all SNPs were genotyped, and those that were do not necessarily accurately tag the causal genetic variants. The multivariate polygenic risk score analyses showed that additional variance can be accounted for when the polygenic liabilities of multiple disorders and traits are combined; this implies that there are risk alleles unique to each disorder and trait that affect cognitive and educational traits.

The results of the present study are supported by a previous study examining the genetic associations between polygenic profile scores for psychiatric and cognitive traits, and many phenotypic traits, showing comparable directions of effect.16 However, owing to the much smaller sample size (3000 versus 112 000 in the present study), the previous study yields insufficient power to detect several of the associations found in the present study. The same previous study also supported the results of the LD regression analyses of the present study between several psychiatric and cognitive traits, but the following traits show novel associations with cognitive ability and educational attainment in the present study: ischaemic stroke, infant head circumference and years of education.

The polygenic profile analysis replicated previous, smaller studies that showed associations between higher cognitive function and higher polygenic risk for autism28 and lower polygenic risk for schizophrenia21 and stroke.22 We did not replicate the previous finding that higher cognitive function is associated with higher type 2 diabetes genetic risk,31 but we did find that higher type 2 diabetes genetic risk is associated with decreased likelihood of obtaining a college degree. Unlike a previous small, underpowered, study23 we found that higher polygenic risk for Alzheimer’s disease is associated with lower cognitive function.

To the extent that these genetic associations between cognitive and health measures are explained by shared genetic influences, they support the theoretical construct of bodily system integrity.32 System integrity was formulated as a latent trait which is manifest as individual differences in how effectively people meet cognitive and health challenges from the environment, and which has some genetic aetiology. Although it is recognized that some illnesses will cause changes in cognitive functions, system integrity suggests, in addition, that there is shared variance in how well different complex bodily systems operate and that this underlies correlations between higher cognitive functioning and good health and longevity.

The present study has a number of strengths. First, the large sample size of UK Biobank (N>100 000) affords powerful, robust tests of genetic association. Second, the participants took identical cognitive tests, which were always administered in the same computerized fashion, reducing any potential bias owing to heterogeneity in test content and administration. Third, all of the UK Biobank genetic data were processed in a consistent matter, on the same platform and at the same location. Fourth, our use of summary data from a large number of international GWAS studies allowed a comprehensive and detailed examination of shared genetic aetiology with cognitive ability across a wide range of health-related phenotypes, producing many of the first estimates of the genetic correlation between traits.

The present study has some limitations. The three cognitive tests were brief, bespoke measures. However, they covered three major cognitive domains (reasoning, processing speed and memory), showed acceptable internal consistency and had validity in that they showed the expected correlations with one another and with age and educational attainment.33, 34 The verbal-numerical reasoning test has types of item that are the same as those found in tests of general cognitive ability. In addition, verbal-numerical reasoning and educational attainment showed strong genetic correlations (0.812 and 0.906, respectively) with childhood general cognitive function. This suggests that the variance in these traits is largely the product of the same genetic variants that underpin general cognitive function. The GWAS studies we curated to perform LD score regression and extract the polygenic profile scores were often consortia studies, involving meta-analyses across data sets with substantial heterogeneity in sample size, genome-wide imputation quality and phenotypic measurement. We expect that, with larger and more consistent independent data sets, we would be able to use the polygenic profile scores to predict more variance in cognitive test performance. Some of the GWAS consortia studies did not have ethical approval to be used for genetic correlation and polygenic profile scoring analyses associated with education, meaning that we were unable to estimate a few correlations. Clustering in genetic population structure meant that we restricted the genotyped samples to individuals of white British ancestry. Our results thus need to be replicated in large samples of different genetic backgrounds; the sample sizes available in UK Biobank were not large enough for us to model, with adequate power, data from UK Biobank individuals of other ancestries.

The best method for showing pleiotropy using GWAS data would be to obtain the genome-wide significant hits from two GWAS and correlate the effect sizes. However, there are two reasons why this is suboptimal at present. The first reason is that many significant SNP hits are needed in multiple GWAS data sets, which is currently not possible. The second reason is that GCTA has shown that many true associations do not reach statistical significance owing to low power, therefore, SNPs that do not attain statistical significance should also be considered. Both LD regression and polygenic profile score analyses provide the opportunity to use the full GWAS output to examine pleiotropy.

Because the optimum number of SNPs used to generate a particular polygenic profile can differ between traits, we created five profile scores per physical and mental health trait and tested each in a regression against each of the UK Biobank cognitive traits to determine the score that explained the greatest variance in each cognitive trait. We found that these did differ between different trait combinations, suggesting that the amount of shared genetic aetiology differs between different pairs of traits. However, as pleiotropy was quantified using LD score regression to perform a single test for each pair of phenotypes, the multiple testing problems associated with the polygenic profile score method did not confound the estimates of pleiotropy shown here. Although the estimate of phenotypic variance explained by the polygenic profile method was small, this should be considered as the minimum estimate of the variance explained. Owing to pruning SNPs in LD, the PGRS method makes the assumption of a single causal variant being tagged in each LD block considered. If this assumption is not true for the phenotypes considered, the proportion of variance explained will be underestimated here.

Conclusion

It is notable that, a short while ago, a single result from several of the findings reported here would have been considered a major novel finding and reported as a study in itself.20 With so many findings, it has not been possible fully to discuss their implications. For example, the genetic associations between infant head circumference and intracranial volume with educational attainment and verbal-numerical reasoning are important in themselves, as are many other cognitive–mental health and cognitive–physical health associations. Taken all together, these results provide a resource that advances the study of aetiology in cognitive epidemiology substantially.