Populations and plant material
Four breeding populations, collectively comprising 5805 inbred barley lines, were included in this study. The populations originate from Nordic Seed A/S and differ in their combination of growth habit (winter versus spring) and row-type (2-rowed versus 6-rowed). Further information on the populations can be found in Table 1. In summary, the 2-rowed spring population (2RS) comprised 3346 lines, the 2-rowed winter (2RW) population comprised 1501 lines, the 6RW population comprised 511 lines, and the 6-rowed spring (6RS) population comprised 447 lines. All stated population sizes are reported after outlier removal as described in the section on principal component analysis (PCA).
Field experiments and phenotypic data
All populations were phenotyped for heading date and lodging. The number of phenotyped plots varied by population and trait. In total, the phenotypic data included growing seasons from 2013 to 2023, and a set of 16 geographical locations representing fields in Denmark, Germany, Finland, and France (Table S1). Each year, new crosses were introduced and tested, while some older lines were discontinued. Consequently, the set of lines differed between years with a degree of overlap between adjacent years. Testing was done annually in multiple locations. Within each combination of year-location, genotypes were organized in trials in an alpha-lattice design with 2–3 replicates. Plants were grown in plots with sizes of 10–15 m2. X and Y field coordinates for each plot were noted within each trial and used to allow for modelling spatial effects by moving average over adjacent plots within trials.
The heading date of a plot was measured as the date where 50% of the main spikes (the first spike of the plant) had 1–2 cm of visible awns protruding the flag leaf (BBCH scale 49–51). Heading dates were recorded as days starting from May 1st. Stem lodging was visually scored on a scale from 1–9, where 9 is most severe. Further details can be found in Table S2.
Genotyping, imputation and SNP filtration
DNA was extracted from 2-week-old seedlings using a Cetyl Trimethyl Ammonium Bromide method as described by Orabi et al. (2014). Individuals genotyped after 2013 were genotyped with the iSelect Illumina Infinium 15 K SNP chip, remaining individuals were genotyped with the iSelect Illumina Infinium 9 K SNP array. Genotyping by both arrays was outsourced to TraitGenetics GmbH (Gatersleben, Germany). Genotypes with the 15 K SNP were used as a reference for within-population imputation of the 9 K genotypes. Imputation was performed using Beagle v.5.4 with default settings and a sliding window of 150 cM. The imputation accuracy was tested within populations using a cross-validation (CV) scheme where the genotypes of q random individuals were set to missing at the 15 K chip-specific SNPs. The parameter q was set so that it fitted the number of genotypes to predict in the 9 K array. The procedure was repeated 100 times. An average imputation concordance of 0.97–0.98 was achieved. Prior to imputation, a SNP call rate filter of >0.8 was applied to a larger set of both unphenotyped and phenotyped individuals (6RW = 1668; 2RW = 5431; 6RS = 945; 2RS = 17,732). After population-wise imputation, the resulting VCF files were merged to produce one file containing all genotypes. This file was filtered to only include phenotyped lines (n = 5813) and SNPs genotyped in all four populations with a minor allele count (MAC) of minimum 20. This yielded a total of 12,644 SNPs for further analyses. The physical positions of all SNPs were obtained by aligning them to the Morex v3 reference genome (Mascher et al. 2021).
The SNP density plot was made using the R-package “CMplot” (Yin et al. 2021). The median number of SNPs per million base pair (Mbp) was calculated using VCFtools v.0.1.16 (Danecek et al. 2011). Finally, the median distance between neighbouring SNPs was calculated using a custom R-script.
Population structure and outlier removal
To examine the overall population structure and to detect extreme population outliers, PCA was performed in PLINK v.1.9 (Purcell et al. 2007). Extreme population outliers were defined as genotypes where principal components 1 to 3 were closer to the median value of another population than the assigned. In total this yielded 8 outliers; 3 from 6RS, 3 from 2RS and 2 from 2RW, that were removed from the data. In addition, the genomic relationship and genetic redundancy within populations were assessed using the off-diagonal elements of the genomic relation matrix (G) as shown in Fig. S1. G was calculated by VanRaden method 1 as described in VanRaden (2008).
Genetic structure and admixture were further analysed with the software ADMIXTURE setting K from 2 to 40 (Alexander et al. 2009). To find the optimal value of K, the software-integrated CV function was used to perform a 10-fold CV at each K value.
Linkage disequilibrium
Intra-chromosomal LD was estimated between all pairs of SNPs within populations. Two measurements of LD were calculated—one estimated the traditional squared allele-frequency correlations (r2) and the other corrected for kinship relationships (rV2) using the method proposed by Mangin et al. (2012). LD decay was examined by ordering SNP pairs by distance followed by binning of data into groups according to their physical distance. Bins consisted of 10 kilobase pair (kbp) intervals, i.e. the first bin contained all SNP pairs with a distance of 0–10 kbp, the next bin all pairs with a distance of 10–20 kbp etc. For each bin the average r2 or rV2 value was plotted against the average distance between marker pairs and a smooth curve was added in R using the loess function with a span of 0.3.
The persistence of allele phases between populations over distance were examined by considering the described SNP bins. Using the same approach as Schopp et al. (2017), linkage phase similarity (LPS) was calculated within each bin by computing the cosine similarity of rv values between population pairs. The results were plotted similarly to the LD decay plots of single populations.
Variance components and heritability estimation
Traits were analysed by fitting the following linear mixed model using the software package DMU (Madsen and Jensen 2013):
$${\boldsymbol{y}}\,={{\mathbf{X}}}_{\mathbf{1}}\mu +\,{{\mathbf{X}}}_{\mathbf{2}}{\mathbf{l}}+{{\mathbf{Z}}}_{\mathbf{1}}{{\mathbf{g}}}_{{\rm{a}}}+{{\mathbf{Z}}}_{\mathbf{2}}{{\mathbf{g}}}_{{\mathbf{l}}}+{{\mathbf{Z}}}_{\mathbf{3}}{\mathbf{w}}\,+\mathop{\sum }\limits_{j=1}^{15}{{\mathbf{Z}}}_{\boldsymbol{j}}{\mathbf{s}}+{\mathbf{e}}$$
(1)
Where y is the vector of phenotypic observations; µ is the overall mean; l is the vector of the fixed effects associated with year x location effects; ga is the vector of genomic breeding values for lines with \({{\bf{g}}}_{{\rm{a}}}\sim N({\bf{0}},{\bf{G}}{\sigma }_{{ga}}^{2})\) where \({\sigma }_{{ga}}^{2}\) denotes the additive genetic variance and G is the genomic relationship matrix; gl is the vector of residual line effects (not captured by the additive marker effects); w denotes the interaction between environment and genotype (year x location x line) with \({\bf{w}}\sim N({\bf{0}},{\bf{I}}{\sigma }_{w}^{2})\); s is a vector of spatial effects with \({\bf{s}}\sim N({\bf{0}},{\bf{I}}{\sigma }_{s}^{2})\), which captures a moving window containing the plot itself and the 14 surrounding plots within trials when analysing the 6RW, 2RW and 6RS populations (Fig. S2). In the 2RS population, no y-coordinates were provided, and therefore spatial effects were modelled as independent fixed effects of x-coordinates. e is a vector of residual effects with \({\bf{e}}\sim N({\bf{0}},{\bf{I}}{\sigma }_{e}^{2})\). X1 and X2 are the design matrices for the fixed effects; µ and l, respectively. Z1, Z2, Z3 and Zj are the design matrices for the random effects; ga, gl, w, and s, respectively.
To account for inbreeding, the reported additive genetic variance component (\({\sigma }_{{ga}}^{2}\)) and its standard error (SE) have been multiplied by the average diagonal of the relevant G matrix. When the spatial variance was calculated as a moving window summing over 15 plots, the spatial variance component (\({\sigma }_{{gs}}^{2}\)) and its SE were reported as the estimated value multiplied by 15 to account for the contribution of 15 plots within the window.
The broad sense heritability (H2) and narrow sense heritability (h2) were calculated as follows:
$${H}^{2}=\,\frac{\widehat{{\sigma }_{{ga}}^{2}}+\widehat{{\sigma }_{{gl}}^{2}}}{\widehat{{\sigma }_{p}^{2}}}$$
(2)
$${h}^{2}=\,\frac{\widehat{{\sigma }_{{ga}}^{2}}}{\widehat{{\sigma }_{p}^{2}}}$$
(3)
Both heritability measurements were calculated on a single plot as well as a line mean (entry) level. For the single plot level, phenotypic variance was calculated as:
$$\widehat{{\sigma }_{p}^{2}}=\widehat{{\sigma }_{{ga}}^{2}}+\widehat{{\sigma }_{{gl}}^{2}}+\widehat{{\sigma }_{w}^{2}}+\widehat{{\sigma }_{s}^{2}}+\widehat{{\sigma }_{e}^{2}}$$
(4)
For the entry level, phenotypic variance was calculated as:
$$\widehat{{\sigma }_{p}^{2}}=\widehat{{\,\sigma }_{{ga}}^{2}}+\widehat{{\sigma }_{{gl}}^{2}}+\frac{\widehat{{\sigma }_{w}^{2}}}{{n}_{e}}+\frac{\widehat{{\sigma }_{s}^{2}}}{{n}_{r}}+\frac{\widehat{{\sigma }_{e}^{2}}}{{n}_{r}}$$
(5)
Where ne and nr indicate the number of environments and the average number of replicates per line across all environments, respectively.
GWAS models
All GWAS analyses were performed using the software DMU (Madsen and Jensen 2013). GWAS was performed on a set of SNPs with MAC ≥ 30. When combining populations, SNPs with a total MAC < 30 or with a MAC < 10 in either population were excluded. GWAS was done as a single SNP regression using an expanded version of the model stated in Eq. 1. It was performed within populations (single-population GWAS) and by combining populations (multi-population GWAS) using either a univariate model (MP1) or a multivariate model combining populations (MP2). In the MP1 model, the GWAS trait is treated as the same across populations. The model is an expansion of the single-population model and includes a fixed population effect. Further, the model nests fixed effects of environments (l, Eq. 1) within populations. In contrast, MP2 treats the same trait in different populations as genetically correlated using a multivariate version of the previously presented GWAS models. In addition to the estimation variance of SNP effect estimates, the covariance between estimation errors from SNP regression was estimated for each pair of populations in MP2.
GWAS significance test and correction for multiple testing
Significance of GWAS marker effects were assessed using p values calculated by applying a Wald test to z scores:
$${z}_{i}=\frac{\widehat{{b}_{i}}}{{SE}(\widehat{{b}_{i}})}$$
(6)
Using the single-population and MP1 models, \({\widehat{{b}_{i}}}\) is the estimated regression coefficient of SNP i. When applying the test to MP2, \({\widehat{{b}_{i}}}\) is the sum of the np scaled estimated regression coefficients when combining np populations. Thus in MP2 \({SE}({\widehat{{b}_{i}}})\) the following equations apply:
$${b}_{{scaled},i}=\mathop{\sum }\limits_{j=1}^{{n}_{p}}\frac{\widehat{{b}_{i,j}}}{{SE}(\widehat{{b}_{i,j}})}$$
(7)
\({\widehat{{b}_{i,{j}}}}\) refers to the estimated regression coefficient of SNP i in population j.
In order to calculate the z scores of SNP regressions from MP2, the SE of the estimated SNP effects are calculated as follows:
$${SE}\left({\widehat{b}}_{{scaled},i}\right)=\sqrt{{{\bf{a}}}^{{\bf{T}}}{{\bf{V}}}_{{\bf{i}}}{\bf{a}}}$$
(8)
Where a is a column vector of ones with dimension 1 × np where np is the number of included populations. Vi denotes the variance-covariance matrix of the scaled regressions for SNP i and has the dimension of np × np. Due to the scaled nature of the SNP regressions, all the diagonal elements in V are one. Each covariance between SNP regressions in different populations were estimated as follows:
$${\mathrm{cov}}({\widehat{b}}_{{scaled}\,i,j},{\widehat{b}}_{{scaled}\,i,k})\,=\,\frac{{\mathrm{cov}}({\widehat{{b}_{i,j}}},\,{\widehat{{b}_{i,k}}})}{{SE}({\widehat{{b}_{i,j}}}){SE}({\widehat{{b}_{i,k}}})}$$
(9)
Where \({\widehat{b}}_{{scaled\; i},j}\) and \({\widehat{b}}_{{scaled\; i},k}\) are the scaled regressions of the ith SNP in populations j and k, respectively.
To correct for multiple testing a Bonferroni correction was applied setting the genome-wide significance threshold to 0.05/12,644 = 3.95E-06.
Further statical analyses and modelling
The theoretical proportion of additive genetic variance explained (PVE) by the ith marker was calculated as follows:
$${{PVE}}_{i}\,=\,\frac{{2* p}_{i}* (1-{p}_{i})* {\widehat{{b}_{i}}}^{2}}{\widehat{{\sigma }_{g\alpha }^{2}}}$$
(10)
Where pi and \({b}_{i}\) are the allele frequency and the estimated regression coefficient of the ith marker, and \(\widehat{{\sigma }_{g\alpha }^{2}}\) (not to be confused with \(\widehat{{\sigma }_{{ga}}^{2}}\)) is the additive genetic variance before multiplication with the average diagonal elements of the G matrix.
Prior to modelling the genetic correlation between heading date and lodging within a population, the data was limited to observations of plots scored for both traits. The genetic correlation was then modelled by expanding Eq. 1 to a bivariate model which included residual covariance.
Power calculations
Power analyses were performed for each GWAS by iterating over values of SNP effects (b) ranging from 0 to \(\sqrt{2}\) by steps of 0.01. For each b value, the power calculations were performed for a set of non-centrality parameter (NCP) values as described by Sham and Purcell (2014):
$${\rm{NCP}}={\left(\frac{{\rm{b}}}{{\rm{SE}}({\widehat{{\rm{b}}}})}\right)}^{2}$$
(11)
The SE of each SNP effect (\({\rm{SE}}({\widehat{{\rm{b}}}})\)) originated from our actual GWAS on each trait, which was reduced to the common set of 4812 markers with a MAC ≥ 30 in all populations.
In the single-population and MP1 models, NCPs were obtained by dividing the given b value by the vector of \({SE}(\hat{b})\) produced by the GWAS models. For MP2, NCP ratios were obtained by dividing the corresponding scaled b value (Eq. 7) by the vector of SE of scaled b values (Eq. 8). The NCPs were used as test statistics in a non-central chi-squared test with 1 degree of freedom and a type 1 error rate set to the overall Bonferroni significance threshold. As power calculations were applied to each NCP value, we averaged over SNPs to obtain one power measurement per value of b2.