- Split View
-
Views
-
Cite
Cite
Piter Bijma, Andries D Hulst, Mart C M de Jong, The quantitative genetics of the prevalence of infectious diseases: hidden genetic variation due to indirect genetic effects dominates heritable variation and response to selection, Genetics, Volume 220, Issue 1, January 2022, iyab141, https://doi.org/10.1093/genetics/iyab141
- Share Icon Share
Abstract
Infectious diseases have profound effects on life, both in nature and agriculture. However, a quantitative genetic theory of the host population for the endemic prevalence of infectious diseases is almost entirely lacking. While several studies have demonstrated the relevance of transmission of infections for heritable variation and response to selection, current quantitative genetics ignores transmission. Thus, we lack concepts of breeding value and heritable variation for endemic prevalence, and poorly understand response of endemic prevalence to selection. Here, we integrate quantitative genetics and epidemiology, and propose a quantitative genetic theory for the basic reproduction number R0 and for the endemic prevalence of an infection. We first identify the genetic factors that determine the prevalence. Subsequently, we investigate the population-level consequences of individual genetic variation, for both and the endemic prevalence. Next, we present expressions for the breeding value and heritable variation, for endemic prevalence and individual binary disease status, and show that these depend strongly on the prevalence. Results show that heritable variation for endemic prevalence is substantially greater than currently believed, and increases strongly when prevalence decreases, while heritability of disease status approaches zero. As a consequence, response of the endemic prevalence to selection for lower disease status accelerates considerably when prevalence decreases, in contrast to classical predictions. Finally, we show that most heritable variation for the endemic prevalence is hidden in indirect genetic effects, suggesting a key role for kin-group selection in the evolutionary history of current populations and for genetic improvement in animals and plants.
Introduction
Pathogens have profound effects on life on earth, both in nature and agriculture, and also directly on the human population (Schrag and Wiener 1995; Russel 2013). In nature, infectious pathogens are a major force shaping evolution of populations by natural selection, both in animals and plants (reviewed in Karlsson et al. 2014; Ebert and Fields 2020). In livestock, the annual cost of fighting and controlling epidemic and endemic infectious diseases is substantial, and much greater than the annual value of genetic improvement (Rushton 2009; Knap and Doeschl-Wilson 2020). Moreover, while antimicrobials have revolutionized medicine, the rapid appearance of resistant strains has resulted in a global health problem, both in the human population and in livestock (EFSA 2012; Thanner et al. 2016). Thus, there is an urgent need for additional methods and tools to combat infectious diseases. For livestock and plant production, artificial genetic selection of (host) populations for infectious disease traits may provide such a tool. To quantify and optimize the potential benefits of such selection, however, we need to understand the quantitative genetics of infectious disease traits.
The integration of quantitative genetics and epidemiology for livestock populations was pioneered by Bishop and co-workers. They demonstrated unexpected effects, such as responses to selection for gastro-intestinal parasite infections clearly greater than expected from ordinary quantitative genetics (Bishop and Stear 1997, 1999, 2003). Bishop and co-workers also identified the basic reproduction number, , as a key parameter for genetic improvement and demonstrated the need for further integration of quantitative genetics and epidemiology (e.g., MacKenzie and Bishop 1999, 2001; Bishop and MacKenzie 2003; Nieuwhof et al. 2009; see also Doeschl-Wilson et al. 2021). These studies clearly show that classical quantitative genetic approaches do not predict response to genetic selection for disease traits, because they ignore the feed-back dynamics in the transmission of the infection. Several later studies have demonstrated the relevance of these transmission dynamics for heritable variation and response to selection in the host population (Lipschutz-Powell et al. 2012; Anche et al. 2014; Tsairidou et al. 2019; Hulst et al. 2021), mostly using stochastic simulation.
However, despite the findings of Bishop et al. (see references above) and the availability of well-established epidemiological theory (e.g., Diekmann et al. 2012), a quantitative genetic theory of the host population for the prevalence of infectious diseases is almost entirely lacking. The current theoretical framework of quantitative genetics and the approaches for genetic selection against infectious diseases in livestock and crops are largely based on the individual host response, ignoring transmission dynamics of the infection in the population. Moreover, we lack general expressions for the breeding value and genetic variance in key epidemiological parameters, in particular, for the basic reproduction number , even though such parameters may have a genetic basis.
Infections for which recovery does not confer any long-lasting immunity typically show endemic behavior, where the infection remains present in the population. For such infections, the endemic prevalence is defined as the expected fraction of the population that is infected. Because we lack a theoretical quantitative genetic framework for infectious diseases, we do not know which genetic effects of the host population determine the prevalence of an infectious disease, and have no concepts of breeding value and heritable variation for endemic prevalence. Hence, we do not understand the response of the endemic prevalence to genetic selection for disease traits at present. The main parameter determining the prevalence of endemic infections is the basic reproduction number , defined as the average number of individuals that gets infected by a typical infected individual in an otherwise noninfected population. In this article, we will propose a quantitative genetic framework for heritable variation and response to selection for and for the endemic prevalence of infectious diseases.
Individual phenotypes for infectious diseases are often recorded as the binary infection status of an individual, zero indicating noninfected and one indicating infected. The prevalence of an infection is then defined as the fraction of individuals that is infected, which is the fraction of individuals that has infection status y = 1. Because the average value of individual binary infection status is equal to the fraction of individuals infected, response to genetic selection in binary infection status is identical to response in prevalence, and vice versa. Binary infection status (0/1) typically shows low heritability, which suggests that response to selection is limited, also for prevalence (Bishop and Woolliams 2010; Bishop et al. 2012; Martin et al. 2018).
Geneticists have long realized that the categorical distribution of binary traits does not agree well with quantitative genetic models for polygenic traits, such as the infinitesimal model (Fisher 1919). For this reason, models have been developed that link an underlying normally distributed trait to the observed binary phenotype, such as the threshold model (Dempster and Lerner 1950; Gianola 1982) and the equivalent generalized linear mixed model with a probit link function (e.g., de Villemereuil et al. 2016). In such models, the underlying scale is interpreted as causal, and genetic parameters are assumed to represent “biological constants” on this scale. The genetic parameters on the observed scale, in contrast, depend on the mean of the trait, and thus change with the mean even when the change in allele frequencies at causal loci is infinitesimally small. In a landmark paper, Robertson (1950) showed that the observed-scale heritability of binary traits reaches a maximum at a prevalence of 0.5, and approaches zero when the prevalence is close to 0 or 1. Hence, observed-scale heritability vanishes when artificial selection moves prevalence close to zero, hampering further genetic change.
Infectious disease status, however, differs fundamentally from binary phenotypes for noncommunicable traits, such as, say, heart failure. Because pathogens can be transmitted between host individuals, either directly or via the environment, the infection status of an individual depends on the status of other individuals in the population. This suggests that indirect genetic effects (IGEs) may play a role, which would fundamentally alter heritable variation and response to selection (Griffing 1967; Moore et al. 1997; Wolf et al. 1998; Bijma and Wade 2008; Bijma 2011). Results of simulation studies indeed suggest that selection response in the prevalence of infectious diseases may differ qualitatively from response in noncommunicable traits (Nieuwhof et al. 2009; Doeschl-Wilson et al. 2011; Anche et al. 2014; Hulst et al. 2021), and this has also been observed in an actual population (Heringstad et al. 2007). Results of Hulst et al. (2021), for example, show that genetic selection may result in the eradication of an infection via the mechanism of herd immunity, just like with vaccination (Fine 1993). This result contradicts predictions based on the observed-scale heritability for noncommunicable binary traits, where heritability vanishes when prevalence approaches zero (Robertson 1950).
While quantitative geneticists and breeders typically focus on individual disease status and (implicitly) interpret prevalence as an average of individual trait values, epidemiologists interpret the endemic prevalence of an infectious disease as the result of a population-level process of transmission of the infection (Kermack and McKendrick 1927; Keeling and Rohani 2011; Diekmann et al. 2012). In the latter perspective, both and the prevalence are emergent properties of a population, similar to the size of a termite colony or the number of prey caught by a hunting pack, rather than an average of individual trait values. Because such emergent traits do not belong to single individuals, we cannot apply the common partitioning of individual phenotypic values into individual additive genetic values (breeding values) and nonheritable residuals (“environment”). Nevertheless, the genetic effects that determine the response to selection in an emergent trait and the heritable variation for an emergent trait can be defined based on the so-called total heritable variation (Bijma 2011). The total heritable variation in a trait is based on the individual genetic effects on the level of the emergent trait, rather than on a decomposition of individual trait values into genetic and residual effects. This suggests we can develop a quantitative genetic theory for the endemic prevalence of infectious diseases by combining epidemiological theory with the theory of total heritable variation.
Here, we propose a quantitative genetic theory for the basic reproduction number and for the endemic prevalence of infectious diseases. We first identify the genetic factors that determine the prevalence of an infectious disease. Similar to the threshold model, we will assume an underlying additive infinitesimal model for those genetic factors. However, the link between the underlying additive scale and the observed endemic prevalence will be founded in epidemiological theory, with a key role for . Subsequently, we investigate the population-level consequences of genetic variation in individual disease traits for and for the endemic prevalence. Next, we move to the individual level, and derive expressions for the breeding value and heritable variation, for , endemic prevalence and individual binary infection status, and show how these parameters depend on the level of the endemic prevalence. Results will show that heritable variation for endemic prevalence increases when prevalence approaches zero, while heritability of individual infection status goes to zero. Then we investigate response to selection against individual binary infection status (0/1), and show that response of prevalence to selection accelerates considerably when prevalence goes down. Finally, we partition the breeding value for prevalence into direct and IGE, and show that most of the heritable variation in the endemic prevalence of the infection is indirect, and thus hidden to classical genetic analysis and selection. We focus solely on the development of quantitative genetic theory, and do not consider the statistical estimation of the genetic effects underlying prevalence. Such methods have been developed elsewhere (Anacleto et al. 2015; Biemans et al. 2017; Pooley et al. 2020).
The theory we develop here applies to endemic microparasitic infections, i.e., where transmission depends just on whether an individual is infected or not, and where the infection is endemic in the local population (e.g., farm). It may also apply to endemic macroparasitic infection, such as coccidiosis or parasites, but we do not study this here. Endemic infections are of daily concern to farmers, and the very fact that they are endemic indicates that the existing management tools are insufficient. Thus, those are the infections likely to be targeted by breeding. Examples include mastitis, infectious claw disorders, respiratory infections in young animals (young replacement stock, meat calves, and fattening pigs), and fecal-oral transmitted infections causing gastro-intestinal diseases (diarrhea and so on.), but also several endemic and potentially zoonotic infections, such as Salmonella spp. and Campylobacter jejuni in poultry, Hepatitis E virus and MRSA in pigs, and Bovine Tb, Leptospira, Brucella, and Johne’s disease in cows.
Theory and Results
The genetic factors that determine R0 and the endemic prevalence
We consider an endemic infectious disease, where individuals can either be susceptible (i.e., in the noninfected state), denoted by S, or in the infected state, denoted by I. We use corresponding symbols in italics to denote the number of individuals with that status. Thus, with a total of N individuals in the population in which the endemic takes place, S denotes the number of susceptible individuals, I the number of infected individuals, and (see Table 1 for a notation key). We will assume that infected individuals are also infectious, and can thus infect others. When individuals recover they become susceptible again. This model is known as the SIS compartmental model (Hethcote 1989), and was first discussed by Weiss and Dishon (1971; In the Discussion, we will consider the validity of our results for other compartmental models).
Symbol . | Meaning . |
---|---|
N | Total number of individuals in the population in which the endemic takes place; N = I + S |
I | Number of infected individuals in the population |
S | Number of susceptible (i.e., noninfected) individuals in the population |
P | Prevalence in the endemic equilibrium; P = I/N; |
Basic reproduction number | |
Response of prevalence to selection, i.e., change in prevalence per generation | |
-like quantity that determines the prevalence for type i (see Equation 20) | |
A | Breeding value |
, , | Breeding value for the logarithm of susceptibility, infectivity and recovery rate |
, | Breeding value for the logarithm of ; breeding value for |
, | Breeding value for individual binary disease status; breeding value for prevalence |
Genotypic value for | |
, | Genotypic value for individual disease status; genotypic value for prevalence. |
Ratio of variance in breeding value for prevalence over phenotypic variance in y | |
c | Effective contact rate. Without heterogeneity = c. |
Heritability of individual binary disease status | |
y | Individual binary infection status; infected: y = 1; noninfected: y = 0. |
i, j | Subscript denoting an individual. |
α | Recovery rate (relative to a value of 1) |
Transmission rate parameter from individual j to individual i. | |
γ | Susceptibility (relative to a value of 1) |
φ | Infectivity (per unit of time, relative to a value of 1) |
Mean infectivity of the infected individuals in the endemic equilibrium | |
Life time infectivity (relative to a value of 1) | |
for the typical infected individual in an otherwise noninfected population | |
Intensity of selection; selection differential expressed in SD units | |
Accuracy of mass selection, correlation of and in the selection candidates | |
Variance of among individuals; analogous for and | |
Covariance of and ; analogous for and | |
Variance of the breeding values for the logarithm of | |
Additive genetic variance for endemic prevalence | |
Additive genetic variance in individual binary infection status | |
, | Direct and indirect additive genetic variance for endemic prevalence, respectively |
Direct-indirect additive genetic covariance for endemic prevalence |
Symbol . | Meaning . |
---|---|
N | Total number of individuals in the population in which the endemic takes place; N = I + S |
I | Number of infected individuals in the population |
S | Number of susceptible (i.e., noninfected) individuals in the population |
P | Prevalence in the endemic equilibrium; P = I/N; |
Basic reproduction number | |
Response of prevalence to selection, i.e., change in prevalence per generation | |
-like quantity that determines the prevalence for type i (see Equation 20) | |
A | Breeding value |
, , | Breeding value for the logarithm of susceptibility, infectivity and recovery rate |
, | Breeding value for the logarithm of ; breeding value for |
, | Breeding value for individual binary disease status; breeding value for prevalence |
Genotypic value for | |
, | Genotypic value for individual disease status; genotypic value for prevalence. |
Ratio of variance in breeding value for prevalence over phenotypic variance in y | |
c | Effective contact rate. Without heterogeneity = c. |
Heritability of individual binary disease status | |
y | Individual binary infection status; infected: y = 1; noninfected: y = 0. |
i, j | Subscript denoting an individual. |
α | Recovery rate (relative to a value of 1) |
Transmission rate parameter from individual j to individual i. | |
γ | Susceptibility (relative to a value of 1) |
φ | Infectivity (per unit of time, relative to a value of 1) |
Mean infectivity of the infected individuals in the endemic equilibrium | |
Life time infectivity (relative to a value of 1) | |
for the typical infected individual in an otherwise noninfected population | |
Intensity of selection; selection differential expressed in SD units | |
Accuracy of mass selection, correlation of and in the selection candidates | |
Variance of among individuals; analogous for and | |
Covariance of and ; analogous for and | |
Variance of the breeding values for the logarithm of | |
Additive genetic variance for endemic prevalence | |
Additive genetic variance in individual binary infection status | |
, | Direct and indirect additive genetic variance for endemic prevalence, respectively |
Direct-indirect additive genetic covariance for endemic prevalence |
Symbol . | Meaning . |
---|---|
N | Total number of individuals in the population in which the endemic takes place; N = I + S |
I | Number of infected individuals in the population |
S | Number of susceptible (i.e., noninfected) individuals in the population |
P | Prevalence in the endemic equilibrium; P = I/N; |
Basic reproduction number | |
Response of prevalence to selection, i.e., change in prevalence per generation | |
-like quantity that determines the prevalence for type i (see Equation 20) | |
A | Breeding value |
, , | Breeding value for the logarithm of susceptibility, infectivity and recovery rate |
, | Breeding value for the logarithm of ; breeding value for |
, | Breeding value for individual binary disease status; breeding value for prevalence |
Genotypic value for | |
, | Genotypic value for individual disease status; genotypic value for prevalence. |
Ratio of variance in breeding value for prevalence over phenotypic variance in y | |
c | Effective contact rate. Without heterogeneity = c. |
Heritability of individual binary disease status | |
y | Individual binary infection status; infected: y = 1; noninfected: y = 0. |
i, j | Subscript denoting an individual. |
α | Recovery rate (relative to a value of 1) |
Transmission rate parameter from individual j to individual i. | |
γ | Susceptibility (relative to a value of 1) |
φ | Infectivity (per unit of time, relative to a value of 1) |
Mean infectivity of the infected individuals in the endemic equilibrium | |
Life time infectivity (relative to a value of 1) | |
for the typical infected individual in an otherwise noninfected population | |
Intensity of selection; selection differential expressed in SD units | |
Accuracy of mass selection, correlation of and in the selection candidates | |
Variance of among individuals; analogous for and | |
Covariance of and ; analogous for and | |
Variance of the breeding values for the logarithm of | |
Additive genetic variance for endemic prevalence | |
Additive genetic variance in individual binary infection status | |
, | Direct and indirect additive genetic variance for endemic prevalence, respectively |
Direct-indirect additive genetic covariance for endemic prevalence |
Symbol . | Meaning . |
---|---|
N | Total number of individuals in the population in which the endemic takes place; N = I + S |
I | Number of infected individuals in the population |
S | Number of susceptible (i.e., noninfected) individuals in the population |
P | Prevalence in the endemic equilibrium; P = I/N; |
Basic reproduction number | |
Response of prevalence to selection, i.e., change in prevalence per generation | |
-like quantity that determines the prevalence for type i (see Equation 20) | |
A | Breeding value |
, , | Breeding value for the logarithm of susceptibility, infectivity and recovery rate |
, | Breeding value for the logarithm of ; breeding value for |
, | Breeding value for individual binary disease status; breeding value for prevalence |
Genotypic value for | |
, | Genotypic value for individual disease status; genotypic value for prevalence. |
Ratio of variance in breeding value for prevalence over phenotypic variance in y | |
c | Effective contact rate. Without heterogeneity = c. |
Heritability of individual binary disease status | |
y | Individual binary infection status; infected: y = 1; noninfected: y = 0. |
i, j | Subscript denoting an individual. |
α | Recovery rate (relative to a value of 1) |
Transmission rate parameter from individual j to individual i. | |
γ | Susceptibility (relative to a value of 1) |
φ | Infectivity (per unit of time, relative to a value of 1) |
Mean infectivity of the infected individuals in the endemic equilibrium | |
Life time infectivity (relative to a value of 1) | |
for the typical infected individual in an otherwise noninfected population | |
Intensity of selection; selection differential expressed in SD units | |
Accuracy of mass selection, correlation of and in the selection candidates | |
Variance of among individuals; analogous for and | |
Covariance of and ; analogous for and | |
Variance of the breeding values for the logarithm of | |
Additive genetic variance for endemic prevalence | |
Additive genetic variance in individual binary infection status | |
, | Direct and indirect additive genetic variance for endemic prevalence, respectively |
Direct-indirect additive genetic covariance for endemic prevalence |
The prevalence of an infectious disease is determined by . The is defined as the average number of individuals that get infected by a typical (i.e., average) infected individual in an otherwise noninfected population, and is a property of the population (Kermack and McKendrick 1927; Anderson and May 1979; Diekmann et al. 1990). When R0 > 1, an average infected individual on average infects more than one new individual in an infection free population, and the infection can persist in the population.
Throughout, we will use the symbol P to denote the endemic prevalence. The actual prevalence tends to fluctuate around the equilibrium value because of random perturbations and transient effects, for example when new animals replace some of the resident animals. Equation (3) is an approximation when there is variation among individuals, which is commonly referred to as “heterogeneity” in the epidemiological literature, and which will be addressed in the section on the impact of genetic variation on the endemic prevalence below.
Figure 1 illustrates the relationship between the endemic prevalence and . When is smaller than one the endemic prevalence is zero (the infection is not present in the long run), and Equation (3) does not apply. For large the endemic prevalence asymptotes to 1. This threshold phenomenon, i.e., P = 0 when and P > 0 when , is exact also with heterogeneity (Diekmann et al. 1990). Note that the curve is steeper the closer is to 1. This pattern will have considerable consequences for the relationship between the heritable variation in the endemic prevalence and the level of the endemic prevalence, as will be shown in the section on individual genetic effects for the endemic prevalence below.
Because the endemic prevalence is determined by R0 (Equation 3), the response of prevalence to selection (on any criterion), i.e., the genetic change in the endemic prevalence from one (host) generation to the next, follows from the genetic change in R0. Thus, to measure the value of an individual with respect to response to selection, we should base this measure on the genetic impact of the individual on R0. In other words, the definition of an individual breeding value for endemic prevalence should be based on . The next step, therefore, is to find the individual genetic factors underlying .
Equations (3) through (5) show that the factors underlying the endemic prevalence of an infection are the contact rate c, the susceptibility, γ, the infectivity, φ, and the recovery rate α. We define c as a fixed parameter for the population (or, for example, for a sex, herd or age class combination), whereas γ, φ, and α are quantitative traits that may show random variation among individuals. Note, while the actual contact rate may vary among individuals, it is convenient to include such variation in the individual susceptibility and infectivity traits. Thus, we also assume that all the individuals are mixing randomly within the (local) population, such as a herd. In principle, might also depend on the specific combination of i and j, so that we cannot fully separate into a product of components due to i and j. However, in a quantitative genetic perspective, such a combination effect represents interactions between genes in distinct individuals (say “between-individual epistasis”), which does not contribute to the heritable variation, and which we will therefore ignore. In epidemiological terminology, we assume separable mixing (Diekmann et al. 1990). Hence, conceptually we define c as the average effective contact rate for the population, while variation in contact rate among individuals is included in γ and φ. Moreover, to define the scale of Equations (4) and (5), it is convenient to include the scale in c, and to express γ, φ, and α relative to a value of 1. Hence, with this parameterization, the c is on the scale of , and and c are identical in the absence of heterogeneity. With heterogeneity, however, may deviate from c (see below).
Genetic models for susceptibility, infectivity, recovery, and
Genetic variation is potentially present in susceptibility, infectivity, and the recovery rate. In this section, we propose a genetic model for these traits, which subsequently leads to a genetic model for .
Throughout, we use subscript l to denote the natural logarithm. Thus, the breeding values for , , and follow a multivariate normal distribution, as common in quantitative genetics. Moreover, for the average individual the = 0, so that its rates are equal to one (). Hence, those rates should be interpreted relative to a value of 1. An individual with , for example, is twice as susceptible as the average individual. Also note that, by defining breeding values to have a mean of zero, we put the mean into the contact rate c. Hence, in the following, c will refer to the model where the mean breeding value on the log scale is equal to zero (Equation 7, see also Discussion).
The breeding values on the log-scale can approximately be interpreted as a relative change of the corresponding rate. For example, since , an of 0.1 corresponds approximately to a 10% greater than average susceptibility (). Similarly, an of −0.1 corresponds approximately to a 10% smaller than average susceptibility (). Realistic values for the genetic variances on the log-scale are probably smaller than ∼0.52 (Hulst et al. 2021). For example, with , the 10% least susceptible individuals have , while the 10% most susceptible individuals have .40. Thus, the average susceptibilities of these top and bottom 10% of individuals differ by a factor of 5.7, which is substantial. Therefore, we will consider additive genetic variances on the log-scale no greater than 0.52. With a prevalence of 0.3, this value corresponds to an observed-scale heritability of individual binary infection status of about 0.05 (Hulst et al. 2021).
Genotypic value and breeding value for
In contrast to the pair-wise transmission rate parameter in Equation (5), an individual’s genotypic value for is entirely a function of its own rates, as can be seen from the index i on all elements of Equation (8). This is because refers to the genetic effects that originate from the individual, rather than to those that affect its trait value. As a consequence these rates may be correlated, as defined in Equation (7) above. Hence, represents a total genotypic value (Bijma et al. 2007; Bijma 2011). We focus on the total genotypic value, because our ultimate interest is in response to selection. In the next section of this manuscript, we will show that is indeed the simple population average of .
The genotypic value for for the average individual, which has , is equal to the contact rate, c. Hence, the genotypic value is defined here including its average, it is not expressed as a deviation from the mean. Moreover, we refer to as a genotypic value, rather than a breeding value, because the in Equation (9) is a nonlinear function, so that will show some nonadditive genetic variance, even though is additive.
Equations (12) and (10c) show that genetic (co)variation in susceptibility, infectivity and/or recovery, and thus in the breeding value for the logarithm of , leads to an increase in the mean genotypic value for . For example, for , 1.13c. While this 13% increase in may suggest limited impact of heterogeneity, a 13% increase in has a considerable impact on the endemic prevalence when is close to one (Figure 1).
Equations (12) and (13) show that a log-normal distribution for susceptibility, infectivity and recovery results in a positive mean-variance relationship for . Figure 2 illustrates this relationship, for and genetic variation in susceptibility only. The x-axis shows the contact rate, which is equal to the genotypic value for of the average individual in the population (i.e., an individual with = = = 1). Hence, the x-axis reflects the level of . The small circle represents a population with a prevalence of ∼0.33, for which observed-scale heritability of binary infection status is ∼0.02 (Hulst et al. 2021). For that population, is ∼1.5, and the genetic standard deviation in is ∼0.48. Hence, despite the small observed-scale heritability, has considerable genetic variation and some individuals will have a genotypic value smaller than 1, which agrees with the findings of Hulst et al. (2021). In the context of artificial selection against infectious diseases, the positive mean-variance relationship resulting from our model may be interpreted as conservative, because it implies a reduction of the genetic variance in with continued selection for lower prevalence.
In summary, this section has presented a genetic model for susceptibility, infectivity and recovery, leading to expressions for the genotypic value and genetic variance in (Equations 8, 9, and 11). Note however, that we have not yet provided formal proof that the individual genotypic value for indeed predicts the actual (of the population so to say). In fact, the definition of in Equation (8) is an educated guess based on the expression for in a homogeneous population (Equation 4). In epidemiology, however, is an emerging property of a population-level process of the transmission of an infection, rather than an average of individual (genotypic) values. Thus, it remains to be proven that the defined in Equation (8) indeed predicts the of a genetically heterogeneous population. In the next two sections, therefore, we will focus on the population-level consequences of genetic heterogeneity, and investigate the impact of genetic variation on the level of and on the endemic prevalence.
The impact of genetic heterogeneity on R0
is a key parameter for infectious diseases, because infections can persist in a population if and only if is greater than one (Kermack and McKendrik 1927; Diekmann et al. 1990). In other words, an endemic equilibrium can exist only when is greater than 1. Conversely, eradication of an infectious disease, either by vaccination or other measures such as genetic selection of the host population, requires that is reduced to a value smaller than one. Here, we address the consequences of genetic (co)variation in susceptibility, infectivity and recovery for the value of , and provide a proof that is indeed the simple population average of the individual genotypic values for , as defined in Equations (8), (9), and (11). Because our interest is in the impact of genetic heterogeneity on and in the genotypic value for , we consider genetic (co)variation only, disregarding environmental sources of (co)variation. Note that is strictly defined for the infection free state of the population (i.e., where the infected fraction is infinitesimally small). Hence, in this section we consider the infection free state, while the endemic equilibrium will be addressed in the next section.
Genetic (co)variation in susceptibility, infectivity and recovery has two consequences for . First, it increases the mean genotypic value for because the expectation of a log-normal variate increases with the variance on the log-scale. This effect is trivial; it follows directly from Equations (12) and (10c) and is not the main focus of this section. Second, as stated above, is the average number of individuals that gets infected by a typical infected individual in an otherwise noninfected (large) population (Kermack and McKendrick 1927; Diekmann et al. 1990). The expression for given in Equation (4) ignores the “typical” term in the definition of , and is therefore an approximation in case of heterogeneity (Diekmann et al. 1990). The focus of this section is on the consequences of heterogeneity for the properties of the typical infected individual, and thus for .
The properties of the “typical infected individual” will depend on the magnitude and nature of the heterogeneity among the individuals in the population, because the susceptibility and recovery determine which animals are infected, while the infectivity of those individuals may differ from the population average. In contrast to the conclusion of Springbett et al. (2003), therefore, genetic heterogeneity can affect (Diekmann et al. 1990, 2012). Suppose, for example, that individuals differ in both susceptibility and infectivity, and that susceptibility is positively correlated to infectivity. Because individuals with greater susceptibility are more likely to become infected, the typical infected individual will have an above-average susceptibility. Moreover, because of the positive correlation with infectivity, this will also translate into an above average infectivity of the typical infected individual, leading to higher . Hence, variation among individuals together with a positive (negative) correlation between susceptibility and infectivity results in an increase (decrease) in (Diekmann et al. 1990). A similar argument holds for genetic covariation between recovery and infectivity. For this reason, in general deviates from the right-hand side of Equation (4) obtained using the averages of α, γ, and .
Thus, depends on the variance in the breeding value for the logarithm of , but is still equal to the mean genotypic value for . In other words, a positive covariance between susceptibility and life-time infectivity indeed increases , but this effect is fully captured by the effect of the variance in the breeding values for the logarithm of on the mean genotypic value for (the term in Equation 18). This result, therefore, provides formal proof that the genotypic value for , as defined in Equations (8)–(11), indeed represents the individual genetic value for .
Note that, while is equal to the simple average genotypic value for , it still differs from the product of the simple averages of the rates when susceptibility, infectivity and/or recovery are correlated; Moreover, may also differ from the simple with heterogeneity. A numerical investigation of the term in Equation (16) shows that a correlation between susceptibility and life time infectivity may change by a maximum of about 25% for realistic levels of heterogeneity and log-normally distributed genetic effects. For example, for = = , and a correlation , is 22% greater than . For values of close to 1, this 22% may be the difference between absence of an infection vs. a significant endemic prevalence. Thus, a correlation between susceptibility, infectivity and/or recovery may have a meaningful impact on .
In summary, this section has shown that heterogeneity and a positive correlation between susceptibly and life-time infectivity lead to an increase of , and thus increase the probability that an infectious disease persists in the population. However, when genotypic values for follow a log-normal distribution, is still equal to the simple average of those genotypic values.
The impact of genetic variation on the endemic prevalence
In this section, we present an expression for the endemic prevalence in a population with genetic variation in susceptibility, infectivity and recovery, and also briefly investigate the quantitative effect of such variation for the endemic prevalence. Figure 1 and Equation (3) show the relationship between and the endemic prevalence for a homogeneous population. With variation among individuals, however, more susceptible individuals are more likely to be in the infected state in the endemic equilibrium. For this reason, the mean susceptibility of the remaining noninfected individuals will be lower than the population average susceptibility. This in turn translates into an endemic prevalence lower than expected based on [Equation 3; Springbett et al. 2003; Diekmann et al. 2012; Note, however, that the threshold value of remains, so that endemic prevalence is zero if and only if , and in that sense the given in Equation (18) is exact]. Similar arguments can be used to show that prevalence depends on the variation in the recovery rate, and on the covariation of infectivity, susceptibility and recovery. Thus, Equation (3) is exact only in the absence of heterogeneity in these parameters.
The endemic prevalence in a heterogeneous population can be found by realizing that the prevalence must have reached an equilibrium value for each type of individual (Biemans et al. 2017; Aznar et al. 2018). Suppose, for example, that susceptibility, infectivity, and recovery would be governed by the same single bi-allelic locus in a diploid organism. Then, for the entire population to be in equilibrium, each of the three genotypic classes should be in equilibrium as well. In other words, the prevalence should have reached an equilibrium value within each genotypic class, but this value may differ among the three classes. Here, we adapt this approach to continuous variation in polygenic traits.
Equations (20a) and (20b) make no assumptions on the distribution of , , and , and are thus not restricted to log-normal distributions. Although Equation (20b) is similar to Equation (8), note that differs from the genotypic value for (; We use a symbol slightly different from to highlight this difference). The is a function of the mean infectivity of the infected individuals in the endemic equilibrium (), while is a function of the infectivity of the individual itself (). Our interest here is in the prevalence for an individual with susceptibility and recovery rate in the endemic equilibrium, where i is exposed to the mean infectivity of the infected individuals. For this reason, is a function of rather than . The , in contrast, defines the contribution of an individual’s genes to (Equation 18), which is relevant for response to selection. Note that depends on the multivariate distribution of γ, φ, and α.
To find the endemic prevalence, we need to solve Equations (20a) and (20b) for P. While we found an approximate analytical solution for the case without (correlated) genetic variation in infectivity, the resulting expression is very complex (not shown). We therefore used a numerical solution, which is easily obtained (see Appendix B for methods, and Supplementary Material 1 for an R-code). We validated the numerically obtained solution using full stochastic simulation of actual endemics, following standard methods in epidemiology. Results of these simulations confirmed the numerically obtained solutions ( Appendix C).
The solutions of Equations (20a) and (20b) show that variation in susceptibility and/or recovery reduces the endemic prevalence, compared to the simple prediction based on (Equation 3). Hence, with variation in susceptibility and/or recovery, prevalence is always lower than predicted by Equation (3) (Figure 3; as expected with heterogeneity; Greenhalgh et al. 2000). Note that genetic variation in infectivity has no effect on the prevalence (beyond its trivial effect on the mean of , Equation 18), as long as infectivity is not correlated to susceptibility and/or recovery.
Figure 3 illustrates the impact of heterogeneity on the endemic prevalence for a limited number of scenarios with genetic variation in susceptibility only. For , the effect of heterogeneity is imperceptible. For , the true prevalence is up to 2 percent point lower than the value from Equation (3). For , true prevalence is up to 6 percent point lower. This maximum difference occurs at a contact rate of two. Moreover, when c = 2 and there is no variation in infectivity, prevalence is always equal to , irrespective of the genetic variation in susceptibility and recovery. (This is not visible in Figure 3, because the x-axis shows rather than c). This occurs because the two opposing effects mentioned at the beginning of this paragraph exactly cancel each other. More detailed results can be found in Supplementary Material 2.
Genotypic value for individual binary infection status
In the previous two sections, we have considered the population-level effects of genetic heterogeneity. In the next two sections, we move to the individual level. This section focusses on the effects of an individual’s genes on its own infection status, while the next section focusses on the effect of an individual’s genes on the prevalence in the population.
Thus, the represents the direct genetic effect (DGE) on the own phenotypic value (including the mean, , here; Note the distinction between subscript y, indicating individual binary infection status, and γ, indicating susceptibility). The genotypic value of an individual is not a function of its breeding value for log-infectivity, since an individual’s infectivity does not affect its own infection status. Hence, Equation (23) does not condition on .
Calculation of from Equation (24) requires knowledge of the endemic prevalence P. In the previous section, we used a numerical approach to find P, because our interest was in the effects of heterogeneity on P. In applied breeding, however, breeders may often have a reasonable idea of realistic values for the endemic prevalence, and a numerical solution may not be needed to find , or have little added value. (The dependence of the breeding value for binary infection status on the endemic prevalence will be given in Equation 34 below).
Validation
We used stochastic simulation of endemics, following standard methods in epidemiology ( Appendix C), to validate Equation (24). Figure 4, A–C shows the mean observed infection status of individuals as a function of their genotypic value . For all three panels in Figure 4, regression coefficients were very close to 1, showing that is an unbiased linear predictor of individual infection status.
Individual genetic effects for the endemic prevalence
The previous section focused on the genetic effects of individuals on their own infection status. In this section, we will consider the genetic effects of individuals on the endemic prevalence in the population. In other words, the previous section focused on the contribution of genetic effects to the variation in infection status among individuals, while this section considers the genetic effects that are relevant for response to selection. We will present expressions for the genotypic value, breeding value and additive genetic variance for the endemic prevalence. The genotypic value will reflect the full genetic effect of an individual on the endemic prevalence in the population, while the breeding value reflects the additive component thereof. The last part of this section contains a comparison of the breeding value for endemic prevalence and that for individual infection status.
Equation (30b) is approximate, because the relationship between P and given in Equation (3) is approximate with heterogeneity. Equations (30a) and (30b) show how the genotypic variance for endemic prevalence depends on the level of or equivalently, on the level of the endemic prevalence. Hence, in contrast to ordinary additive genetic traits, the genetic variance for endemic prevalence is a function of the level of the endemic prevalence (Equation 30b).
Figures 5A and B illustrate that the standard deviation in genetic values for endemic prevalence is considerably larger at lower , or equivalently, at lower prevalence. Hence, even though the genetic variance in decreases with the level of (Figure 2), the genetic variance for prevalence increases strongly when decreases. This result originates from the increasing slope of the relationship between prevalence and when decreases (Figure 1). In other words, an equal change in has much greater impact on the endemic prevalence at low than at high , which is well-known in epidemiology (e.g., Metz 1978; Bolker and Grenfell 1996). Hence, for a constant variance in the breeding value for the logarithm of , the genetic variance for endemic prevalence is much greater at lower prevalence. Moreover, genetic selection for lower prevalence will lead to an increase in the genetic variance for prevalence.
Figure 6 shows some examples of the distribution of the genotypic value for endemic prevalence, for different values of and the corresponding endemic prevalence. For the scenarios in Figure 6, the observed-scale heritability of individual infection status does not exceed 0.022 (see Figure 7 below). The panels illustrate that the genotypic standard deviation for endemic prevalence is relatively large, particularly when prevalence is small. For example, for = 1.67 (P ≈ 0.4; Panel B), the standard deviation in genotypic values for prevalence is around 0.19 (see also Figure 5), and values between ∼0 and ∼0.7 are quite probable. Hence, despite the low observed-scale heritability of individual infection status, the probable values of span as much as 70% of the full 0-1 range of endemic prevalence.
Breeding value and additive genetic variance for prevalence
Equation (32c) shows that the additive genetic variance in endemic prevalence increases strongly when prevalence decreases, similar to the relationship between the genotypic variance and endemic prevalence (Figure 5). This result suggests that response of endemic prevalence to selection will be greater at lower levels of the prevalence, which we will further investigate below in the section on response to selection.
The relative amount of nonadditive genetic variance in the endemic prevalence is determined by the magnitude of ( Appendix D). For realistic values of , the vast majority of the genotypic variance in prevalence is additive. For example, for = 0.52, 88% of the variance in is additive. Hence, the distinction between the breeding value for prevalence () and the genotypic value for prevalence () seems of minor importance, and results in Figures 5 and 6 will closely resemble those for the additive genetic effects.
Breeding value and heritability for infection status vs prevalence
Note, in contrast to genotypic values, breeding values are expressed as a deviation from their mean here. The is the ordinary observed-scale breeding value for binary infection status that breeders are familiar with.
Equation (33) implies that the impact of an individual’s genes on the response of the endemic prevalence to selection is considerably larger than their impact on the infection status of the individual itself, particularly when the endemic prevalence is small. Consider, for example, an individual with in a population with an endemic prevalence of P = 20%. The expected infection status of this individual in the current population equals . Hence, on average, this individual will be infected 18% of the time. However, its breeding value for prevalence equals Hence, if we select individuals with as parents of the next generation, then the endemic prevalence will go down to 0.20–0.10 = 0.10. In other words, the response to selection will be fivefold greater than suggested by the ordinary breeding value for individual infection status (since 1/P = 1/0.2 = 5). We will numerically validate this theoretical result in the section on response to selection below.
We used stochastic simulation to validate this expression and investigate its precision. Results show that Equation (34) closely matches the regression of individual binary infection status on the breeding value for the logarithm of for realistic levels of heterogeneity (; Supplementary Material 3; the good fit results from compensating errors due to the approximations). Hence, Equation (34) is sufficiently precise for practical purposes. Note that, since infectivity does not affect the infection status of an individual itself, a potential component due to infectivity has to be left out of the term when calculating Equation (34). In other words, in Equation (34) the should include only the breeding values for the logarithm of susceptibility and recovery (see Equation 10a).
Hence, the observed-scale heritability for binary infection status has a maximum at a prevalence of 0.5, and goes to zero at a prevalence of zero or one, just like the heritability of binary phenotypes for noncommunicable polygenic traits (Robertson 1950; Figure 7A; assuming the infinitesimal model at the level of the logarithm of , so that is constant).
Figures 7A and B show a comparison of and for a population without genetic variation in infectivity, with genetic variances in the logarithm of ranging from 0.12 through 0.52. In Figure 7A, the maximum value of equals 0.0625, for P = 0.5 and . Given that genetic variances greater than are very large (as argued above), observed-scale heritabilities of binary infection status greater than ∼0.06 are unlikely for endemic infectious diseases. The heritabilities in Figure 7A agree with the findings of Hulst et al. (2021), who used stochastic simulation of actual endemics and analysis of the resulting binary infection status data with a linear animal model. Figure 7B shows that increases strongly when prevalence goes down. Figure 7 illustrates that and differ by a factor of approximately P2, so that the additive genetic variance in prevalence is (much) greater than the additive genetic variance in individual infection status, and may even exceed the phenotypic variance at low values of the endemic prevalence (i.e., > 1).
In conclusion, in this section, we have presented expressions for the breeding value for prevalence (Equation 31) and for individual infection status (Equation 34), and for the corresponding genetic variances. With realistic levels of heterogeneity, the breeding value for prevalence is a factor 1/P greater than the breeding value for individual infection status. This result suggests that response to selection should be considerably greater than expected based on ordinary heritability of individual infection status. We will test this hypothesis in the next section.
Response to selection
The higher genetic variance for prevalence at lower values of the prevalence (Equations 30 and 32, Figures 5B and 6) suggests that the response of the endemic prevalence to selection should increase when the prevalence decreases. To validate and illustrate this hypothesis, we stochastically simulated an endemic infectious disease in a large population undergoing mass selection for individual infection status. Hence, the individuals with the lowest observed average infection status were selected as parents of the next generation. Simulations were based on standard methods in epidemiology, not making use of the above theory ( Appendix F).
Figure 8A shows the observed prevalence (i.e., the mean binary infection status in each generation), the mean breeding value for prevalence and the mean breeding value for binary infection status, for ∼70 generations of selection. Response in prevalence increases strongly when prevalence decreases, and the infection disappears in the final generation. There is excellent agreement between the observed prevalence and the breeding value for prevalence, showing that the change in indeed predicts the change in prevalence. In contrast, the response in prevalence deviates substantially from the response in the breeding value for individual infection status (), particularly at lower values of the prevalence. Hence, while the breeding value for infection status correctly predicts the average individual infection status within a generation (Figure 4), the change in considerably underestimates the response to selection. Furthermore, given the weak selection and the low value of the observe-scale heritability of binary infection status, which did not exceed 0.022 in Figure 8A, response to selection in prevalence is remarkably large, unless prevalence is high. This result agrees with findings of Hulst et al. (2021).
Figure 8B shows a comparison of observed and predicted prevalence. Above a prevalence of ∼0.5, response predicted from Equation (38b) is somewhat larger than observed response, while the reverse is true below a prevalence of ∼0.5 (Note, response to selection in a generation is reflected by the slope of the figure). Nevertheless, agreement between observed and predicted response is remarkably good given the very unrealistic assumption of linearity in Equation (38b) (i.e., bivariate normality of and ). Because selection was based on mean individual infection status recorded over a period lasting on average only 1.25 events per individual (see legend Figure 8), many values were either 0 or 1, implying strong deviations from normality.
For a prevalence smaller than ∼0.5, predictions from Equation (39) were very close to the observed response in prevalence [Figure 8B; for P > 0.5, results of Equations (38b) and (39) are almost identical].
In conclusion, results in this section show that response to selection in the prevalence of endemic infectious diseases is a factor greater than suggested by the ordinary breeding values for individual binary infection status. Thus, breeders can predict response to selection by upscaling the selection differential in the usual estimated breeding values for binary infection status by a factor .
Direct and indirect genetic variance for endemic prevalence
(43c)
Figure 9 shows the total additive genetic variance for the endemic prevalence and the fractions due to DGE, IGE and their covariance, for a scenario with equal genetic variances in susceptibility, infectivity and recovery and covariances equal to zero. For an endemic prevalence smaller than 0.5, IGE contribute the majority of the genetic variance. For example, for an endemic prevalence of 0.3, the total additive genetic variance consists of 6% direct genetic variance, 66% indirect genetic variance and 28% direct-indirect genetic covariance. These results imply that IGE dominate the heritable variation and response to selection for the endemic prevalence of infectious diseases, unless prevalence is high.
Discussion
We have presented a quantitative genetic theory for endemic infectious diseases, with a focus on the genetic factors that determine the endemic prevalence. We defined an additive model for the logarithm of individual susceptibility, infectivity and rate of recovery, which results in normally distributed breeding values for the logarithm of . Next, we investigated the impact of genetic heterogeneity on the population level, for both and the endemic prevalence. Results show that, despite heterogeneity, remains equal to the mean individual genotypic value for . Subsequently, we considered genetic effects of individuals on their own infection status and on the endemic prevalence in the population. Building on the breeding value for the logarithm of , we showed that genotypic values and genetic parameters for the prevalence follow from the known properties of the log-normal distribution. In the absence of genetic variation in infectivity, genetic effects for the endemic prevalence are a factor 1/prevalence greater than the ordinary genetic values for individual binary infection status. Hence, even though prevalence is the simple average of individual binary infection status, breeding values for prevalence show much more variation than those for individual infection status. These results imply that the genetic variance that determines the potential response of the endemic prevalence to selection is largely due to IGE, and thus hidden to classical genetic analysis and selection. For susceptibility and recovery, a fraction 1-P of the full genetic effect on endemic prevalence is due to IGE, whereas the effect of infectivity is entirely due to IGE. Hence, the genetic variance that determines the potential response of the endemic prevalence to selection must be much greater than expected based on classical quantitative genetic theory, particularly at low levels of the prevalence (Figure 7). We evaluated this implication using stochastic simulation of endemics following standard methods in epidemiology, where parents of the next generation were selected based on their own infection status (mass selection). The results of these simulations show that response to selection in the observed prevalence and in the breeding value for prevalence increases strongly when prevalence decreases, and closely matches our predictions, which supports the theoretical findings presented here.
Model assumptions
Following Anacleto et al. (2015, 2019), Biemans et al. (2019), and Pooley et al. (2020), we assumed a linear additive model with normally distributed effects for the logarithm of susceptibility, infectivity and recovery, leading to a normal distribution of the additive genetic values for the logarithm of (Equation 10). For complex traits, it is common to assume normally distributed genetic effects, based on the central limit theorem (Fisher 1919). Because susceptibility, infectivity and recovery act multiplicatively in the expression for , and because is nonnegative, we specified a normal distribution for its logarithm. This resulted in an additive model on the log scale, which agrees with the infinitesimal model, and also translates the domain of to the domain of the normal distribution. Hence, we assumed constant genetic parameters for the logarithm of . The same approach has been used to model genetic variation in the residual variance, which is also restricted to nonnegative values (SanCristobal-Gaudy et al. 1998; Hill and Mulder 2010). The log-normal distribution of genotypic values for results in a decrease of the genetic standard deviation in with decreasing (Figure 2), which seems reasonable given the presence of a lower bound for . Moreover, the log-normal distribution for is convenient, because it results in simple expressions for the breeding value and the genetic variance for prevalence (Equations 31 and 32).
The assumption of a normal distribution for the logarithm of genotypic values for also agrees with the standard implementation of generalized linear (mixed) models (GLMM; Nelder and Wedderburn 1972). refers to an expected number of infected individuals; In other words, is the expected value of count data. In GLMM, the default link function for count data is the log-link (McCullagh and Nelder 2019). Hence, our linear model for the logarithm of also agrees with common statistical practise.
The strong increase of the genetic variance in prevalence with decreasing (Figure 5A) is not due to the assumption of lognormality of . On the contrary, the log-normal distribution results in a decrease of the genetic standard deviation in with decreasing (Figure 2). The strong increase in the genetic variance in prevalence with decreasing results from the relationship between and the prevalence in the endemic steady state (Figure 1; Equation 3), which becomes steeper when is closer to one. This relationship is very well established in epidemiology since Weiss and Dishon (1971; e.g., Keeling and Rohani 2011).
Hence, the represents the additive component of the genotypic value for . However, because our model is additive on the log-scale, while the genotypic value for includes nonadditive genetic effects, we decided to build our theory on the breeding value for the logarithm of .
Hence, to move the mean breeding value on the log scale into the contact rate, we have to multiply the original contact rate by . While both parameterizations are obviously equivalent, the second results in simpler expressions and has been used throughout this manuscript. Thus, c rather than must be used when applying our results. This is essential, because breeding values and genetic variances for the endemic prevalence depend on the contact rate (e.g., Equations 31 and 32).
In our results for response to selection (Figure 8), we have assumed that the population has reached the endemic steady state at any time. In other words, we assumed that, after a selection, the population has reached the new endemic prevalence before the next selection takes place. Whether this assumption holds true will depend on the rate of convergence of the feedback process in the disease transmission (discussed below and illustrated in Figure 10) versus the rate of genetic improvement. Hulst et al. (2021) show examples of convergence to the new equilibrium. If the genetic improvement goes gradually, for example when replacing part of the animals each year like in dairy cattle, and when the pathogen survives only briefly in the environment, then the prevalence of the local population will track the gradual genetic changes in the population and the improvements predicted will be observed immediately. On the other hand, if the genetic changes are large and abrupt, like when restocking broilers or fattening pigs with a new genetic stock, and when the new stock is exposed to the infectious material from the previous stock, either because the pathogen survives in the environment or from neighboring stables or pens, then it may take some time before the full effect of genetic improvement materializes.
Nevertheless, the full effect of genetic improvement will materialize over time, also when the next selection takes place before the population has reached the new endemic prevalence due to the previous selection. Thus, incomplete convergence to the new equilibrium before the next selection takes place is a transient phenomenon. It does not affect the ultimate genetic improvement, because the ultimate endemic prevalence is determined by the genetic value for , not by the previous prevalence. Incomplete convergence to the new equilibrium may actually lead to a slightly greater response, because the accuracy of selection will be a bit higher at higher prevalence, leading to a larger genetic change. In other words, the in Equation (38b) will be higher, due to higher heritability (This follows from Figure 7A, where heritability is higher at higher prevalence, as long as P < 0.5).
Other compartmental models
In this study, we focused on endemic infectious diseases following a SIS-model, where individuals can be either susceptible (S, i.e., noninfected) or infected (I). Hence, we assumed the infection does not confer any long-lasting immunity, and we ignored the potential existence of infected classes (“compartments”) other than S and I, such as recovered infected individuals that are not yet susceptible again. Moreover, we ignored the influx of new individuals into the population due to births, and the removal of individuals due to deaths.
A key condition for validity of our results is that the pathogen can replicate only in the host individual, meaning that a reduction in infected individuals fully translates into reduced exposure of the host population to the pathogen. The mere survival of the pathogen in the environment does not violate our assumptions (see Hulst et al. 2021 for a discussion). Our conclusions are not limited to SIS models if this condition is met, but apply to all models with no longer lasting immunity. For models with temporary immunity (e.g., SIRS) or lifelong immunity (e.g., SIR) the conclusions with respect to infectivity and susceptibility will be true, but the genetic variation in recovery may a different more restricted role.
Also, infections that do confer long-lasting immunity may show endemic behavior when a population is large enough. Measles in the human population before the introduction of vaccination are is a well-known example. For such infections, the same mechanisms as discussed above will play a role and the endemic prevalence for a homogeneous population still follows from Equation (3). However, the introduction of new susceptibles by birth can no longer be ignored, and recovery of infected individuals does not result in new susceptible individuals. Thus, the role of recovery will change, and the genetic make-up of the newborn individuals becomes relevant, particularly in populations undergoing selection.
Positive feedback
The increasing difference between the breeding value for prevalence and the breeding value for individual infection status at lower prevalence (Equation 33) is a result of the increasing slope of the relationship between and the endemic prevalence (Equation 3, Figure 1). Equation (3) follows directly from a simple equilibrium condition (see text above Equation 3). However, the focus on the equilibrium partly obscures the underlying mechanism.
For a disease caused by exposure, the effect of genetic selection depends on future exposure. For an infectious disease, future exposure depends on the future number of infected individuals in the local population and on their (lifetime) infectivity, both of which are affected by the genetic selection. Thus, for infectious diseases future exposure depends on selection, leading to feedback effects. Figure 10, A and B illustrates that the difference between and originates from positive feedback effects in the transmission dynamics. (Figure 10 shows results for selection against susceptibility, selection for faster recovery would yield identical results). With lower susceptibility fewer individuals will become infected, which subsequently translates into a reduced transmission rate, followed by a further reduction in the number of infected individuals, etc, resulting in a positive feedback loop (Figure 10A). The initial change in prevalence before feedback effects manifest is equal to the selection differential in breeding value for individual infection status (; horizontal lines in Figure 9). This change represents the direct response due to reduced susceptibility, and does not include any change in exposure of susceptible individuals to infected herd mates. Next, prevalence decreases further because the initial decrease in prevalence reduces the exposure of susceptible individuals to infected herd mates. This additional decrease represents the indirect response to selection via the “social” environment. Without genetic variation in infectivity, the direct response makes up a fraction P of the total response in prevalence, and the indirect response a fraction .
The feedback mechanisms outlined here will also play a role in macroparasitic infections. For example, also for macroparasitic infections infectivity will have a nonlinear effect, susceptibility does not have to be zero to eradicate an infection, and prevalence will go down more than linear with a genetic decrease in infectivity. However, we did not investigate how this works out precisely, for example for gastro intestinal parasites.
Herd immunity
In Figure 8A, the infection ultimately goes extinct due to mass selection for individual infection status. This happens due to a phenomenon known as herd immunity (Fine 1993). In the final generation, the infection disappears because falls below a value of one; not because all the individuals have become fully resistant to infection. This result is similar to the eradication of an infection by means of vaccination, which also does not require full immunity of all individuals and can also be achieved when only part of a population is vaccinated (Anderson and May 1985). As can be seen in Figure 10 and in simulation results of Hulst et al. (2021), herd immunity develops over cycles of the transmission-recovery loop. Thus, the full benefits of genetic selection or vaccination do not manifest immediately, as it takes some time for a population to converge to the new endemic steady state.
The relevance of herd immunity for response to genetic selection can be illustrated using the data underlying Figure 8A. For the population starting at a prevalence of 0.5, the contact rate is equal to two, and the mean breeding value for log-susceptibility is equal to zero in the initial generation (c = 2, , so that = 2). In the final generation, the mean breeding value for log-susceptibility has dropped to −0.73, so that = 0.96. Hence, , explaining extinction. However, if the average individual of the final generation would have been exposed to the infection pressure of the first generation, then the expected prevalence for this individual would have been 0.32 (from Equation 20a, with and P = 0.5). Hence, the individual would have been infected 32% of the time. Nevertheless, in a population consisting entirely of this type of individual, as is the case in the final generation, the infection will no longer be present in the long term. This example illustrates the relevance of reduced exposure due to indirect effects for herd immunity and for response to selection of infectious diseases.
Relationship to previous work
Bishop and co-workers have pioneered the integration of quantitative genetics and epidemiology for livestock populations (see references in the Introduction). Some of their work considers both the prevalence of an infection and the negative effect of the infection on performance traits (resilience) in an integrated approach, mostly using stochastic simulation. In this study, in contrast, we focus exclusively on prevalence, since our primary purpose was to develop a quantitative genetic theory for the endemic prevalence of an infection. In particular, we aimed to find expressions for the breeding value and the additive genetic variance in the endemic prevalence. Our results show that these are fundamentally different from quantitative genetic expressions for noncommunicable traits, exhibiting a very large component due to IGE. The effect of an infectious disease on performance traits, in contrast, can be modelled using classical quantitative genetic approaches, such as reaction norm models where trait values are regressed on pathogen load. Hence, resilience may not exhibit indirect genetic variation, i.e., when it is independent of susceptibility, recovery or infectivity, and there is no need to include resilience in theoretical models for prevalence.
MacKenzie and Bishop (2001) and Tsairidou et al. (2019) investigated the prediction of response to selection in the prevalence of infectious diseases, considering both quantitative genetics and epidemiology. MacKenzie and Bishop (2001) directly modeled a constant rate of genetic improvement for the transmission parameter β, treated as a genetic property of the susceptible (i.e., recipient) individual only, and used a stochastic epidemic model to study the impact of genetic improvement in β on and on the probability of a major epidemic. Tsairidou et al. (2019) used a similar approach, but considered both infectivity and susceptibility. They directly modelled response in susceptibility and infectivity, assuming a fixed accuracy of selection for these two traits, and also used a stochastic epidemic model to study the impact of genetic improvement in susceptibility and infectivity on and on the severity of the epidemic. Hence, these two studies combine a classical quantitative genetic approach for response to selection for the parameters of an epidemiological model with stochastic simulation of epidemics. In this study, in contrast, we extend quantitative genetic theory to include and the endemic prevalence, aiming to understand the genetic variation and potential response to selection in these population-level parameters. Hence, we aim to bring epidemiology into the quantitative genetic domain, rather than to combine classical quantitative genetics models for epidemiological parameters with simulation of epidemics.
The breeding value and additive genetic variance for the logarithm of are central to this work. Anche et al. (2014) and Biemans et al. (2019) presented a breeding value and additive genetic variance for , rather than its logarithm. Anche et al. (2014) considered a two locus model with additive effects for susceptibility and infectivity. They derived a breeding value for using partial derivatives of with respect to the allele frequencies at each of the two loci. While their model is additive for susceptibility and infectivity, it contains some epistasis for because depends on the product of these two parameters. For locus-based models with a few loci of fixed effect, it is probably not very relevant on which scale the model is additive (if any). For polygenic models, in contrast, an additive model on the scale of susceptibility and infectivity, or on the scale of , may result in negative values for and in an unrealistically large additive genetic variance in with recurrent genetic selection for lower prevalence. Hence, for polygenic traits, an additive model on the log scale is more appropriate, as argued above.
Biemans et al. (2019) presented an expression for the additive genetic variance for treated as a polygenic traits, with the aim to quantify the amount of heritable variation in in a data analysis. Their expression extends the concept of Anche et al. (2014) to the polygenic case, but can also be interpreted as the variance of a first-order Taylor-series linearization of , assuming independence of susceptibility and infectivity (combining Equations 4 and 5 of the current manuscript). The expression of Biemans et al. (2019) is suitable when the objective is to find a point estimate for the additive genetic variance in in a population, and when the additive genetic variances in susceptibility and infectivity are not too large and susceptibility and infectivity are independent. For a quantitative genetic theory of , however, an approach based on the breeding value for the logarithm of is superior, as argued above.
Utilization of hidden genetic variation for genetic improvement
In this study, we have shown that a fraction of the full individual genetic effect on the endemic prevalence represents an IGE, because only a fraction P of the full effect surfaces in the infection status of the individual itself (excluding genetic variation in infectivity; Equation 33 and Appendix E). In other words, a fraction of the individual genetic effects of susceptibility and recovery on the prevalence are hidden to direct selection and classical genetic analysis. Nevertheless, results in Figure 8 show that prevalence responds rapidly to selection, particularly when prevalence is small. Hence, prevalence responds faster to selection when a greater proportion of its heritable variation is hidden, and when heritability is low (Figure 7A), which seems a paradox.
The response due to the IGE of susceptibility and recovery arises naturally when selecting for lower individual infection status (i.e., for the direct effect); it does not require any specific measures of the breeder. Thus, on the one hand, our results imply that response to genetic selection against infectious diseases should be considerably greater than currently believed, even when no changes are made to the selection strategy. While empirical studies are scarce, the available results support this expectation (discussed in Hulst et al. 2021).
On the other hand, however, classical selection for direct effects is not the optimal way to reduce prevalence, for the following two reasons. First, classical selection does not target genetic effects on infectivity, because an individual’s infectivity does not affect its own infection status (Lipschutz-Powell et al. 2012). Hence, infectivity changes merely due to a potential genetic correlation with susceptibility and/or recovery. When this correlation is unfavorable, infectivity will increase and response in prevalence will be smaller than expected based on the genetic selection differentials for susceptibility and recovery (and thus smaller than the result of Equation 38b). In theory, this could even lead to a negative net response (Griffing 1967). This is similar to the case with social behavior-related IGEs on survival in laying hens and Japanese quail, where selection for individual survival has sometimes increased mortality (Craig and Muir 1996; Muir 2005). This scenario seems unlikely for infectious diseases, but at present we lack knowledge of the multivariate genetic parameters of susceptibility, infectivity and recovery to make well-founded statements.
Second, even in the absence of genetic variation in infectivity, individual selection for susceptibility and recovery is nonoptimal because the accuracy of selection is limited due to limited heritability, particularly at low prevalence (Figure 7A). The response to selection in traits affected by IGE can be increased by using kin selection and/or group selection (Griffing 1976; Muir 1996; Bijma 2011), and by including IGE in the genetic analysis (Muir 2005, Bijma et al. 2007b; Muir et al. 2013; Anacleto et al. 2015; Biemans et al. 2019; Pooley et al. 2020). Kin selection occurs when transmission takes place between related individuals, for example within groups of relatives (Anche et al. 2014). Group selection refers to the selection of parents for the next generation based on the prevalence in the group in which transmission takes place, rather than on individual infection status (Griffing 1976). Both theoretical and empirical work shows that kin and group selection lead to utilization of the full genetic variation, including both DGE and IGE (Griffing 1976; Muir 1996, 2005; Bijma and Wade 2008; Bijma 2010, 2011). For infectious diseases, the work of Anche et al. (2014) illustrates the effect of kin selection, where favorable alleles for susceptibility increase much faster in frequency when disease transmission is between related individuals. Simulation studies on IGE in pig populations suggest that the benefits of kin selection also apply to breeding schemes based on genomic prediction (Chu et al. 2021).
Do pathogens create kin selection?
Exposure to infectious pathogens is a major driver of the evolution of host populations by natural selection, both in animals and plants (reviewed in Karlsson et al. 2014 and Ebert and Fields 2020). In the human species, for example, a study of genetic variation in 50 worldwide populations reveals that exposure to infectious pathogens is the primary driver of local adaptation and the strongest selective force that shapes the human genome (Barreiro and Quintana-Murci 2010; Fumagalli et al. 2011). The key role of infectious pathogens in natural selection, together with the large contribution of IGE to the genetic variation in prevalence in the host population, indicates that IGE must have been an important fitness component in the evolutionary history of populations. This, in turn, suggests that associating with kin may have evolved as an adaptive behavior. In other words, the key role of infectious diseases in natural selection might lead to social structures where individuals associate preferably with kin, because such behavior has indirect fitness benefits. This is because interactions among kin lead to utilization of the full heritable variation in fitness, including both DGE and IGE (Bijma 2010), and thus considerably accelerate response of fitness to selection. At low to moderate levels of the endemic prevalence the indirect genetic variance in prevalence might be sufficiently large for such behavior to evolve, even in the absence of direct fitness benefits such as preferential behavior toward kin. While this is a complex issue requiring careful quantitative modelling, including migration and the emergence of selfish mutants, the key role of pathogens in natural selection together with the large IGE demonstrated here strongly suggest the importance of kin selection in the history of life.
In agriculture, the implementation of kin selection may be feasible when animals can be kept in kin groups or plants can be grown in plots of a single genotype or a family in the breeding population. In many cases, however, this will not be feasible, and other methods will be required to optimally capture the IGE underlying the prevalence of infectious diseases. In particular, we need more and better phenotypic data on disease traits (Bishop and Woolliams 2014). Current developments in sensing technology and artificial intelligence enable the development of tools for large scale automated collection of longitudinal data on individual infection status, and also on the contact structure between individuals (relevant mainly in animals). These advances, together with genomic prediction and recently developed statistical methods for the estimation of the direct and IGE underlying the transmission of the infection (Biemans et al. 2019; Pooley et al. 2020) could represent a much-needed breakthrough in the artificial selection against infectious diseases in agriculture. Our results on genetic variation and response to selection suggest that such selection is way more promising than currently believed.
Data availability
An R-code to numerically find the endemic equilibrium prevalence is provided in the file Supplementary Material 1 - Numerical Solution Prevalence Heterogeneity.
Supplementary material is available at GENETICS online.
Acknowledgments
The authors thank Jack C. M. Dekkers for helpful comments on the manuscript.
Funding
Funding for this study was received from the host institutions of the authors.
Conflicts of interest
The authors declare that there is no conflict of interest.
Literature cited
EFSA Panel on Animal Health and Welfare (AHAW).
Appendices
Appendix A
with heterogeneity and log-normally distributed susceptibility, infectivity en recovery
So there is no interaction between i and j. (This property is known as separable mixing in the epidemiological literature; Diekmann et al. 1990, 2013). Moreover, we assume that susceptibility, infectivity, and recovery follow a log-normal distribution (Equations 6 and 7). We also assume that the population is not very small, so that in the early phase of an endemic where only few individuals are infected, the composition of the remaining susceptible individuals is not affected.
Hence, we now have the elements of , but still need to solve the integral expression.
The right-hand side of this expression is identical to the mean genotypic value for (Equation 12).
Appendix B
Numerical solution to find the endemic prevalence with heterogeneity
Appendix C
Methods for simulation of epidemics and validation of prevalence and genotypic value for individual disease status
We simulated endemics according to standard epidemiological theory to validate the numerical solution of the endemic prevalence (Equations 20a and 20b) and the genotypic values for binary disease status (Equation 24). We considered two compartments of individuals, susceptible individuals (S) and infected (I) individuals, and a so-called stochastic SIS-model where susceptible individuals can become infected, and infected individuals can recover and then immediately become susceptible again (Weiss and Dishon 1971). For simplicity, we simulated genetic variation in susceptibility only, with γi ∼ Lnorm(0, ).
To limit Monte-Carlo error, we simulated a relatively large population of N = 2,000 genetically unrelated individuals for a total of 300,000 events (infection or recovery). We used a burn-in of 100,000 events before recording data on individual binary disease status. Hence, in the recorded data, the average individual experienced 100 events (50 infections and 50 recoveries).
The endemic was started by infecting a proportion P0 = 1-1/c of the individuals, chosen at random. Subsequently, we sampled events (infection or recovery) and the individual involved using Gillespie’s algorithm (Gillespie 1977). For each infected individual, the probability of recovery was proportional to the recovery rate,. For susceptible individual i the probability of infection was proportional to , I/N denoting the fraction of the population that is infected. Probabilities were accumulated over all individuals and scaled to a sum of 1 by dividing them by their sum. Finally, the specific event was sampled by drawing a random number, say x, from a standard uniform distribution and finding the event and the corresponding individual belonging to the probability interval [], where . The disease status of that individual and I were updated before sampling the next event. The time of each event was not simulated. After 300,000 events, prevalence was calculated as the disease status averaged over the entire population, and also by individual, discarding the burn-in period. The regression coefficient of average individual disease status on was also estimated.
Additive genetic variance in log-susceptibility was . Three scenarios were considered, differing in contact rate: c = 1.22 giving P = 0.2, c = 2 giving P = 5 and c = 5.15 giving P = 0.8. Those combinations of , c and P were found by numerically solving Equations (20a) and (20b). The actual prevalences observed in the simulations were equal to these numerical solutions.
Appendix D
Additive genetic variance in log-normal traits.
We assumed log-normally distributed genotypic values for susceptibility, infectivity and recovery, also resulting in a log-normal distribution for and for . Hence, genetic effects are additive on the log-scale, but taking the exponent introduces some nonadditive genetic variance on the actual scale. Here, we derive the fraction of the variance that is additive on the actual scale.
Figure D1 illustrates that the additive fraction of the variance in z approaches 1 when goes to zero. For , ∼88% of the variance in z is additive. Variances on the log scale larger than 0.52 are unrealistic (see main text). This indicates that at least 88% of the genetic variance in susceptibility, infectivity, recovery, R0 and prevalence is additive when they follow a log-normal distribution.
Appendix E
Breeding value for individual disease status vs breeding value for prevalence, without genetic variation in infectivity
Hence, without genetic variation in infectivity, and are identical, and we will use the symbol in the following.
Hence, when expressed relative to their mean, and differ approximately by a factor P (see also Figure 4 in Bijma 2020). This result is approximate, because the true relationship is nonlinear and the expression is approximate with variation among individuals. For realistic magnitudes of the genetic variance, however, the nonlinearity is limited. Note that the above derivation does not require the assumption of a log-normal distribution of susceptibility and recovery.
Appendix F
Methods for observed response to selection
First a base population was generated of N = 4000 unrelated individuals, with genetic variation in susceptibility only. No distinction was made between males and females. For each individual, breeding values for the logarithm of susceptibility were sampled from , and individual susceptibility was calculated as . The expected prevalence for the base generation was calculated as , with a c of either 2 or 10, and the initial disease status of base generation individuals was sampled at random from .
Next, an endemic was simulated following methods described in Appendix C, for a total of 15,000 events (sum of infections and recoveries), consisting of a burn-in of 10,000 events and 5,000 recorded events. The 4,000 individuals were ordered based on their mean individual disease status over the 5,000 recorded events (so based on 1.25 events on average per individual), and the 2000 individuals with the lowest values were selected as parents of the next generation (corresponding to a selected proportion of 0.5).
Selected parents were mated at random. Each pair of parents produced two offspring, resulting in N = 4,000 offspring. Offspring inherited the breeding value for the logarithm of susceptibility in a Mendelian fashion; . The initial disease status of offspring (i.e., at the start of the burn-in period of their generation) was sampled at random from , where denotes the expected prevalence in the offspring generation, calculated as . The 0.02 guaranteed an average of at least 80 infected individuals at the start of the endemic in any generation, also when the expected prevalence was zero (i.e, when ). Then an endemic was started, as described above for the base generation, etc. This process was repeated until the number of infected individuals dropped to zero, implying extinction of the infection.