The Georgetown Smoking-Lung Cancer Macro Model aims to provide a simplified strategy for predicting lung cancer rates based on current and past smoking. The model begins with the two-stage clonal expansion (TSCE) model, which was developed using Cancer Prevention Study (CPS-II) data to incorporate the roles of three factors in lung cancer mortality: smoking duration; smoking intensity; and time since quitting (for former smokers). Separate estimates of predicted lung cancer rates by age and gender were developed for current, former, and never smokers, and were then aggregated over the smoking groups for each age and gender.
To correct for variations by cohort and age (where the cohort effects picked up temporal effects), regression models were estimated that allow for deviations in the predicted lung cancer death rates from historical lung cancer death rates. These models were estimated separately by gender, using data for age 30 to 84 for each of the years from 1969 to 2000 to calibrate the model. We then validated the model by comparing estimates by age and gender to lung cancer rates for the years 2001 through 2010. A specification that allowed for an interaction of age and age-squared and cohort and cohort-squared for the predicted rates, as well as independent cohort and cohort-squared effects, was found to predict best. It was used as our final model for both males and females.
Methods used to predict lung cancer death rates by smoking-related factors
To incorporate the role of smoking duration and intensity in lung cancer mortality, we used the two-stage clonal expansion (TSCE) model as applied by Hazelton et al. (1). They estimated a series of non-linear equations that related the lung cancer death rate to rates of initiation, cell division, apoptosis of initiated cells, and malignant conversion of initiated cells which, in turn, were a function of smoking intensity and duration. Separate models were developed using CPS-I and CPS-II data.
Programming for the models was made available to us as a Microsoft Excel add-in. Data on smoking intensity, age of initiation, and years since quitting—by age, gender, and year—were applied to the TSCE models to determine lung cancer death rates separately for never, current, and former smokers. The data were developed by Holford et al. (2–4) using age-period-cohort models applied to National Health Interview Survey data from 1965 to 2010. For current and former smokers, intensity was measured as the average number of cigarettes smoked per day. Smoking duration was measured as the current age minus the smoking initiation age for current smokers, and as the current age minus the sum of smoking initiation age and years since quitting for former smokers. Lung cancer mortality rates were estimated for each age, gender, and year by smoking status (current, former and never).
We developed separate estimates of lung cancer death rates by age, gender, and year (1969–2010) for the CPS-I and CPS-II models, yielding four separate sets of estimates of lung cancer death rates: male CPS-I; female CPS-I; male CPS-II; and female CPS-II. For each set, death rates were applied by smoking status, age, and year to the population in the respective categories —measured as prevalence of a particular smoking status multiplied by the total population—to obtain total deaths. The deaths were summed over the three smoking status categories to obtain predicted total lung cancer deaths by age, gender, and year, which were divided by population to obtain overall lung cancer mortality rates. These predicted lung cancer mortality rates were compared to the historical lung cancer mortality rates (by age, gender, and year) that were obtained from the National Center for Health Statistics. (5)
Regression Analysis
Using data by age, gender, and year, we then calculated a single model that regressed the historical lung cancer rate (HLCR) on the predicted lung cancer rates (PLCR) and intercept, which clearly indicated a systematic error related to age and year. Since age, period, and cohort are collinear, we focused on age and cohort, where age is known to affect lung cancer rates and cohort is known to distinguish whether smoking risks vary over time. We considered age, age2 and age3 and cohort, cohort2 and cohort3 sequentially to the equations; we found that the cubed term added little explanatory power and did not reduce correlation of the error terms. The basic equation was:
HLCRa,t= b0 +[b1 +b2Agea,t +b3Agea,t2 + b4Cohort a,t__+ b5Cohorta,t2] PLCRa,t +ea,t,
The first term in brackets (b1) indicates general biases in the TSCE estimates (unrelated to age or year), and the next two terms correct the predictions of the smoking models for linear(b2) and non-linear(b3) biases in age, followed by linear(b4) and non-linear(b5) biases by cohort. The cohort coefficients were used to consider the changing relationship of smoking to lung cancer over time.
References
- Hazelton WD, Clements MS, Moolgavkar SH. Multistage carcinogenesis and lung cancer mortality in three cohorts, Cancer Epidemiol Biomarkers Prev 2005: 14: 1171-1181.
- Anderson CM, Burns DM, Dodd KW, Feuer EJ. Chapter 2: Birth-cohort-specific estimates of smoking behaviors for the U.S. population, Risk Anal 2012: 32 Suppl 1: S14-24.
- Holford TR, Levy DT, McKay LA, Clarke L, Racine B, Meza R, et al. Patterns of birth cohort-specific smoking histories, 1965-2009, Am J Prev Med 2014: 46: e31-37.
- Holford TR, Meza R, Warner KE, Meernik C, Jeon J, Moolgavkar S, et al. Tobacco control and the reduction in smoking-related premature deaths in the United States, 1964-2012, JAMA 2014: 311: 164-171.
- National Center for Health Statistics. Lung Cancer Death Rates, Hyattsville MD: Public Health Service; 2015.