Purpose: The Georgetown-Einstein model is called 'Spectrum' (Simulating Population Effects of Cancer Control Interventions -- Race and Understanding Mortality). Developed at Georgetown University and Albert Einstein College of Medicine, it is a continuous time parallel universes state transition model programmed in C++ object-oriented programming language. The model simulates breast cancer incidence and mortality by estrogen receptor (ER)/human epidermal growth factor receptor 2 (HER2) status in the absence of screening or adjuvant treatment and then overlays screening and/or treatment.
Overview: Spectrum/G-E is a microsimulation of breast cancer in the United States population, implemented in the C++ programming language, that is specifically oriented towards estimating the impact of screening and adjuvant treatment innovations that have taken place since 1975. The approach is phenomenological: there is no attempt to model any specific biology of breast cancer. The impact of screening and treatment are managed by creating “parallel universes” whereby the same life history is subjected to different real or counterfactual screening or treatment strategies, and the varying results directly compared. The model’s inputs have been calibrated to produce a reasonable approximation to Surveillance, Epidemiology, and End Results (SEER) incidence and mortality over the period 1975-2010.
The Natural History of Breast Cancer: Breast cancer is assumed to exist in two forms: progressive and non-progressive. Non-progressive lesions have a transient existence, are never identified clinically, but may be detected through screening and present as ductal carcinoma in situ (DCIS) when they are screen detected. Non-progressive breast cancer has no mortality associated with it. Progressive lesions may present clinically, or through screen detection, in any of the AJCC stages (DCIS, I, IIa, IIb, III, and IV), and all of these lesions carry a risk of breast cancer mortality. The incidence of breast cancer depends on a woman’s birth cohort, and varies with age. The age-specific incidence rates, in turn, depend on the woman’s breast density. All breast cancers, progressive or non-progressive, may be classified by the presence or absence of two biomarkers: estrogen receptors (ER) and HER2. The mortality risk conferred by any given breast cancer depends upon these biomarkers, the patient’s age at diagnosis, the stage at diagnosis, and the treatment provided.
In the simulation, construction of a life history begins by selecting a birth cohort for each simulated woman, sampled from the distribution of population birth years, relying on U.S. Census data, or, in some applications a single birth cohort is simulated. Conditional on her birth cohort, a date of death from non-breast cancer causes is sampled from a cohort-specific 1-year life table. She is also assigned breast densities at ages 40, 50, and 65 based on inputs specifying the prevalence of the four Breast Imaging Reporting and Data System (BI-RADS) density categories at age 40, and estimates of the probabilities of transition among those categories at ages 50 and 65.
Incidence of breast cancer in the (counterfactual) absence of screening is based on a modification of the Holford age-period-cohort (APC), extended beyond its covered ages and cohorts by applying year-on-year incidence ratios from the Gagnon APC model, and then further calibrated to improve match to SEER incidence 1975-2010. In addition, the basic incidence rates are adjusted up or down by applying hazard ratios that depend on the woman’s breast density at each age.
A time-to-event distribution for onset of clinical breast cancer is sampled to determine when, if ever, the woman will develop clinically apparent breast cancer. If a clinically apparent breast cancer will develop it is assigned a stage by sampling the age-specific stage distribution for clinically detected cancers, and then is given a biomarker classification by sampling the biomarker distribution conditional on age and stage. Survival from the time of clinical diagnosis in the absence of treatment is then sampled from a time-to-event distribution conditional on age, stage and biomarkers using the survival functions describing prognosis of breast cancer in 1975, and the corresponding date of death from breast cancer (which may be before or after the date of non-breast cancer death) is calculated.
Finally, a sojourn time for the lesion is sampled. The sojourn time distributions are conditional on age at clinical presentation and on biomarkers. All of the conditional distributions are assumed to be gamma distributions with a common shape parameter. (The value of the shape parameter is an input, as are age-biomarker specific means.) A date preceding the date of clinical onset of breast cancer by a timeframe equal to the duration of the sojourn time is identified as the onset of the sojourn period for this lesion. Note that the dates at which the lesion transitions from one stage to the next are not simulated, only the stage at clinical presentation.
If no clinically apparent breast cancer is to develop, time-to-event distributions for onset and regression of non-progressive lesions are sampled to determine when, if ever, such a lesion develops. Parameters of these distributions are among those calibrated to produce a match to SEER incidence after the dissemination of screening in the United States. The non-progressive lesion is assigned an ER/HER2 classification by sampling from the biomarker distribution of all DCIS lesions for a woman of her age, and its stage is set to DCIS.
The above steps create a basic life history describing breast cancer in the absence of screening or adjuvant treatment, characterizing each simulated woman by a birth date, date of death from non-breast cancer causes, and, in women with breast cancer, dates of sojourn onset, clinical presentation, and death from breast cancer.
The Screening Process: Each woman is assigned a mammography screening schedule (or, in simulations including counterfactual screening strategies, several screening schedules). The “dissemination” screening schedule is randomly sampled to produce birth cohort-specific screening schedules that are thought to resemble actual screening behavior among women in the U.S. The dissemination screening model is based on the work of Cronin and Krapcho, and is a direct implementation of their algorithm. Counterfactual strictly periodic screening schedules such as “every two years from age 50 to age 74” can be simulated as well, as can scenarios in which screening intervals vary with age or with breast density. If a screening mammogram is performed during the sojourn interval of a breast cancer, there is a probability that it will be detected. This probability, known as the sensitivity, depends on the woman’s age at the time of the screening, whether it is an initial or subsequent screen, the woman’s breast density, and whether the mammogram uses film or digital technology. Note that this sensitivity is an abstract, unobservable parameter of the model that is calibrated to reproduce 1-year screen detection rates from the Breast Cancer Surveillance Consortium (BCSC). The actual outcome of the simulated screening is determined by sampling a uniform random number and comparing that to the sensitivity.
If screen detection occurs, a new stage, possibly earlier than the clinical stage, is assigned to the lesion. To do this, the model draws on distributions of stage dwell. The distributions are assumed to be exponential, and the means are unconditional program inputs. Based on the lead-time obtained by screening and the dwell time distributions, a screened stage is sampled from a Bayesian posterior distribution. (The prior probability distribution is a screened stage input, the “data” is the lead-time, the likelihood for the lead-time is calculated by convolving the stage-dwell time distributions.) Survival in the absence of adjuvant treatment is then re-calculated based on the new age and stage at diagnosis (the biomarkers are assumed to be the same as would have been seen if the lesion were diagnosed clinically), and the date of death from breast cancer is revised accordingly.
In light of the above, the effect of screening on breast cancer mortality is based entirely on stage shift and age shift.
Once a lesion has been screen detected, screening terminates. If a lesion goes undetected at every screening examination, it will still present clinically at its clinical presentation date (unless it is non-progressive, in which case it will eventually regress). Screening examinations conducted before the sojourn period, or in a woman with no breast cancer in her life history, have a probability of nevertheless leading to a false-positive result. This probability is one minus the specificity of the test. Test specificity is conditional on age, breast density, initial vs. subsequent screen, and screening technology. False-positive screening tests do not interrupt the screening schedule.
The Treatment Process: Upon clinical diagnosis or screen detection, a woman with a breast cancer diagnosis is assigned a treatment. In the basic model, this is done by sampling from an age, stage, year of diagnosis, biomarker-specific distribution of treatments. These distributions are program inputs thought to represent the dissemination of adjuvant therapies in the U.S. population. Counterfactual treatment distributions (e.g., every woman receives the most effective treatment available at the time for her age and biomarker combination) are also available.
Each combination of treatment and lesion characteristics (age at diagnosis, stage, biomarkers) is associated with a hazard ratio less than or equal to 1 which specifies the treatment effectiveness. Although the model is programmed to also apply cure fractions in association with treatment, all implementations of the model so far have assumed all cure fractions are zero and have relied exclusively on hazard reduction. These hazard ratios are derived from an expert review and meta-analysis of clinical trials of breast cancer treatment. The survival curve for the lesion, with the treatment-associated hazard ratio applied, is sampled to determine a new survival duration, and the date of death from breast cancer is modified accordingly.
Customization to Particular Analyses: The Spectrum/G-E model has been used in a number of different analyses over the years. Where a special population is the focus of analysis (e.g. obese women, a racial group, a geographic subpopulation, a high-risk group), customization is achieved by modifying the input parameters to reflect the circumstances of that population. A user who is familiar with the details of the model inputs will have no difficulty accomplishing those modifications using standard software tools such as spreadsheets or statistical packages.
Where the goal is to model counterfactual policies for screening and treatment, those policies are implemented through modification of the program code. Model GE is an object-oriented program coded in ISO standard C++. Its basic code includes a variety of mammogram scheduler classes that support the modeling of fixed periodic schedules, periodic schedules with variable intervals that depend on age or density, or implements the dissemination mammography schedule. It is a matter of adding code that creates those objects at the appropriate point in the code, and adding lines of code that create copies of the basic life history, and then applying those mammography schedule objects to them.
References
- Schechter CB, Near AM, Jayasekera J, Chandler Y, Mandelblatt JS. Structure, function, and applications of the Georgetown-Einstein (GE) breast cancer simulation model. Med Decis Making. 2018 Apr;38(1_suppl):66S-77S. [Abstract]
- Mandelblatt J, Schechter CB, Lawrence W, Yi B, Cullen J. The SPECTRUM population model of the impact of screening and treatment on U.S. breast cancer trends from 1975 to 2000: principles and practice of the model methods. J Natl Cancer Inst Monogr 2006;(36):47-55. [Abstract]