National Security

How McClatchy analyzed the data for its ‘Stricken’ investigation into veterans cancer

When veterans are approved for medical coverage by the Department of Veterans Affairs, they go to one of the almost 2,500 hospitals, outpatient clinics or medical centers run by the VA’s Veterans Health Administration.

The Veterans Health Administration also records and maintains data on the number of diseases and illnesses diagnosed or treated in those facilities.

Cancers, like other diseases, are in part identified in medical records by their International Classification of Disease (ICD) code, a diagnosis code set by the World Health Organization. When a veteran receives care at a VA health care facility, many experts told McClatchy the attending physician or nurse is required to enter the diagnosis code in the patient’s record in order to close out the appointment.

Through multiple Freedom of Information Act requests, McClatchy obtained administrative cancer-treatment billings data from the Veterans Health Administration for all ICD cancer diagnosis codes from fiscal years 2000 through 2018.

Since veterans may schedule more than one appointment per fiscal year to treat their cancer at a VA health care facility, McClatchy requested the number of “unique” visits per fiscal year by cancer code. The Veterans Health Administration defines “unique” as one patient, identified by Social Security number, who is counted only once per fiscal year regardless of the number of visits.

From fiscal year 2000 to fiscal year 2018, the Veterans Health Administration recorded more than 12 million unique visits to treat cancer. That dataset includes both new diagnoses and returning patients. The data counts both primary cancers, such as lung or breast cancer, and secondary cancers, which occur when a primary cancer spreads to another location of the body. McClatchy excluded secondary cancers from its analysis. A patient can be diagnosed with more than one primary cancer at a time. Each primary cancer is counted separately in the VHA data. Five of the veterans in McClatchy’s reporting had more than one primary cancer.

The VA has reported it diagnoses about 45,000 new cases of cancer among veterans a year.

In its calculations, McClatchy included cases of malignant cancer and those which may turn malignant in future, designated “in situ.”

McClatchy chose fiscal year 2000 as the starting point for the analysis because it reflected the last VA usage numbers before the military response to the September 11, 2001, attacks.

The data required some merging. The VA maintained its billing data in ICD 9, a prior version of ICD coding, from fiscal years 2000 through 2015. In fiscal year 2016, the VA moved to ICD 10. So FOIA data obtained by McClatchy came in two sets; one from fiscal year 2000 to 2015 and one from fiscal year 2016 to 2018.

McClatchy reconciled differences between the two datasets to give a continuous view of instances of cancer in the veteran community. McClatchy used a conversion table created by the National Cancer Institute to merge the two datasets. While several codes have one-on-one matches, many ICD 9 codes have multiple related ICD 10 codes. For these instances, McClatchy matched the sum of each related ICD 10 code with the ICD 9 code.

McClatchy calculated a rate by dividing the number of unique claims for each cancer type by the total number of claims filed that year and then multiplying by 100,000. In academia, the term “cancer rate” generally connotes the rate of newly diagnosed cancers, but the rates used by McClatchy include both newly diagnosed cancers as well as those which had been previously diagnosed, and reflects the overall burden of cancer cases on the VA health care system.

To make the data easier to view, some specific cancer types were also grouped into larger categories. For instance, cases of leukemia, lymphoma and myeloma were grouped as blood cancers, and cancers of the thorax, lungs and bronchi were grouped as respiratory cancers.

Not all veterans choose to receive, or are eligible for, health care at the VA. Eligibility is based on an agency assessment of veterans’ military records, their income and whether or not injury or illnesses are likely connected to their time in uniform. Studies state that VA users tend to be older, sicker and in a lower-income bracket than the general U.S. population.

Since McClatchy analyzed data for cancer cases treated in VA health care facilities, the analysis does not include veterans who were diagnosed or treated for cancer outside the VA system.

McClatchy excluded from its analysis non-veterans like spouses, dependent children, or family caregivers who are eligible for VA benefits. McClatchy also did not include secondary cancers, such as a primary breast cancer that has spread to the lungs, in the analysis.

The rate of cancer treatments for veterans at VA health care centers spikes sharply from fiscal year 2000 to fiscal year 2001. Experts McClatchy consulted provided no specific explanation for the increase.

From fiscal year 2001, the rate increases until it peaks at around fiscal year 2009. It then gradually decreases until another rise that starts in fiscal year 2014 and crests in fiscal year 2016. Experts consulted by McClatchy believe that 2016 rise is due to possible overcounting in those fiscal years, since the coding system transitioned from ICD 9 to ICD 10 at that time. The rate then decreases in fiscal year 2017 again but starts to show a rise in fiscal year 2018 ⁠— the final year included in McClatchy’s analysis.

Experts said that an accurate comparison of the rates of cancer cases treated in the VA health care system to that in the general U.S. population would only be possible with the creation of a statistical model using age and gender-specific data for each cancer case. Since McClatchy did not have access to data on patients’ age or gender, McClatchy was unable to compare the rates it found to rates in the general U.S. population.

It is important to note that studies have found that ICD coding data, like the data McClatchy used primarily in its analysis, has a tendency to overcount, which experts told McClatchy is often due to human error when inputting the codes. Conversely, cancer registries ⁠— databases used to monitor and track cancer diagnoses, such as the VA’s Central Cancer Registry system ⁠— have a tendency to undercount.


When McClatchy presented initial findings to the VA, the agency said it disagreed with McClatchy’s approach. The VA said an analysis of billing data would create an overcount, and that its internal cancer registry system did not show a significant rise.

“According to the latest official VA cancer data, the annual total number of cancer cases among enrolled veterans peaked in 2010 and has been declining since,” the VA said in a statement. “Colorectal and prostate cancer have been declining, while hepatocellular and skin (melanoma) cancer have been increasing. These trends largely mirror national cancer trends.”

There are multiple ways to track cancer rates, and each has limitations.

Studies have found that the billing data used in McClatchy’s analysis, which covered all treatments provided by the VA coded as cancer according to the International Classification of Diseases (ICD), has a tendency to overcount, while data from cancer registries such as the one used by the VA has a tendency to undercount.

McClatchy asked the VA through a Freedom of Information Act request for the internal data that the VA referenced in the statement above.

For the VA’s response, read more by clicking the arrow in the upper right.

FOIA Response

In its response to that FOIA request, the Veterans Health Administration said parts of the cancer registry system were not being maintained.

“[The VA Central Cancer Registry] is not a viable source of VA cancer registry data at this time,” the Veterans Health Administration wrote in a response to our open records request. “There are no staff working on [the VA Central Cancer Registry] so it is not functioning to any standard.”

The VA then sent McClatchy raw data from its cancer registry that could not be adjusted for population and did not include a breakdown by service.

While the VA in its statement noted a decrease in cancers from 2010, viewing the raw cancer registry data over a longer period, from 2000 to 2017, showed an increase in some cancers. It was a similar trend to McClatchy’s analysis of billing data over fiscal years 2000 to 2018.

The VA’s cancer registry data shows the number of blood cancers increased 41 percent, while bladder, kidney and ureter cancers increased 70 percent. Skin cancers have increased 48 percent, brain cancers are up 20 percent. Liver and pancreatic cancers are up 151 percent — although they represent only a small number of actual cases. Prostate cancers are up 9 percent.

McClatchy’s analysis of billing data showed decreases in treatments for brain, respiratory and testicular cancers. VA’s cancer registry system showed increases in brain, respiratory and testicular cancers. The differing results are due to differences in methods of calculation and the makeup of the two datasets.

“Don’t let anyone convince you that the information you’re pulling is wrong. It’s not wrong. It’s just different from what VA has used,” said Susan Lukas, a former VA official who now advocates for military reservists and veterans through the Reserve Officers Association. “One of the outcomes of your research comparing ICD codes to identify veterans with cancer is that it may be time for VA to use this international system instead of their internal cancer register.”

Why We Did This Story

Military correspondent Tara Copp was in Kuwait, preparing to embed with U.S. forces crossing into Iraq as the 2003 invasion began.

In one early March evening exchange at Kuwait’s Ahmed Al Jaber Air Base, where thousands of U.S. personnel were supporting the air campaign, Copp was in a pickup truck with a member of the National Guard driving around the base. The smell outside, a mixture of jet fuel and burnt air, had already become familiar.

The driver turned to Copp and talked about his concern that as a member of the National Guard, if he got sick, his health care may not be covered.

In the years since, as tens of thousands of service members returned home, questions have been raised about the health effects of their exposure to burn pits, toxic air, cancer-linked firefighting foam or emissions from advanced jets.

In the process of reporting this story, the struggle of veterans from previous conflicts to get their chronic illnesses recognized was also raised.

That’s why we went looking for the data, and reported what we found.

Support investigative journalism

These stories matter, and investigative reporters help hold government agencies accountable by bringing the facts to you.

Subscribe to McClatchyDC for more in-depth journalism from sources you can trust. Subscribe here.

After several requests for comment, the VA sent McClatchy data on newly diagnosed cancers for calendar years 2000 to 2017 from “CDW ONC RAW,” one of the datasets in their cancer registry system.

That dataset shows increases in cancers such as brain, respiratory and testicular cancers. McClatchy’s analysis of billing data found that the rates for treatments of those cancers decreased.

The reason for the discrepancy is the billing data that McClatchy obtained includes both new diagnoses and returning patients, while the VA’s cancer registry data includes only new diagnoses, known as incidences. In addition, McClatchy adjusted the billing data to account for changes in the total number of veterans getting treated in VA health care facilities over the years.

Since McClatchy could not access patient-specific data from the cancer registry, the VA advised that only its raw incidence numbers should be used. The changes McClatchy reported from the VA’s cancer registry system reflect changes in the raw number of diagnosed cancer cases and not population-adjusted rates.

Shirsho Dasgupta is a data reporting fellow at McClatchy’s DC bureau. A graduate of the University of Southern California, he also has reporting experience in India and started his journalism career writing about soccer for outlets like The Guardian and VICE.
Ben Wieder is a data reporter in McClatchy’s Washington bureau. He worked previously at the Center for Public Integrity and Stateline. His work has been honored by the Society of American Business Editors and Writers, National Press Foundation, Online News Association and Association of Health Care Journalists.