Earlier this month, the HHS’ Centers for Medicare and Medicaid Services (CMS) announced two significant changes to how it handles data access requests. Physical copies of data will no longer be available, and data can only be accessed via the CMS’s own virtual data center. Those wishing to access these data will also face higher fees for the privilege.
Given that these decisions come in the wake of recent data breaches at CMS contractors the changes may appear sensible. In 2022, up to 254,000 Medicare beneficiaries were put at risk when a subcontractor (Healthcare Management Solutions (HMS), LLC) was the victim of a ransomware attack. Again in 2023, another contractor was the target of a cyberattack, with over 600,000 Medicare beneficiaries having their data exposed after a vulnerability was exploited in Progress Software’s MOVEit Transfer solution.
By hosting data in their own virtual environment, they have greater control over who can access it and prevent further, unauthorized distribution of the data. Yet they embody an issue that has only grown in magnitude over time: how do we balance the risks of health data breaches with the freedom of access needed to generate advances in public health?
Healthcare Data in the United States
Healthcare provision generates near-unparalleled volumes of data. Every interaction between a patient and their provider is documented and fed into a larger management system, culminating in millions of individual records covering hundreds of millions of test results, diagnoses, and outcomes. The quality of the information being collected has also improved. Once, we relied on medical or diagnostic proxies for conditions, but technological advances have cut out the medical middleman and can lead us more directly to the source of ill health.
Yet America’s federalized and privatized healthcare system gives way to a fractured and complex health information landscape, with relatively few datasets spanning multiple care networks, let alone states. This greatly hinders public health research, which can only exist as a field if there is data from which to draw meaningful conclusions. Without sufficient data, researchers are unable to accurately assess trends, identify risk factors, develop effective interventions, or evaluate the impact of public health initiatives. Thus, the absence of data not only impedes progress in understanding and addressing health challenges but also undermines efforts to improve overall population health and well-being.
One commonly used dataset, the “Veterans Affairs” dataset, is used widely in epidemiological and medical information. Yet its use produces imperfect results. Women represent less than 20% of the patients, a significant deviation from what would be expected in the general population, which limits the generalizability of study results. Imbalances exist in the data for a range of other social and demographic factors known to impact health, or even in the prevalence of certain conditions: more veterans are amputees, for example.
Therein lies the value of the data held by the CMS. The CMS is the United States federal agency that administers all public health insurance, which covers approximately one-third of the entire US population. This more representative sample has been used by researchers in several high-profile projects, such as to support the development of the Affordable Care Act or to document racial disparities experienced by Medicare recipients, as well as innumerable academic projects. That this research has contributed a positive impact on public health is undeniable, but perhaps one of its biggest benefits is not measurable directly by academic papers. Scores of researchers were first exposed to the complexities and intricacies of public health research via the CMS dataset. As 375 academic researchers pointed out in a letter to Chiquita Brooks-LaSure, Administrator at the CMS, such fees “will disproportionately dissuade research by and training of scholars from disadvantaged backgrounds and institutions.” Losing such voices will only serve to entrench an already inequitable academic landscape.
The CMS dataset is a unique resource on the state of Americans’ health. No other dataset can come close to its complexity, and restricting its access will have immeasurable consequences for the future of public health research and leadership.
Concerns Over Data Privacy
None of this is to say that the CMS’s decisions were completely unfounded. With such vast data reserves come great privacy concerns. In the United States, the use and disclosure of medical data are governed by the Health Insurance Portability and Accountability Act of 1996 (HIPAA). The Act has a broad remit, but one of its central focuses is ensuring that a patient’s protected health information remains private, with strict guidelines on the exact nature of the data that can be distributed, as well as its permitted use. HIPAA violations arising from both known or unknown mismanagement incur strict penalties, highly incentivizing compliance. The Office for Civil Rights, which oversees HIPAA enforcement, has collected $129 million in fines for non-compliance to date. Though the average size of these fines is $11,000, they vary widely: the largest penalty ever paid for HIPAA non-compliance totaled $18 million, paid by Anthem Inc. in 2018. Though such a breach would have to be very large, even a fine of several thousand dollars would be enough to dissuade researchers whose resources are overseen by universities and funders alike.
Under HIPAA, any researchers in receipt of data from the CMS are responsible for its protection and must sign an agreement with the exact conditions of its secure management. Though the data handed over has been stripped of any personally identifiable information, it still contains sensitive information on diagnoses, prognoses, health-related biological indicators, and even zip codes. Such information may be used to trace a patient’s identity, though it is believed by experts that such an event would be unlikely.
But for those managing healthcare databases, the line between data accessibility and susceptibility to cyberattacks is narrow and constantly re-drawn. Sensitive information is highly valuable on the black market, acting as its own incentive for nefarious actors. A single health record can be worth $1,000 – a social security number goes for just $1. Attacks like those mentioned above are only expected to become more common, and although they did not involve the research partners of the CMS, they are pushing the CMS to tighten its reigns and enforce greater restrictions over access to its data.
And yet, nearly all health researchers agree that the CMS has not fully considered the possible negative consequences of their new approach. Again referring to the open letter to Administrator Brooks-LaSure, researchers emphasized that the recent decision “will put the future of such research in jeopardy, erode academic freedom, directly harm Medicaid and Medicare beneficiaries’ access to and quality of care, hamper the Administrator’s ability to protect the trust fund, and frustrate the agency’s rulemaking toward the goals mentioned above.” The researchers feel that their own work is threatened and that the productivity of public health research will be dampened. It is also unclear of the benefit of introducing fees – which amount to thousands of dollars – to access the data. Is it to maintain the virtual service? Or dissuade those wanting to access the data with malintent? This lack of transparency is frustrating and does not foster trust in the CMS.
There is also an ethical question over the introduction of fees to a federal dataset. “Open access” is a term often used in academic circles, but essentially refers to the principle that research outputs, including data, should be freely available to the public without financial, legal, or technical barriers. This is particularly salient when the body managing and collecting the data is funded by taxpayers. Debates over open access have led to many funding bodies (including governmental bodies) requiring that any research resulting from their support be made accessible to the public for free. It is therefore counterintuitive that, when research, in general, is on a more open trajectory, the CMS would choose to introduce fees.
Protecting Public Health
The data available in the CMS dataset has already been used in hundreds of academic studies, but in an age of incredible computing power and advancing statistical methods, it is likely that we are yet to leverage its full potential.
The data, for example, can then be used to inform more accurate clinical support tool algorithms. These algorithms, which have been common for decades, are increasingly informed by insights from machine learning models and artificial intelligence. However, these insights can only be as good as the data from which they are derived. An algorithm trained primarily on patients under forty years old may fail to predict the best treatment plan for those over sixty-five. This “data gap” is ever more extreme for historically marginalized communities, who have had unequal access to healthcare and usually faced extreme discrimination upon accessing it. This has the potential to do real harm, and in a healthcare setting, could lead to patients being mis- or underdiagnosed.
As pointed out by Rachel Miller, MPH, a health policy and payment specialist at the American Physical Therapy Association:
“The Biden administration has repeatedly called for more health data transparency, and HHS has responded by saying that it’s committed to carrying out that mission… These restrictions, which will actually make access to data more difficult, costly, and inefficient, are a harmful move in the exact opposite direction.”
Future algorithmic discrimination can only be prevented if researchers have the freedom and means to access the data they need, and if the research is carried out by more than the privileged few that will be able to afford it under new rules. Health equity is a much-lauded goal amongst researchers and practitioners alike and will remain so if more nationally representative data such as that held by the CMS is not made available.
Data-Driven Science Needs Scientific Data
Science is inherently data-driven. Without representative, high-quality data, none of the public health advancements of the last centuries would be possible. CMS’ decision is counter to the broader goal of safeguarding the health of populations in a rigorous and equitable manner. To ensure that future populations can enjoy similar public health advancements, greater, not less, access to the CMS dataset, is vital.
The post CMS Restricting Access to Healthcare Datasets Will Cause Long Term Damage to Public Health appeared first on HIPAA Journal.