The US Needs a Public CDP
A case study of how a lack of data integration in our public agencies have hamstrung our ability to tackle any systemic problem
There is a data problem at the heart of the US bureaucratic machine. As agencies try to modernize, they are stuck with the reality that tackling systemic issues like housing instability, the opioid or youth smoking epidemics, the rise in veterans suicide, and more require interagency working groups…that have consistently failed to put a dent in these issues over the last decade. Without getting to deep into Admin Law, working across agencies has always had issues since our founding. Writing in the Harvard Law Review in 2012, Professors Jody Freeman and Jim Rossi look at this traditional model of agency collaboration and argue that "interagency coordination is one of the central challenges of modern governance” and share common examples that have plagued our bureaucracy in managing overlapping or redundant missions.
From a data perspective, the public sector tackles problems by using large scale economic metrics like Gross Domestic Product, Inflation & Unemployment rates, Consumer & Producer Price Indices, Consumer Confidence Indices, and many more to determine policy. When to use what report, and when, is an endlessly debated topic among economists and policymakers. As technology has developed, it has become clear that these macroeconomic indicators are not nearly detailed enough to be able to set real time data system or AI goals towards. A robust network of interaction that connects individual needs to these indicators is required to help guide technology and AI principles towards positive outcomes. As an example, the opioid epidemic was only discovered & understood very late in its process of ripping through the country despite State & Federal data warehouses containing data around prescription rates. Despite having the data on this explosion in poor clinical practices, pill prescription rates remained unchecked without any public health official taking note. Between 2007 and 2012, 780 million opioids, mostly entirely legal, poured into West Virginia. The article that uncovered that trend literally won a Pulitzer prize for investigative journalism. No one was checking these supposed critical data systems for issues like this, and there are many large consulting firms who had the contracts for these systems who have somehow escaped culpability in missing a literal epidemic under their noses.
During that time, in yet other separate databases, overdose deaths stats exploded. In yet other databases with other agency owners, arrests for drug crimes and thefts skyrocketed. By the time the issue was classified as an epidemic and a public health emergency in 2017, it was far too late. States like Alabama had 143 prescriptions per 100 people, Tennessee had 94 per 100, Kentucky had 97 per 100. As any public health expert knows, epidemics caught early can be managed if not mitigated, but if you only fully understand the extent of an issue when it has already fomented to epidemic-level proportions, unwinding it is another task entirely. To date, hundreds of billions of dollars have been spent on the opioid epidemic with little to no success in abating the issue. Ironically, cracking down on ‘pill mill’ clinics has pushed the issue into the black market, helping exacerbate a resurgence in methamphetamines, an explosion in illicit fentanyl use, and synthetics causing their own issues. This data flows to yet other data systems.
We saw the value in connecting this upstream data as a ‘check engine light’ for potential downstream issues when working with Oklahoma’s ODMHSAS agency who runs opioid related programs. Combining insights allowed the team to get ahead of trends, change health surveillance tactics, and better assign funding to community programs, among other things. The opioid epidemic has shifted every year — from pill mill doctors, to the introduction of synthetics and fentanyl, to bans and black market heroin taking its place, to intersecting with the COVID-19 pandemic, and more. The idea that legacy systems could be effective that track only where overdoses are, the ethnicity of whooverdosed, and some basic in-patient stats, is palpably absurd. Yet this is the reality for most states, even after billions in federal and lawsuit-related funding.
As State and Federal agencies work to slowly tether these indicators together, it is becoming clear that most are ill equipped to deal with the data in these systems even at the macro level of prescription, crime, and overdose rates by county. In my work at Google Cloud’s public sector team, I helped States connect communication systems that have data around folks researching care and treatment options as a proxy for early warning systems. Communications data that is used by private sector firms to understand rising interest in products from makeup to electric cars is totally absent as an indicator in our public sector agencies.
At one time, I was running over one hundred individual advertising campaigns for programs across the Department of Health and Human Services on the Google Ads platform. Each different part of HHS that was running ads – from the Centers for Disease Control running ads for opioid treatment, HIV/AIDS awareness and promoting the Flu vaccine, to FDA running ads for tobacco cessation, to Substance Abuse and Mental Health Services Administration (SAMHSA) running suicide prevention campaigns, to Health Resources and Services Administration (HRSA) recruiting more organ donors – all initiatives came up with their own target audience by demographic and socio-economic risk separately. When it came time to run the ads and perform experiments on results, each initiative looked at their own campaigns in a silo, and so didn’t realize that in every case, almost half of the targeting criteria lined up. If you were a single man living in a rural impoverished area, especially in states hit hardest by the above issues like West Virginia, Ohio, or New Hampshire, you were being absolutely bombarded with hundreds if not thousands of individual campaigns from your local and federal public health authorities. Everything you saw begged you to get help, stop smoking, make sure to go to work, fill out a different set of forms depending on the issue you had, don’t trust your local sketchy doctors, and so on. Ad fatigue is a phenomenon where your audience sees your ads so often that they become bored with them and stop paying attention. Ad fatigue sets in quickly, after 5-10 ads of the same type. These people were getting thousands of these ads each, if not more. When we brought this data to program leaders, they stared at us blankly. The CDC controlled their programs, what did they care if the FDA was also running anti tobacco ads or SAMHSA was running ads about suicide prevention? Their work was opioid related. Failure doesn’t stop government programs, funding drying up does.
In the industries that rely on data from these sorts of communications systems to better understand their customers shifting needs, their transparency-as-policy approach is not a piece of paper guiding action, it’s a literal technical system. The colloquial term for building systems that connect individual-level changes in awareness, consideration, or purchase behavior to overall metrics of profitability, in a place that is easily connectable to other parts of the organization, is simply called a ‘Customer Data Platform,’ or CDPs. A CDP is not a discrete piece of technology, and different techniques exist in different places to build them, but fundamentally it is a series of methods to get a 360 degree view of a customer. They rely more on the systems that touch individuals directly, which are rising everyday such as transactional and web analytics systems, CRM platforms, leveraged or purchased third-party data sources, and so on. CDPs then overlay demographic information, behavioral data, purchase history, and other customer-related data points onto these more frequently-touched systems. The CDP then typically uses machine learning and other advanced analytics techniques to segment and analyze the data, providing insights into customer behavior, preferences, and needs, which in turn connect back to production systems like recommendation models to direct users towards their likely needs. The key truth that CDPs have unlocked is that behavioral traits – e.g. where you are in your life and what you need, are far more important to tactical program execution than the demographic traits assigned at birth or by circumstance.
Our public sector needs a universal customer data platform that crosses agencies, empowered to help mission leaders but protected from technologically illiterate individual agency leadership chains. Even passive, fully anonymous metrics carry huge insights. When I was working with the United State Postal Service, for example, users who check the website a dozen times in a short period are highly correlated with package shipment issues. Yet our public sector has a total inversion of this pyramid of data value. They focus on general data points like demographic details and annual income reports, which in a well-run system should be the macro variables that help balance what you are seeing at micro level, instead take precedent while digital touchpoints that represent the most common relationship between citizen and public sector agencies are rarely if ever considered in policy making. Despite CDC having nearly 50M visitors monthly, they do basically nothing with their website other than use it as a billboard for PSA style information. It’s a tremendous waste of one of the most valuable data sources public health has at their disposal.
A CDP would also have the benefit of transparency, a critical requirement if we do not want the age of AI to widen inequality and empower sclerotic bureaucratic actors to continue in their isolated fiefdoms. We need to know what data agencies are collecting on individuals, how it is managed, what machine learning models are being created, how those models connect to macro economic indicators, and the decisions being made from them. Diffusing constituent data across a wide range of IT management systems with no connection between them in the name of security accomplishes neither security nor efficacy. It certainly accomplishes a lack of visibility into data usage.
A unified and transparent CDP would give other defense and intelligence agencies, including overseas focused agencies like the State Department and USAID, a model for how to build a successful and robust ecosystem while protecting critical data assets. Today the islands of control within sub-branches of the Department of Defense create enormous barriers to modernization, and they are facing huge recruitment challenges for the first time in decades. A paradigm shift is required, and the interwoven systems analogous to a CDP exist within these agencies as logistics management systems on top of personnel systems, equally as complex as they try to deal with rapidly changing needs in areas of international conflict.
To be clear – I am not arguing for a monolithic, one-size-fits-all system that defined the enormous failures of the previous data-center-focused era of the 2000's. A public CDPs, like similar commercial customer data platforms, are less of a unified storage location and web UI that you sign into, and more sit closer to the IT stack as a set or series of architecture patterns; a theory of how to connect data across systems come to life. I have accessed data from customer data platforms inside data warehouses, in excel spreadsheets, pulled it into HR applications, queried it in data science notebooks to determine financial ROI on products or conduct sentiment and topic analysis, tapped into it to build audiences for new advertising campaigns, and so on. It is a data-first posture that helps guide frontline decision making as the critical backbone of any organization.
For example, Sephora connects 200+ data points between online and in-store interactions, AirBnB’s entire ecosystem relies on a similar 100+ datapoint connection between the app, customer feedback, and booking systems. Verizon’s includes over 200+ data points between customer service calls, digital engagements, app usage, and network feedback. In each of these systems there may be a common platform that these companies use internally to tap into that data, but typically the ‘platform’ is really just an API, or application programming interface, call. As APIs become standard and easy to use across any modern technology, it's the connectivity that matters far more than what you are logging into.
The construction of a public CDP would be a revolutionary change in bureaucratic process. Agencies today see their expertise as frontline defense against issues of the day, whereas they really should be final arbiters of data-defined issues that arise from our clear eyed understanding of social and economic issues. Centralizing authority and management of a CDP under existing frameworks (GSAs DAP program, OMBs USDS team, the Data.gov folks, etc) would turn MOU style working groups into fully functional, data-first organizations for pennies on the dollar. Rather than reinvent the wheel of capability as DoD agencies are starting to do with Information Operations currently, or as agencies are struggling to do with anything related to enterprise IT security and compliance processes, central teams should build outwards towards mission teams and provide staffing capital as well — sit with them, help them, act as a team member just with a different management roll up back to the central CDP team. This is very common in the private sector, and it’s time to break the barriers of siloed control in our bureaucracy. We have no chance of tackling issues of national importance if we don’t.