Serving ads online immediately ran into regulatory problems. Advertising has huge amounts of laws and policy controls over regulating its content, including an entire wing of the Federal Trade Commission dedicated to the task. Court cases and later laws in the early 1900s fought against deceptive advertising in (for example) snake oil salesmen touting fake medicines, or competitors labeling their own products as more popular brands to increase sales.
By the 2000s, marketing and advertising regulations that had mainly been created to constrained ‘false advertising’ had turned towards trying to restrict both an expanding type of media content on the one hand and privacy & security concerns on the other. FTC sued big tech in 2012 for misrepresenting the level of tracking in ads by Google, Facebook in 2019 for privacy violations, Amazon in 2023 for using marketing and slick UI tactics to essentially auto-enroll people in Prime, and then make it hard to cancel subscriptions, among many other cases.
Content has been a constant thorn in the side of tech companies with digital ads platforms as well. When I was in Google Ads working with small businesses around 2014-2015, groups of locksmiths would constantly drive each other's Google reviews down. Locksmith is one of the most expensive keywords, because who googles ‘Locksmith near me’ unless you need one at that moment? The price per click could go for up to ⅓ of the total revenue the locksmith would bring in. The industry would click on each other's ads and Google would use AI and ML systems to try to filter out invalid click activity. The thing was that budgets would be spent immediately from click fraud, and users would only get the money refunded at the end of month invoice, so it would effectively work as a way to dry up competitor ad budgets and drive them out of local markets. Groups would make fake local listings to ‘own’ the local markets, like a Google Maps mafia. Google would then be sued (in 2017) by legitimate actors facing this nightmare of an ads market for their services, as they were three years earlier for essentially the same reason.
Facebook is being sued by the medical industry for passive scraping of patient data, Journalists are suing Facebook and Google for having better ads, and Google is being sued by the Dept of Justice generally trying to break up its ads monopoly. The right hand column of the ads policy section of the Google Ads help center reads like a laundry list of lawsuit results and industry disputes. However, viewed over time it also reads as a list of expanding capabilities that AI has brought to digital media policy.
Digital ads had a unique problem – they did not flow through a single publisher. Users could publish ads to various exchanges automatically, and with such a dynamic process, missing months for lengthy manual policy reviews for over a billion users is simply not feasible. Advertisers looking to capitalize on the current moment would simply take their business elsewhere, so many machine learning and AI developments focused on this area of policy compliance from the very beginning. Vision models to identify CSAM, tobacco, or alcohol content, NLP models to identify false or misleading claims, ads that violate CCPA, etc. Models purpose built to that try to impose rules on trademarks and other compliance issues, and more. The locksmith ad fraud issue was solved eventually by simply redirecting credits for Invalid Activity back into user accounts in real time. Going through each potentially fraudulent click one by one is impossible, so scaled problems must be met with scalable solutions.
With the push to first party data systems, many privacy and security questions have centered around the relationship between consent and data tracking. There has been recent a legal tradition favoring first party data because you, the consumer, provide it to companies when you engage with their apps and sites directly. That said, it’s precisely because of that fact that data tracking on first party properties can be far more granular. Ireland’s recent case against Facebook shows that regulators are uncomfortable with this first party tracking as well, essentially expanding GDPR to say that while all privacy rights declared in the law are equal, some are more equal than others. Companies must now outline the ‘legitimate interest’ in consent forms about why they are collecting user data on their own platforms, rather than just to regulators as the law stipulated. Facebook’s claim that it was for ‘targeted advertising,’ clearly a legitimate interest of Facebook, was found to not be enough when balanced against the privacy right of users, the Irish court held.
User policy always falls into a gray area like this, where ideally you have hard and fast rules on actors and methods to handle user data, but that only just limits people who can get past those hurdles, then you’re right back to where you started from. YouTube has had a constant issue with nudity and pornographic material on its platform for years, as there are legitimate artistic reasons that nudity would be present in a video. What is YouTube supposed to do, build a model for whether a video is ‘art’ or not? Actually yes, as this whitepaper shows the method combining better automated systems that can directly incorporate user feedback has been extremely valuable in walking this line. Since I worked inside Google Ads over five years ago, that policy has expanded to cover topics of political content, lead form ads, financial products, editorial content, and more that were all possible to monitor with the aid of dedicated machine learning & AI models.
As media systems diffuse over time, endless streams of consent forms will pop up for agreements about data tracking and portability between systems. Companies like Shopify and Pinterest who have formed a data-based partnership will have to figure out how to manage users who allow various levels of access to each platform. If you’re an ads platform manager at Shopify, who the customer did not grant rights to view activity on other platforms to, and you're looking at data coming in from Pinterest, who that same user did grant rights to share info with third parties, what happens?
AI and Machine Learning are tools nearly purpose built for understanding and applying consistent policy controls across large data ecosystems. At Google data is a common good, and I had the ability to query data tables other folks on teams around the world were putting together. However, if I was trying to do research on YouTube for example, and saw a table I wanted to query but it had sensitive user data in it, the system would flag that something about my job code, location, typical work, device ID, or something else was causing it to ID me as someone that shouldn’t get that level of access. It would redirect me to a manual form that I could use to request access from the relevant team directly. If approved, it wouldn’t be an issue ever again until I had to re-request access on an annual basis, or once my approval timeline ran out.
Automated systems like this for data access could revolutionize the privacy space in the era of shared data platforms. Right now, most of that process is managed through API calls directly between platforms that simply return empty datasets if you try to access data beyond your credibility. That’s fine for now, but as the landscape gets more complex, there is a large greenfield space for an organization to manage that ecosystem. The market for data and security firms is exploding, with industry estimates reaching nearly $6B this year and doubling over the next decade. Google publicly touts the fact that is has “blocked or removed over 5.2 billion ads, restricted over 4.3 billion ads and suspended over 6.7 million advertiser accounts. And [Google] blocked or restricted ads from serving more than 1.57 billion publisher pages and across more than 143,000 publisher sites, up from 63,000 in 2021.” The AI policy world is just getting started, and lessons learned here as the first movers in the AI policy debate will apply to regulation in other domains. Watch this space.