Next Gen SEO (SEaLLMO?)
In my last screed, I foretold a scenario where ChatGPT et al share subscription, advertising, and other revenue with content creators (kinda like a Spotify) in exchange for access to their content for LLM training. So here’s what happens next…. AI companies figure out LLM attribution and begin paying / sharing revenue with content creators whose content is used to train their models. Then, all of the content creators start begging to be included in chatbot LLMs rather than suing or blocking them - asking: “How do I get my content included and referenced by LLMs?” Because they’ll get paid every time LLMs cough up new content generated from their original content - i.e., they’ll get paid not just when that model is trained, but also anytime it is determined that new content generated from their original content has been delivered.
Sound familiar? It should. This is (sort of) how the search engine optimization (SEO) business came to be. Remember when I said here that “…all Web1/2 companies / concepts will be rebuilt for Web3 [i.e., GPU-accelerated compute] to leverage the economic and practical opportunities unlocked (much like every sector was rebuilt for web and for social and for mobile and for cloud and so on)…”? This is one of those concepts.
Today, the SEO industry is about $80B annually growing somewhere around 20% per year (according to some Googling, because ChatGPT, that little minx, refused to give me a straight answer). So it shouldn’t necessitate Jobsian creative forethought to imagine that spending shifts rapidly to content optimization for LLMs. (SEO for LLMs? Hmm, too long. LLM Optimization? Nah, that’s taken. Search Engine and Large Language Model Optimization (SEaLLMO)? Sure, that’s catchy…)
Already, SEO experts are weighing in on optimizing content for LLM ingestion (and egestion <- it’s actually a word, I looked it up). E.g., a quick Google search yields this digital agency piece that helps you unlock the power of SEO, LLMs, and Knowledge Graphs (according to the blog’s headline). And ChatGPT itself told me what language, prompts, formatting, data sets, etc. to use to optimize content for LLMs.
Perhaps the biggest challenge, however, comes not in trying to figure out how to increase your chances of an LLM using your content, but in how we determine that the *new* content the model generates is based on your content that participated in training said model; i.e., with Google et al, search output is the actual content from (and links to) the referenced page; by contrast, with LLMs, the output is newly generated content that the model created based on how it was trained. So how can we tell that the brand spankin’ new itinerary ChatGPT planned for my family trip to Italy next summer is based on content scraped from Fodors.com, TripAdvisor.com, and nomadicmatt.com? Or, more importantly, how do Fodors, TripAdvisor, and Nomadic Matt know that their content was used to create my itinerary?
Answer: I don’t know.
But I bet ChatGPT does. And I suspect the longer answer to how to track content through ingestion / training and then to creation / delivery of novel content involves blockchain. Why? Because it requires an impenetrably secure and neutral third party with an immutable ledger that catalogs the cascading tree of relationships and engagement with content to accurately and appropriately compensate contributors fairly. (Mouthful, eh?)
So we’ve got the content, the LLM that ingests it all and creates new content, the revenue share model, and now the tagging / tracking mechanism. Perhaps there’s a business in here for automating this all for content creators. Or an exchange of sorts that matches LLMs with content. Or maybe it’s worth investigating: (1) buying (a ton of) content (on the cheap), (2) optimizing the acquired content for LLM ingestion / training / generation, and (3) tagging and tracking that content’s usage and influence as it makes its way through the interconnected world of LLMs. Regardless, we’re amidst a tectonic shift in not just how content and information are discovered and how the creators of that content are compensated, but also in the tooling and structure of how content is created.