All Roads Lead to Open-Source
Why open-source AI is the most likely long-term game-theoretic optimum
Welcome to Fully Distributed, a newsletter about AI, crypto, and other cutting-edge technology. Join our growing community by subscribing here:
Can it be good business to give something away for free?
At first glance, that's what open-source software seems to be: freely available code that anyone can download and use however they want.
So how can an open-source company like StabilityAI raise $100 million at a unicorn valuation? What drives developers to pour hundreds of hours of free labor into projects like LangChain and LlamaIndex? What motivated Meta to open-source its 65 billion parameter Large Language Model? Are these actions purely altruistic?
In this essay, I will make the case for open-source software in the context of AI development. In particular, I will demonstrate that open-source can:
Provide a superior product for end-users
Offer a more sustainable business model in the long run
Let’s dig in.
What is Open-Source?
So, what exactly is open-source software?
At its core, open-source software refers to programs whose source code is made available to the public, allowing anyone to view, use, modify, and distribute the code. This is in contrast to closed-source software, where the source code is proprietary and kept secret.
Imagine a communal dinner where villagers gather to cook together, openly sharing recipes and culinary secrets. They tweak and improve upon each dish, embracing their diverse tastes and perspectives. Each cook can borrow a recipe and use their own ingredients to create the dish. This is the world of open-source software, where the "recipes," or source code, are accessible for everyone to experiment with, adapt, and share in a spirit of collective creativity.
In contrast, closed-source software is akin to an exclusive restaurant with a secret menu, where a single, masterful chef and their skilled team meticulously prepare the entire meal, delighting patrons with unique and exquisite dishes. The "secret sauce" is carefully guarded; only the chef can change the recipe, and users are forbidden from replicating the dishes at home.
One approach is decentralized and collaborative, while the other is top-down and centralized. So what does this mean for the end-users who consume the final product?
Advantages of Open-Source AI
Most end users are unaware if a given software is open- or closed-source. This distinction becomes even less visible in AI, as users only interact with an application built on top of a model, which can be either open-source or proprietary.
Nevertheless, open-source software offers a number of significant direct and indirect benefits for its users:
Open models, private data: AI models are increasingly becoming essential infrastructure for large enterprises and governments, which often prioritize data security, control, and cost-effectiveness. Closed-source models may transfer too much power to private AI companies, whereas open, auditable, and interpretable models allow organizations to fine-tune them with their own private or regulated data and maintain control within a secure perimeter, while also reducing costs associated with licensing fees.
Customization and optimization: Closed-source models often come with restrictions (and sometimes censorship) on their potential use cases. Open-source models allow developers to modify and customize the model to fit their specific needs (e.g. a niche use case or industry). Additionally, open-source models allow for easier optimization since developers can modify the model’s architecture and parameters directly, which may lead to better performance and more efficient resource usage.
Interpretability, Auditability, Security: The transparency of open-source AI models fosters trust, facilitates audits, and provides educational opportunities, ensuring both security and quality. By examining the underlying code and documentation, individuals can enhance their understanding of AI systems. The "many eyes" effect, which relies on the active engagement of the developer community, helps identify and fix vulnerabilities that might otherwise go unnoticed.
Decentralization of AI Ecosystem: Open-source fosters a more inclusive and diverse development environment, reducing the concentration of power and promoting a more equitable future for AI technologies. By encouraging collaboration, lowering barriers to entry, and enabling customizability, open-source AI models contribute to faster innovation and a wider range of participants in AI research and development.
Beyond the benefits to the end-user, open-source will be a superior business model.
Let’s explore why.
Open-Source AI: A Game-Theoretic Perspective
Today, foundational language models follow scaling laws, where larger models typically yield better performance. However, there are theoretical limits to this scaling (especially in NLP and image generation), which will manifest in two ways:
Dataset asymptote - a point at which all large language models will be trained on the entirety of all public data (i.e. the internet). As we get closer to this limit, adding more training data will yield diminishing marginal returns. Note: some additional gains might be achieved via higher training time and RLHF.
Performance asymptote - a point at which humans are unable to distinguish between the performance of different general-purpose AI models (similar to how we are unable to differentiate between 4K and 8K HDTV resolution).
As we gradually approach these asymptotes, it will be game-theoretic to increasingly compete on cost. The only remaining sources of performance edge might come from customization and/or optimization of models for specific use cases, as well as adding private proprietary data to the training sets. However, as discussed in the previous section, open models are much better positioned for this given their inherent flexibility, transparency, and lack of licensing fees.
This evolving landscape may lead to a world where all big tech companies and other major AI labs are forced to open-source their models and focus on monetization through customizations, scaling, hosting, and deployment services instead. In other words, open-source (or more accurately, open-core) could become the only viable long-term business model in the AI industry.
Some companies, like StabilityAI, are already front-running this trend by offering open-source models and focusing on providing value through custom models trained on private data for enterprises. OpenAI has been developing custom variants of its GPT-4 model for companies like Morgan Stanley and Duo Lingo too. For now, they are the only game in town, but as Google, Meta, and others join the AI race, competitive pressure to open-source models will greatly intensify.
Risks and Considerations of Open-Source
While open-source offers numerous benefits, there are potential risks and drawbacks to consider, such as:
Misuse: The open nature of the software can potentially enable malicious actors to exploit vulnerabilities or use the technology for nefarious purposes. Some examples include using AI for hacking (uncovering vulnerabilities) or generating deep fakes. Appropriate regulation should target the use of technology, rather than the technology itself.
Resource Constraints: Open-source projects often rely on community contributions, which can lead to resource limitations and slower development compared to well-funded proprietary counterparts. This is particularly true for LLMs, which may cost hundreds of millions (or even billions) of dollars to train.
Fragmented Development and Quality Standards: The collaborative nature of open-source can sometimes lead to fragmented development, as different contributors may have varying opinions on the direction a project should take. This can potentially result in difficulties in establishing and maintaining consistent quality standards across projects.
Conclusion
The AI revolution has only begun and it is hard to extrapolate what the market will look like even just 12 months from now. Today, OpenAI is dominating the news cycle, but we know that the field will be crowded with competition in no time - both from Big Tech (i.e. Google, Meta, Apple) and new players (i.e. Stability AI, Anthropic, Cohere, NVIDIA, and many others). Model performance is more likely to converge than diverge in the future, which should drive the costs of proprietary models close to zero. Open-source (or open-core) offers an alternative business model that can be sustainable and highly profitable. Most importantly, it has the potential to offer a much better experience for the end user.
Long live open-source :)
Let me know what you think! Will foundational models become commoditized? Will all major models be eventually open-sourced? Who will accrue the most value?
DMs always open on Twitter @leveredvlad
If you enjoyed reading this, subscribe to my newsletter! I regularly write essays about AI, crypto, and other cutting-edge technology.