Welcome to Fully Distributed, a newsletter about AI, crypto, and other cutting-edge technology. Join our growing community by subscribing here:
Github Copilot made headlines after it was reported that software engineers were on average 55% faster in completing their tasks when using the tool (in addition to higher completion rates, higher job satisfaction etc.). Since then, it became a pretty consensus view that there will be a “Copilot” tool for every profession from healthcare and education to finance and accounting.
Perhaps the most successful “Copilot for X” to date has been a Series A startup Harvey AI - a tool that promises to completely transform and revolutionize how legal work is carried out today.
In this short essay I will explore economics behind Harvey AI, explain why it is good business for Big Law, and why other professions may not reap the same benefits.
Let’s dig in.
Economics of Big Law
Before we dive into economics of Harvey, it’s important to understand a few nuances around how Big Law works and makes money today, as it is directly tied to the value a “Copilot for Lawyers” can bring to the industry.
At a high level, lawyers are effectively legal consultants that charge a steep hourly rate for the work they produce. An entry level Associate can charge over a $1,000 per hour, with more senior lawyers often charging several multiple of that.
Of note, lawyers do not charge for every hour work of their work. On average, an entry level associate may accrue around 2,000 billable hours, which means that there are about 1,500 hours that are non-billable (assuming a 70 hour work week). Setting aside the debate whether the number of billable hours would have to come down from any use of automation tools, the main pitch to Big Law firms is that they can greatly reduce the amount of non-billable work they do (i.e. work that does not directly generate revenue). For any law firm this presents to opportunities for value capture:
Do more work with same number of full time employees (FTEs)
Do same amount of work but with less FTEs
Based on my quick back of the envelope math, it is clear that the first approach is much more preferred given the “gross margin” earned on an entry-level associate. Of course in reality, there will likely be some combination of both, but for simplicity, I provided the two book ends. We can see that there is potential to increase revenue by up to $750K or cut costs by up to $50K on a per FTE basis.
For context, Allen & Overy, an elite UK law firm that was the first pilot customer of Harvey AI generated about ~$2.5 billion in revenue across its 3,500 attorneys, implying revenue of over $700K / FTE or ~$360/billable hour (UK firms charge less than in the US). Assuming our base case of 25% productivity gain, this has the potential to boost A&Os revenue per FTE by some $130K per year. Not bad!
Economics for Harvey AI
Ok, so the potential value capture for Big Law is obvious and pretty attractive. But what does this look like for an AI SaaS product like Harvey AI?
First, let’s talk about costs. Unlike traditional SaaS that has a variable cost of close to zero, AI tools incur hard compute costs (or API call cost) for every user interaction. This can be tricky, since most SaaS is priced on a fixed price per seat, while your cost structure is largely variable. Indeed, a power user is LESS profitable than a casual user, creating an interesting incentive misalignment.
So what are the unit economics for Harvey? Skipping the relatively one-time costs of embedding a large corpus of legal text, hosting costs, and engineering overhead, the primary cost for Harvey (or any AI-first tool) will be its compute/API costs. Harvey uses GPT-4 (at least for now), so using the pricing from OpenAIs website we see that:
Inference Cost for Prompt (Input) is ~$0.03 per 1K tokens (1 token = ~0.75 words)
Inference Cost for Completion (Output) - $0.06 per 1K tokens
So the cost is directly linked to how much text is ingested and generated per user interaction. But wait! There is one more kink - some interactions may require multiple LLM calls - for example, if a lawyer asks some legal question that spans multiple legal documents, Harvey would have to fetch several closest matches (using an embedding model) and then feed multiple chunks of text into its prompt to produce an output.
This is all to say that doing the exact math on this is very tricky without seeing actual usage data, but I am going to give it a pass. Assuming ~20 words per user query, ~250 words for the chunks fetched from legal documents, and a roughly ~1,000 words in the final output, we get the following unit economics:
Of course, I also had to make some assumptions around how many LLM calls per interaction (I assumed 2) as well as the number of times a day does an average lawyer uses the tool. Based on my back channel checks, Harvey currently charges around $500 / lawyer per year, resulting in a negative gross margin - not quite the “gold standard” of 80-90% of traditional SaaS.
A few caveats:
API / compute costs are likely to come down over time
Based on the value unlock to the end user, Harvey can charge a lot more
For example, if Harvey revises its price to $5,000 / year (still a fraction of the $130K in potential new revenue for A&O), it’s gross margin will be ~90%! Is it crazy for them to charge $10K? $20K? Time will tell - but it will be largely driven by the value and impact they bring to the end customer.
Generalizing for “Copilot for X”
A similar analysis must be done for any aspiring “Copilot for X” product. If you are building a tool for a cost center, the math will likely be much less interesting than if you were augmenting a revenue generating employee. As we can see from my monkey math above, even at a 50% productivity gain of a highly paid lawyer, you “only” bring $50K in annual savings - vs some $750K in potential additional P&L (a 15x difference!). If you are automating a $50K/year FTE, this puts a ceiling on how much value you unlock (and how much you can ultimately charge).
Secondly, it’s important to consider if it is a big enough cost item for your target customer. If the target user amounts to 50% of your customer’s fixed cost, this is attractive - but if the target user is only 5% of the cost stack, even a 100% efficiency gain (i.e. fully remove the need for this FTE) barely moves the needle. This will also dictate how willing enterprises will be to adopt your (untested) tool, and how much they would be willing to pay for it.
Conclusion
Harvey AI found itself in a sweet spot - it serves a highly profitable revenue generating employee that deals in a very rules-based industry that deals primarily with text. This means that other industries may not benefit the same from an AI Copilot. If I was building a vertical copilot tool I would think about the following:
Is my product primarily cutting costs or adding new revenue?
Does my product move the needle for the end customer? (is it a must have?)
Can the current tech handle the type of work my customer does? (how much precision do you need, do you deal with only text, or also numbers, images, audio?)
My view is that for anything ‘incremental’ a horizontal tool like Microsoft Copilot will probably be suffice, so any new tool needs to be a very meaningful improvement over what incumbents will be able to offer enterprises for free.
Personally, I’ve been looking at Copilot applications in the financial services industry, which has a completely different business model and type of work than those of lawyers. I will write up a separate blog exploring my learnings, but if you are interested in jamming on this topic please reach out - I’d love to chat with you.
DMs always open on Twitter @leveredvlad
If you enjoyed reading this, subscribe to my newsletter! I regularly write essays about AI, crypto, and other cutting-edge technology.
Thanks for doing the math! If these calculations are true (to some extent), Harvey is making an incredible bet on cheap inference cost of future, better models.