Building on Quicksand
The new rules for building AI products, from someone who shipped one of the biggest
“It feels like you’re building on quicksand. Every day you wake up and feel the anxiety and stress, wondering what’s going to shift? What new breakthrough is going to happen? Every day our research teams would announce something new - new demos built, new capabilities shared internally. That meant every day our product team was rethinking our roadmap and what was going to be most effective. Those who thrive in this era are those who are willing to just throw away something that they worked on for the last four weeks and start over again.”
“That was one of the most impactful quotes from a roundtable my team at Crew Capital and I hosted with 30 founders and execs from our portfolio companies. Our guest was a product leader at one of the major AI labs, responsible for overseeing one of their most impactful products, loved by millions of users.”
Last week my team and I hosted a virtual roundtable with a product leader at one of the major AI labs, responsible for overseeing one of their most impactful AI products, loved by millions of users.
Last week, I wrote about how the AI labs are building task automation for every type of knowledge work at an industrial scale, what it means for software companies and for knowledge workers.
This week, in the second part of this series, I want to focus on what it means for the people actually building AI products. Because if there's one thing I took away from speaking with the PM behind one of the most widely adopted AI products in the world, it's that the craft of building great products hasn’t gone away, but what that craft requires has fundamentally changed.
Model Taste is the New Product Taste
The conversation opened with the idea that the most important skill for product builders in AI today is not traditional product intuition.
Rather, it’s a deep, hands-on sense of what the AI models are good at. Product managers and product leaders need to understand what the current model generation is actually good at, what the last generation couldn’t do, and what the next one might unlock for their product.
In the past, PM’s would be expected to have a deep understanding of their users and what they want to accomplish with their product. They understood the user’s lived experience inside their product, and sought ways to improve it by building new features.
Today, that simply isn’t enough.
AI models are getting so good, so quickly that PMs who don’t have a sense for model capabilities will see their product fall behind competitors who do.
Our product leader guest explained that in 2026, product teams need to commit to radical flexibility - a willingness to throw away several weeks of work and start over when a new model release redefines what’s possible in a feature the engineering team just finished building. That can be really hard - try telling an engineer who just shipped a new feature that a new model made it obsolete. It’ll go about as well as you expect.
And yet, the right answer might mean killing that feature entirely. A PM who understands that builds for the world that exists in the future, while the one who keeps the feature that the latest AI model just obviated is building for a world that existed in the past.
The only way to develop that sense though is to be constantly talking to - and building with - the models. Becoming a tinkerer is more important than ever when it comes to shipping AI.
For founders who consider themselves to also be the head of product, carving out precious time to develop model taste isn’t optional. As an investor, it’s clear when founders have it or don’t - can they speak to the new capabilities in their product from the latest AI model release, or what can’t their product do yet but is on the roadmap as the models improve?
And it extends beyond founders to the PMs they hire, as shipping product in 2026 isn’t the same as shipping it in 2016, let alone 2024.
Eval-Driven Development
How can you measure if your company’s AI product is hitting the mark with customers?
If you follow AI news, you’ll have seen model providers talk about how their AI stacks up in benchmarks, which are standardized tests to compare models against each other on generic tasks like math or coding.
Benchmarks are like an SAT test - helpful in making rough comparisons, but just like an SAT score can’t tell you which human will succeed in a job, benchmarks don’t tell you whether a model can actually do the specific job your product needs it to do.
As our guest put it, benchmarks have become a waste of time. The only thing that matters is an evaluation, or an eval - a test you build yourself tailored to the exact tasks that the AI needs to perform in your product.
But creating evals is hard. Very hard. What does “good” look like when AI is generating an output, and the judgement of that output is subjective? What is a good investment memo? A good marketing plan? A good product roadmap?
Many AI product teams skip this step. They try the latest AI model in their product, the output looks fine, they ship it. Even at the AI lab our guest worked at, he said most product teams - internal and external - were not great at building evals.
He suggested a very tactical path forward for AI teams.
Create a spreadsheet where each row is a task you expect users to accomplish, and each column is a step in the workflow or a generated output.
For each output, create a rubric for what “good” looks like.
Then generate outputs, grade them against the rubric, and diagnose why/where failures happened. Was there a prompting problem? An agent context issue? A data gap? A model limitation?
Fix it, and try again.
Don’t focus on any particular threshold, but rather continuous improvement. Make your AI product progressively better through your judgement and intuition, and hold yourself accountable to the eval you built.
Everyone in and around AI is searching for and talking about moats.
And while it may sound too subjective to matter - evaluating your AI product’s output in a rigorous, task-specific, systematic way can be a moat, especially when everyone else is shipping on vibes like they’re Rick Rubin.
The Delegation Problem
One of the most honest admissions from the product leader in our discussion - and remember this is someone who helped lead one of the most popular AI products in the world today - came around discussing the challenges his company faced around getting users to move beyond treating their AI as a search engine, and instead getting it to actually take tasks on their behalf.
He broke down three barriers that AI companies will face in getting users to delegate workflows to the AI.
User education. Model capabilities are progressing so rapidly, the vast majority of users don’t know what the models are currently capable of. Many users are on free or lower-tier models, and don’t realize the upper threshold of work that frontier models can take on.
Trust. Going from data retrieval, or even asking a model for feedback on an idea or work output - is very different from trusting it to take an action on your behalf. Most users, especially those who aren’t the most tech savvy in the first place, simply do not yet trust AI to take actions on their behalf.
Delegation. Most people aren’t good at delegating anyway, even if it’s to other humans. Delegation is a learned skill. Since most people aren’t managers, they’re individual contributors, that means most haven’t honed delegation. People have a hard time delegating tasks to their AI.
If you’re at an AI company, and you’re struggling with how to go from getting the 10% of power users at your customers to the other 90%, then you’ll understand these problems intimately.
The solutions we discussed in the conversation focused around:
Abstraction. Make your product simple enough to accomplish a workflow that any employee at your customer can understand it.
Progressive disclosure. Let the power users who want to go deeper into your product click beyond the abstraction layer. But don’t compromise the user interface to address multiple personas that need different things from your product. Keep a clean UX and let power users go deeper, but make sure you give them enough power that you win over the early adopters.
Facilitate evangelization. Create a method for your users to share their wins with each other. Get your power users to create a company Slack channel showcasing their use cases, so that their colleagues can learn from them. Another example is building an online community with shared templates and tutorials.
The adoption curve of AI tools mirrors any other technology, just with a more compressed timeline. Companies shipping AI software products need to win the first 10% of power users, and let them pull in the other 90%.
Hiring PMs in 2026
As a VC, I’ve worked with founders on hiring product leaders, and know how challenging it can be especially given all of the changes in building products we’ve discussed already. It was reassuring to hear the product leader we spoke with talk about how much hiring evolved at the AI lab he worked at even just over the course of 2025.
His AI lab’s initial view was that every PM needed to understand both traditional enterprise software and AI. That meant hiring people who had experience at a software company and then moved to an AI startup. The venn diagram of overlap was small, making it hard for them to hire.
After a year of that, they realized their hiring strategy should be more nuanced, depending on which team was hiring the PM. Teams focused on building traditional enterprise features like permissions, integrations, IT auditability - they didn’t need to have the same AI expertise as the ones working on core AI agent features.
For those AI-focused PMs, they wanted to see much more than just someone who was vibe coding prototypes to speed up engineering velocity on their team. They looked for PMs who were so deep in the models they identified failure cases in AI outputs, built their own eval systems, had experience in improving outputs with those eval systems, debugged agent transcripts, and knew how to build software to address model failure - not merely prompt around it. In short, they wanted PMs at the cutting-edge of the ‘scaffolding layer’, as our guest put it.
He said the best PMs were not just using Claude Code to ship prototypes and put PRs in sandboxes, but actually reviewing the changes the AI was making as they shipped with it. Looking at every file update, every PR, asking questions, and building intuition and model taste.
So the PM role is splitting, and not every product leader needs to be an AI expert, but the ones closest to the AI need to be much more than vibe coders. Companies need to know which track they’re hiring for.
Building an AI-first Culture
A fundamental thesis I have about the economy today is that as AI models continue to improve, the delta between the companies with an AI-first culture and those who are slow to adopt AI will widen.
The S&P 500 of 10 years from now is going to feature companies who were aggressive in putting AI at the center of their company. And the startups who do this will outcompete their peers who don’t.
That doesn’t mean every employee at every company needs to be an AI expert - far from it. But companies need to implement cultures and practices that emphasize AI as a core operating lever, not an R&D project.
We discussed a few tactics companies can use that actually work:
Celebrate internal tool builders. Historically, being an engineer or a PM focused on building internal tools wasn’t as celebrated as building for customers. If you were an engineer at Meta or Google, you wanted to work on a product used by billions of people around the world, not a new engineering portal that fosters alignment across your R&D team. That is changing. At the major AI labs, people who build internal tools are celebrated. Their products are showcased at all-hands meetings. They get promotions. People are expected to allocate 10% of their time to building systems that their teammates can use internally.
Recalibrate hiring expectations. As we talked about, while not every product leader needs to be an AI expert, the ones who are shipping core AI features do need to be deep in the models.
Redeploy PMs as internal AI workflow consultants. For larger startups with many different functional teams, this was the most effective approach our guest said he’d seen. Startups can take a cohort of their most forward-thinking PMs and make their sole job to be going from team to team identifying repetitive workflows and building agents to solve them. Finance, recruiting, sales ops, etc. Not a one-off hackathon, but a repeatable process that is a structural commitment to embedding AI across the business.
The Shifting Ground
Building and investing in AI startups in 2026 is nothing if not disorienting. The speed of AI model development, the shifting competitive landscape, and the insatiable customer demand for AI create a confluence of factors that makes this one of the most exciting and most unsettling times to be working in technology.
I have to acknowledge the insights in this piece come from one product leader’s experience at one AI lab. The AI labs have near unlimited compute, world-class researchers, large customer bases, and huge balance sheets. Most startups don’t operate with the same resources. The urgency of these changes depends on how close your product is to the AI models themselves. And, the urgency may be even further reduced if the pace of model development slows.
But that’s not the bet I’d make, because no matter how the future evolves, what won’t change is that the underlying craft of building great products will continue to separate the best companies from the rest.
And yet the craft of building products has evolved. Product taste has become model taste. Evals are the new metrics. The classic product org of PMs + engineers + designers is changing as everyone has the ability to ship code. The companies that build a culture around embracing these shifts - celebrating internal tool builders, redeploying PMs as AI consultants, pushing every employee to develop model fluency - will compound their advantage every quarter.
The founders, startups, and investors who view the instability/uncertainty caused by AI as a feature rather than a bug will be the ones who build something durable, even if its foundation was built on what felt like quicksand.



