In the Meantime

A portfolio founder and I were on our monthly catch-up last month. The Zoom ended; the conversation kept going over text about how I’d been spending my time: building on top of agent harnesses, codifying how I work in markdown files, getting the muscle memory you only build by working with models and agents closely.

His take:

“does it matter though? Any gaps will be closed in months.”

“this is like ‘prompt engineering’ a year ago — temporary workaround until the models just anticipate all needs and fill all gaps.”

“NOT YET! talk to me in a year.”

It’s the most articulate version of a position I’ve been hearing from smart, technical, successful operators: don’t bother building skill on top of a layer that will get vendored or abstracted away.

He’s not wrong that the floor is rising. The question is how fast.

The floor and ceiling are both rising, at different speeds

I think his position only holds if the floor is rising faster than the ceiling, fast enough to wash out any specialized practice before it pays back. There’s a version of the bet where the floor and ceiling rise at the same speed; the practice is still temporary, you just trade today’s gap for one that re-opens as the floor rises.

What I’m actually betting on is different: the ceiling is rising faster than the floor. The reason is structural. The underlying intelligence of the models is improving faster than the user-facing UX, and we genuinely haven’t figured out how to teach most people to use AI well. The product surface lags the capability frontier, and that lag is where personal competitive advantage lives.

The shift from assembly to Python didn’t eliminate the advantage of being a systems thinker; it moved where the advantage lived. The systems thinker who learned Python on top of assembly intuition outperformed both the purist and the Python-only beginner.

Pushing the ceiling is a practice.

What the argument is actually about

His underlying bet runs deeper than the harness layer. He thinks the labs and AI providers will make the products intuitive enough, fast enough, that you won’t need any specialized know-how to get the value. The floor will rise high enough that the ceiling comes within reach without the work.

I wrote about why I think this is wrong last week. Even when the products are intuitive, the surface area of your work doesn’t become legible to a generic agent. Vendors ship generic judgment by design: customer-specific drafts, workflow templates, opinionated defaults. The specialization stays user-side: which prospects to push on this quarter, what “better” means inside the specific shape of your job.

The question is whether you know how to direct the model at the parts of your work that need the smarts versus the parts that need your judgment.

The meantime, made concrete

Last Friday, I was talking with my neighbor at a backyard gathering on our street. We started in the same investment-banking analyst class almost twenty years ago; he’s now an MD at a distressed-credit fund. He’d read one of my earlier posts and asked: how do you actually do this AI-writing-memos-with-all-your-diligence-information thing?

He described what he wanted: raw diligence files in, structured memos out. I told him it was a two-hour problem if he set it up right: persistent context, treating the agent like a junior employee who writes everything down for next time, scaffolding rules around the parts of his work that matter. Without that grounding, even Claude Cowork would disappoint and just produce the generic “not very good” version.

Claire Vo, the founder of ChatPRD, is what the other end of that question looks like. She recently put GPT-5.5 through three real jobs that span the full difficulty range of her work:

A teaching app for her second-grader on advanced subtraction. Planned and generated in 17 minutes, working first try.
A tech debt migration of millions of legacy chat threads in her own production codebase. It ran for six hours autonomously, with one edge case failure across two million rows.
Reverse-engineering a proprietary Chinese Bluetooth speaker protocol that earlier models had been failing at since January. GPT-5.5 cracked it.

Her own summary of what changed:

“It’s about raising the ambition of what’s possible… I’m now throwing GPT-5.5 at my bug backlog, flaky tests, and security assessments — the hard stuff.”

Both ends moved at once. The model raised the floor for everyone, but the practice is what got her to those three jobs together.

The people closer to the ceiling don’t experience it as ceiling work. They call it raising ambition.

My own bet

I’m aware I’m biased. The body of work below is the test, not the testimony.

I studied finance and accounting in college, started my career in investment banking, and have spent the last decade-plus leading product, design, and data teams and now investing in early-stage startups. I never pushed a line of code until about eight months ago, when I started working with coding agents. Everything below has been built since then.

Three apps for my kids

The first cluster is small applications I’ve built for my family. Hanzi Dojo, an app to help my kids learn traditional Chinese characters and Zhuyin (in addition to the simplified and pinyin they learn at their immersion school). A catalog to showcase my kids’ Lego creations. The Mini Mint, a synthetic banking and investment app for my kids to save, spend, and learn investing without having to open Greenlight and Robinhood accounts.

None of these would have warranted an outside vendor historically; too niche, too small, too household-specific. The cost-vs-fit equation I wrote about last week has flipped: I have gaps I want filled perfectly, and the cost is weekend time. The maintenance is the point. These three repos have accrued 287 commits between them, solving recurring problems for two kids whose interests keep moving.

Fifteen minutes on a Friday

A few weeks ago I came across a thread in a local parents’ Facebook group, about 40 comments deep with summer activity recommendations. The information was buried, redundant, and missing the metadata that would make it usable. I wanted a referenceable version, so I tried to build one.

I started in ChatGPT: screenshots in, dedupe, organize, sort by drive time, research official URLs, catalog by attributes. Ten minutes in I lost patience with it. It kept making mistakes (fix one thing, lose another). It had no context about how I work.

So I switched to Claude Code. “Set up a repo. Here’s what I want to do.” It handled the whole thing in two turns.

That difference is about the relationship to AI, not the model, which is what I wrote about a few months ago. Same task, same person, same underlying intelligence; one session was a disposable chat, the other an environment with months of accumulated configuration (root-level instruction file, custom skills, harness settings, conventions baked in across every repo). The scaffolding had already encoded what good looked like. The clean reference page that came out was personalized as a side effect of feeding the agent the right inputs.

“Spin up a personalized, useful artifact in fifteen background minutes on a Friday” is the new shape of what the practice produces. The fifteen-minute output isn’t downstream of the agent; it’s downstream of the months of investment and learning behind it.

What this body of work has produced for me

The same intuition shows up in more traditional product work. AI-assisted deal evaluation system at NextView: about twenty sessions with a coding agent got me to an MVP that compiled and worked last fall (when I was way less proficient with coding agents than I am now). My engineering partner’s cleanup was mostly streamlining the code and engineering best practices around production apps. The ongoing work is prompt iteration.

Three meta-skills have come out of this work, and they’re the parts that travel.

First, I can route a diagnosis when something doesn’t work: model limitation, context problem, or scaffolding setting.

Second, the practice is a self-reinforcement loop. The more clearly I understand how the pieces fit, the faster the new learning compounds. I can read a new technical approach (e.g. multi-agent orchestration, file-system memory patterns, async pipelines) and either apply it or flag where it doesn’t fit.

Third, this fluency is now a load-bearing input into building AI products. Building them well requires understanding which components are malleable: how to work with the models, how to work with the harness, where probabilistic and deterministic outputs trade off, where context leaks happen.

By the framework I use across Ground Truth, from Level 1 (single chat) to Level 4 (autonomous agent harness), I’m operating at Level 3 daily and Level 4 in specific workstreams.1

In the meantime

His “talk to me in a year” is right about the timeframe. In a year, both the floor and the ceiling will be higher, and my bet is the ceiling keeps rising faster. The question worth sitting with is whether you’re the kind of person who pushes ceilings or waits for them to be lowered.

The gap is where the practice pays off. That’s the bet I’m making in the meantime.

Let’s meet up.

NextView Founder Portal