The standard pitch in geospatial AI right now is some version of “deep learning makes your analysts more efficient.” You hear it in almost every vendor deck, and it sounds reasonable enough that most people don’t push back on it.
But after 30+ years of looking at imagery, I think the framing is off, and it’s off in a way that matters quite a bit if you’re the one writing the check. So here’s what I actually think happens.
Deep learning doesn’t really make analysts more efficient. What it does is move their work somewhere else. To a different focus area, and that’s often a very very good move.
Let me walk through what I mean using a concrete example. Take a damage assessment problem where you’ve got something like ten thousand structures across a conflict-affected area, and you need to know which buildings are intact, which are damaged, and which are destroyed. The pre-deep-learning version of this workflow has an imagery analyst opening up the area of interest, walking it visually structure by structure, and making a determination on each one. That’s a multi-day job, and the truth is that probably nine out of every ten of those structures didn’t change at all between the pre-event and post-event imagery. So most of the analyst’s time gets spent confirming non-changes.
The deep-learning version looks different. You train a model on labeled damage data, you point it at the same area, and it surfaces the few hundred structures where something actually appears to have changed. The analyst sees those structures, not the rest. Time on task drops from days to hours. On the broader event-detection pipelines we build at BlueLens, we routinely take a forty-eight-hour cycle down to under six, and we can hit that number repeatedly.
But here’s the part that doesn’t really show up in vendor pitches, and it should. The analyst is still on the job.
The model didn’t decide which of those structures were actually destroyed. What the model decided was which structures looked statistically different from their pre-event state. Some of the things it flags as changed are going to be construction debris that landed since the last satellite pass. Some are going to be a new tarp draped over an otherwise intact roof. Some are going to be a tree that fell across a courtyard. Figuring out what the change actually means, whether it represents an airstrike or a renovation or a goat that wandered into frame, is still a judgment call, and the model has no opinion on it. All the model is doing is showing you which pixels moved.
This is what I mean when I say the work gets relocated. The hours the analyst used to spend confirming non-changes have turned into hours they spend interpreting the changes that actually matter. Headcount on the analyst side doesn’t really drop. What drops is latency. And the part of the job that needs professional judgment, the part where the analyst’s experience and area knowledge and tradecraft do the heavy lifting, is exactly where it always was.
The same pattern shows up everywhere we use these models. We run burn scar segmentation with a vision transformer and hit ninety-six percent accuracy on validation data. That’s a good number, and we’re rightly proud of it. It also means that roughly one polygon in twenty-five is wrong, and somebody who knows what they’re looking at still has to figure out which one. SAR change detection on a hundred-square-kilometer scene generates tens of thousands of candidate polygons, and a human still has to decide which ones are mass movement versus seasonal water versus sensor artifacts.
The model is fast. Working out what things actually mean is still slow.
I think the reason the “efficiency” framing keeps getting used is that it’s about the only way to make this kind of work sound clean in a sales pitch. “We deliver deep learning that triages the imagery so your analyst can focus on the meaningful pixels” is the truth, but it isn’t as catchy as “we deliver AI that replaces your analyst.” The catchier version sells better, even though it isn’t actually what the technology does, and never has been.
If you’re trying to evaluate a vendor pitching this kind of work, the question I’d ask isn’t “what does the model do.” It’s “where does the human end up.” If the answer is some version of “we get rid of the human,” the vendor is either confused about what the technology actually does or hoping you don’t ask too many follow-up questions. Either way, you probably shouldn’t write that check. If the answer is closer to “we move the human from confirming non-changes to interpreting the changes that matter,” now you’re having a real conversation about what the pipeline does, what kind of throughput it gives you, and what your analyst staffing actually looks like on the other end.
There’s a deeper version of this I think is worth sitting with for a minute.
Alan Watts has a line, paraphrased from a talk he gave on photography and recording, about busy-ness having nightmares of a world of echoes, because the recording is just an echo of an echo of an echo of an echo. The line keeps coming back to me when I think about how machine learning actually works under the hood. A model trained on labeled imagery is, when you really look at it, trained on a record of human judgments about what that imagery shows. The labels are the echo. The model’s predictions are the echo of that echo. When you deploy that model at scale, what you’re really deploying is a compressed, fast, slightly degraded version of judgments that human beings originally made, with all the human inheritances of context and error and bias baked in along the way.
That isn’t a knock on the technology, by the way. It’s just a description of it. And it implies something specific about where the actual value in a deep learning pipeline lives. The value isn’t really in the model, which by definition is downstream of human judgment. The value is in the humans at both ends of the pipeline, the labelers who taught the model what to look for and the analysts who interpret what the model surfaces. The model is the middle. The middle moves fast. The ends still do the work.
So no, I don’t think deep learning redefines efficiency. I think it moves the work to where it was always going to end up, which is in the hands of a person who knows what they’re looking at and can tell you what it actually means.
At BlueLens we try to build pipelines that respect that distinction, and we try to be honest about where the model is adding throughput versus where the human is adding meaning. It’s a less exciting pitch than “AI-powered everything.” It’s also, I’d argue, the only version of the pitch that holds up once you put it in front of an actual operational customer.
If you’re working on a hard geospatial problem and you want a pipeline that’s honest about which part of the work it actually does, we should talk.