Some time in 2024, I was running a change detection algorithm over Zaporizhzhia, Ukraine. I was not looking for anything in particular. I was testing a method. Pre-conflict median composite from Sentinel-1 GRD, post-conflict median composite, log ratio in decibels, a threshold somewhere around minus one and a half. Standard workflow. The kind of thing you build to see if it works before you trust it with anything that matters.
It worked.
The algorithm flagged a small cluster of structures in the urban core. I cross-referenced with open-source reporting and local news. The buildings had been hit by drone strikes within the previous two weeks. I had not been hunting. I had been testing. The method found the truth on its own.
That moment, more than any benchmark or paper, is what taught me what these tools are for and where they stop being useful. It also taught me something about the role of large language models in a working geospatial pipeline. Because I did not write that pipeline alone. I wrote it with an LLM. But the LLM never saw the imagery. The LLM never knew the coordinates. The LLM was not in the room when the algorithm flagged those buildings. And that separation was the entire point.
This is a field report from two years of using LLMs alongside PyQGIS in production work. Not vibes. Not vendor talking points. What I have actually been doing, what has worked, what still does not work, and why the workflow I developed when the models were genuinely bad is, somewhat to my surprise, still the workflow that holds up now that the models are good.
The Method, Not the Machine
Before I talk about LLMs, I want to be clear about what I mean by remote sensing work. I am a sensor-agnostic practitioner. The platform does not matter to me. Sentinel-1 SAR, HLS harmonized optical, Landsat, PlanetScope, NAIP, commercial VHR, a drone, a fixed-wing collection, a guy with a camera hanging out of a Cessna. They are all just inputs. The discipline is in what you do with them.
That posture matters for this piece because the workflow I am about to describe is not a SAR workflow or an HLS workflow. It is a remote sensing workflow. The same pattern that found those buildings in Zaporizhzhia also found burn scars in summer fire seasons within two weeks of running the test. Different sensor. Different physics. Same discipline. Same role for the LLM.
The temptation in this field is always to attach yourself to a platform and become a Sentinel-1 person or a Planet person or an ESRI person. Resist it. The phenomena come first. The sensor is whatever you have access to that lets you observe the phenomenon well enough to reason about it. Pipelines should be portable across sensors because the questions are portable across sensors. Once you accept that, the role of the LLM becomes clearer too. It is helping you build the pipeline, not the pipeline itself.
How I Actually Use the Thing
The pattern I have used since 2024 is what I call the consultant pattern. The LLM is in a different room. You walk into the room. You describe the problem. You get advice. You walk out. You apply the advice yourself, in your own environment, with your own data, and you verify everything before you trust it.
This sounds restrictive because it is. That is the point.
In practical terms, here is what that looks like in PyQGIS work. I am writing a script that needs to chain a few processing algorithms, maybe a buffer, a clip, a focal statistics, a raster calculator expression. I know what I want the pipeline to do. I am not sure of the exact processing.run() incantation for one of the steps. The algorithm IDs in QGIS are not memorable. They look like native:zonalstatisticsfb and gdal:cliprasterbymasklayer and they take parameter dictionaries that nobody can remember off the top of their head. I open a chat window. I describe the step. I get back a draft.
I do not paste that draft straight into my script.
I read it. I check the algorithm ID against the QGIS processing toolbox. I check the parameter names against the algorithm help text. I run the snippet in isolation in the Python console, against a small test layer, with print statements around it, and I look at the output. Only then does it go into the real pipeline. The LLM proposed. I disposed. The work is mine.
That sequence sounds slow. It is not. The LLM compresses the part that used to be slow, which was hunting through documentation and Stack Overflow for the right algorithm and the right parameter shape. The verification step is fast because you already know what good output looks like for the algorithm you asked about. The acceleration is real. It is just bounded by your judgment, not the model’s confidence.
Where It Genuinely Helps
Two years in, here is what the consultant pattern is actually good for in PyQGIS work.
Scaffolding processing.run() calls when you know the algorithm exists but cannot remember the parameter dictionary. This is the highest-frequency use case and the one with the cleanest return on time. You describe what you want. You get a parameter dictionary. You check the keys against the algorithm and you are off.
Walking through the QGIS object model when you are doing something less common. QgsProject, QgsMapLayer, QgsVectorLayer, QgsRasterLayer, the layer registry, the canvas, the iface object, the symbology classes. The relationships between these are not intuitive if you came to QGIS from ArcPy or from straight GDAL. An LLM is a good conversational partner for asking what a QgsFeatureRequest actually does or how to get from a layer to its data provider to its underlying source.
Drafting QML style XML. Nobody writes QML by hand. It is generated by the QGIS UI and then sometimes you need to tweak it programmatically or generate it from scratch for a batch styling job. LLMs are good at this because QML is verbose and structured and the patterns are learnable.
Debugging cryptic GDAL and SNAP error messages. GDAL errors are often a single line that means nothing unless you have seen it before. SNAP graph builder errors are worse. An LLM that has seen ten thousand of these in its training data is faster than your memory or your grep skills.
Writing the boring iteration code around an algorithm you already understand. You know the algorithm. You need to apply it to four hundred scenes in a directory structure that has some quirks. Writing that loop by hand is not interesting. Asking for a draft loop and then adjusting it is faster.
Generating test geometries and test rasters for unit-style verification. When you want to know whether your spatial logic is correct, the fastest path is often a tiny synthetic dataset where you know the right answer in advance. LLMs are good at producing this.
Explaining what a stranger’s PyQGIS plugin is actually doing. You inherit a workflow. The code is uncommented. The variable names are l and r and tmp2. Pasting a function into a chat and asking for a plain-English summary is faster than reading it line by line, and as long as you verify the summary against the actual behavior, it is reliable.
Where It Still Lies
The failures are also worth being specific about, because the failure modes are why the consultant pattern matters. If the model never lied, you could let it drive. The model still lies. So you cannot.
Hallucinated QGIS API methods. This is the most common failure and the most dangerous one because the hallucinated methods look exactly like real methods. QgsVectorLayer.getFeaturesInBoundingBox() does not exist, but it looks like it should, and a model that does not know any better will write code that calls it. The code will fail at runtime, not at parse time, which means if it sits in a branch you do not test, it will bite you in production.
Wrong CRS handling, confidently. CRS issues are subtle. They produce results that look reasonable but are wrong in ways that only become obvious when you overlay against ground truth. LLMs will confidently mix EPSG codes, treat geographic coordinates as projected, forget to reproject before measurement, and skip the QgsCoordinateTransform step entirely. This is a hard one to catch by reading the code. You catch it by checking the output against a known reference.
Confusing QGIS Python with ArcPy. They share concepts but not APIs. A model that has seen more ArcPy than PyQGIS in its training data will quietly substitute ArcPy idioms into PyQGIS code. The code looks reasonable. It does not run.
Outdated processing algorithm IDs. QGIS has been through multiple processing framework rewrites. Old tutorials use old IDs. Models trained on old tutorials suggest old IDs. The fix is usually a one-token substitution but you have to know to look.
Misunderstanding the layer, provider, and source distinction. A QgsVectorLayer is not the same as its data provider, which is not the same as its source URI. Operations that look like they should work on the layer actually need to happen on the provider. Models often blur this.
Anything involving the canvas versus the project versus the layer registry. The QGIS interface has three overlapping but distinct concepts of what is loaded. The canvas is what you see. The project is what is saved. The layer registry is what is in memory. They are usually in sync but not always, and operations that target the wrong one fail silently.
If you do not know PyQGIS well enough to catch these, an LLM driving QGIS for you will produce code that runs and is wrong. That is the worst possible failure mode in geospatial work. Code that crashes you fix. Code that lies about reality ships into a report.
The Air Gap, or, Bush Engineering for Serious Work
The consultant pattern began as a constraint. In 2024, LLMs were not integrated with anything. There was no Copilot in QGIS. There was no agent that could read your data. There was a chat window in a browser and there was your laptop and the data sat in a directory the chat window could not see. So you typed your problem, you got advice, you went and applied it.
That constraint turned out to be a feature.
When you work on conflict zones, you do not paste imagery into cloud chat windows. When you work on classified material, you do not paste anything into anything that is not in the same security boundary as the data. When you work on medical imagery or patient-adjacent data, the same logic applies. When you work for clients whose contracts include data handling clauses, the same again. The set of domains where you cannot send your data to a third-party LLM is much larger than the set where you can. And that set includes most of the work that actually pays.
The consultant pattern handles this cleanly because it never needed the data in the first place. The LLM is helping you write code. The code runs against the data on your machine or your authorized environment. The LLM is in the room next door, advising on syntax and structure. It never crosses the threshold into the room where the data lives.
This is bush engineering. Pip-boy style. You build the workflow with what you have, in the environment you are allowed to operate in, and you make the tool do the small thing it is good at without asking it to be the whole pipeline. The consultant pattern works with a hosted frontier model. It also works with a local model running on your workstation. OpenClaw, a quantized Llama variant, whatever fits in the VRAM you have. Local models are not as sharp. They make more mistakes. But the role you have constrained them to is small enough that they can play it. You are asking for a draft processing.run() call, not for a research assistant.
This portability is the part that I think the wider community has not fully absorbed. There is an enormous amount of geospatial work happening in environments where you cannot phone home. Government work. Defense work. Medical research. Critical infrastructure. International work where data residency is a hard requirement. The agent-driven, cloud-integrated, IDE-resident model that the industry is currently selling will not survive contact with any of these environments. The consultant pattern, executed against a local model, will. Quietly. Without ceremony. With a frugality that some readers will recognize from their own engineering traditions.
A Second Example: HLS and Burn Scars
I mentioned earlier that the same pattern worked for an HLS optical workflow. The mechanics are different but the discipline is the same.
HLS, the Harmonized Landsat Sentinel-2 product, gives you a consistent thirty-meter optical record across two satellite families. It is a beautiful dataset for vegetation work because the harmonization smooths over a lot of the cross-sensor headaches that used to require careful per-scene calibration. For burn scar detection, the standard move is to compute a Normalized Burn Ratio from the near-infrared and short-wave infrared bands, run pre and post composites, and difference them. The dNBR. Old technique. Reliable.
I built the pipeline the same way I built the SAR one. Sketched the design myself. Asked the LLM for help on the parts that are tedious to write from memory. Stacking the HLS scenes into a cube. Applying the right cloud and shadow masks from the QA bands. Vectorizing the dNBR threshold into polygons with sensible minimum mapping units. Writing out a GeoPackage with proper attribution. None of these are hard problems. All of them are easier to scaffold with a consultant than from cold memory.
When I ran the pipeline, the algorithm flagged buildings that had burned within roughly two weeks of the run. These were not wildfire scars. They were houses and apartments set alight by drone and rocket strikes. The dNBR does not care what started the fire. A burn is a burn. The algorithm did exactly what the algorithm was designed to do, and the result that came back was the same kind of result the SAR pipeline had produced over Zaporizhzhia. I was testing a method. The method found something real.
I want to be careful about the lesson here. The lesson is not that LLMs found the burn scars. They did not. A standard dNBR with cloud masking and a sensible threshold found the burn scars. The LLM helped me write the plumbing. The remote sensing was mine. The science was mine. The judgment about what threshold to use, what mapping unit to apply, what to validate against, what counts as a real signal versus a phenological artifact, all mine. The LLM was the consultant. The work was the work.
This is the part I want practitioners in any sensor community to take away. The LLM does not replace your remote sensing. It accelerates the parts of your remote sensing work that were never the interesting parts to begin with. You still have to know your physics. You still have to know your sensor. You still have to know what a real signal looks like in your data. If you do not, no amount of LLM acceleration will save you.
What I Tell People Starting Now
The market is going to push you toward agents. Toward IDE plug-ins that autocomplete your geospatial code. Toward integrations that let the LLM read your data, drive QGIS, and run your processing for you. Some of this will be useful. Most of it will be premature.
If you are starting now, my advice is the same advice the constraint forced on me two years ago. Do not start with an agent. Start with the consultant.
Learn enough PyQGIS to read it critically. You do not need to memorize the API. You do need to know enough about the object model, the processing framework, and the CRS handling to spot the lies when they appear. This is a few weeks of focused work. It is the most valuable few weeks you will spend.
Build verification into your workflow before you build acceleration into it. Every LLM-drafted snippet runs in isolation against a small test case before it goes into the real pipeline. Every spatial result is checked against a known reference before you trust it. The verification habit is the muscle that lets you move fast safely. Without it, you are not moving fast. You are moving wrong, quickly.
Treat the data and the model as two different rooms. Even when you do not have to. Even when the data is not sensitive. Even when the model is hosted and convenient. The habit of separation makes you portable across environments. Today’s casual project is tomorrow’s contract with a data handling clause. The workflow that works for both is the consultant pattern.
Stay sensor-agnostic. The model that helps you with Sentinel-1 helps you with HLS helps you with PlanetScope helps you with the drone footage your colleague brought back from a survey. The workflow is the same. The discipline is the same. The LLM is helping you write the same kind of code in all four cases. Resist the urge to specialize too narrowly. The phenomena are the work. The sensors are the means.
Closing
The buildings in Zaporizhzhia are why I write about this. Not because the algorithm was clever. The algorithm was standard. Log ratio change detection on Sentinel-1 GRD is a method any competent remote sensing person could implement in an afternoon. The point is what surrounded the algorithm. The discipline of method, the separation of model from data, the verification habit, the willingness to test something carefully before trusting it with anything that mattered. That is what made the result mean something when it appeared.
An LLM driving QGIS for me could not have produced that result. Not because the model was incapable. Because the workflow that produces results worth trusting is not a workflow that an agent runs. It is a workflow that a practitioner runs, with a consultant in the next room, with the data in its own room, with verification at every step, and with the practitioner’s judgment as the only thing that gets to decide what counts as real.
Two years in, that has not changed. The models have gotten better. The pattern has not changed. I suspect it will not change for a long time, because the pattern is not really about the model. It is about what disciplined work looks like when a powerful but unreliable tool enters the room. You give it a small job. You verify what it gives back. You keep the data on your side of the wall. You do the science yourself.
That was true when the models were bad. It is still true now that they are good. It will be true when they are better. The discipline is the thing.