Inference at the Source

For decades the Earth observation pipeline held a fixed shape. A sensor collected, a radio downlinked, and analysis happened on the ground. Between roughly 2019 and 2026 that shape broke. Inference moved off the ground and onto the spacecraft, first as narrow cloud filters on radiation-hardened vision processors, then as object detectors on embedded GPUs, and most recently as large vision-language models and datacenter-class accelerators operating in orbit. This review traces that migration. It organizes the field along three axes that the trade literature usually treats in isolation: the onboard processing pipeline and where value is created within it, the compute tiers that made each step possible, and the sensor modalities that impose the actual constraints. It argues that the central economic driver throughout has been the downlink bottleneck, that the recent inflection is genuine and compressed into a window of months rather than years, and that the field’s defining unsolved problem is no longer whether a model can run in orbit but whether its output can be trusted and combined once it reaches the ground. That last problem — the absence of a portable output data model across sensors and vendors — is where the next decade of work sits.

Scope and method: how the evidence is graded

This is a narrative literature review rather than a systematic one. It does not follow PRISMA protocol, and it makes no claim to exhaustive coverage of a field whose primary literature is large and whose commercial and classified activity is, by nature, partially observable. The aim is synthesis: to connect a set of developments the trade and academic literatures usually report in isolation, and to organize them along the pipeline, compute-tier, modality, and ownership axes developed below.

Sources are drawn from three tiers, and the distinction is load-bearing throughout. Claims are graded by which tier supports them.

Tier A — Peer-reviewed publications and primary agency or mission documentation. Used for all foundational and technical claims about onboard inference, hardware, and model architectures.
Tier B — Vendor primary materials: mission pages, specifications, and official program announcements. Used for capability and deployment claims, read with awareness of promotional intent.
Tier C — Reputable trade reporting. Used only for events too recent to have entered the primary literature, and flagged in text as reported rather than established.

Two limitations are stated plainly. First, the single most novel event covered, the April 2026 orbital vision-language demonstration, rests at present on Tier C reporting alone, and is treated accordingly. Second, the field is moving faster than its documentation. This review is a snapshot as of mid-June 2026, and several Tier B and Tier C claims will warrant revision as primary sources appear.

§1 — The inflection

The conventional Earth observation architecture is a courier service. A satellite captures imagery over a target, stores it, and waits for a ground station pass to empty its buffer. Analysts on the ground then apply machine learning or their own trained eyes to decide what the data contains. The intelligence is produced entirely on the surface. The spacecraft is a camera with a radio.

The constraint that holds this architecture in place is bandwidth. A satellite sees far more than it can ever send home. The downlink, not the sensor, is the scarce resource, and the entire economics of the industry has organized itself around that scarcity. Onboard processing is the structural answer, formalized in the literature as orbital edge computing: moving machine inference onto the spacecraft so that it can decide what matters before spending downlink on it.

What makes the present moment worth a formal review is how quickly the answer arrived. The foundational demonstrations took the better part of a decade. The transition to operational, datacenter-grade capability took months. Consider the recent record. Starcloud placed a single Nvidia H100 — on the order of a hundred times the onboard compute of the processors that preceded it in orbit — into very low orbit on 2 November 2025. Loft Orbital deployed Yam-9, carrying a heterogeneous bank of four networked compute units, on 28 November 2025. Axiom Space stood up its first orbital data center nodes in January 2026. In April 2026, a vision-language model answered natural-language queries about imagery from orbit for the first time on record. Four structural firsts in roughly half a year.

The thesis of this review is that these are not isolated press events. They are the surface of a single migration, and they become legible only when arranged against the pipeline they are reshaping and the hardware that finally made the reshaping possible.

§2 — The pipeline: where value is created onboard

The phrase edge computing is borrowed from telecommunications, where it means processing near the network periphery. In Earth observation it means something more specific: processing aboard the spacecraft before the data ever reaches the ground. The phrase flattens a sequence of distinct steps, and the distinctions matter, because each step creates a different product, serves a different consumer, and carries a different risk.

Stage	Function	Effect on downlink
Acquisition	Sensor captures over the area of interest. Identical to any conventional mission.	None
Calibration	Radiometric correction, noise removal, quality adjustment.	Negligible, but consequential downstream
Filtering	Cloud masking, discarding empty terrain, dropping irrelevant frames.	First major reduction
Compression	Conventional or learned encoding. The compressed form can double as an embedding.	Order-of-magnitude reduction
Inference	Detection, classification, feature extraction. Imagery becomes information.	Reduces to a report rather than a scene
Output	What leaves the spacecraft, and in what form. The branch point.	Defines the product

The onboard pipeline, after the structure set out in the TerraWatch Space edge-computing review. Reduction compounds from filtering onward. The output stage is where architectures diverge and where business models are decided.

Two of these stages deserve emphasis because they are routinely underweighted. The first is calibration. Performed poorly or inconsistently onboard, calibration produces a product that will not integrate cleanly with ground-processed data downstream. This is not a cosmetic concern. It is the seam along which the whole edge proposition either holds or fails, and we return to it in section nine.

The second is the dual nature of learned compression. A neural network trained to encode a scene into a compact representation produces something that can serve two masters. It is a smaller file to downlink, and it is also an embedding — a numerical fingerprint that downstream models can consume directly for classification or change detection without ever decompressing the original image. The implication is quietly radical. If the downlinked product becomes the embedding rather than the picture, then the question of whose embedding schema everything conforms to becomes the entire commercial contest, a point developed in sections seven and nine.

§3 — The foundational decade

The recent inflection rests on years of patient demonstration, most of it European and institutional, and almost all of it concerned with a single unglamorous task: deciding not to send something. The concept itself was named in 2019, when orbital edge computing was framed as machine inference performed in space, and the operating challenges and opportunities of edge AI for space systems were surveyed shortly after.

The empirical reference point is Phi-Sat-1, flown by the European Space Agency in 2020 and recognized as the first satellite to run a deep convolutional neural network on a dedicated onboard accelerator. Its job was modest and exactly right for the constraint. A segmentation network called CloudScout, running on an Intel Movidius Myriad 2 vision processing unit, identified cloudy hyperspectral frames so they could be discarded before downlink. The mission’s quieter contribution was proving that a commercial inference accelerator could survive ionizing radiation and produce results matching its ground-based twin.

ESA’s OPS-SAT extended the idea from a fixed model to a reconfigurable platform, introducing the NanoSat MO Framework, which let software be deployed and updated in orbit much as applications are installed on a phone. Phi-Sat-2, with Open Cosmos as prime contractor, carried that architecture to an operational footing in 2024: a multispectral cubesat running six swappable AI applications, among them street-to-map generation, vessel detection, learned image compression, marine-anomaly detection, and a wildfire classifier from Thales Alenia Space. The applications could be uploaded after launch and operated from a simple ground interface, which is the conceptual ancestor of every hosted-edge platform discussed below.

Running parallel to the European effort, the Jet Propulsion Laboratory carried a long autonomy lineage into onboard Earth-science inference. JPL benchmarking work placed a family of its operational models onto the same Myriad X vision processor class used in commercial cubesats, and reported that its output matched a ground reference closely enough for operational use. Among the models so demonstrated was a six-class U-Net flood-mapping network derived from earlier convolutional flood work on UAVSAR radar imagery of Hurricane Harvey, which is significant as an early concrete instance of onboard inference applied to radar rather than optical data. That JPL lineage reached its current expression on the CogniSAT-6/HAMMER cubesat, a 6U platform launched in March 2024 with a hyperspectral instrument, used as a testbed for onboard inference across clouds, wildfires, volcanic activity, harmful algal blooms, surface water, vegetation, mineral mapping, and land-use classification.

By the time large models reached orbit, in other words, the practice of running classifiers and segmentation networks in space was mature. The novelty of 2026 sits on top of a deep stack, not on bare ground.

§4 — The compute ladder

The clearest way to read the field’s progress is by the class of processor flown, because the model you can run is a direct function of the silicon you can power and cool. Each tier opened a category of capability that the tier below it could not reach.

Tier	Class	What it unlocks
I — VPU	Intel Movidius Myriad 2 / Myriad X. Watt-class power. Latch-up protected.	Fixed CNNs and segmentation. The class that proved inference survives orbit. Phi-Sat-1, Phi-Sat-2, CogniSAT-6, JPL benchmarking.
II — Embedded GPU	Nvidia Jetson Orin class. Tens of watts.	Object detection, multi-task pipelines, small foundation models. The workhorse tier for operational commercial constellations. Loft Yam-9, Planet on-orbit object detection.
III — Datacenter GPU	Nvidia H100, Blackwell, and TPU-class parts. Hundreds of watts.	Large vision-language models, training, high-throughput SAR inference. Starcloud, Kepler, the Google space-TPU effort.

Movement up the ladder is gated less by the chip than by power generation, radiative cooling, and radiation tolerance. The leap from Tier II to Tier III is the substance of the 2025 to 2026 inflection.

The Tier III leap is the one worth dwelling on. Starcloud-1 carried a single H100 in a package roughly the size and weight of a dormitory refrigerator into an orbit near 350 kilometers, on a mission whose primary purpose was to determine whether a high-power datacenter GPU could operate reliably through vacuum, radiation, and repeated thermal cycling. That the satellite also ran Google’s Gemma model and trained a small language model on orbit was, in engineering terms, secondary. The real result was that power delivery, thermal management, and sustained high-wattage operation can be maintained across an operational life in space. Once that is true, the ceiling on what can be inferred in orbit is no longer set by the spacecraft.

§5 — By modality: the sensor sets the constraint

Compute tiers describe what is possible. Sensor modality describes what is necessary, because each modality presents a different data-rate problem and therefore a different reason to process in orbit.

Synthetic aperture radar. SAR is the modality where onboard inference matters most, for a brutally simple reason. A SAR instrument can generate on the order of ten gigabytes per second, and downlinking that volume has always been the binding constraint on the architecture. SAR is also the modality where the all-weather, day-night collection that makes it valuable for defense and maritime work produces the most relentless data pressure. The research precedent is older than the commercial push: JPL demonstrated radar inference on flight-class hardware years earlier with its U-Net flood-mapping network on UAVSAR imagery. The defining recent commercial demonstration is Starcloud’s plan to ingest SAR from Capella Space’s constellation, process it in orbit, and downlink only the resulting insights, with capsized-vessel and wildfire detection cited as early workloads. Processing radar where it is collected, rather than shipping it home raw, is the single clearest case for the entire edge thesis.

Optical and multispectral. The optical segment is where onboard inference is moving fastest from demonstration into fielded product. Planet already flies Jetson Orin processors for object detection and designs onboard GPU inference into its next-generation constellation from the start. BlackSky’s Gen-3 system pairs very high-resolution optical collection with automated, AI-enabled tip-and-cue tasking and shortwave-infrared bands that see through smoke and haze, a capability set discussed in section eight. The optical case is less about raw data rate than about latency: getting a detection to a decision-maker while the event is still unfolding.

Hyperspectral. Hyperspectral imaging has been the proving ground for onboard AI from the beginning, because its data cubes are large and most of the spectral richness in any given scene is irrelevant to any given question. The lineage runs straight through Phi-Sat-1’s hyperspectral cloud filter and CogniSAT-6’s spectral inference across geology, vegetation, and water targets. Hyperspectral is the modality where onboard feature extraction, rather than mere filtering, has the longest record.

Thermal, RF, and the multi-sensor turn. The newer architectures are explicitly multi-sensor. Loft’s forthcoming YAC-4 constellation is positioned as the first AI-enabled Earth observation platform combining multi-sensor collection with onboard processing in one system. The strategic significance is that fusing modalities onboard, rather than on the ground, raises the calibration and normalization stakes sharply, because the products now have to agree with each other before they ever leave the spacecraft.

§6 — Where the compute lives

A second structural axis concerns ownership of the compute. Two models have emerged, and they imply very different futures.

The hosted-edge model. Loft Orbital’s approach treats the spacecraft as infrastructure. Under its Virtual Missions program, customers deploy and update AI models on Loft’s on-orbit compute in real time, reach existing sensors on active satellites, and validate analytics without owning or managing any hardware. Yam-9 is, by Loft’s account, the first commercial satellite to fly four networked compute units as a single heterogeneous environment, explicitly as a benchmarking pathfinder for that constellation. The model is closer to cloud infrastructure-as-a-service than to traditional satellite manufacturing, and the ecosystem around it is already populated: an analytics provider running maritime domain awareness on a Loft platform under a US Air Force Phase II STTR, a third-party edge platform processing JPL wildfire and flood algorithms onboard, and a partnership with Helsing aimed at a multi-sensor constellation for defense and security.

The orbital data center model. The more aggressive model treats orbit not as a place to process a satellite’s own data but as a destination for compute itself, driven by terrestrial constraints on power, land, and cooling. Starcloud is the clearest exponent, with a stated trajectory from its single-GPU pathfinder toward multi-GPU and Blackwell-class payloads on its October 2026 launch, integrated with a cloud platform module so customers can deploy workloads to orbit directly, and a long-horizon ambition of gigawatt-scale orbital facilities. Axiom Space’s orbital data center nodes and Kepler Communications, reported to operate the largest cluster of GPUs in space and to host undisclosed commercial workloads, occupy the same category. A reported collaboration to fly Google TPUs on a satellite bus signals that the largest compute vendors now regard space as a viable processing tier rather than a novelty.

The hosted edge processes the data a satellite already collects. The orbital data center brings the compute to space and lets the data come to it. The first reshapes Earth observation. The second reshapes computing.

§7 — The model layer: from convolutional nets to language in orbit

The hardware story has a software counterpart, and the two converged precisely at the recent inflection. For most of the foundational decade, onboard models were small, single-purpose convolutional networks — a cloud detector or a ship detector, trained for one task and frozen. The shift since 2023 has been toward geospatial foundation models, large networks pretrained on vast unlabeled imagery and adapted to many downstream tasks with little additional data.

The reference points are now well established. Prithvi, from NASA and IBM, is pretrained on harmonized Landsat and Sentinel-2 imagery and fine-tuned for cloud imputation, flood mapping, fire-scar segmentation, and crop classification. TerraMind, from IBM, ESA, and Forschungszentrum Jülich, pushes the concept to an any-to-any multimodal model spanning nine data types, including optical, SAR, elevation, and text. The open-source Clay model, from Development Seed, is pretrained over tens of millions of multi-sensor image chips. Benchmark evidence accumulated through late 2025 indicates that Earth-observation-specific models such as TerraMind, Prithvi, and Clay outperform natural-image pretrained backbones on multispectral tasks.

One development bears directly on the embedding-as-output argument from section two. Google DeepMind’s AlphaEarth Foundations produces a compact sixty-four-dimensional embedding per location, deliberately designed to be consumed by classical machine learning — random forests and gradient-boosted trees — inside a data warehouse, eliminating the GPU inference step at the point of use. If embeddings of that kind become the canonical exchange format for Earth observation, the question of which embedding space the industry standardizes on becomes foundational rather than incidental.

Moving foundation models onto the spacecraft is now a named research problem in its own right. The first comprehensive survey of onboard remote-sensing foundation-model deployment, published in early 2026, lays out a unified pipeline spanning model development, compression, and hardware optimization, and concludes that onboard deployment of these large models is feasible within current memory, energy, and computation envelopes. That conclusion is the bridge between the foundation-model literature and the orbital hardware described in section four.

The April 2026 vision-language demonstration is the visible apex of this software arc. A Gemma-class vision-language model, carried inside a JPL software harness on Loft’s Yam-9, answered plain-language tasking — identifying where natural terrain meets human development, or infrastructure clustered around railway hubs — and returned the result without a human analyst in the loop. It is essential to be precise about what this is and is not. It is onboard semantic triage against a single scene: large-scale land-use and land-cover classification and object localization driven by natural language. It is not temporal change detection, which requires a registered baseline and a differencing operation the demonstration did not perform. The distinction is not pedantry. It marks the boundary between a triage signal and a precision analytic product, and that boundary is exactly where the field’s unsolved problem lives.

§8 — The defense vector: tip-and-cue, and the pull toward autonomy

The clearest operational demand for onboard inference comes from defense and intelligence, where the value of a detection decays by the minute. The organizing concept is tip-and-cue: a wide-area sensor detects activity and automatically cues a high-resolution sensor to inspect it, ideally fast enough to catch a moving target.

BlackSky has built its commercial posture around this loop. Its Gen-3 constellation performs automated detection and classification of vehicles, vessels, and aircraft, delivered through a tasking and analytics platform with low-latency AI analytics produced while a mission is ongoing. Its planned AROS wide-area surveillance satellites, funded through a National Reconnaissance Office contract and oriented toward the end of the decade, are explicitly designed to tip-and-cue the Gen-3 birds, pairing broad-area search against point-target inspection at national and regional scale. Adjacent awards, including a Defense Innovation Unit tactical GEOINT contract, point the same direction: real-time, AI-enabled intelligence pushed toward the tactical edge.

The trajectory implied by tip-and-cue is autonomy. Once a constellation can detect, cue, and characterize without a ground loop, the natural extension is standing tasking expressed in language — an instruction to watch a region and report anomalies, with the satellites resolving the rest among themselves. That is precisely the capability the orbital vision-language demonstration gestures toward, and it is why the April 2026 result reads as more consequential to defense planners than its modest technical scope would suggest.

§9 — The unsolved problem: interoperability, and the missing output data model

The field has largely answered the question it spent a decade asking. A model can run in orbit, on hardware ranging from a watt-class vision processor to a datacenter GPU, across every major sensor modality, and the onboard deployment even of large foundation models is now assessed as feasible. The open question is no longer feasibility. It is trust and combination.

Three problems compound at the output stage. The first is calibration consistency. When radiometric correction happens onboard, inconsistently across spacecraft or across a constellation’s operational life, the resulting products may not align cleanly with each other or with ground-processed archives. An analyst fusing detections from multiple sources inherits those inconsistencies silently. The second is schema fragmentation. Each operator is building its own onboard output format, its own confidence semantics, its own coordinate and metadata conventions, and there is no shared standard governing how a detection from one vendor’s spacecraft normalizes against a detection from another’s. The third is provenance. An onboard inference result that arrives as a terse report, rather than as imagery an analyst can re-examine, carries a heavier burden of explainability, because the evidence that would let a human adjudicate it never came down.

These are not peripheral engineering details. They are the seam along which the edge proposition either delivers fused, decision-grade intelligence or produces a proliferation of incompatible, unauditable alert streams. The decade of onboard inference work solved collection-side autonomy. The coming decade’s work is a portable, sensor-agnostic output data model: a normalization layer that lets a product mean the same thing, carry highly accurate precision geocoordinates, and combine faithfully, regardless of which spacecraft, which sensor, or which vendor produced it. The surveyed literature on onboard foundation-model deployment is converging on architecture, optimization, and hardware as named subfields. The interoperability of what those models emit remains, for now, conspicuously underdeveloped.

§10 — Outlook: what to watch

Four developments will indicate whether the inflection consolidates or stalls. First, the October 2026 Starcloud launch and its Blackwell-class payload will test whether datacenter-grade inference in orbit is operationally repeatable or a one-time demonstration. Second, the fielding of Loft’s multi-sensor YAC-4 constellation will test whether onboard sensor fusion can be made to hold the calibration line. Third, the maturation of compact embeddings as an exchange format will determine whether the industry converges on a shared numerical language or fragments into proprietary ones. Fourth, and most consequentially for the analytic community, the response to the interoperability gap will decide whether onboard intelligence becomes a trusted layer of the geospatial enterprise or a fast but unreliable tributary feeding it.

The unveiling, to borrow a word, is not that satellites can now think. It is that the place where Earth observation produces meaning has moved, from the ground to the source, and the architecture, the standards, and the tradecraft have not yet caught up to where the inference now lives.

References

Selected sources, graded by tier as described above. The full BibTeX is archived alongside this review.

Denby, B., & Lucia, B. (2019). Orbital Edge Computing: Machine Inference in Space. IEEE Computer Architecture Letters, 18(1), 59–62.
Furano, G., Meoni, G., Dunne, A., Moloney, D., et al. (2020). Towards the Use of Artificial Intelligence on the Edge in Space Systems: Challenges and Opportunities. IEEE Aerospace and Electronic Systems Magazine, 35(12), 44–56.
Giuffrida, G., Fanucci, L., Meoni, G., et al. (2021). The Phi-Sat-1 Mission: The First On-Board Deep Neural Network Demonstrator for Satellite Earth Observation. IEEE Transactions on Geoscience and Remote Sensing.
European Space Agency. OPS-SAT mission and the NanoSat MO Framework.
Open Cosmos / European Space Agency. (2024). Phi-Sat-2: Onboard AI Apps for Earth Observation.
Dunkel, E., Swope, J., et al. (NASA JPL). Onboard AI benchmarking of operational Earth-science models on flight-class vision processors (Myriad X).
Denbina, M., Towfic, Z., Thill, J., Bue, M., et al. Flood Mapping Using UAVSAR and Convolutional Neural Networks. IGARSS, 3247–3250.
Zilberstein, I., et al. (NASA JPL) (2025). Demonstrating Onboard Inference for Earth Science Applications with Spectral Analysis Algorithms and Deep Learning. arXiv:2508.15053.
TerraWatch Space. (2026). Edge Computing for Earth Observation: 2026 Edition.
IEEE Spectrum. (2025). Nvidia H100 in Space: Starcloud’s Orbital Data Center.
CNBC. (2025). Nvidia-backed Starcloud trains first AI model in space.
Data Center Frontier. (2025). Starcloud Launches Orbital AI Data Center with NVIDIA H100 GPU.
Loft Orbital. (2025–2026). YAM-9 and Virtual Missions program materials. (Tier B)
Fernholz, T. (2026). A satellite just learned to find things on its own. TechCrunch, 15 June 2026. (Tier C; sole source for the April 2026 demonstration.)
Jakubik, J., et al. (2023). Foundation Models for Generalist Geospatial Artificial Intelligence (Prithvi). arXiv:2310.18660.
IBM, European Space Agency, & Forschungszentrum Jülich. (2025). TerraMind: an any-to-any multimodal geospatial foundation model.
Development Seed. (2025). Clay Foundation Model.
Google DeepMind. AlphaEarth Foundations: 64-dimensional per-location embeddings. See also GEO-Bench-2 (Nov 2025) and “Harvesting AlphaEarth,” arXiv:2601.00857.
Onboard Deployment of Remote Sensing Foundation Models: A Comprehensive Review of Architecture, Optimization, and Hardware. (2026). Remote Sensing, 18(2), 298. doi:10.3390/rs18020298.
Léonard, C., Stober, D., & Schulz, M. (2025). FPGA-Enabled Machine Learning Applications in Earth Observation: A Systematic Review. arXiv:2506.03938 (TU Munich / DLR).
BlackSky Technology. (2025–2026). Gen-3 and Spectra platform materials. (Tier B)
Breaking Defense. (2026). NRO funds BlackSky for new satellites, AI-optimized image detection system.
BlackSky Technology / BusinessWire. (2025). Defense Innovation Unit Gen-3 Tactical GEOINT (TACGEO) expansion contract. (Tier B)