Engineering story

How a geospatial AI supply chain monitor was built, what went wrong, and what the architecture actually looks like under the hood. Written for the hiring panel, not the user.

1. The problem

Mid-market manufacturers and logistics operators have almost no visibility into supply chain disruptions until a shipment is late. When a port closes in Panama, a cyclone hits Mozambique, or labour action stalls Australian terminals, the typical manufacturer finds out when their freight forwarder sends an apologetic email — usually a week after the fact.

Tier-1 supply chain visibility platforms (project44, FourKites, MarineTraffic) cost USD 100k+ annually and require integration projects measured in months. They are excellent products, but they are built for enterprises with dedicated logistics teams. The mid-market — companies moving 50-500 containers a year — gets left with spreadsheets and carrier portals.

SupplyWatch demonstrates that a usable geospatial disruption monitor is achievable with entirely free satellite data (ESA Sentinel-2), a current-generation multimodal model (Gemini 2.5 Flash), and a handful of GCP services. Total monthly cost at demo scale: under AUD $25. Total time from first line of code to working dashboard with 70 ports: roughly 3 weeks of nights-and-weekends work.

2. Architecture

Cloud Scheduler

Daily trigger, 2am UTC

↓

Cloud Run Job

Python 3.13 pipeline

queries

CDSE STAC API

Sentinel-2 L2A

reads creds

Secret Manager

calls

Vertex AI

Gemini 2.5 Flash

queries

Open-Meteo API

weather archive

↓

GCS

imagery cache

Firestore

briefings + watchlist

↓

Cloud Run Service

Next.js 16 dashboard

Daily pipeline (Cloud Run Job)

Triggered by Cloud Scheduler at 2am UTC. Reads the watchlist from Firestore, acquires a fresh CDSE OAuth token from Secret Manager, queries the Copernicus Data Space Ecosystem STAC API for the latest Sentinel-2 L2A scene per port with cloud cover under 10%, downloads the true-colour composite, crops it to a 2km x 2km AOI around the port coordinates, and uploads the JPEG to GCS. Finds the previous month's image as a baseline, then calls Gemini 2.5 Flash for the two-stage analysis.

Two-stage Gemini analysis

Stage 1 (Observe): Gemini describes both images in free text — vessel count and position, quay activity, yard fill, water conditions, landside activity. No disruption assessment at this stage. Stage 2 (Assess): A separate call takes the observations plus external context (weather from Open-Meteo, labour events, geopolitical signals from a curated JSON dataset of 18 active events), and produces a structured DisruptionAnalysis JSON object with severity score, confidence grade, disruption category, and quantitative metrics including vessel counts and yard fill percentage.

Frontend (Cloud Run Service)

Next.js 16 App Router, TypeScript strict, Tailwind CSS. Reads from Firestore in real time via the Firebase Web SDK. The landing page at / is a server component that loads instantly with no Firebase dependency. The dashboard at /demo is a client component with real-time Firestore subscriptions. Leaflet renders the interactive map with AOI rectangles drawn from port coordinates.

Infrastructure as code

All GCP resources are managed by Terraform with remote state in GCS. This includes the Cloud Run Job, Cloud Run Service, Cloud Scheduler, Artifact Registry, GCS buckets, Secret Manager entries, and IAM bindings. Every service account follows least-privilege: the frontend runner has no direct GCP permissions (Firebase is accessed from the browser), the pipeline runner has exactly the roles it needs (Firestore user, Vertex AI user, GCS object admin on a single bucket, Secret Manager accessor on two secrets).

3. The geospatial accuracy problem

The hardest technical problem in this build was not the AI analysis. It was getting the geospatial code right. AI-generated geospatial code — coordinate transforms, bounding box calculations, CRS conversions, tile boundary handling — is wrong approximately 60% of the time when written without an execution environment to iterate against.

The failure modes are specific and reproducible. A model will compute a WGS84 bounding box correctly but then pass it to a function expecting EPSG:3857. It will swap latitude and longitude because the training data is ambiguous about axis order. It will compute a tile index that is off by one at a projection boundary. It will calculate a distance in degrees as if they were metres at the equator and apply it at 55 degrees north. These are not hallucinations — they are correct computations applied in the wrong coordinate space, and they are essentially invisible until you visualise the output on a map or run assertions against known reference points.

The standard approach — write the code, run the tests, fix the failures — breaks down when the code is generated by a model that cannot execute anything. You end up in a loop: the model generates code, you notice a problem, you describe the error back to the model, it generates new code with different bugs.

The agentic execution loop

I structured the geospatial development as an agentic loop: the AI wrote code, a sandboxed Python runtime executed it immediately against real coordinate pairs, and assertion failures — wrong CRS, axis-order mismatch, out-of-bounds tile, projection drift over distance — became automatic feedback. The model saw the failure, the stack trace, and the expected vs actual values, then iterated.

After several iterations, the code stabilised. Each function acquired inline assertions: for every computed bounding box, check that the four corners form a valid WGS84 rectangle in the expected longitude range. For every tile index, check that the index matches a manual computation for a known reference point. For every distance calculation, validate against the Haversine formula as an independent check. The assertions became the guardrail — they remain in the committed code and run as part of the test suite.

This pattern — an AI writing code with an execution sandbox that provides immediate, machine-readable feedback — is directly relevant to the FDE role. Customer deployments routinely require coordinate-space code: geofencing, route matching, asset tracking, spatial queries. The ability to set up a feedback loop where the AI can run its own code and self-correct is the difference between a working integration and an endless debugging session.

4. Reasoning architecture

The Gemini analysis is not a one-shot call. The system separates observation from assessment — a two-stage chain-of-thought that prevents the model from jumping to conclusions based on superficial image features.

More importantly, the system weighs evidence types against each other. Satellite imagery is the primary signal, but it has a known failure mode: a port can look completely normal from orbit while being operationally dead. A court ruling voiding concessions (Panama, April 2026) stopped all vessel movement at Balboa and Cristobal, but the satellite image still showed berthed vessels and stacked containers. A single-modal system looking only at pixels would report normal operations. SupplyWatch injects external context — geopolitical events, weather data, labour actions — and has explicit rules about precedence.

Worked example: Beira, Mozambique

The Sentinel-2 image of Beira captured on 15 March 2026 showed vessels at berth, yard stacks at expected levels, and operating cranes. On imagery alone, a severity score of 1 or 2 would have been reasonable. But external weather data from Open-Meteo reported tropical storm conditions in the Mozambique Channel, with the port — which has limited sheltered berthing — reporting 12.5 days average vessel waiting time. The system's assessment prompt instructs Gemini that severe weather with documented congestion at the specific port overrides ambiguous imagery. The result was a severity 5/5 with high confidence, matching what a human analyst would conclude given the same evidence.

The inverse case also works: a port in a region with active geopolitical events but no direct impact on that specific terminal is not flagged. The prompt explicitly constrains external context to ports "directly in the conflict zone or subject to the court ruling." This prevents the system from painting an entire region red because one nearby port has a problem.

This evidence-weighing pattern — primary signal from imagery, secondary signals from external data, explicit precedence rules — is the kind of architecture an FDE deploys for real customers. It is not "prompt engineering." It is reasoning architecture: deciding which evidence to trust when signals conflict, and encoding that decision in the system rather than hoping the model guesses right.

5. Cost engineering

The entire system runs on under AUD $25/month at current demo scale (70 ports, daily analysis). A production deployment with 200+ ports and twice-daily runs would stay comfortably under AUD $200/month.

Service

Monthly cost (AUD)

Notes

Cloud Run Job

~$3.00

2 vCPU, 4 GB, ~5 min/day. Scale to zero.

Cloud Run Service

~$2.00

256 Mi, 1 vCPU. Scale to zero.

Vertex AI (Gemini Flash)

~$6.00

140 images/day, 2 calls each. Flash is ~25× cheaper than Pro.

Firestore

~$1.50

Reads: dashboard queries. Writes: 1 doc/port/day.

GCS

~$0.80

~140 JPEGs/month at ~50 KB each. Lifecycle policy deletes >90 days.

Cloud Scheduler

~$3.00

1 job/day = 30 invocations/month ($0.10 each).

Secret Manager

~$0.50

2 secrets, ~60 accesses/month.

Sentinel-2 data

Free

ESA Copernicus programme. No API key required.

Open-Meteo weather

Free

No API key. Archive API for historical backfill.

Total

~$16.80

Less than a single lunch for two in Sydney.

The key cost optimisations are all structural, not penny-pinching: Cloud Run scales to zero between runs so compute cost is proportional to usage, not uptime. Sentinel-2 data is free (taxpayer-funded by the EU Copernicus programme). Gemini Flash is deliberately chosen over Pro — at 25x the cost, Pro would make this uneconomic for the mid-market, and Flash's multimodal quality is more than sufficient for satellite imagery analysis. GCS object lifecycle policies automatically delete imagery older than 90 days, keeping storage costs near zero.

6. What I'd build next

Written as if scoping work for a real customer with a supply chain visibility budget of ~AUD $5,000/month.

SAR integration (Sentinel-1) for cloud-covered regions

Sentinel-2 optical imagery fails when there is cloud cover — and many of the most disrupted ports are in tropical regions where cloud cover exceeds 50% on most days. Sentinel-1 C-band SAR penetrates cloud and provides backscatter intensity that can be used to detect vessel presence, yard fill changes, and infrastructure changes. The technical challenge is that SAR imagery requires different preprocessing (radiometric calibration, speckle filtering, geocoding) and the interpretation model for Gemini must be trained on a different visual vocabulary. A production system would run both: Sentinel-2 as primary, Sentinel-1 as the fallback when cloud cover exceeds the threshold.

Customer-specific anomaly types

The current system detects generic disruption categories (weather, congestion, labour, incident). A real deployment would define custom anomaly types per customer: an automotive manufacturer cares about RoRo terminal congestion and parts-container dwell times, a retailer cares about yard fill at intermodal rail yards, an electronics manufacturer cares about air freight terminal throughput. Each custom type maps to specific observable features in the imagery and specific external data sources, and the Gemini prompt is parameterised per customer.

Email and Slack briefing delivery

The dashboard is useful for exploration, but most supply chain operators need briefings pushed to them. A delivery layer would: (1) maintain per-user port watchlists, (2) generate a daily summary email with the top 3-5 disrupted ports and a link to the full dashboard, (3) send Slack/Teams notifications for severity-4-and-above events within 30 minutes of detection, and (4) support configurable quiet hours and severity thresholds per user.

Shipment impact estimation

The highest-value feature for a logistics operator: if a customer provides a CSV of active shipments (container IDs, ETD, ETA, origin port, destination port, carrier), the system cross-references disruption data to estimate: (1) which shipments are likely delayed and by how many days, (2) the probability distribution of the delay (not just a point estimate — "80% probability of 3-5 day delay" is actionable, "your shipment might be late" is not), and (3) alternative routing options with cost/time trade-offs. This is the feature that moves SupplyWatch from "this is neat" to "we need to pay for this."

7. Source code

SupplyWatch is open source. The repository contains the full backend pipeline (Python), frontend dashboard (Next.js/TypeScript), Terraform infrastructure definitions, and this documentation.

GitHub repository: (link coming soon — contact Tom Oliveri for access)