The problem.
Global surf forecasts run on the same physics. A wave-height number from a buoy-scale ocean model gets multiplied by a generic shoaling coefficient and presented as a forecast. The number is right, on average, across thousands of breaks. But waves don't break on averages — they break on specific reefs, in specific coves, in specific wind funnels. Two days with identical 1.0m / 12s / 198° offshore signatures can produce wildly different sessions at Saladita: one peeling cleanly, one fast and short, one a closeout. The offshore data can't see the difference. Local knowledge can.
Our approach: instead of writing the local knowledge as rules (which we also do), we let the model learn it from someone who's seen thousands of sessions and writes them down.
The training data.
The labels come from local surfers who post to social media almost daily, frequently captioning posts with how the wave actually was that day. Phrases like "fast and short," "lots of water," "amazing afternoon," "a morning of slow waves," "small but fun," "very long waves this morning" are not casual filler — they're distinct surf-condition descriptors, used consistently across years of posts. Each one labels a day.
We scraped his public posts (978 total, going back to December 2018), filtered to those with quality-descriptive captions, and assigned each to a category:
| Category | n | Representative captions |
|---|---|---|
| love | 44 | "one of the best days of surf I've had recently. Offshore wind, big waves" · "Dream waves" · "amazing afternoon" |
| good | 19 | "yesterday's waves were very good" · "very long waves this morning" · "Enjoying a long wave" |
| fun | 13 | "The waves were a lot of fun this morning" · "fun in straight waves" · "fun morning sunrise session" |
| soft_good | 9 | "Smooth" · "A lot of calm in a wave" · "Una mañana suave" |
| small_fun | 5 | "small waves but fun" · "small but very fun" · "A little one, fun for December" |
| fast_short | 10 | "the truth is that it is a very short and fast wave" · "Fast and short" · "Shorter, faster, smaller when it works this time of year" |
| slow | 3 | "Slow wave" · "a morning of slow waves" · "Slow and small but fun" |
The five top categories are merged into a single positive class (90 days) and the two bottom categories into a single negative class (13 days). The classifier learns to distinguish between them.
The features.
For each labeled day, we pull the marine and wind archive at Saladita's coordinates (17.5897°N, 101.4317°W) from Open-Meteo's ERA5-backed reanalysis. We aggregate hourly readings into two daily blocks:
AM block (6 AM – 10 AM local), capturing the morning session when dawn-patrol-tier conditions hold. PM block (2 PM – 6 PM local), capturing the afternoon when the thermal sea breeze typically deteriorates the wave.
For each block, we record: wave height (peak), wave period (mean), wave direction (mean), swell-only direction and period, wind direction, wind speed, and wind-wave height (a proxy for local chop). Plus a derived "offshore-ness" score — cosine of the wind direction relative to 45° NE, the direction of the morning land breeze at Saladita.
Eighteen features in total. The model is given no information about what each feature means — only the numbers and the label. It finds the patterns.
The model.
A logistic regression with L2 regularization, balanced class weights (because there are 7× more positive than negative labels), and 5-fold stratified cross-validation. We chose logistic regression deliberately: it's simple, the coefficients are interpretable, and with only 13 negative examples a more flexible model would overfit immediately. The model learns a weighted sum of standardized features and passes it through a sigmoid to produce a probability between 0 and 1.
The cross-validated AUC is 0.60 ± 0.12. This is honest — better than random (which would be 0.50), but not strong. The reason is partly statistical: the offshore wave model can't fully resolve the local windswell chop that drives "fast and short" days. Two days look identical on the buoy-scale and feel different at the break. And partly methodological: we have 90 positive labels and 13 negative — that's not enough data to learn a sharp boundary, and the base rate of "good" days in his caption history is 87% (he tends to post on days he liked surfing).
What the model learned (top feature weights)
| Feature | Weight | What it means |
|---|---|---|
| PM wind-wave height (chop) | −0.74 | Strongest negative signal. Afternoon chop predicts "fast/short" descriptors — confirming that local windswell contaminating the groundswell is what breaks the wave at Saladita. |
| AM swell direction | +0.58 | Where the swell is coming from in the morning. The model finds a clear sweet spot in the 195–215° band; deviations either side hurt. |
| PM wind speed | −0.48 | Above ~5 knots in the afternoon, the wave loses shape. Below that, even an onshore-leaning direction can still produce surfable conditions. |
| AM wave period | +0.48 | Longer-period swells in the morning are more organized. Below ~11s the model gets less confident. |
| AM swell period | −0.45 | An interesting negative — possibly capturing that too long-period swells (15+s) overpower Saladita's shallow bathymetry into closeouts on the inside. The model is small enough that this could also be sample noise. |
| PM wind direction | +0.44 | Direction matters more than speed in the afternoon. Westerly cross-shore wind can still be glassy; pure-onshore SW kills it. |
| PM wave height | +0.43 | The model marginally prefers bigger afternoons — but the coefficient is smaller than the wind features, confirming what locals already say: cleanliness matters more than size. |
Eleven other features carry smaller weights. The full coefficient vector is in the source code at functions/api/_model_artifacts.js.
The energy and percentile layer.
Separately from the classifier, we compute the wave energy flux per unit crest length using the deep-water formula:
P = (ρg² / 64π) · H² · T ≈ 0.49 · H² · T kW/m
Where H is significant wave height in meters and T is wave period in seconds. This number — kilowatts per meter of wave crest — is a better summary of "how much wave is hitting the beach" than wave height alone. A 1.5m wave at 16s carries 76% more energy than a 1.5m wave at 9s, and surfers feel that difference even though the height number is the same.
We also rank each day's peak wave power against the full 5-year historical distribution at Saladita — 1,700 days of marine archive going back to October 2021. The percentile tells you not just how big today is, but how big it is relative to what this break sees. A 30 kW/m day is unusual in May; the same number is unremarkable in September.
The extended outlook.
Past the 10-day direct forecast window, we forecast the storm source regions rather than the destination. Southern Hemisphere winter storms at 35–50°S take 6–8 days to send groundswell to Mexico. By pulling 16-day wave forecasts at the 50°S 130°W and 35°S 130°W grid points and applying great-circle travel-time math (group velocity ≈ 0.78 × period for deep-water swells), we can flag major storms 5–7 days before they'd appear in the direct Saladita forecast. The result is a "what's brewing" section showing arrivals projected 10–14 days out.
This is uncertain. Storms forecast 10+ days out have wide ensemble spread, directional aim can shift, and not every 6m+ wave-height in the storm belt makes it to Mexico cleanly. We surface only major (≥9m Hs) and solid (≥6m Hs) storms, and tag them so readers know to treat them as possibilities rather than confirmations.
What this can't do.
The list is long enough to be worth being clear about.
Calibrated probabilities. A 95% P(good) from this model doesn't mean 95% certainty that today will be amazing. It means today looks structurally like 95% of his labeled good days. Calibration would require a held-out test set with confirmed outcomes — and would still be limited by the base rate skew.
Distinguishing levels of good. The classifier doesn't separate "love" from "good" from "fun". All collapse into the positive class. A high probability could mean a once-a-month session or just a normal fun morning.
Surprise events. Overnight sandbar drift, a sudden 6 AM squall, an unforecasted swell pulse arriving on a 17-second period — all invisible to the inputs.
Local intel the data can't see. Whether the tide is dropping or rising at the moment the swell arrives; whether the cobblestones on the inside have shifted from a big winter storm; whether a surf-camp van just unloaded twelve students at the takeoff. The model has marine and wind features only.
Generalization to other breaks. This model is trained on Saladita captions specifically. The exact feature weights — what makes "good" — would be different at Troncones, La Boca, Puerto Escondido. Same architecture, different training data.
The honesty axis.
We publish the AUC. We publish the training set size. We publish the feature weights. We publish what the model can't do. This is not because we don't believe in the model — we do, otherwise we wouldn't ship it. It's because surf forecasting is full of confident-sounding tools whose actual skill is closer to "marginally better than coin-flip" than to "the wave will be 1.4m/12s/198° at 7:00 AM." A reader who sees a 95% P(good) badge should know what that number is and isn't claiming. That knowledge changes how they use the page.
Data sources and code.
Marine data: Open-Meteo Marine Weather API — wave height, period, direction, swell components, wind-wave height. Free, public, no key required. Backed by ECMWF and NOAA wave models with ERA5 reanalysis for historical.
Atmospheric data: Open-Meteo Weather API — temperature, wind speed and direction, precipitation, at the Saladita grid point. Updated hourly.
Training labels: Anonymized aggregation of public surf-condition descriptors written by surfers at this break. Caption text and post timestamps only — no images or video, and no individual contributor named.
Code: The forecast function lives at /functions/api/forecast.js on this site (Cloudflare Pages Function, ES module). The model artifacts — feature names, scaler parameters, coefficients, historical distribution — are at /functions/api/_model_artifacts.js. Both are visible to anyone who fetches them.
Versioning.
This is v2 of the forecast (the v1 rules engine still runs alongside, contributing to the editorial quality score). Trained on 82 labeled days from 2019-07-16 to 2026-05-19. Cross-validated AUC 0.598 ± 0.120. Last retrain: 2026-05-27. Future versions will incorporate tide-phase data, additional labeled days, and possibly higher-resolution wave-spectrum models if we can access them.
Suggestions, corrections, or labeled-day contributions: please send a note. The model gets better the more honest labels it sees.