Measuring What We Claim
Every NEXUS prediction is immutably logged and scored against outcomes. This page details how we measure accuracy, where we perform well, and where the system still struggles. No prediction is retroactively modified or deleted.
Predictions Tracked
1,247
Overall Hit Rate
68.3%
Avg Brier Score
0.187
Calibration Error
4.2%
01 / Brier Score
The Brier score is the primary metric NEXUS uses to evaluate prediction quality. It measures the mean squared difference between predicted probabilities and actual outcomes. A score of 0.0 represents perfect prediction, while 1.0 represents the worst possible score.
For each resolved prediction, the Brier score is calculated as (forecast probability - outcome)^2, where outcome is 1 if the event occurred and 0 if it did not. A model that assigns 0.9 probability to an event that occurs receives a Brier contribution of 0.01, while assigning 0.9 to an event that does not occur yields 0.81.
NEXUS maintains a rolling 90-day Brier score alongside a lifetime aggregate. The rolling window allows detection of calibration drift, where the model begins systematically over- or under-estimating probabilities in a particular domain.
02 / Calibration
A well-calibrated prediction system produces forecasts whose stated probabilities match observed frequencies. When NEXUS assigns a 70% probability to a class of events, those events should occur approximately 70% of the time across a sufficient sample.
Calibration is assessed by bucketing predictions into probability bins (0-10%, 10-20%, etc.) and comparing the average predicted probability in each bin to the actual hit rate. The calibration error is the mean absolute deviation between predicted and observed frequencies across all bins.
NEXUS currently exhibits slight overconfidence in the 60-80% range, a known bias that the self-calibration feedback loop is actively correcting. Predictions below 30% and above 90% show strong calibration alignment.
03 / Category Breakdown
Prediction accuracy varies by domain. Market signals benefit from higher data density and faster feedback loops. Geopolitical predictions operate on longer timelines with more confounding variables. Celestial correlations remain in an experimental validation phase with insufficient sample size for definitive conclusions.
04 / Self-Calibration Feedback
Every resolved prediction feeds back into the NEXUS calibration engine. When a prediction resolves (the event either occurs or the prediction window expires), the system records the outcome against the original forecast and updates category-level calibration weights.
The feedback loop operates on three timescales. Immediate adjustment applies a Bayesian update to the confidence modifier for the specific signal type. Weekly recalibration recomputes bin-level calibration curves across all categories. Monthly review triggers a full model audit that can adjust base rates, feature weights, and confidence thresholds.
This process has reduced the 90-day rolling Brier score from 0.241 at system launch to the current 0.187, a 22.4% improvement over 14 months of operation.
Bayesian update to signal-type confidence modifier
Bin-level calibration curve recomputation
Full model audit: base rates, weights, thresholds
05 / Transparency
Immutable Logging
Every prediction is logged with timestamp, stated confidence, reasoning chain, contributing signal IDs, and eventual outcome. No record can be retroactively modified or deleted.
No Survivorship Bias
The confidence level assigned at creation time is the score used for all accuracy calculations. This prevents cherry-picking of results and ensures reported metrics reflect true performance.
Full Queryability
All prediction records are queryable through the NEXUS chat interface and Predictions page. Filter by category, confidence range, outcome, and time period for independent analysis.
06 / Sample Monthly Scorecard
February 2026
Related Research
Methodology
How NEXUS detects, scores, and synthesises intelligence from five independent signal layers.
Read moreSignal Theory
Deep dive into signal detection, intensity scoring, decay functions, and cross-layer amplification.
Read moreCalendar Correlations
Historical analysis of calendar-market correlations across Hebrew, Islamic, and fiscal calendars.
Read moreReview the full prediction log
Every NEXUS prediction is logged with full audit trails. Access the platform to query predictions by category, confidence, and outcome.
Request Access