Research / Prediction Accuracy

Measuring What We Claim

Every NEXUS prediction is immutably logged and scored against outcomes. This page details how we measure accuracy, where we perform well, and where the system still struggles. No prediction is retroactively modified or deleted.

Predictions Tracked

1,247

Overall Hit Rate

68.3%

Avg Brier Score

0.187

Calibration Error

4.2%

01 / Brier Score

The Brier score is the primary metric NEXUS uses to evaluate prediction quality. It measures the mean squared difference between predicted probabilities and actual outcomes. A score of 0.0 represents perfect prediction, while 1.0 represents the worst possible score.

For each resolved prediction, the Brier score is calculated as (forecast probability - outcome)^2, where outcome is 1 if the event occurred and 0 if it did not. A model that assigns 0.9 probability to an event that occurs receives a Brier contribution of 0.01, while assigning 0.9 to an event that does not occur yields 0.81.

NEXUS maintains a rolling 90-day Brier score alongside a lifetime aggregate. The rolling window allows detection of calibration drift, where the model begins systematically over- or under-estimating probabilities in a particular domain.

Brier Scale Reference
0.00 - 0.10Excellent
0.10 - 0.20Good
0.20 - 0.30Fair
0.30+Poor
NEXUS Current
0.187

02 / Calibration

A well-calibrated prediction system produces forecasts whose stated probabilities match observed frequencies. When NEXUS assigns a 70% probability to a class of events, those events should occur approximately 70% of the time across a sufficient sample.

Calibration is assessed by bucketing predictions into probability bins (0-10%, 10-20%, etc.) and comparing the average predicted probability in each bin to the actual hit rate. The calibration error is the mean absolute deviation between predicted and observed frequencies across all bins.

NEXUS currently exhibits slight overconfidence in the 60-80% range, a known bias that the self-calibration feedback loop is actively correcting. Predictions below 30% and above 90% show strong calibration alignment.

Calibration by Probability Bin
0-10%
4%
10-20%
14%
20-30%
23%
30-40%
33%
40-50%
42%
50-60%
52%
60-70%
58%
70-80%
67%
80-90%
83%
90-100%
94%
Well calibrated
Overconfident

03 / Category Breakdown

Prediction accuracy varies by domain. Market signals benefit from higher data density and faster feedback loops. Geopolitical predictions operate on longer timelines with more confounding variables. Celestial correlations remain in an experimental validation phase with insufficient sample size for definitive conclusions.

Category
Accuracy
Brier Score
Sample Size
Status
Market Signals
74.1%
0.152
612
Operational
Geopolitical Events
63.8%
0.214
489
Operational
Celestial Correlations
47.2%
0.309
146
Validating

04 / Self-Calibration Feedback

Every resolved prediction feeds back into the NEXUS calibration engine. When a prediction resolves (the event either occurs or the prediction window expires), the system records the outcome against the original forecast and updates category-level calibration weights.

The feedback loop operates on three timescales. Immediate adjustment applies a Bayesian update to the confidence modifier for the specific signal type. Weekly recalibration recomputes bin-level calibration curves across all categories. Monthly review triggers a full model audit that can adjust base rates, feature weights, and confidence thresholds.

This process has reduced the 90-day rolling Brier score from 0.241 at system launch to the current 0.187, a 22.4% improvement over 14 months of operation.

Feedback Timescales
Immediate

Bayesian update to signal-type confidence modifier

Weekly

Bin-level calibration curve recomputation

Monthly

Full model audit: base rates, weights, thresholds

Launch Brier
0.241
Current
0.187
22.4% improvement over 14 months

05 / Transparency

Immutable Logging

Every prediction is logged with timestamp, stated confidence, reasoning chain, contributing signal IDs, and eventual outcome. No record can be retroactively modified or deleted.

No Survivorship Bias

The confidence level assigned at creation time is the score used for all accuracy calculations. This prevents cherry-picking of results and ensures reported metrics reflect true performance.

Full Queryability

All prediction records are queryable through the NEXUS chat interface and Predictions page. Filter by category, confidence range, outcome, and time period for independent analysis.

06 / Sample Monthly Scorecard

February 2026

Total
8
Hit Rate
62.5%
Brier Score
0.209
Hits
5
Misses
3
Prediction
Category
Confidence
Result
Brier
USD/JPY breaks below 148 support
Market
82%
0.032
OPEC+ extends production cuts through Q2
Geopolitical
71%
0.084
Escalation on Korean peninsula within 30 days
Geopolitical
35%
0.122
BTC correlation spike with equities
Market
76%
0.578
EU sanctions package triggers RUB volatility
Geopolitical
64%
0.130
Mercury retrograde period correlates with VIX spike
Celestial
41%
0.348
Gold tests $2,400 resistance on CPI print
Market
79%
0.044
Taiwan Strait naval activity increase
Geopolitical
58%
0.336
-0.018 month-over-month (7.9% improvement vs January 2026)

Review the full prediction log

Every NEXUS prediction is logged with full audit trails. Access the platform to query predictions by category, confidence, and outcome.

Request Access