Kwiz Quants

African Retail Forex: The Quant Gap Data Science Can Fix

Kwiz Computing Technologies — Thu, 23 Apr 2026 00:00:00 GMT

A Broken Bet Most Traders Don’t Know They’re Making

A Nairobi trader opens their MT5 terminal at 8 a.m. and stares at a GBP/USD chart, reading candlestick patterns the way a witch doctor reads bones. On the other side of that same trade sits a systematic desk in London running a strategy that was validated on fifteen years of tick data, stress-tested across 500 Monte Carlo paths, and deployed with automated risk controls. The Nairobi trader is not in a market. They are in a statistics problem they have not realised they are losing.

The good news: the tools that desk in London uses are not secret. Most of them are open-source. The barrier is not technology anymore. It is knowing where to start.

What Happened to Kenya’s Forex Market After 2023

Kenya’s Central Bank tightened its forex dealer licensing framework in 2023, raising capital requirements and compliance obligations that pushed several local brokers out of the market. The direct effect was predictable: retail traders migrated toward regulated offshore brokers, particularly those regulated by the FCA, FSCA, and CySEC. Platforms like Exness, Pepperstone, and IC Markets saw significant uptake across East Africa.

What did not migrate with those traders was any improvement in method. According to industry data from broker disclosure reports, roughly 70-80% of retail forex accounts lose money in any given quarter. That number holds across regions and has held for years. The offshore move changed the regulatory wrapper but not the underlying problem.

The problem is structure. Institutional desks trade with rules. Most retail traders in Kenya trade with feelings dressed up as rules.

The Institutional Edge Is Not What You Think

People assume the institutional edge comes from proprietary data or faster execution. Both matter, but neither is the primary driver of systematic profitability. The real edge is the discipline to test hypotheses rigorously before risking capital on them, then execute without emotional override.

A systematic strategy answers three questions before it goes live. First: does the signal actually exist in historical data, or did I find it by testing enough variations that something was bound to look good? Second: does the strategy work on data it was never trained on? Third: what is the realistic worst-case drawdown, and can I stay solvent through it?

Retail traders skip all three questions. They see a strategy work on a demo account for three weeks and call it validated. A demo account run over three weeks is not evidence of anything. It is a coin flip that landed in your favor.

The Tooling That Closes the Gap

Here is the specific stack that makes systematic trading possible without an institutional budget.

Backtesting and signal validation. The quantstrat package in R provides a full backtesting framework. More importantly, R’s statistical ecosystem allows you to apply proper validation methods. At Kwiz Computing, we build every strategy against the Deflated Sharpe Ratio framework before any live testing begins. The DSR adjusts your observed Sharpe Ratio for the number of strategies you tested to find it. If you tested 200 variations of a moving average crossover and picked the best one, your backtest result is almost certainly a false discovery. The DSR tells you whether it is.

Walk-forward validation. A single in-sample/out-of-sample split is not enough for currency strategies, because forex regimes shift. We use combinatorial purged cross-validation, a technique from Marcos Lopez de Prado’s work, to test strategies across many non-overlapping time windows without introducing lookahead bias. This is the difference between a strategy that looks good on paper and one that has actually been stress-tested.

Automated execution. MetaTrader 5 supports algorithmic execution through Expert Advisors (EAs). At Kwiz Quants, our kwizmt5 R package and our own KwizStrategyTester EA bridge R-side strategy logic to MT5 order routing — that is what the Kwiz Quants infrastructure runs on. The R side handles signal generation and risk sizing; MT5 handles order routing. This removes the moment-to-moment discretion that kills most retail accounts.

Risk management as code. Position sizing based on Kelly fractions or fixed-fractional rules, automatic stop placement, and daily drawdown limits can all be implemented as functions that run before any order is submitted. When risk management is code, it does not flinch. It does not convince itself that “this trade is different.”

Why This Matters Specifically for African Practitioners

The argument sometimes made is that systematic trading is irrelevant to African markets because our capital bases are smaller and our access to institutional data is limited. This argument is wrong on both counts.

Systematic trading helps small accounts more than large ones, not less. A discretionary trader with a $500 account who blows up on three bad weeks of impulsive trading loses everything. A systematic trader with the same account running a strategy with defined stops and position sizing loses a controlled amount, learns something specific from the drawdown, and adjusts. The discipline compounds.

On data access: forex data is among the most democratised financial data in the world. Tick data for major and minor pairs going back ten or more years is available from brokers, from Dukascopy, and from aggregators. A Nairobi-based quant analyst with an internet connection has access to essentially the same raw price data as a desk in Zurich.

The gap is not access. It is the knowledge that proper validation exists, and the willingness to apply it before going live.

Where Most People Get Stuck (and What to Do)

The typical journey for a data-literate practitioner who wants to build systematic trading infrastructure goes like this. They read about backtesting, implement something in Python or R, see impressive backtest results, and try it live. It fails. They conclude that systematic trading does not work.

The conclusion is wrong. The workflow was wrong. Specifically, the backtest had one or more of the following problems: it was fit on the same data used to evaluate it, it did not account for transaction costs and slippage, or it was selected from many strategies tested on the same dataset, making the result a statistical artifact rather than a real signal.

Fixing these problems is not complicated. It requires applying the right statistical framework in the right order. Validate the signal with DSR before selection. Use purged cross-validation to test generalization. Paper-trade with realistic costs before going live. The framework is documented, the tools are in R and Python, and the process is repeatable.

The practical starting point is simpler than most people expect. Pick one currency pair you trade. Define one rule-based entry signal. Define exact exit rules. Backtest it on five years of hourly data. Apply the DSR. If it passes, validate with walk-forward testing. If it still passes, size it conservatively and run it on demo for sixty days with automated execution. That process is within reach for anyone with working knowledge of R or Python.

The Structural Opportunity

Retail forex trading in Kenya and across East Africa is growing. The post-2023 shift toward offshore regulated brokers has, if anything, accelerated participation, because traders now have access to tighter spreads, more instruments, and better execution than was possible through local dealers.

Almost none of that participation is systematic. That is not a permanent condition. It is a skills gap, and skills gaps close.

The practitioners who close this gap first will have a durable edge, not because systematic trading is guaranteed to win, but because trading without a tested, rule-based framework is almost guaranteed to lose over any meaningful time horizon. The statistics on retail trading outcomes are not ambiguous. They have been consistent for decades across every market that has disclosed them.

The question is not whether data science applies to African retail forex. It clearly does. The question is whether you will be among the practitioners who apply it, or among the ones who continue to hand their capital to the algorithms on the other side of the order book.

Kwiz Quants is launching soon. The infrastructure described in this article is not illustrative. It is what we built and what we run ourselves. We are opening early access to a limited group of practitioners before the public launch so we can gather real feedback from people using it on live markets.

If you want to see the platform in action rather than read about it, register now on the Kwiz Quants page. Spots in the soft launch are limited.

Backtesting Without Lookahead Bias: Combinatorial Purged Cross-Validation

Kwizera Jean — Sun, 01 Mar 2026 00:00:00 GMT

Why Standard Cross-Validation Fails in Finance

Cross-validation is the gold standard for model evaluation in machine learning. Split your data into folds, train on some, test on others, and you get an unbiased estimate of out-of-sample performance. It works beautifully for i.i.d. data — images, text, tabular datasets where observations are independent.

Financial time series violate this assumption fundamentally. Stock prices, forex rates, and other market data exhibit:

Serial correlation — today’s price depends on yesterday’s
Regime changes — the statistical properties of returns shift over time
Label leakage — if your target variable is a forward return, adjacent observations share information

When you apply standard k-fold cross-validation to financial data, training folds contain information about test folds. The model “sees” the future through correlated observations near the fold boundaries. The result: backtest performance that looks better than what you’ll achieve in live trading.

The Purging and Embargo Solution

Marcos Lopez de Prado’s Combinatorial Purged Cross-Validation (CPCV) addresses these issues through two mechanisms:

Purging

Purging removes observations from the training set that overlap temporally with the test set’s label window. If your strategy predicts 5-day returns, then observations within 5 days of any test-set boundary are excluded from training.

Timeline:  |---Train---|xxxPURGEDxxx|---Test---|xxxPURGEDxxx|---Train---|

This eliminates the most direct form of information leakage: training on data whose label period overlaps with test observations.

Embargo

An embargo period extends the purge beyond the strict label window. Even after purging label overlap, serial correlation means that observations just outside the purge zone still carry information about the test period. The embargo adds a buffer (typically 1-2% of the dataset length) to ensure genuine independence.

Timeline:  |---Train---|xxPURGExx|--EMBARGO--|---Test---|--EMBARGO--|xxPURGExx|---Train---|

The Combinatorial Approach

Standard walk-forward testing uses the data once: train on the first 80%, test on the last 20%. This is wasteful — you get a single estimate of performance from one specific market regime.

CPCV generates all possible combinations of contiguous training and test groups, subject to purging and embargo constraints. For a dataset split into N groups with k test groups, CPCV produces unique backtest paths.

This gives you:

Multiple independent performance estimates rather than a single point estimate
A distribution of backtest results that reveals strategy robustness
More efficient use of limited data — every observation appears in test sets

R Implementation

Here is a simplified implementation of the CPCV framework:

#' Generate CPCV train/test splits with purging and embargo
#'
#' @param n_obs Number of observations
#' @param n_groups Number of groups to split into
#' @param n_test Number of groups to use as test in each split
#' @param purge_length Number of observations to purge at boundaries
#' @param embargo_pct Embargo as a fraction of dataset length
#' @return List of train/test index pairs
generate_cpcv_splits <- function(n_obs, n_groups = 6, n_test = 2,
                                  purge_length = 5, embargo_pct = 0.01) {

  embargo_length <- ceiling(n_obs * embargo_pct)
  group_size <- floor(n_obs / n_groups)

  # Generate group boundaries
  groups <- lapply(seq_len(n_groups), function(g) {
    start <- (g - 1) * group_size + 1
    end <- min(g * group_size, n_obs)
    start:end
  })

  # Generate all combinations of test groups
  test_combos <- combn(n_groups, n_test, simplify = FALSE)

  splits <- lapply(test_combos, function(test_groups) {
    test_idx <- unlist(groups[test_groups])
    train_groups <- setdiff(seq_len(n_groups), test_groups)
    train_idx <- unlist(groups[train_groups])

    # Apply purging: remove training observations near test boundaries
    test_range <- range(test_idx)
    purge_zone <- c(
      (test_range[1] - purge_length):(test_range[1] - 1),
      (test_range[2] + 1):(test_range[2] + purge_length)
    )

    # Apply embargo
    embargo_zone <- c(
      (test_range[1] - purge_length - embargo_length):(test_range[1] - purge_length - 1),
      (test_range[2] + purge_length + 1):(test_range[2] + purge_length + embargo_length)
    )

    exclusion_zone <- unique(c(purge_zone, embargo_zone))
    exclusion_zone <- exclusion_zone[exclusion_zone > 0 & exclusion_zone <= n_obs]

    train_idx <- setdiff(train_idx, exclusion_zone)

    list(train = train_idx, test = test_idx)
  })

  splits
}

Applying CPCV to a Strategy

library(dplyr)
library(purrr)

# Generate splits
splits <- generate_cpcv_splits(
  n_obs = nrow(market_data),
  n_groups = 6,
  n_test = 2,
  purge_length = 10,
  embargo_pct = 0.02
)

# Evaluate strategy on each split
results <- map_dfr(splits, function(split) {
  train_data <- market_data[split$train, ]
  test_data <- market_data[split$test, ]

  # Fit strategy on training data
  model <- fit_strategy(train_data)

  # Evaluate on test data
  signals <- predict_signals(model, test_data)
  returns <- compute_strategy_returns(signals, test_data)

  tibble(
    sharpe = mean(returns) / sd(returns) * sqrt(252),
    max_drawdown = max_drawdown(cumsum(returns)),
    n_trades = sum(abs(diff(signals)) > 0)
  )
})

# Summary: distribution of out-of-sample performance
summary(results$sharpe)

CPCV vs Other Methods

Method	Lookahead Bias	Data Efficiency	Multiple Estimates
Walk-Forward	Low	Low	No
Standard k-Fold CV	High	High	Yes
Time-Series Split	Low	Low	Limited
CPCV	None	High	Yes

Walk-forward testing avoids lookahead bias but gives you a single estimate from one market regime. Standard CV is efficient but leaks information. CPCV achieves both: no lookahead bias and multiple independent estimates.

Practical Considerations

Choosing Parameters

n_groups: More groups = more combinations but smaller test sets. 5-8 groups is typical for multi-year datasets.
n_test: 2 test groups is the most common choice, providing a good balance between the number of combinations and test set size.
purge_length: Should match or exceed your strategy’s maximum lookahead window (e.g., if you predict 5-day returns, purge at least 5 observations).
embargo_pct: 1-2% is typical. Higher for strategies that are more sensitive to serial correlation.

Interpreting Results

The distribution of Sharpe Ratios across CPCV splits tells you more than any single backtest number:

Consistently positive across splits → Robust strategy with genuine edge
High variance across splits → Strategy is regime-dependent; proceed with caution
Negative in any splits → Strategy may not generalise; investigate which market conditions cause failure

Integration in the Kwiz Quants Pipeline

CPCV is the second validation gate in our pipeline, applied after the Deflated Sharpe Ratio screening. Strategies that pass DSR are subjected to CPCV to verify that their performance generalises across different time periods — not just the specific window that happened to produce the best-looking backtest.

Only strategies that show consistently positive risk-adjusted returns across all CPCV splits proceed to MT5 online backtesting. This ensures that when we deploy a strategy to live trading, we have evidence of robustness, not just a single favourable backtest.

The Deflated Sharpe Ratio: Why Most Backtests Lie

Kwizera Jean — Sun, 15 Feb 2026 00:00:00 GMT

The Multiple Testing Problem in Quant Finance

Here is a thought experiment. Generate 1,000 random trading strategies — strategies with no actual predictive power, just noise. Backtest all of them on the same historical data. How many will show a Sharpe Ratio above 1.0?

The answer, depending on the data length and volatility, is typically dozens. Some of these random strategies will look genuinely impressive: strong returns, reasonable drawdowns, plausible-looking equity curves. If you picked the best one and presented it to investors, it would look like a real strategy.

This is the multiple testing problem, and it is the single most common reason that backtested strategies fail in live trading. When you test many hypotheses on the same dataset, some will appear significant by chance alone. The more strategies you test, the more false discoveries you produce.

Why the Standard Sharpe Ratio Fails

The Sharpe Ratio is the most widely used performance metric in quantitative finance. It measures risk-adjusted returns: the excess return per unit of volatility. A Sharpe Ratio of 1.0 is considered good; 2.0 is excellent.

But the standard Sharpe Ratio has no mechanism to account for how many strategies were tested to find the one being presented. If you tested 500 strategies and are showing the best one, the reported Sharpe Ratio is biased upward — sometimes dramatically so.

This is not a theoretical concern. It is the central challenge in quantitative strategy development, and the primary reason that “signal sellers” and retail strategy vendors consistently fail to deliver in live trading what they promised in backtests.

The Deflated Sharpe Ratio Framework

Marcos Lopez de Prado introduced the Deflated Sharpe Ratio (DSR) to address this problem directly. The DSR adjusts the observed Sharpe Ratio for:

The number of trials — how many strategies were tested before selecting this one
Skewness of the return distribution — asymmetry changes the significance threshold
Kurtosis of the return distribution — fat tails inflate the apparent Sharpe Ratio
The length of the backtest — shorter backtests are more susceptible to noise

The DSR answers a precise question: given how many strategies I tested and the statistical properties of the returns, what is the probability that this observed Sharpe Ratio is a false discovery?

Implementation in R

The DSR computation requires the observed Sharpe Ratio, the number of independent trials, and the higher moments of the return distribution:

#' Compute the Deflated Sharpe Ratio
#'
#' @param observed_sr Observed Sharpe Ratio of the selected strategy
#' @param n_trials Number of strategies tested
#' @param n_obs Number of return observations
#' @param skew Skewness of the return series
#' @param kurt Excess kurtosis of the return series
#' @return p-value: probability that the observed SR is a false discovery
deflated_sharpe_ratio <- function(observed_sr, n_trials, n_obs,
                                  skew = 0, kurt = 3) {

  # Expected maximum SR under the null hypothesis
  # (i.e., what you'd expect the best SR to be from n_trials of pure noise)
  euler_mascheroni <- 0.5772156649
  expected_max_sr <- sqrt(2 * log(n_trials)) -
    (log(pi) + euler_mascheroni) / (2 * sqrt(2 * log(n_trials)))

  # Standard error of the SR estimate, adjusted for higher moments
  sr_se <- sqrt(
    (1 - skew * observed_sr + ((kurt - 1) / 4) * observed_sr^2) / (n_obs - 1)
  )

  # Test statistic: how many SE above the expected maximum?
  test_stat <- (observed_sr - expected_max_sr) / sr_se

  # One-sided p-value
  p_value <- pnorm(test_stat, lower.tail = FALSE)

  list(
    observed_sr = observed_sr,
    expected_max_sr = expected_max_sr,
    p_value = p_value,
    is_significant = p_value < 0.05
  )
}

A Simulated Demonstration

To illustrate the DSR’s power, let’s generate 200 random strategies and see how many survive:

library(dplyr)
library(purrr)

set.seed(42)
n_strategies <- 200
n_days <- 500

# Generate random return series (no actual signal)
random_returns <- matrix(
  rnorm(n_strategies * n_days, mean = 0, sd = 0.01),
  nrow = n_days, ncol = n_strategies
)

# Compute Sharpe Ratios
sharpe_ratios <- apply(random_returns, 2, function(r) {
  mean(r) / sd(r) * sqrt(252)  # Annualised
})

# How many look "good" by naive SR?
sum(sharpe_ratios > 1.0)  # Typically 5-15 strategies

# Apply DSR to the best strategy
best_idx <- which.max(sharpe_ratios)
best_returns <- random_returns[, best_idx]

dsr_result <- deflated_sharpe_ratio(
  observed_sr = sharpe_ratios[best_idx],
  n_trials = n_strategies,
  n_obs = n_days,
  skew = moments::skewness(best_returns),
  kurt = moments::kurtosis(best_returns)
)

# The DSR will correctly identify this as NOT significant
# because the high SR is explained by the number of trials

In typical runs, the best random strategy achieves a Sharpe Ratio of 1.5-2.5 — impressive by conventional standards. But the DSR correctly identifies it as a false discovery, because the expected maximum SR from 200 random trials explains the observed value entirely.

How Kwiz Quants Uses DSR

In the Kwiz Quants validation pipeline, every strategy must pass the DSR test before proceeding to MT5 backtesting. This is the first gate in our multi-layer validation process:

DSR screening — Does the strategy’s Sharpe Ratio survive adjustment for the number of strategies tested? If not, it is discarded regardless of how good the backtest looks.
Combinatorial Purged Cross-Validation — Does the strategy generalise across non-overlapping time periods without lookahead bias?
MT5 online backtesting — Does the strategy perform with realistic spreads and slippage?
Demo live trading — Does the strategy work under real market conditions?

The DSR is the cheapest and most powerful filter. It eliminates the majority of false discoveries before they consume expensive testing resources downstream.

Implications

The DSR has a simple but profound implication: the number of strategies you tried matters as much as the performance of the one you selected. Any performance report that doesn’t disclose the number of trials is, at best, incomplete and, at worst, misleading.

For retail traders evaluating signal providers or strategy vendors, ask one question: how many strategies did you test before finding this one? If the answer is vague or unavailable, the reported performance is almost certainly inflated by selection bias.

For quantitative researchers, the DSR should be a standard part of every strategy development workflow. It costs almost nothing to compute and prevents the most common source of live trading disappointment.

Why We Build Systematic Trading Infrastructure in R

Kwiz Computing Technologies — Sat, 10 Jan 2026 00:00:00 GMT

The Case for R in Quant Finance

Python dominates the quant finance conversation, and for good reason — it has excellent libraries, a large community, and strong integration with machine learning frameworks. So why did we build Kwiz Quants primarily in R?

Statistical Depth

R’s statistical ecosystem is unmatched. The breadth of packages for time series analysis, financial econometrics, and statistical testing is deeper than any other language. When you’re implementing combinatorial purged cross-validation or computing Deflated Sharpe Ratios, R’s statistical foundations make the work cleaner and more reliable.

Production-Ready R

The perception that R is “just for analysis” is outdated. Modern R infrastructure makes production deployment viable: Plumber for REST APIs, Docker for containerisation, renv for reproducible environments, and Rhino for application architecture.

The key is engineering discipline. We apply the same practices used in any production software stack: modular code with box, 95%+ test coverage with testthat, CI/CD pipelines, and structured logging.

The Kwiz Quants Stack

Our trading infrastructure connects R-based strategy engines to MetaTrader 5 execution through our own kwizmt5 R package — a dual-protocol bridge (TCP and HTTP) — and the KwizStrategyTester EA, with DuckDB and Parquet for logging and Shiny for monitoring. Each component is containerised, tested, and designed for resilience — atomic writes, hot standby replicas, and versioned snapshots.

R isn’t the easy choice for trading infrastructure. But it’s the right choice for a system where statistical rigour is the core differentiator.