Profit Factor is probably the most quoted metric in the EA marketplace. A PF above 1.5 is 'good.' Above 2.0 is 'excellent.' Anything above 3.0 is 'exceptional.' These benchmarks are meaningless — and understanding why will change how you evaluate strategies.
What Profit Factor actually measures
Profit Factor is the ratio of gross profit to gross loss: total money made on winning trades divided by total money lost on losing trades. A PF of 1.5 means for every $1.00 lost, the strategy makes $1.50. It must be above 1.0 to be profitable.
On the surface, higher is better. But PF is a composite number, and what's inside it matters enormously. The same PF of 2.0 can describe two radically different strategies — one robust, one fragile.
The single-trade problem
Imagine a backtest with 50 trades: 49 small losses averaging -$20 each, and one enormous winner of $1,100. Total gross loss: $980. Total gross profit: $1,100. Profit Factor: 1.12. Not impressive.
Now imagine the same backtest where that single winner is $5,000 instead. PF becomes 5.1. Exceptional by any benchmark. But the strategy hasn't changed — the system still loses on 49 out of 50 trades. The entire edge depends on that one outlier trade occurring in every forward-testing window. It usually doesn't.
A very high PF (above 3.0) on a relatively small sample of trades almost always means the metric is being inflated by one or two large winners. Remove the three best trades from the backtest and recalculate. If PF collapses, the strategy has no real edge.
Sample size and statistical significance
A PF of 2.0 over 20 trades means almost nothing statistically. The same PF over 500 trades is meaningful. This seems obvious, but marketplaces regularly feature EAs with 3-month live results of 30-40 trades and PF listed as the headline metric.
As a rough rule: for PF to be a reliable signal rather than noise, you want at least 200 trades in a consistent market environment. For shorter-term validations, look at rolling windows — does the PF stay relatively stable across different 60-day periods, or does it swing wildly?
What PF hides: the distribution of wins and losses
Two strategies can have identical PF with completely different risk profiles. Strategy A: 70% win rate, small wins, occasional large loss (high-frequency mean reversion). Strategy B: 30% win rate, large wins, small losses (trend following). Same PF, but Strategy A has a much higher probability of consecutive losses, while Strategy B has a much higher probability of long flat periods.
For portfolio construction, this distribution matters more than the headline PF. A portfolio of Strategy A clones will have tight correlation during sharp mean-reversion failures. A portfolio of Strategy B clones will all be flat during ranging markets.
Better metrics to use alongside PF
- Expectancy per trade (average profit/loss per trade in dollar or pip terms)
- Maximum consecutive losses (tests psychological and account viability)
- Recovery Factor (net profit divided by max drawdown)
- Stability Score (consistency of equity curve growth — how linear is it?)
- Trade count per rolling 30 days (catch silent decay early)
Use PF as a filter, not a ranking. Anything below 1.2 is almost certainly not worth running in live conditions. Above 1.2, look at sample size, rolling stability, and what's driving the number. PF is a starting point for analysis, not an endpoint.
See this in your own portfolio
AlgoLens gives you every metric and visualization mentioned in this article — live, from your real trading data.