Speed versus Accuracy

In version 3.3 we have introduced a speed option facility. Speed options can be 0 (default), 1, 2, or 3, from slowest (0) to fastest (3). These options tell TOCR how exhaustive it should be in looking for improvements. There is a small loss in accuracy from slower to faster speed options.

Our testing on a large database shows the following changes with speed options. All % changes are relative to speed option 0.

Speed option Time change Score Accuracy Change
1 -10.6% -0.0075%
2 -17.0% -0.0177%
3 -22.1% -0.0483%

The time changes (speedups) are fairly regular, it would be rare for a higher speed option to cause a slowdown in processing, though it is possible for the odd file. The accuracy changes are much more variable, they are simply the effect of less exhaustive processing. They are an average and therefore a guide to what to expect.

The following table shows accuracy and speedup variation across a range of different datasets (A to J). Only speed options 0 and 3 are shown for simplicity (they provide the widest range of values). Data set maximum scores range form 951k to 11236k. The greatest speedups seem to us to come from the most difficult datasets (noisy, joined and broken characters, etc.)

Option 0 Err % Option 3 Err % Err Difference Err % Increase % Time Change
A 0.4946 0.4948 0.0003 0.0530 -11.153
B 6.0570 6.0684 0.0114 0.1875 -53.375
C 0.1513 0.1637 0.0125 8.2329 -11.436
D 0.0822 0.0955 0.0133 16.1309 -13.873
E 0.0915 0.1093 0.0178 19.5039 -12.354
F 0.4809 0.5016 0.0206 4.2896 -22.727
G 0.8344 0.8683 0.0339 4.0682 -17.107
H 1.0005 1.0376 0.0371 3.7111 -16.336
I 1.0177 1.0695 0.0519 5.0967 -24.022
J 2.5081 2.6676 0.1595 6.3600 -39.376

Note that while Error % increases can in some cases look very high (D&E), they also have very high accuracy, the error difference looks much more reasonable. Conversely in the case of high error difference (J), this is a low accuracy dataset, the error % increase is much more reasonable.

The table underestimates true TOCR accuracy since the cells mix different processing options (Lexon and Lexoff for example).