3 More Contributing Factors, and Important Thoughts on Accuracy Levels

Do you follow good practice? When it comes to sound business decision-making, it’s critically important that you can have total confidence in the quality of important data captured manually by your business. In my previous blog, I looked at three factors contributing to good quality in manually captured data:

  1. training on hand-writing
  2. being precise with business rules; and,
  3. double keying for increased accuracy.

I noted that these factors are important over and above the software controls of a purpose-built data entry application, and that such applications support but do not alone assure a data quality fit-for-purpose for a business. In this blog, I introduce three more such factors, and then talk a little more about accuracy and sampling.

4. Feedback is vital

A data sampling team reviews captured data in the context of the original document images, and the data capture instruction manual from which the operators worked. Both are critical for accurate and meaningful error recording. Whether an operator’s work is of passing or failing accuracy, the sampling team should share the recorded errors with the operator, to support training and improvement. The most critical feedback comes generally when an operator displays systematic misunderstanding or misapplication of an instruction in the context of the document, for example where a field has been regularly populated with the wrong piece of source data, or left blank in error.

5. Cross-team review

This serves to bring agreement throughout the whole team on what a problematic instruction is intended to mean, especially in less common or potentially ambiguous cases, and with rarer document types when they arise. The value of this is in consolidating each individual operator’s understanding of the capture requirement, and in raising consistency of quality across the whole team, i.e., across the whole data set.

6. Post-capture validation

Having captured data, and sampled its accuracy to determine whether or not it is of passing quality against a pre-agreed level, there may be value in running all of it through a validation program (purpose-written software routine), to trap certain types of logical error that may be present. This cleanses the data, adding quality, and is not a substitute for quality through manual entry. One reason for not doing this during manual capture may be that it would take up computer memory and slow operators down, for example when the validation program uses machine-heavy algorithms, reads from a large database to check for unlikely data values, does spell-checking, or checks related fields against each other (e.g. dates of birth, marriage and death being chronologically sensible). If the document images have come from a sequence or are logically related, then post-capture validation enables checks to be made through the series, looking for gaps, duplications or missing continuations. Anomalies identified by the program are then subjected to manual review against the original documents, leading to acceptance of the data as keyed, or corrective editing.

Be Clear About "Percentage Accuracy"

When a target accuracy is declared as, say, “99%”, we need to know whether this requires 99% of all characters, or 99% of all data fields, or even 99% of all data records to be error-free. The difference between each of these levels is potentially great — indeed great enough to influence how many times a document is visited within the capture process, determining capture cost. Ninety-nine percent accuracy on a per character basis would be a sensible target for data manually input once. But, if a data record comprises a number of fields, each for the sake of argument 10 characters long, then 99% accuracy on a per field basis may be pushing things for one capture pass. With single-pass keying, the chance that all 10 characters in a field will be accurate (meaning the field is error-free) is only 90.5%. A per field accuracy of 99% in this case would require, for instance, double keying (more cost), which promises 99.95% accuracy on a per character basis, and 99.5% per field of 10 characters; meanwhile, 99% on a per record basis is more demanding and more costly to achieve again! I use the Poisson function (which is available in Excel) to help understand these relationships. Having noted in my first blog that ISO 2859 is a good reference for data accuracy sampling, it is critical that the work of each and every operator within a team is sampled. This enables the quality of each operator to be brought up to the same target level, and for operators who are struggling with quality to be subjected to higher rates of sampling until their quality stabilizes, when regular sampling levels would resume. In my next blog in this series, I shall focus on business rules.