5.1

Table Of Contents
Table 16-2. Outcomes of Content Detection
Positive Negative
True Sensitive content correctly identified
as sensitive.
Non-sensitive content correctly
identified as non-sensitive.
False Non-sensitive content mistakenly
identified as sensitive.
Sensitive content mistakenly identified
as non-sensitive.
Recall gathers the fraction of the documents that are relevant to the content blade.
n
High recall casts a wide net, and gathers all potentially sensitive documents. Too high a recall can result
in more false positives. [False positive = a document judged sensitive by the content blade, which is not,
in fact, sensitive.]
n
Low recall is more selective in the documents returned as sensitive. Too low a recall can result in more
false negatives. [False negative = a document judged not to be sensitive by the content blade, but which
IS, in fact, sensitive.]
Precision is the percent of retrieved documents that are relevant to the search.
n
High precision can reduce the number of false positives returned.
n
Low precision can increase the number of false positives returned.
Precision refers to the relevancy of the results returned. For example, did all of the documents that triggered
the Payment Card Industry Data Security Standard (PCI DSS) policy contain actual credit card numbers, or
did some contain UPC or EAN numbers which were incorrectly identified as sensitive PCI data? High precision
can be achieved with a narrow, focused search to make sure that every piece of content that is caught is truly
sensitive.
Table 16-3. Precision and Recall
Accuracy Factor Measurement Problem if Value is Low
Precision The percentage of retrieved
documents that are actually relevant.
Increased false
Recall The percentage of all of the sensitive
documents that are actually retrieved.
Increased false negatives
Chapter 16 Troubleshooting
VMware, Inc. 227