cyber-101

What's the difference between EDM, IDM, and OCR classification techniques in Zscaler?

A walkthrough of various DLP techniques.

Mike Berggren

03 Nov 2025 — 2 min read

💡

This is part of an on-going series in cybersecurity foundations. Check the cyber 101 article tag index from time to time for more content.

Zscaler has a number of techniques for performing DLP matching. In today's article I'd like to briefly step through them and explain how each approach is different.

Exact Data Match (EDM)
- This technique is intended to examine/protect data when it's structured. Typically this type of data would be in a fixed format or schema (e.g. databases, spreadsheets, forms, etc).
- EDM takes a "fingerprint" (hash) of individual fields (columns) for organizations specific, highly sensitive data records. The DLP engine then checks network traffic for the exact match of this "fingerprinted" data.
- The goal is to identify when there is an exact occurrence of a sensitive record.
- Example use case: ensuring that a specific set of active employee records, patient data, or a customer list isn't exfiltrated.
Indexed Document Matching (IDM)
- This technique is intended to examine/protect unstructured data.
- It works by creating a unique index (fingerprint) of an entire sensitive document (or set of documents). It then compares the content of data in motion to the stored index looking for a full or partial match.
- So for example, the DLP engine might trigger if 75% of a document content matches the indexed versions.
- Because of this, the approach is more flexible but also possibly subject to false positives.
Optical Character Recognition (OCR)
- This technique extracts text from image files (e.g. screenshots, scanned documents, JPEGs, PNGs, or images embedded in other file types like Word).
- It works by extracting text from visual forms and then passing that through standard DLP classification mechanisms, looking for things like keyword matching, regular expressions, or EDM/IDM).
- Here again, the possibility for false positives is higher because the data being examined is subject to image quality.

💡

One other key point to mention: these techniques can be used in combination if desired. It's also possible for Zscaler to apply these techniques both at data-in-motion (network traffic) and data-at-rest (in cloud storage) depending on the product/module.

For more information on this topic, check out the following resources:

https://help.zscaler.com/unified/about-exact-data-match

https://help.zscaler.com/unified/about-indexed-document-match

https://help.zscaler.com/unified/configuring-dlp-advanced-settings

What's the difference between EDM, IDM, and OCR classification techniques in Zscaler?

Mike Berggren

Read more

What are the core features and architectural elements of Check Point Harmony Email & Collaboration?

What are the primary principles of REST APIs?

What are the key elements of the risk management process?

What are the differences between some of the popular casing styles in programming?