What techniques does Zscaler use to identify a file type?

How do we know that file type is REALLY that file type?

What techniques does Zscaler use to identify a file type?
Photo by Kit (formerly ConvertKit) / Unsplash
💡
This is part of an on-going series in cybersecurity foundations. Check the cyber 101 article tag index from time to time for more content.

In continuing our recent theme of Zscaler security measures, I thought it might be helpful to elaborate on the techniques ZIA uses to identify file types. Each of these look at different perspectives of a file in motion to identify a threat. Let's unpack them one by one.

  • MIME Type (Media Type)
    • This technique looks at the Multipurpose Internet Mail Extensions type preceding delivery of a file. It's a standardized label that's used to describe the content's category and format.
    • Think of this like metadata that exists outside the file's content, typically in a communication protocol (like an HTTP Content-Type header).
    • Examples include labels like: image/jpeg application/pdf and text/html
    • The original intent of this mechanism was to tell applications (e.g. email clients, web browsers, etc) how to handle and process a file. For legitimate sources, it can be a decent indicator but it's easy to spoof.
  • Magic Bytes
    • This is an intrinsic (and arguably fairly reliable) ID method that's embedded into a file itself.
    • It's basically a sequence of bytes (digital fingerprint) that's almost always located at the very beginning of a file's binary data (at offset 0).
    • The general format is raw hexadecimal. So for example, JPEG would typically be: FF D8 FF E0
    • It's more difficult to spoof because it's part of the core binary content of the file.
  • Deep Packet Inspection
    • While magic bytes is a signature, you can think of Deep Packet Inspection as a method for identifying file type.
    • The technique here involves looking at the first few packets of a file transmission to identify clues. These packets are going to contain things like the aforementioned Magic Bytes but also additional details about a given file (e.g. document headers, etc).

For more information on this topic, check out the following resources:

https://help.zscaler.com/zia/configuring-file-type-control-policy

https://help.zscaler.com/zia/about-file-type-control