How to detect text in attachments
Sublime uses recursive binary explosion and OCR to extract text from files. For more information on how this works, see the file.explode
documentation.
To simply find any string that was extracted out of any file, such as PDFs, images, and Office documents, use string searching functions like strings.icontains or regex.icontains on the output of the following property: file.explode[].scan.strings.strings
.
The example below detects the word "norton" in any PDF file:
any(attachments, .file_extension == "pdf" and
any(file.explode(.),
any(.scan.strings.strings, strings.icontains(., "norton"))
)
)
To find text that was extracted from a specific type of file or scanner, use the same string searching functions on the desired scanner output. Examples include .docx
, .html
, .javascript
, .ocr
, .vba
, and .xml
:
any(attachments,
.file_extension in~ ("doc", "docm", "docx", "dot", "dotm", "pptm", "ppsm", "xlm", "xls", "xlsb", "xlsm", "xlt", "xltm", "zip")
and any(file.explode(.),
strings.icontains(.scan.ocr.raw, "enable macros")
)
)
A note on performance
The
file.explode
function is a relatively expensive operation, so you typically don't want to run it on every file attachment. Instead, include a pre-filter to limit it to specific file types, such as archives, PDFs, HTML files, etc.One way to do this, as shown in the examples above, is by checking the
.file_extension
prior to callingfile.explode
.
Testing your rule against a sample EML file
The MQL rule editor can be used to display the output of the file.explode
function so you can easily build detection rules for the values extracted. First write your MQL snippet, then click on the file.explode
function, and then the Evaluate
play button:
Updated over 1 year ago