Searching watermarks

Use Watermarker.search() to scan a document for objects that can be treated as watermarks — including watermarks added by third-party tools. Without criteria, search() returns a subset such as backgrounds and floating objects; pass a criteria object to narrow the results.

Search for possible watermarks

The example below searches a watermarked PDF and prints the text, page, and size of each possible watermark.

from groupdocs.watermark import Watermarker

def search_watermarks():
    with Watermarker("./document.pdf") as watermarker:
        possible = watermarker.search()
        print(f"Found {len(possible)} possible watermark(s).")
        for wm in possible:
            text = (wm.text or "").strip()
            print(f"- page={wm.page_number} text={text!r} size={round(wm.width)}x{round(wm.height)}")

if __name__ == "__main__":
    search_watermarks()

document.pdf is the sample file used in this example. Click here to download it.

Found 8 possible watermark(s).
- page=1 text='CONFIDENTIAL' size=268x36
- page=1 text='' size=230x69
- page=1 text='' size=460x287
- page=2 text='CONFIDENTIAL' size=268x36
- page=3 text='CONFIDENTIAL' size=268x36
- page=None text='https://auroravisuals.example/legal/msa' size=128x14
- page=None text='https://auroravisuals.example/legal/licensing' size=173x14
- page=None text='https://auroravisuals.example/portfolio' size=76x14

Download full output

Each possible watermark exposes text, image_data, x, y, width, height, rotate_angle, and page_number.

Search criteria

Large documents may contain many candidates. Use dedicated criteria to find exactly what you need.

Text search criteria

Find watermarks by exact text.

from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import TextSearchCriteria

def search_by_text():
    with Watermarker("./document.pdf") as watermarker:
        possible = watermarker.search(TextSearchCriteria("CONFIDENTIAL"))
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_by_text()

document.pdf is the sample file used in this example. Click here to download it.

Found 6 possible watermark(s)

Download full output

Regular expression search criteria

Pass a compiled regular expression to TextSearchCriteria.

import re
from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import TextSearchCriteria

def search_by_regex():
    with Watermarker("./document.pdf") as watermarker:
        possible = watermarker.search(TextSearchCriteria(re.compile(r"^CONFIDENTIAL$")))
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_by_regex()

document.pdf is the sample file used in this example. Click here to download it.

Found 6 possible watermark(s)

Download full output

When a TextSearchCriteria is provided, the API also scans the main document text along with shapes, XObjects, annotations, and other objects.

Image search criteria

Find image watermarks that are visually similar to a sample image using perceptual hashing (DCT hash). Control sensitivity with max_difference (0–1).

from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import ImageDctHashSearchCriteria

def search_by_image():
    with Watermarker("./document.pdf") as watermarker:
        criteria = ImageDctHashSearchCriteria("./logo.png")
        criteria.max_difference = 0.9
        possible = watermarker.search(criteria)
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_by_image()

document.pdf and logo.png are the sample files used in this example. Download document.pdf and logo.png.

Found 2 possible watermark(s)

Download full output

Other image criteria:

Combined search criteria

Combine criteria with and_(), or_(), and not_().

from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import (
    ImageDctHashSearchCriteria, TextSearchCriteria, RotateAngleSearchCriteria,
)

def search_combined():
    with Watermarker("./document.pdf") as watermarker:
        image_criteria = ImageDctHashSearchCriteria("./logo.png")
        image_criteria.max_difference = 0.9
        text_criteria = TextSearchCriteria("CONFIDENTIAL")
        angle_criteria = RotateAngleSearchCriteria(30, 60)

        combined = image_criteria.or_(text_criteria).and_(angle_criteria)
        possible = watermarker.search(combined)
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_combined()

document.pdf and logo.png are the sample files used in this example. Download document.pdf and logo.png.

Found 3 possible watermark(s)

Download full output

Text formatting search criteria

Find watermarks by text formatting such as font, size, and color ranges.

from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import TextFormattingSearchCriteria, ColorRange

def search_by_formatting():
    with Watermarker("./document.pdf") as watermarker:
        criteria = TextFormattingSearchCriteria()
        criteria.foreground_color_range = ColorRange()
        criteria.foreground_color_range.min_hue = -15
        criteria.foreground_color_range.max_hue = 15
        criteria.foreground_color_range.min_brightness = 0.01
        criteria.foreground_color_range.max_brightness = 0.99
        criteria.min_font_size = 19
        criteria.max_font_size = 42

        possible = watermarker.search(criteria)
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_by_formatting()

document.pdf is the sample file used in this example. Click here to download it.

Found 3 possible watermark(s)

Download full output

Searching by color and size is the most robust way to match a text watermark. You can also constrain font_name and font_bold, but bold fonts are often embedded under a fused subset name (for example ArialBold) rather than Arial with a separate bold flag, so a font_name = "Arial" filter may miss them.

Search watermarks in particular objects

Limit the search to specific object types to improve performance — either globally via WatermarkerSettings.searchable_objects, or per instance via Watermarker.searchable_objects. The flags live in groupdocs.watermark.search.objects.

from groupdocs.watermark import Watermarker, WatermarkerSettings
from groupdocs.watermark.search.objects import (
    SearchableObjects, WordProcessingSearchableObjects, PdfSearchableObjects,
)

def search_in_objects():
    settings = WatermarkerSettings()
    settings.searchable_objects = SearchableObjects(
        word_processing_searchable_objects=WordProcessingSearchableObjects.HYPERLINKS | WordProcessingSearchableObjects.TEXT,
        pdf_searchable_objects=PdfSearchableObjects.ALL,
    )

    with Watermarker("./document.pdf", settings) as watermarker:
        possible = watermarker.search()
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_in_objects()

document.pdf is the sample file used in this example. Click here to download it.

Found 8 possible watermark(s)

Download full output

Restrict the search to hyperlinks for a single Watermarker instance:

from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.objects import PdfSearchableObjects

def search_hyperlinks():
    with Watermarker("./document.pdf") as watermarker:
        watermarker.searchable_objects.pdf_searchable_objects = PdfSearchableObjects.HYPERLINKS
        possible = watermarker.search()
        print("Found", len(possible), "hyperlink watermark(s)")

if __name__ == "__main__":
    search_hyperlinks()

document.pdf is the sample file used in this example. Click here to download it.

Found 3 hyperlink watermark(s)

Download full output

Search text while skipping unreadable characters

Enable tolerant matching when text contains unreadable characters between letters.

from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import TextSearchCriteria

def search_skip_unreadable():
    with Watermarker("./document.pdf") as watermarker:
        criterion = TextSearchCriteria("CONFIDENTIAL")
        criterion.skip_unreadable_characters = True
        possible = watermarker.search(criterion)
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_skip_unreadable()

document.pdf is the sample file used in this example. Click here to download it.

Found 6 possible watermark(s)

Download full output