Searching watermarks Leave feedback

Search for possible watermarks

The example below searches a watermarked PDF and prints the text, page, and size of each possible watermark.

from groupdocs.watermark import Watermarker

def search_watermarks():
    with Watermarker("./document.pdf") as watermarker:
        possible = watermarker.search()
        print(f"Found {len(possible)} possible watermark(s).")
        for wm in possible:
            text = (wm.text or "").strip()
            print(f"- page={wm.page_number} text={text!r} size={round(wm.width)}x{round(wm.height)}")

if __name__ == "__main__":
    search_watermarks()

document.pdf is the sample file used in this example. Click here to download it.

Found 8 possible watermark(s).
- page=1 text='CONFIDENTIAL' size=268x36
- page=1 text='' size=230x69
- page=1 text='' size=460x287
- page=2 text='CONFIDENTIAL' size=268x36
- page=3 text='CONFIDENTIAL' size=268x36
- page=None text='https://auroravisuals.example/legal/msa' size=128x14
- page=None text='https://auroravisuals.example/legal/licensing' size=173x14
- page=None text='https://auroravisuals.example/portfolio' size=76x14

Download full output

Each possible watermark exposes text, image_data, x, y, width, height, rotate_angle, and page_number.

Search criteria

Large documents may contain many candidates. Use dedicated criteria to find exactly what you need.

Text search criteria

Find watermarks by exact text.

from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import TextSearchCriteria

def search_by_text():
    with Watermarker("./document.pdf") as watermarker:
        possible = watermarker.search(TextSearchCriteria("CONFIDENTIAL"))
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_by_text()

document.pdf is the sample file used in this example. Click here to download it.

Found 6 possible watermark(s)

Download full output

Regular expression search criteria

Pass a compiled regular expression to TextSearchCriteria.

import re
from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import TextSearchCriteria

def search_by_regex():
    with Watermarker("./document.pdf") as watermarker:
        possible = watermarker.search(TextSearchCriteria(re.compile(r"^CONFIDENTIAL$")))
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_by_regex()

document.pdf is the sample file used in this example. Click here to download it.

Found 6 possible watermark(s)

Download full output

When a TextSearchCriteria is provided, the API also scans the main document text along with shapes, XObjects, annotations, and other objects.

Image search criteria

Find image watermarks that are visually similar to a sample image using perceptual hashing (DCT hash). Control sensitivity with max_difference (0–1).

from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import ImageDctHashSearchCriteria

def search_by_image():
    with Watermarker("./document.pdf") as watermarker:
        criteria = ImageDctHashSearchCriteria("./logo.png")
        criteria.max_difference = 0.9
        possible = watermarker.search(criteria)
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_by_image()

document.pdf and logo.png are the sample files used in this example. Download document.pdf and logo.png.

Found 2 possible watermark(s)

Download full output

Other image criteria:

ImageColorHistogramSearchCriteria — robust to rotation, scaling, and translation.
ImageThumbnailSearchCriteria — robust to rotation, scaling, and minor color changes.

Combined search criteria

Combine criteria with and_(), or_(), and not_().

from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import (
    ImageDctHashSearchCriteria, TextSearchCriteria, RotateAngleSearchCriteria,
)

def search_combined():
    with Watermarker("./document.pdf") as watermarker:
        image_criteria = ImageDctHashSearchCriteria("./logo.png")
        image_criteria.max_difference = 0.9
        text_criteria = TextSearchCriteria("CONFIDENTIAL")
        angle_criteria = RotateAngleSearchCriteria(30, 60)

        combined = image_criteria.or_(text_criteria).and_(angle_criteria)
        possible = watermarker.search(combined)
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_combined()

document.pdf and logo.png are the sample files used in this example. Download document.pdf and logo.png.

Found 3 possible watermark(s)

Download full output

Text formatting search criteria

Find watermarks by text formatting such as font, size, and color ranges.

from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import TextFormattingSearchCriteria, ColorRange

def search_by_formatting():
    with Watermarker("./document.pdf") as watermarker:
        criteria = TextFormattingSearchCriteria()
        criteria.foreground_color_range = ColorRange()
        criteria.foreground_color_range.min_hue = -15
        criteria.foreground_color_range.max_hue = 15
        criteria.foreground_color_range.min_brightness = 0.01
        criteria.foreground_color_range.max_brightness = 0.99
        criteria.min_font_size = 19
        criteria.max_font_size = 42

        possible = watermarker.search(criteria)
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_by_formatting()

document.pdf is the sample file used in this example. Click here to download it.

Found 3 possible watermark(s)

Download full output

Searching by color and size is the most robust way to match a text watermark. You can also constrain font_name and font_bold, but bold fonts are often embedded under a fused subset name (for example ArialBold) rather than Arial with a separate bold flag, so a font_name = "Arial" filter may miss them.

Search watermarks in particular objects

Limit the search to specific object types to improve performance — either globally via WatermarkerSettings.searchable_objects, or per instance via Watermarker.searchable_objects. The flags live in groupdocs.watermark.search.objects.

from groupdocs.watermark import Watermarker, WatermarkerSettings
from groupdocs.watermark.search.objects import (
    SearchableObjects, WordProcessingSearchableObjects, PdfSearchableObjects,
)

def search_in_objects():
    settings = WatermarkerSettings()
    settings.searchable_objects = SearchableObjects(
        word_processing_searchable_objects=WordProcessingSearchableObjects.HYPERLINKS | WordProcessingSearchableObjects.TEXT,
        pdf_searchable_objects=PdfSearchableObjects.ALL,
    )

    with Watermarker("./document.pdf", settings) as watermarker:
        possible = watermarker.search()
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_in_objects()

document.pdf is the sample file used in this example. Click here to download it.

Found 8 possible watermark(s)

Download full output

Search for hyperlink watermarks

Restrict the search to hyperlinks for a single Watermarker instance:

from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.objects import PdfSearchableObjects

def search_hyperlinks():
    with Watermarker("./document.pdf") as watermarker:
        watermarker.searchable_objects.pdf_searchable_objects = PdfSearchableObjects.HYPERLINKS
        possible = watermarker.search()
        print("Found", len(possible), "hyperlink watermark(s)")

if __name__ == "__main__":
    search_hyperlinks()

document.pdf is the sample file used in this example. Click here to download it.

Found 3 hyperlink watermark(s)

Download full output

Search text while skipping unreadable characters

Enable tolerant matching when text contains unreadable characters between letters.

from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import TextSearchCriteria

def search_skip_unreadable():
    with Watermarker("./document.pdf") as watermarker:
        criterion = TextSearchCriteria("CONFIDENTIAL")
        criterion.skip_unreadable_characters = True
        possible = watermarker.search(criterion)
        print("Found", len(possible), "possible watermark(s)")

if __name__ == "__main__":
    search_skip_unreadable()

document.pdf is the sample file used in this example. Click here to download it.

Found 6 possible watermark(s)

Download full output

We value your opinion. Your feedback will help us improve our documentation.

Searching watermarks Leave feedback

On this page

Search for possible watermarks

Search criteria

Text search criteria

Regular expression search criteria

Image search criteria

Combined search criteria

Text formatting search criteria

Search watermarks in particular objects

Search for hyperlink watermarks

Search text while skipping unreadable characters

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!

On this page