Searching watermarks
Leave feedback
On this page
Use Watermarker.search() to scan a document for objects that can be treated as watermarks — including watermarks added by third-party tools. Without criteria, search() returns a subset such as backgrounds and floating objects; pass a criteria object to narrow the results.
The example below searches a watermarked PDF and prints the text, page, and size of each possible watermark.
from groupdocs.watermark import Watermarker
def search_watermarks():
with Watermarker("./document.pdf") as watermarker:
possible = watermarker.search()
print(f"Found {len(possible)} possible watermark(s).")
for wm in possible:
text = (wm.text or "").strip()
print(f"- page={wm.page_number} text={text!r} size={round(wm.width)}x{round(wm.height)}")
if __name__ == "__main__":
search_watermarks()
document.pdf is the sample file used in this example. Click here to download it.
Found 8 possible watermark(s).
- page=1 text='CONFIDENTIAL' size=268x36
- page=1 text='' size=230x69
- page=1 text='' size=460x287
- page=2 text='CONFIDENTIAL' size=268x36
- page=3 text='CONFIDENTIAL' size=268x36
- page=None text='https://auroravisuals.example/legal/msa' size=128x14
- page=None text='https://auroravisuals.example/legal/licensing' size=173x14
- page=None text='https://auroravisuals.example/portfolio' size=76x14
Each possible watermark exposes text, image_data, x, y, width, height, rotate_angle, and page_number.
Large documents may contain many candidates. Use dedicated criteria to find exactly what you need.
Find watermarks by exact text.
from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import TextSearchCriteria
def search_by_text():
with Watermarker("./document.pdf") as watermarker:
possible = watermarker.search(TextSearchCriteria("CONFIDENTIAL"))
print("Found", len(possible), "possible watermark(s)")
if __name__ == "__main__":
search_by_text()
document.pdf is the sample file used in this example. Click here to download it.
Found 6 possible watermark(s)
Pass a compiled regular expression to TextSearchCriteria.
import re
from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import TextSearchCriteria
def search_by_regex():
with Watermarker("./document.pdf") as watermarker:
possible = watermarker.search(TextSearchCriteria(re.compile(r"^CONFIDENTIAL$")))
print("Found", len(possible), "possible watermark(s)")
if __name__ == "__main__":
search_by_regex()
document.pdf is the sample file used in this example. Click here to download it.
Found 6 possible watermark(s)
When a TextSearchCriteria is provided, the API also scans the main document text along with shapes, XObjects, annotations, and other objects.
Find image watermarks that are visually similar to a sample image using perceptual hashing (DCT hash). Control sensitivity with max_difference (0–1).
from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import ImageDctHashSearchCriteria
def search_by_image():
with Watermarker("./document.pdf") as watermarker:
criteria = ImageDctHashSearchCriteria("./logo.png")
criteria.max_difference = 0.9
possible = watermarker.search(criteria)
print("Found", len(possible), "possible watermark(s)")
if __name__ == "__main__":
search_by_image()
document.pdf and logo.png are the sample files used in this example. Download document.pdf and logo.png.
Found 2 possible watermark(s)
Other image criteria:
ImageColorHistogramSearchCriteria— robust to rotation, scaling, and translation.ImageThumbnailSearchCriteria— robust to rotation, scaling, and minor color changes.
Combine criteria with and_(), or_(), and not_().
from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import (
ImageDctHashSearchCriteria, TextSearchCriteria, RotateAngleSearchCriteria,
)
def search_combined():
with Watermarker("./document.pdf") as watermarker:
image_criteria = ImageDctHashSearchCriteria("./logo.png")
image_criteria.max_difference = 0.9
text_criteria = TextSearchCriteria("CONFIDENTIAL")
angle_criteria = RotateAngleSearchCriteria(30, 60)
combined = image_criteria.or_(text_criteria).and_(angle_criteria)
possible = watermarker.search(combined)
print("Found", len(possible), "possible watermark(s)")
if __name__ == "__main__":
search_combined()
document.pdf and logo.png are the sample files used in this example. Download document.pdf and logo.png.
Found 3 possible watermark(s)
Find watermarks by text formatting such as font, size, and color ranges.
from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import TextFormattingSearchCriteria, ColorRange
def search_by_formatting():
with Watermarker("./document.pdf") as watermarker:
criteria = TextFormattingSearchCriteria()
criteria.foreground_color_range = ColorRange()
criteria.foreground_color_range.min_hue = -15
criteria.foreground_color_range.max_hue = 15
criteria.foreground_color_range.min_brightness = 0.01
criteria.foreground_color_range.max_brightness = 0.99
criteria.min_font_size = 19
criteria.max_font_size = 42
possible = watermarker.search(criteria)
print("Found", len(possible), "possible watermark(s)")
if __name__ == "__main__":
search_by_formatting()
document.pdf is the sample file used in this example. Click here to download it.
Found 3 possible watermark(s)
Searching by color and size is the most robust way to match a text watermark. You can
also constrain font_name and font_bold, but bold fonts are often embedded under a fused
subset name (for example ArialBold) rather than Arial with a separate bold flag, so a
font_name = "Arial" filter may miss them.
Limit the search to specific object types to improve performance — either globally via WatermarkerSettings.searchable_objects, or per instance via Watermarker.searchable_objects. The flags live in groupdocs.watermark.search.objects.
from groupdocs.watermark import Watermarker, WatermarkerSettings
from groupdocs.watermark.search.objects import (
SearchableObjects, WordProcessingSearchableObjects, PdfSearchableObjects,
)
def search_in_objects():
settings = WatermarkerSettings()
settings.searchable_objects = SearchableObjects(
word_processing_searchable_objects=WordProcessingSearchableObjects.HYPERLINKS | WordProcessingSearchableObjects.TEXT,
pdf_searchable_objects=PdfSearchableObjects.ALL,
)
with Watermarker("./document.pdf", settings) as watermarker:
possible = watermarker.search()
print("Found", len(possible), "possible watermark(s)")
if __name__ == "__main__":
search_in_objects()
document.pdf is the sample file used in this example. Click here to download it.
Found 8 possible watermark(s)
Restrict the search to hyperlinks for a single Watermarker instance:
from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.objects import PdfSearchableObjects
def search_hyperlinks():
with Watermarker("./document.pdf") as watermarker:
watermarker.searchable_objects.pdf_searchable_objects = PdfSearchableObjects.HYPERLINKS
possible = watermarker.search()
print("Found", len(possible), "hyperlink watermark(s)")
if __name__ == "__main__":
search_hyperlinks()
document.pdf is the sample file used in this example. Click here to download it.
Found 3 hyperlink watermark(s)
Enable tolerant matching when text contains unreadable characters between letters.
from groupdocs.watermark import Watermarker
from groupdocs.watermark.search.search_criteria import TextSearchCriteria
def search_skip_unreadable():
with Watermarker("./document.pdf") as watermarker:
criterion = TextSearchCriteria("CONFIDENTIAL")
criterion.skip_unreadable_characters = True
possible = watermarker.search(criterion)
print("Found", len(possible), "possible watermark(s)")
if __name__ == "__main__":
search_skip_unreadable()
document.pdf is the sample file used in this example. Click here to download it.
Found 6 possible watermark(s)
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.