GroupDocs.Parser API References

This page serves as the entry point to the GroupDocs.Parser API references.

GroupDocs.Parser is a set of powerful APIs that enables you to parse and extract text, images, metadata, and structured data from popular document formats such as PDF, Word, Excel, PowerPoint, and more across multiple platforms.

Available Products

Select your target platform below to access detailed API code documentation.

GroupDocs.Parser for .NET

Access full API references for .NET developers

GroupDocs.Parser for Java

Access full API references for Java developers

GroupDocs.Parser for Python via .NET

Access full API references for Python developers

Additional Resources

Product Overview

GroupDocs.Parser is a comprehensive document parser and extractor SDKs that provides developers with powerful APIs to extract data from documents without requiring external dependencies or additional software installations. The library supports parsing and extraction from over 50 document formats including PDF, Microsoft Word, Excel, PowerPoint, OneNote, Outlook, and many more.

Key Features

Text Extraction: Extract raw or formatted text from entire documents or specific pages
Image Extraction: Extract images from documents with support for various image formats
Metadata Extraction: Retrieve document properties, creation dates, author information, and more
Structured Data Parsing: Extract tables, forms, and structured data using template-based parsing
Container Extraction: Extract attachments and embedded documents from container formats
Cross-Platform Support: Available for .NET, Java, and Python platforms
No External Dependencies: Parse documents without requiring Microsoft Office, Adobe Acrobat, or other third-party software

Supported File Formats

GroupDocs.Parser supports a wide range of document formats:

Word Processing: DOC, DOCX, DOT, DOTX, RTF, ODT, OTT
Spreadsheets: XLS, XLSX, XLSM, XLSB, CSV, ODS, OTS
Presentations: PPT, PPTX, PPS, PPSX, ODP, OTP
PDF Documents: PDF, PDF/A
Email: MSG, EML, EMLX, PST, OST
Archives: ZIP, TAR, RAR
Other Formats: OneNote, Markdown, EPUB, and more

Common Use Cases

Document indexing and search engine integration
Content management systems (CMS)
Data migration and conversion projects
Document analysis and reporting
Automated document processing workflows
Text mining and content extraction
Metadata cataloging and organization

Documentation and Downloads

Product Overview - Learn about features, supported formats, and use cases
Developer Documentation - Comprehensive guides, tutorials, and code examples
Blog – Latest updates and tutorials
Live Demos – Interactive online demo applications
Releases & Downloads - Download the latest versions and release notes