GroupDocs.Parser API References
This page serves as the entry point to the GroupDocs.Parser API references.
GroupDocs.Parser is a set of powerful APIs that enables you to parse and extract text, images, metadata, and structured data from popular document formats such as PDF, Word, Excel, PowerPoint, and more across multiple platforms.
Available Products
Select your target platform below to access detailed API code documentation.
GroupDocs.Parser for .NET
Access full API references for .NET developers
GroupDocs.Parser for Java
Access full API references for Java developers
GroupDocs.Parser for Python via .NET
Access full API references for Python developers
Additional Resources
Product Overview
GroupDocs.Parser is a comprehensive document parser and extractor SDKs that provides developers with powerful APIs to extract data from documents without requiring external dependencies or additional software installations. The library supports parsing and extraction from over 50 document formats including PDF, Microsoft Word, Excel, PowerPoint, OneNote, Outlook, and many more.
Key Features
- Text Extraction: Extract raw or formatted text from entire documents or specific pages
- Image Extraction: Extract images from documents with support for various image formats
- Metadata Extraction: Retrieve document properties, creation dates, author information, and more
- Structured Data Parsing: Extract tables, forms, and structured data using template-based parsing
- Container Extraction: Extract attachments and embedded documents from container formats
- Cross-Platform Support: Available for .NET, Java, and Python platforms
- No External Dependencies: Parse documents without requiring Microsoft Office, Adobe Acrobat, or other third-party software
Supported File Formats
GroupDocs.Parser supports a wide range of document formats:
- Word Processing: DOC, DOCX, DOT, DOTX, RTF, ODT, OTT
- Spreadsheets: XLS, XLSX, XLSM, XLSB, CSV, ODS, OTS
- Presentations: PPT, PPTX, PPS, PPSX, ODP, OTP
- PDF Documents: PDF, PDF/A
- Email: MSG, EML, EMLX, PST, OST
- Archives: ZIP, TAR, RAR
- Other Formats: OneNote, Markdown, EPUB, and more
Common Use Cases
- Document indexing and search engine integration
- Content management systems (CMS)
- Data migration and conversion projects
- Document analysis and reporting
- Automated document processing workflows
- Text mining and content extraction
- Metadata cataloging and organization
Documentation and Downloads
- Product Overview - Learn about features, supported formats, and use cases
- Developer Documentation - Comprehensive guides, tutorials, and code examples
- Blog – Latest updates and tutorials
- Live Demos – Interactive online demo applications
- Releases & Downloads - Download the latest versions and release notes