GroupDocs.Parser API References

This page serves as the entry point to the GroupDocs.Parser API references.

GroupDocs.Parser is a set of powerful APIs that enables you to parse and extract text, images, metadata, and structured data from popular document formats such as PDF, Word, Excel, PowerPoint, and more across multiple platforms.

Available Products

Select your target platform below to access detailed API code documentation.

Additional Resources

Product Overview

GroupDocs.Parser is a comprehensive document parser and extractor SDKs that provides developers with powerful APIs to extract data from documents without requiring external dependencies or additional software installations. The library supports parsing and extraction from over 50 document formats including PDF, Microsoft Word, Excel, PowerPoint, OneNote, Outlook, and many more.

Key Features

  • Text Extraction: Extract raw or formatted text from entire documents or specific pages
  • Image Extraction: Extract images from documents with support for various image formats
  • Metadata Extraction: Retrieve document properties, creation dates, author information, and more
  • Structured Data Parsing: Extract tables, forms, and structured data using template-based parsing
  • Container Extraction: Extract attachments and embedded documents from container formats
  • Cross-Platform Support: Available for .NET, Java, and Python platforms
  • No External Dependencies: Parse documents without requiring Microsoft Office, Adobe Acrobat, or other third-party software

Supported File Formats

GroupDocs.Parser supports a wide range of document formats:

  • Word Processing: DOC, DOCX, DOT, DOTX, RTF, ODT, OTT
  • Spreadsheets: XLS, XLSX, XLSM, XLSB, CSV, ODS, OTS
  • Presentations: PPT, PPTX, PPS, PPSX, ODP, OTP
  • PDF Documents: PDF, PDF/A
  • Email: MSG, EML, EMLX, PST, OST
  • Archives: ZIP, TAR, RAR
  • Other Formats: OneNote, Markdown, EPUB, and more

Common Use Cases

  • Document indexing and search engine integration
  • Content management systems (CMS)
  • Data migration and conversion projects
  • Document analysis and reporting
  • Automated document processing workflows
  • Text mining and content extraction
  • Metadata cataloging and organization

Documentation and Downloads