Extract Text from Page in PDF in Raw Mode

Introduction

In this tutorial, we’ll explore how to use GroupDocs.Parser for .NET to extract text from pages in PDF documents using raw mode. GroupDocs.Parser is a powerful tool that enables developers to work with various document formats programmatically.

Prerequisites

Before starting this tutorial, ensure you have the following:

  • Visual Studio installed on your machine.
  • Basic knowledge of C# programming.
  • GroupDocs.Parser for .NET library, which you can download here.
  • A sample PDF file for testing purposes.

Import Namespaces

First, make sure to import the necessary namespaces in your C# project:

using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using GroupDocs.Parser.Data;
using GroupDocs.Parser.Options;

Step 1: Create an Instance of Parser Class

To begin, instantiate the Parser class by providing the path to your sample PDF file.

using (Parser parser = new Parser("YourSampleFile.pdf"))
{
    // Your code goes here
}

Step 2: Get Document Info and Iterate Over Pages

Next, retrieve the document information and iterate over each page to extract text.

IDocumentInfo documentInfo = parser.GetDocumentInfo();
for (int p = 0; p < documentInfo.RawPageCount; p++)
{
    Console.WriteLine($"Page {p + 1}/{documentInfo.RawPageCount}");
    // Your code for text extraction goes here
}

Step 3: Extract Text from Each Page

Within the loop, use the GetText method to extract text from each page and print it.

using (TextReader reader = parser.GetText(p, new TextOptions(true)))
{
    Console.WriteLine(reader.ReadToEnd());
}

Conclusion

In this tutorial, we’ve learned how to extract text from PDF pages in raw mode using GroupDocs.Parser for .NET. This process involves creating a Parser instance, obtaining document information, iterating over each page, and extracting text using the GetText method.

FAQ’s

What is GroupDocs.Parser for .NET?

GroupDocs.Parser for .NET is a document parsing API that allows developers to extract text, metadata, and other information from various file formats programmatically.

How do I download GroupDocs.Parser for .NET?

You can download the library from the GroupDocs website.

Is there a free trial available?

Yes, you can access a free trial of GroupDocs.Parser for .NET from here.

Where can I find support for GroupDocs.Parser for .NET?

For technical assistance and community support, visit the GroupDocs forum.

How can I purchase a license for GroupDocs.Parser for .NET?

You can purchase a license from the purchase page or acquire a temporary license here.