Recognizing Text

Introduction

In the realm of .NET development, efficient text extraction from various document formats is paramount. GroupDocs.Parser for .NET provides a robust solution to extract text seamlessly. In this tutorial, we will delve into using GroupDocs.Parser step-by-step to recognize and extract text from documents.

Prerequisites

Before we dive into using GroupDocs.Parser, ensure you have the following prerequisites:

  • Basic understanding of C# programming
  • Visual Studio installed on your machine
  • Access to the internet for package downloads and documentation references

Import Namespaces

Begin by importing the necessary namespaces to leverage GroupDocs.Parser functionalities:

using System;
using System.Collections.Generic;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using Aspose.OCR;
using GroupDocs.Parser.Data;
using GroupDocs.Parser.Options;

Step 1: Install GroupDocs.Parser

Firstly, download and install the GroupDocs.Parser library. You can acquire it from the download link.

Step 2: Get a Temporary License

To use GroupDocs.Parser, obtain a temporary license from here.

Step 3: Initializing ParserSettings

Create an instance of ParserSettings class to configure text extraction settings, including OCR connectors if needed.

ParserSettings settings = new ParserSettings(new AsposeOcrOnPremise());

Step 4: Using Parser to Extract Text

Now, create an instance of Parser class with the configured settings.

using (Parser parser = new Parser("YourSampleFile.docx", settings))
{
    // Configure TextOptions for OCR usage
    TextOptions options = new TextOptions(false, true);
    // Extract text using OCR
    using (TextReader reader = parser.GetText(options))
    {
        // Display extracted text or a 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

In this snippet:

  • Replace "YourSampleFile.docx" with the path to your target document.
  • TextOptions is configured to enable OCR and optimize text extraction.

Conclusion

Congratulations! You’ve learned how to integrate GroupDocs.Parser for .NET into your projects to extract text efficiently. Explore the extensive documentation for advanced features and optimizations.

FAQ’s

Is GroupDocs.Parser suitable for extracting text from PDF files?

Yes, GroupDocs.Parser supports text extraction from various formats, including PDF.

Can I integrate GroupDocs.Parser into my ASP.NET application?

Absolutely, GroupDocs.Parser can be seamlessly integrated into ASP.NET applications.

Does GroupDocs.Parser require a license for commercial use?

Yes, a license is required for commercial usage. Get a temporary license here.

What document formats are supported by GroupDocs.Parser?

GroupDocs.Parser supports a wide range of formats, including DOCX, PDF, XLSX, and more.

Visit the GroupDocs.Parser forum for support and discussions.