Extract Text from Excel Document as HTML

Introduction

In this tutorial, we’ll explore how to use the GroupDocs.Parser for .NET to extract text from an Excel document and convert it into HTML format. GroupDocs.Parser is a powerful library that allows developers to work with various document formats, extracting text and metadata efficiently.

Prerequisites

Before we begin, ensure you have the following set up:

  • Visual Studio installed on your system.
  • Basic understanding of C# programming.
  • GroupDocs.Parser library for .NET. You can download it from here.

Import Namespaces

Start by including the necessary namespaces in your C# project to access the GroupDocs.Parser functionalities.

using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using GroupDocs.Parser.Data;
using GroupDocs.Parser.Options;

Step 1: Create an Instance of Parser Class

First, instantiate the Parser class by providing the path to your Excel document.

using (Parser parser = new Parser("YourSampleFile.xlsx"))
{
    // Further code will go here
}

Replace "YourSampleFile.xlsx" with the path to your Excel file.

Step 2: Extract Text as HTML

Within the using block of the Parser instance, use the GetFormattedText method to extract formatted text in HTML mode.

using (Parser parser = new Parser("YourSampleFile.xlsx"))
{
    using (TextReader reader = parser.GetFormattedText(new FormattedTextOptions(FormattedTextMode.Html)))
    {
        // Further code will go here
    }
}

Step 3: Read and Print Extracted HTML Text

Next, read the extracted HTML text from the TextReader and print it to the console.

using (Parser parser = new Parser("YourSampleFile.xlsx"))
{
    using (TextReader reader = parser.GetFormattedText(new FormattedTextOptions(FormattedTextMode.Html)))
    {
        Console.WriteLine(reader.ReadToEnd());
    }
}

Once executed, this code will extract the text from the Excel document and display it as HTML format in the console.

Conclusion

In this tutorial, we learned how to use GroupDocs.Parser for .NET to extract text from an Excel document and convert it into HTML format. This library provides a straightforward way to work with various document formats, enabling developers to efficiently handle text extraction tasks in their applications.

FAQ’s

Can GroupDocs.Parser handle other document formats besides Excel?

Yes, GroupDocs.Parser supports a wide range of file formats including PDF, Word, PowerPoint, and more.

Is GroupDocs.Parser compatible with .NET Core?

Yes, GroupDocs.Parser is compatible with both .NET Framework and .NET Core.

Does GroupDocs.Parser preserve formatting during text extraction?

Yes, GroupDocs.Parser can preserve formatting such as fonts, styles, and layout during text extraction.

Can I extract metadata from documents using GroupDocs.Parser?

Yes, GroupDocs.Parser allows extracting metadata like author, creation date, and more from supported document types.

Is there a free trial available for GroupDocs.Parser?

Yes, you can download a free trial from here.