Working with Table Layout in Templates

Introduction

In this tutorial, we’ll explore how to work with table layout in templates using GroupDocs.Parser for .NET. GroupDocs.Parser is a powerful document parsing API that allows developers to extract text and metadata from various document formats, including PDF, Microsoft Office, and more.

Prerequisites

Before we begin, ensure you have the following prerequisites:

  • Basic knowledge of C# and .NET development.
  • Visual Studio installed on your machine.
  • GroupDocs.Parser for .NET installed. You can download it here.

Import Namespaces

First, make sure to import the necessary namespaces into your project:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using GroupDocs.Parser.Data;
using GroupDocs.Parser.Templates;

Step 1: Create a Table Template with Layout

To work with table layouts in templates, you need to define the structure of the table using TemplateTableLayout. This layout specifies the widths of columns and heights of rows.

TemplateTableLayout layout = new TemplateTableLayout(
    new double[] { 30, 100, 320, 400, 480, 550 },   // Column widths
    new double[] { 320, 345, 375 }                  // Row heights
);
// Create a TemplateTable
TemplateTable table = new TemplateTable(layout, "Details", null);

Step 2: Create a Template

Now, create a template using the defined table.

Template template = new Template(new TemplateItem[] { table });

Step 3: Parse a Document Using the Template

Next, instantiate the Parser class and parse a document using the created template.

using (Parser parser = new Parser("YourSampleFile.pdf"))
{
    // Parse the document by the template
    DocumentData data = parser.ParseByTemplate(template);
    // Iterate over extracted data
    for (int i = 0; i < data.Count; i++)
    {
        Console.Write(data[i].Name + ": ");
        // Check if the field is a table
        PageTableArea area = data[i].PageArea as PageTableArea;
        if (area == null)
        {
            continue;
        }
        // Iterate through table rows
        for (int row = 0; row < area.RowCount; row++)
        {
            // Iterate through table columns
            for (int column = 0; column < area.ColumnCount; column++)
            {
                // Get the cell value
                PageTextArea cellValue = area[row, column].PageArea as PageTextArea;
                // Print the cell value
                Console.Write(cellValue == null ? "" : cellValue.Text);
                // Print space between columns
                Console.Write("\t");
            }
            // Move to the next line after each row
            Console.WriteLine();
        }
    }
}

Conclusion

In this tutorial, we’ve learned how to utilize GroupDocs.Parser for .NET to work with table layouts in document templates. By following the outlined steps, you can efficiently parse and extract structured data from documents, facilitating various data processing tasks in your applications.

FAQ’s

Can I parse tables from PDF documents using GroupDocs.Parser for .NET?

Yes, GroupDocs.Parser supports parsing tables from PDF documents along with other popular formats.

Is GroupDocs.Parser suitable for extracting specific data fields from documents?

Absolutely, GroupDocs.Parser offers robust features for extracting targeted data fields based on predefined templates.

How can I handle different table layouts within a document?

GroupDocs.Parser allows defining custom templates to handle diverse table layouts efficiently.

Does GroupDocs.Parser support processing large documents?

Yes, GroupDocs.Parser is optimized for handling documents of varying sizes, ensuring performance and reliability.

Can I integrate GroupDocs.Parser with other .NET libraries?

Certainly, GroupDocs.Parser seamlessly integrates with other .NET libraries, enabling comprehensive document processing workflows.