Working with Fields at Linked Positions in Templates
Introduction
GroupDocs.Parser for .NET is a robust library designed to facilitate document parsing and data extraction tasks. It supports a wide range of file formats, including PDF, DOCX, XLSX, and more. One of its key features is template-based data extraction, which allows you to define fields within a document and extract specific data based on these predefined templates.
Prerequisites
Before we begin, ensure you have the following:
- Basic understanding of C# programming
- Visual Studio installed on your system
- GroupDocs.Parser for .NET library (download from here)
- Sample document files to work with
Importing Namespaces
Start by including the necessary namespaces in your C# project:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using GroupDocs.Parser.Data;
using GroupDocs.Parser.Templates;
Step 1: Define Template Fields
First, define the template fields using regular expressions and linked positions:
// Define a field with a regular expression
TemplateField field = new TemplateField(
new TemplateRegexPosition("Tax"),
"Tax");
// Define a linked field with specific position settings
TemplateField linkedField = new TemplateField(
new TemplateLinkedPosition(
"Tax",
new Size(100, 20),
new TemplateLinkedPositionEdges(false, false, true, false)),
"TaxValue");
Step 2: Create a Template
Next, create a template containing the defined fields:
// Create a template with the defined fields
Template template = new Template(new TemplateItem[] { field, linkedField });
Step 3: Parse Document with Template
Now, initialize the Parser
class and parse the document using the template:
using (Parser parser = new Parser("YourSampleFile.pdf"))
{
// Parse the document by the template
DocumentData data = parser.ParseByTemplate(template);
// Iterate through extracted data and print results
for (int i = 0; i < data.Count; i++)
{
Console.Write(data[i].Name + ": ");
PageTextArea area = data[i].PageArea as PageTextArea;
Console.WriteLine(area == null ? "Not a template field" : area.Text);
}
}
Conclusion
GroupDocs.Parser for .NET simplifies the process of extracting structured data from documents using templates. By defining fields and applying templates, you can efficiently extract relevant information, enhancing automation and productivity in document processing tasks.
FAQ’s
Can GroupDocs.Parser extract data from encrypted PDF files?
Yes, GroupDocs.Parser supports parsing encrypted PDF files by providing the password during parsing.
Which file formats are supported for template-based extraction?
GroupDocs.Parser supports a wide range of file formats including PDF, DOCX, XLSX, PPTX, TXT, and more.
Is there a trial version available for GroupDocs.Parser?
Yes, you can download a free trial version from here.
Can I use GroupDocs.Parser for batch processing of documents?
Yes, GroupDocs.Parser allows batch processing to parse multiple documents concurrently.
Where can I get technical support for GroupDocs.Parser?
You can seek technical support and engage with the community at GroupDocs forum.