GetTextAreas

GetTextAreas()

Extracts text areas from the document.

public IEnumerable<PageTextArea> GetTextAreas()

Return Value

A collection of PageTextArea objects; null if text areas extraction isn’t supported.

Remarks

Learn more:

Examples

The following example shows how to extract all text areas from the whole document:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
    // Extract text areas
    IEnumerable<PageTextArea> areas = parser.GetTextAreas();
    // Check if text areas extraction is supported
    if(areas == null)
    {
        Console.WriteLine("Page text areas extraction isn't supported");
        return;
    }
 
    // Iterate over page text areas
    foreach(PageTextArea a in areas)
    {
        // Print a page index, rectangle and text area value:
        Console.WriteLine(string.Format("Page: {0}, R: {1}, Text: {2}", a.Page.Index, a.Rectangle, a.Text));
    }
}

See Also


GetTextAreas(PageTextAreaOptions)

Extracts text areas from the document using customization options (regular expression, match case, etc.).

public IEnumerable<PageTextArea> GetTextAreas(PageTextAreaOptions options)
Parameter Type Description
options PageTextAreaOptions The options for text area extraction.

Return Value

A collection of PageTextArea objects; null if text areas extraction isn’t supported.

Remarks

Learn more:

Examples

The following example shows how to extract only text areas with digits from the upper-left courner:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
    // Create the options which are used for text area extraction
    PageTextAreaOptions options = new PageTextAreaOptions("[0-9]+", new Rectangle(new Point(0, 0), new Size(300, 100)));

    // Extract text areas which contain only digits from the upper-left courner of a page:
    IEnumerable<PageTextArea> areas = parser.GetTextAreas(options);
    // Check if text areas extraction is supported
    if(areas == null)
    {
        Console.WriteLine("Page text areas extraction isn't supported");
        return;
    }
 
    // Iterate over page text areas
    foreach(PageTextArea a in areas)
    {
        // Print a page index, rectangle and text area value:
        Console.WriteLine(string.Format("Page: {0}, R: {1}, Text: {2}", a.Page.Index, a.Rectangle, a.Text));
    }
}

See Also


GetTextAreas(int)

Extracts text areas from the document page.

public IEnumerable<PageTextArea> GetTextAreas(int pageIndex)
Parameter Type Description
pageIndex Int32 The zero-based page index.

Return Value

A collection of PageTextArea objects; null if text areas extraction isn’t supported.

Remarks

Learn more:

Examples

To extract text areas from a document page the following method is used:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
    // Check if the document supports text areas extraction
    if(!parser.Features.TextAreas)
    {
        Console.WriteLine("Document isn't supports text areas extraction.");
        return;
    }

    // Get the document info
    IDocumentInfo documentInfo = parser.GetDocumentInfo();
    // Check if the document has pages
    if(documentInfo.PageCount == 0)
    {
        Console.WriteLine("Document hasn't pages.");
        return;
    }
 
    // Iterate over pages
    for(int pageIndex = 0; pageIndex<documentInfo.PageCount; pageIndex++)
    {
        // Print a page number 
        Console.WriteLine(string.Format("Page {0}/{1}", pageIndex + 1, documentInfo.PageCount));
 
        // Iterate over page text areas
        // We ignore null-checking as we have checked text areas extraction feature support earlier
        foreach(PageTextArea a in parser.GetTextAreas(pageIndex))
        {
            // Print a rectangle and text area value:
            Console.WriteLine(string.Format("R: {0}, Text: {1}", a.Rectangle, a.Text));
        }
    }
}

See Also


GetTextAreas(int, PageTextAreaOptions)

Extracts text areas from the document page using customization options (regular expression, match case, etc.).

public IEnumerable<PageTextArea> GetTextAreas(int pageIndex, PageTextAreaOptions options)
Parameter Type Description
pageIndex Int32 The zero-based page index.
options PageTextAreaOptions The options for text area extraction.

Return Value

A collection of PageTextArea objects; null if text areas extraction isn’t supported.

Remarks

Learn more:

See Also