GetFormattedText

GetFormattedText(FormattedTextOptions)

Extracts a formatted text from the document.

public TextReader GetFormattedText(FormattedTextOptions options)
Parameter Type Description
options FormattedTextOptions The formatted text extraction options.

Return Value

An instance of TextReader class with the extracted text; null if formatted text extraction isn’t supported.

Remarks

Learn more:

Examples

The following example shows how to extract a document text as HTML text:

// Create an instance of Parser class
using (Parser parser = new Parser(filePath))
{
    // Extract a formatted text into the reader
    using (TextReader reader = parser.GetFormattedText(new FormattedTextOptions(FormattedTextMode.Html)))
    {
        // Print a formatted text from the document
        // If formatted text extraction isn't supported, a reader is null
        Console.WriteLine(reader == null ? "Formatted text extraction isn't suppported" : reader.ReadToEnd());
    }
}

See Also


GetFormattedText(int, FormattedTextOptions)

Extracts a formatted text from the document page.

public TextReader GetFormattedText(int pageIndex, FormattedTextOptions options)
Parameter Type Description
pageIndex Int32 The zero-based page index.
options FormattedTextOptions The formatted text extraction options.

Return Value

An instance of TextReader class with the extracted text; null if formatted text page extraction isn’t supported.

Remarks

Learn more:

Examples

The following example shows how to extract a document page text as Markdown text:

// Create an instance of Parser class
using (Parser parser = new Parser(filePath))
{
    // Check if the document supports formatted text extraction
    if (!parser.Features.FormattedText)
    {
        Console.WriteLine("Document isn't supports formatted text extraction.");
        return;
    }
    
    // Get the document info
    IDocumentInfo documentInfo = parser.GetDocumentInfo();
    // Check if the document has pages
    if (documentInfo.PageCount == 0)
    {
        Console.WriteLine("Document hasn't pages.");
        return;
    }
    
    // Iterate over pages
    for (int p = 0; p<documentInfo.PageCount; p++)
    {
        // Print a page number 
        Console.WriteLine(string.Format("Page {0}/{1}", p + 1, documentInfo.PageCount));
        // Extract a formatted text into the reader
        using (TextReader reader = parser.GetFormattedText(p, new FormattedTextOptions(FormattedTextMode.Markdown)))
        {
            // Print a formatted text from the document
            // We ignore null-checking as we have checked formatted text extraction feature support earlier
            Console.WriteLine(reader.ReadToEnd());
        }
    }
}

See Also