ExtractionOptions
Inheritance: java.lang.Object
public class ExtractionOptions
Provides options for extracting data from documents.
Constructors
Constructor | Description |
---|---|
ExtractionOptions() | Initializes a new instance of the ExtractionOptions class. |
ExtractionOptions(Object data) | Initializes a new instance of the ExtractionOptions class. |
ExtractionOptions(IndexingOptions options, IFieldExtractor customExtractor, IOcrConnector ocrConnector) | Initializes a new instance of the ExtractionOptions class. |
Methods
Method | Description |
---|---|
getCustomExtractor() | Gets the custom text extractor. |
setCustomExtractor(IFieldExtractor value) | Sets or sets the custom text extractor. |
getAutoDetectEncoding() | Gets a value indicating whether to detect encoding automatically or not. |
setAutoDetectEncoding(boolean value) | Sets a value indicating whether to detect encoding automatically or not. |
getEncoding() | Gets the encoding used to extract text from text documents. |
setEncoding(String value) | Sets the encoding used to extract text from text documents. |
getUseRawTextExtraction() | Gets a value indicating whether the raw mode is used for text extraction if possible. |
setUseRawTextExtraction(boolean value) | Sets a value indicating whether the raw mode is used for text extraction if possible. |
getMetadataIndexingOptions() | Gets the options for indexing metadata fields. |
getOcrIndexingOptions() | Gets the options for OCR processing and indexing recognized text. |
getImageIndexingOptions() | Gets the image indexing options for reverse image search. |
getCore() |
ExtractionOptions()
public ExtractionOptions()
Initializes a new instance of the ExtractionOptions class.
ExtractionOptions(Object data)
public ExtractionOptions(Object data)
Initializes a new instance of the ExtractionOptions class.
Parameters:
Parameter | Type | Description |
---|---|---|
data | java.lang.Object | The serialized data. |
ExtractionOptions(IndexingOptions options, IFieldExtractor customExtractor, IOcrConnector ocrConnector)
public ExtractionOptions(IndexingOptions options, IFieldExtractor customExtractor, IOcrConnector ocrConnector)
Initializes a new instance of the ExtractionOptions class.
Parameters:
Parameter | Type | Description |
---|---|---|
options | IndexingOptions | The options. |
customExtractor | IFieldExtractor | The custom extractor. |
ocrConnector | IOcrConnector | The ocr connector. |
getCustomExtractor()
public IFieldExtractor getCustomExtractor()
Gets the custom text extractor. The default value is null .
Returns: IFieldExtractor - The custom text extractor.
setCustomExtractor(IFieldExtractor value)
public void setCustomExtractor(IFieldExtractor value)
Sets or sets the custom text extractor. The default value is null .
Parameters:
Parameter | Type | Description |
---|---|---|
value | IFieldExtractor | The custom text extractor. |
getAutoDetectEncoding()
public boolean getAutoDetectEncoding()
Gets a value indicating whether to detect encoding automatically or not. The default value is false .
Returns: boolean - A value indicating whether to detect encoding automatically or not.
setAutoDetectEncoding(boolean value)
public void setAutoDetectEncoding(boolean value)
Sets a value indicating whether to detect encoding automatically or not. The default value is false .
Parameters:
Parameter | Type | Description |
---|---|---|
value | boolean | A value indicating whether to detect encoding automatically or not. |
getEncoding()
public String getEncoding()
Gets the encoding used to extract text from text documents. The default value is null , which means that the default encoding UTF-8 is used. If AutoDetectEncoding is true then this value is used as the default encoding.
Returns: java.lang.String - The encoding used to extract text from text documents.
setEncoding(String value)
public void setEncoding(String value)
Sets the encoding used to extract text from text documents. The default value is null , which means that the default encoding UTF-8 is used. If AutoDetectEncoding is true then this value is used as the default encoding.
Parameters:
Parameter | Type | Description |
---|---|---|
value | java.lang.String | The encoding used to extract text from text documents. |
getUseRawTextExtraction()
public boolean getUseRawTextExtraction()
Gets a value indicating whether the raw mode is used for text extraction if possible. The default value is true . The raw mode can significantly increase the indexing speed, but normal mode improves the formatting of the extracted text.
Returns: boolean - A value indicating whether the raw mode is used for text extraction if possible.
setUseRawTextExtraction(boolean value)
public void setUseRawTextExtraction(boolean value)
Sets a value indicating whether the raw mode is used for text extraction if possible. The default value is true . The raw mode can significantly increase the indexing speed, but normal mode improves the formatting of the extracted text.
Parameters:
Parameter | Type | Description |
---|---|---|
value | boolean | A value indicating whether the raw mode is used for text extraction if possible. |
getMetadataIndexingOptions()
public MetadataIndexingOptions getMetadataIndexingOptions()
Gets the options for indexing metadata fields.
Returns: MetadataIndexingOptions - The options for indexing metadata fields.
getOcrIndexingOptions()
public OcrIndexingOptions getOcrIndexingOptions()
Gets the options for OCR processing and indexing recognized text.
Returns: OcrIndexingOptions - The options for OCR processing and indexing recognized text.
getImageIndexingOptions()
public ImageIndexingOptions getImageIndexingOptions()
Gets the image indexing options for reverse image search.
Returns: ImageIndexingOptions - The image indexing options for reverse image search.
getCore()
public Object getCore()
Returns: java.lang.Object