IFieldExtractor
public interface IFieldExtractor
Provides methods for extracting fields from a document.
Learn more
The example demonstrates how to implement the interface.
public class LogExtractor implements IFieldExtractor {
private final String[] extensions = new String[] { ".log" };
public final String[] getExtensions() { return extensions; }
public final DocumentField[] getFields(String filePath) {
File file = new File(filePath);
DocumentField[] fields = new DocumentField[] {
new DocumentField("FileName", file.getAbsolutePath()),
new DocumentField("Content", extractContent(filePath)),
};
return fields;
}
private String extractContent(String filePath) {
StringBuilder result = new StringBuilder();
try {
List lines = Files.readAllLines(Paths.get(filePath), StandardCharsets.UTF_8);
for (int i = 0; i < lines.size(); i++) {
String line = lines.get(i);
String processedLine = line.substring(12);
result.append(processedLine);
}
} catch (IOException ex) {
throw new RuntimeException(ex);
}
return result.toString();
}
}
The example demonstrates how to use the custorm extractor for indexing.
String indexFolder = "c:\\MyIndex\\"; // Specify path to the index folder
String documentsFolder = "c:\\MyDocuments\\"; // Specify path to a folder containing documents to search
Index index = new Index(indexFolder); // Creating or loading an index
index.getIndexSettings().getCustomExtractors().addItem(new LogExtractor()); // Adding custom text extractor to index settings
index.add(documentsFolder); // Indexing documents from the specified folder
Methods
Method | Description |
---|---|
getExtensions() | Gets the supported extensions. |
getFields(String filePath) | Extracts all fields from the specified document. |
getFields(InputStream stream) | Extracts all fields from the specified document. |
getExtensions()
public abstract String[] getExtensions()
Gets the supported extensions.
Returns: java.lang.String[] - The supported extensions.
getFields(String filePath)
public abstract DocumentField[] getFields(String filePath)
Extracts all fields from the specified document.
Parameters:
Parameter | Type | Description |
---|---|---|
filePath | java.lang.String | The document file path. |
Returns: com.groupdocs.search.common.DocumentField[] - The extracted fields.
getFields(InputStream stream)
public abstract DocumentField[] getFields(InputStream stream)
Extracts all fields from the specified document.
Parameters:
Parameter | Type | Description |
---|---|---|
stream | java.io.InputStream | The document stream. |
Returns: com.groupdocs.search.common.DocumentField[] - The extracted fields.