public class TikaDocumentParser extends AbstractDocumentParser
DocumentParser that uses Apache Tika| Modifier and Type | Field and Description |
|---|---|
protected int |
charLimit
The maximum number of characters to parse from the document.
|
protected FileTypeMap |
fileTypeMap |
protected List<MetadataExtractor<org.apache.tika.metadata.Metadata>> |
metadataExtractors
List of metadata extractors to apply after parsing documents
|
protected com.fasterxml.jackson.databind.ObjectMapper |
objectMapper
Jackson
ObjectMapper instance |
protected org.apache.tika.Tika |
tika
Apache
Tika instance |
fieldNameAuthor, fieldNameContent, fieldNameContentType, fieldNameCreated, fieldNameDescription, fieldNameKeywords, fieldNameModified, fieldNAmeTitle| Constructor and Description |
|---|
TikaDocumentParser() |
| Modifier and Type | Method and Description |
|---|---|
protected String |
extractMetadata(String filename,
org.springframework.core.io.Resource resource,
String parsedContent,
org.apache.tika.metadata.Metadata metadata,
Map<String,Object> additionalFields)
Prepares the document to be indexed
|
String |
parseToXml(String filename,
org.springframework.core.io.Resource resource,
Map<String,Object> additionalFields)
Parses the given document and generates an XML file
|
void |
setCharLimit(int charLimit) |
void |
setMetadataExtractors(List<MetadataExtractor<org.apache.tika.metadata.Metadata>> metadataExtractors) |
void |
setObjectMapper(com.fasterxml.jackson.databind.ObjectMapper objectMapper) |
void |
setTika(org.apache.tika.Tika tika) |
setFieldNameAuthor, setFieldNameContent, setFieldNameContentType, setFieldNameCreated, setFieldNameDescription, setFieldNameKeywords, setFieldNameModified, setFieldNAmeTitleprotected int charLimit
protected com.fasterxml.jackson.databind.ObjectMapper objectMapper
ObjectMapper instanceprotected List<MetadataExtractor<org.apache.tika.metadata.Metadata>> metadataExtractors
protected org.apache.tika.Tika tika
Tika instanceprotected FileTypeMap fileTypeMap
public void setCharLimit(int charLimit)
public void setObjectMapper(com.fasterxml.jackson.databind.ObjectMapper objectMapper)
public void setMetadataExtractors(List<MetadataExtractor<org.apache.tika.metadata.Metadata>> metadataExtractors)
public void setTika(org.apache.tika.Tika tika)
public String parseToXml(String filename, org.springframework.core.io.Resource resource, Map<String,Object> additionalFields)
filename - the name of the fileresource - the document to parseadditionalFields - additional fields to addprotected String extractMetadata(String filename, org.springframework.core.io.Resource resource, String parsedContent, org.apache.tika.metadata.Metadata metadata, Map<String,Object> additionalFields)
resource - the content of the parsed filemetadata - the metadata of the parsed fileadditionalFields - additional fields to be addedCopyright © 2022 CrafterCMS. All rights reserved.