PDF reader - SmartPlant Foundation - IM Update 48 - Help - Hexagon

SmartPlant Foundation Help

Language
English
Product
SmartPlant Foundation
Search by Category
Help
SmartPlant Foundation / SDx Version
10
SmartPlant Markup Plus Version
10.0 (2019)
Smart Review Version
2020 (15.0)

The PDF reader extracts the information from a PDF file image using the application SPFNPDF. The SPFNPDF processes the PDF files to extract the tag and other link information from the scanned file using a defined set of rules and patterns configured using the Data Capture. For PDF files and Microsoft Office files, by default PDF reader is selected as the base reader in the Data Capture Central Settings module in the Desktop Client. For any file types other than the PDF files, if the base reader is set as the PDF reader, when extracting content from such file types the PDF reader generates Markup renditions which are used by the software to retrieve the tags details. For more information, see Manage file types and prioritize them for content extraction.

For preprocessed content files that have been extracted from third-party applications, the PDF reader looks into the PreProcessedContentFiles folder. For Microsoft Office files, you must ensure that the file name, content file and graphics map file is renamed as the following <File Name with extension>.pdf, <File Name with extension>.pdf_ContentFile.xml (content file), <File Name with extension>.pdf_GraphicsMapFile.xml. We recommend you to do the same for any other file types except PDF files, whose base reader is set as the PDF reader.

The PDF reader checks for <File Name with extension>.pdf, <File Name with extension>.pdf_GraphicsMapFile.xml (graphics map file) and <File Name with extension>.pdf_ContentFile.xml (content file) files, and if they don't exist, it looks for a <Filename with extension>_ContentFile.txt file for tag extraction. If the PDF reader does not find the preprocessed content files, it uses the SPFNPDF application to process the files.

The Extract Content workflow step reads the files available in the PreProcessedContentFiles folder, and attaches the <File Name with extension>.pdf_GraphicsMapFile.xml (graphics map file) and <File Name with extension>.pdf_ContentFile.xml (content file) to the file. The Extract Data workflow step then extracts the tags from the <File Name with extension>.pdf_ContentFile.xml file.