Image reader - HxGN SDx - Update 63 - Administration & Configuration

Administration and Configuration of HxGN SDx

Language
English
Product
HxGN SDx
Search by Category
Administration & Configuration
SmartPlant Foundation / SDx Version
10

The image reader extracts the information contained within an image using an Optical Character Recognition (OCR) engine. You can use a third-party OCR engine, such as ABBYY FlexiCapture or Adlib, to extract the tag and other link information from the scanned file using a defined set of rules and patterns. The image files are processed through the third-party software, and the resulting files are processed by Data Capture.

Processing the image files through the ABBYY results in two files: <Filename with extension>.OCR.pdf and <Filenamewith extension>_ContentFile.xml. Processing the image files through the Adlib also results in two files: <Filename with extension>.OCR.pdf and <Filename with extension>_ContentFile.txt.

You should place the output files from ABBYY or Adlib in the PreProcessedContentFiles folder, located in the same location as the native files prior to the content discovery task.

The Extract Content workflow step attaches the output files from ABBY or Adlib in the PreProcessedContentFiles folder to the same version of the document as the original file. The Extract Data workflow step then extracts the tags from the <Filename with extension>_ContentFile.xml or the <Filename with extension>_ContentFile.txt file.

For example, the following is an example of the tag information in a preprocessed file:

<_Tag_EquipmentSuffixes3>

<_Tag_Name>P-1409-221A</_Tag_Name>

<_Tag_Description>Pump</_Tag_Description>

</_Tag_EquipmentSuffixes3>

<_Tag_EquipmentItems3>

<_Tag_Name>P-1409-222</_Tag_Name>

</_Tag_EquipmentItems3>

<_Tag_InstrumentsXX>

<_Tag_Seq_1>PI</_Tag_Seq_1>.

<_Tag_Seq_2>-</_Tag_Seq_2>

<_Tag_Seq_3>100</_Tag_Seq_3>

<_Tag_Description_Instr>Pressure Instrument</_Tag_Description_Instr>

</_Tag_InstrumentsXX>

  • The representation of the XML tags within the content file is as follows:

    • _Tag_Name represents a single line tag.

    • The node containing the tag information needs to start with "<_Tag" in order for a tag to be created or updated.

    • _Tag_Description represents the description of a tag.

    • _Tag_Seq_<Number> represents the parts of a multi-line tag.

    • Other properties can be added and the name defined must be exactly the same as the property name defined in the schema (no underscore required).

  • For more information on title block information, see Preprocessed content XML file format.

  • The properties of a property list can be set in the preprocessed content XML file using the Tag Attribute Node Name of the property. For example:

    <_Asset_EquipmentSuffixes3>

    <_Asset_Name>61-AC-4501-01S</_Asset_Name>

    <_SPFPrimaryClassification_21>Anode</_SPFPrimaryClassification_21>

    <_AssetValue>25</_AssetValue>

    </_Asset_EquipmentSuffixes3>

    • AssetValue is the tag attribute node name for Asset.

    • The properties of Asset support UoM.

  • The relationship definition name is used to configure a relation between two business objects. For example:

    <_Tag_EquipmentSuffixes3>

    <_Tag_Name>V-222</_Tag_Name>

    <_SPFNPumpList_Height>10 m</_SPFNPumpList_Height>

    <_SPFNTagAsset_12>61-AC-4501-01S</_SPFNTagAsset_12>

    </_Tag_EquipmentSuffixes3>

    • SPFNTagAsset_12 is used to configure a relationship between the tag and the asset; SPFNTagAsset is the relationship definition UID and 12 is the direction.

    • Click Find > Schema > Relationship Definition Name in the SmartPlant Foundation Desktop Client, to find the relationship definition name.

    • If the business object does not exist in both the preprocessed file and the database, the relationship is not created.

  • If a document is involved in the relationship, the relationship is created with the document to which the preprocessed file is attached. For example:

    <_Originator_EquipmentSuffixes3>

    <_Originator_Name>O-101</_Originator_Name>

    <_SPFNDocumentOriginator_21 />

    </_Originator_EquipmentSuffixes3>

  • When you process the content discovery task with the following content file for a given document, the relationship (SPFNDocumentDocument) is created with the off page documents mentioned in content file only if the documents are present in the database.

    <_Document>

    <_Document_Name>1408-10-005</_Document_Name>

    </_Document>

    <_Document>

    <_Document_Name>1409-10-003</_Document_Name>

    </_Document>