Extract data from multiple documents - HxGN SDx - Update 63 - Administration & Configuration

Administration and Configuration of HxGN SDx

Language
English
Product
HxGN SDx
Search by Category
Administration & Configuration
SmartPlant Foundation / SDx Version
10

This functionality was modified in an update. For more information, see Extract data from multiple documents (modified in an update).

The following procedure is used for extracting content from multiple documents which are in a working or current revision state. It extracts tags and their relationships from multiple documents.

To extract the content using the pre-processed content files, you must ensure that corresponding file is attached to the document along with the ContentFile.xml. You must also attach GraphicsMapFile.xml if the file type supports graphical navigation.

How can I configure DCOM to allow content extraction from Microsoft Office files?

You must enable DCOM permissions before HxGN SDx can access Microsoft Office applications.

This is mandatory if you want to extract content from:

  • a Microsoft Excel file using the Data Capture Datasheet Reader Pre-Processor.

  • any Microsoft Office file (97-2003) using the Data Capture Office Reader Pre-Processor.

To set the DCOM configuration for the respective file type application, complete the following steps:

  1. Click Start > Administrative Tools > Component Services.

  2. In the tree view, expand Component Services > Computers > My Computer > DCOM Config.

  3. Based on the Microsoft Office file type, locate and right-click the respective DCOM Config component service:

    • Microsoft Excel Application

    • Microsoft Word 97-2003

    • Microsoft PowerPoint Slide (97-2003)

  4. On the shortcut menu, click Properties.

  5. In the General tab, set the Authentication Level to None.

  6. In the Identity tab, select The Launching User option.

  7. In the Security tab, set the Launch and Activation Permissions to Customize, and click Edit.

    1. Add the Administrators created by Server Manager.

    2. Select the Allow check box for the following items:

      • Local Launch

      • Remote Launch

      • Local Activation

      • Remote Activation

      • Read

      • Special permissions

What happens when content extraction from multiple documents fail?

When content extraction from multiple documents fail, a content discovery task is automatically created to find the problem. It does this by re-processing the large document set into smaller and smaller batches, starting with batches of 100, then batches of 10, and finally in batches of 1. For each batch, a child content discovery task is created under the master content discovery task.

For example, to re-process 1000 documents, 10 content discovery tasks are created, each with 100 documents. Each batch of 100 documents that fails would then be re-processed into 10 child content discovery tasks, each with a batch size of 10 documents. Finally, each batch of 10 documents that fails is re-processed with 10 child content discovery tasks, each with 1 document to find the failed document.

Click Notifications to view the issue in the failed document that corresponds to the failed child content discovery task.

How is the content extracted from multiple documents when multiple files are attached to one or more documents?

When processing multiple documents, where one or more documents have more than one file attached, the following scenarios are considered:

  • If different file types are attached, the software first checks for a file with the ISPFNMasterFile interface, and content is extracted from that file.

  • If a file with the ISPFNMasterFile interface is not found, content is extracted from the file with the highest priority. By default, files with the .dwg file extension are set as highest priority. However, the priority of the file can be changed in the Data Capture Central settings module in the Desktop Client. For more information, see, see Manage file types and prioritize them for content extraction.

  • If more than one file of highest priority is attached, then the software fails to extract the content as it was not able to select the file.

  1. Click Documents > All Documents.

  2. To extract content from multiple documents, select two or more documents from All Documents list, and click Actions > Extract Content.

What is the purpose of a default template group?

When you use Data Capture Content Discovery Task in the Desktop Client or Extract Content in the Web Client to extract content from multiple documents, the software automatically considers the templates and rules defined for the default template group. The default template group is considered only when a PDF file or a drawing file is attached to the document. To successfully extract content, ensure that the templates and rules are configured for the template group. However, if you have not chosen any template group as default, the software automatically considers a template group DefaultDrawingTemplateGroup for extracting the content. This default template group is provided with the software.

In the Web Client, to extract content from a single document, the software automatically considers the templates and rules defined for the auto selected default template group. The default template group is considered only when a PDF file or a drawing file is attached to the document. However, you have an option to select and apply any other template group instead of the default template group. For more information, see Extract data from a document.

For the auto selected default template group, the Match Tag Patterns option is pre-selected.

  • By default, the property Is Data Capture Rel is set to True on document to tag relationships SPFNDocRevMasterTag, SPFNDocRevAliasTag, FDWDocRevTag and SPFNFDWDocRevChildTag for Data Capture tags.

  • To extract content from the drawing and pdf files, the software applies the templates and rules from the template group which is set as default. For more information, see Manage drawing reader pre-processor templates and template groups and Manage PDF reader pre-processor templates.

  • When complete, click Notifications to view the status of content extraction.

  • FDW tags are created without applying the ENS definition.

  • In order to process documents to which 3D models are attached, you must ensure that corresponding pre-processed content files are available. In case, pre-processed content files are not available, then you must process one document at a time instead of multiple documents. For more information on how to extract content from a single document, see Extract data from a document.

  • When processing multiple documents, content cannot be extracted from a document to which multiple files are attached. In such scenario, you can process each document at a time instead of multiple documents, which will allow you to select the file from which content will be extracted. For more information, see Extract data from a document.

  • Based on the attached file type, the default reader is automatically selected to process the file and extract content. The reader is assigned based on the default settings configured in Data Capture Central Settings module in the Desktop Client. For more information on the file types and the supported readers, see Manage reader and application relationship and Manage file types and prioritize them for content extraction.

  • To view the status of content extraction from a selected document:

    • Select Actions menu, and click Show the detail form > Extract Content.

    For more information about the status of a document processed using the Data Capture, see Data Capture Document Status.