Classification Module 6.1.3

When you create or edit an text extractor, you should ensure that it is properly formatted for the classification system. If the text extractor is not valid, it will not be available for use with the classification system.

To test the text extractor format in the web portal

When you create or edit a text extractor, select Validate to ensure the text extractor can be processed by the classification system.
This tests whether text extractor has been formatted properly, not whether the data matches properly.

To test the text extractor format with PowerShell

Run the Confirm-QTextExtractor command with the following mandatory parameters:
1. ServerAddress
  Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
2. Id

Editing Text Extractors

If your rules are not finding the results you require, the text extractors you are using may need refining. Editing text extractors can help pinpoint the patterns you want to match, and reduce false positives.

Remember that rules and text extractors can be shared across multiple taxonomies, so you should not change them in isolation without understanding where your changes may have an effect. You can view all associated rules by selecting the Associated tab when editing the text extractor.

For more details, see:

To edit the name and description of a text extractor with PowerShell

Run the Set-QTextExtractor command with the following mandatory parameters:
1. ServerAddress
  Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
2. ExtractorId
Modify any of the following optional parameters:
1. Name
2. Description

Removing Text Extractors

Because a single text extractor can be associated with numerous rules, they should be removed with care. Before deleting a text extractor, you should consider the following:

if a text extractor is associated with a rule, you must break all associations to any existing rules that may affect your current classifications.
if a text extractor is not associated, deleting it will not affect the classifications.

You can quickly see which rules are associated with each text extractor before removing them through the web portal by selecting the Associated Rules tab.

To remove a text extractor using the web portal

Select Governed Data | Categorization Manager |Extractors.
Select the required text extractor and click Delete.

To remove a text extractor from the classification system with PowerShell

Make sure you know the text extractor ID.
You can use the Get-QTextExtractors command for a full listing of all text extractors in the system.
Run the Remove-QTextExtractor command with the following mandatory parameters:
1. ServerAddress
  Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
2. Id

Working with Regular Expression Text Extractors

You can use regular expression syntax to develop the patterns to locate in the extracted text. You can see examples of regular expressions used in the text extractors included with Quest One Identity Manager. For more information, see Sample Advanced Text Extractors Details.

To create a regular expression text extractor using the web portal

Select Governed Data | Categorization Manager | Extractors.
Select Regex to create a new text extractor.
Enter a unique identifier.
The identifier is used by the classification system. Once created, you cannot change this value. It is recommended you use a naming convention that reflects the purpose of the text extractor.
Enter a name and description.
The name and descriptions are useful when building rules to ensure the proper text extractors are being included, so provide all necessary information.
You are ready to define the settings.
Specify the patterns to match by entering the required regular expression.
You can select the Check Syntax button to ensure the regular expression is properly formatted.

Enable the following settings as required:


Setting	Description
Ignore case	Enable if you want your criteria to match regardless of the case.
Multiline	Enable if you want ^ and $ to match the beginning and end of a line.
Use ECMA script compliance mode	Enable if you want to use ECMAScript-compliant behavior. This can only be used with the Ignore case and Multiline settings. Usage of this option with other Regex settings will result in an error.
Singleline	Enable if you want . to match every character including \n.
Explicit capture	Enable if you only want to capture groups that are explicitly named or numbered with format (?<name> subexpression).
Ignore whitespace	Enable if you want to ignore unescaped white space in a string and enable comments after #.
Right to left	Enable if you want to search from right to left instead of from left to right.
Cultural invariant	Enable if you want to ignore cultural differences in language.

Click Check Syntax to ensure the text extractor has been properly formatted before you associate it with any of your rules.
Carefully review your settings and save your changes.
Click Validate to ensure the text extractor can be processed by the classification system.

To add a regular expression text extractor with PowerShell

Run the Add-QRegexTextExtractor command with the following mandatory parameters:
1. ServerAddress
  Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
2. Id
  Provide an ID for this text extractor. The identifier is used by the classification system and once created, it cannot be changed. It is recommended you use a naming convention that reflects the purpose of the text extractor.
3. Name
  The name should reflect the purpose of the text extractor.
4. Expression
  Enter the regular expression that will specify the patterns to match.
If desired, use the following optional parameters:
1. Description
  Provide a description for the text extractor. This is useful when building rules to ensure the proper text extractors are being included, so provide all necessary information.
  
  The default value for the following parameters is False.
2. IgnoreCase
  Set this to $true if you want your criteria to match regardless of the case.
3. UseMultilineMode
  Set this to $true if you want ^ and $ to match the beginning and end of a line.
4. UseSinglelineMode
  Setting this to $true if you want . to match every character including \n.
5. UseExplicitCaptureMode
  Set this to $true if you only want to capture groups that are explicitly named or numbered with format (?<name> subexpression).
6. IgnoreWhitespace
  Set this to $true to ignore unescaped white space in a string and enable comments after #.
7. UseRightToLeftMode
  Set this to $true to search from right to left instead of from left to right.
8. UseECMAScriptCompliantMode
  Set this to $true to enable ECMAScript-compliant behavior. This can only be used with the IgnoreCase and Multiline flags. Usage of this option with other Regex behavior flags will result in an error.
9. CultureInvariant
  Set this to $true to ignore cultural differences in language.

To edit a regular expression text extractor using the web portal

Select Governed Data | Categorization Manager | Extractors.
Locate the required Regex text extractor and click Edit.
From the General tab, you can edit the name and description.
The name and descriptions will be visible by all users who are building rules. Including detailed information helps to ensure the proper text extractors are being included.
Select the Definition tab and update the expression as required.
You can select the Check Syntax button to ensure the regular expression is properly formatted.
If required, modify the available settings. (Ignore case, Multiline, Singleline, Explicit Capture, Ignore whitespace, Right to left, Use ECMA script compliance mode, and Culture invariant.)
Save your changes.
Click Validate to ensure the text extractor can be processed by the classification system.
Click Check Syntax to ensure the text extractor has been properly formatted before you associate it with any of your rules.
Carefully review your settings and save your changes.

To edit a regular expression text extractor with PowerShell

Make sure you know the ID of the desired text extractor. For more information, see Finding a Taxonomy, Category, or Extractor ID using PowerShell.
Run the Set-QRegExTextExtractor command with the following mandatory parameters:
1. ServerAddress
  Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
2. Id
Modify any of the following parameters: Name, Description, and Expression.
Enable or disable the following options: IgnoreCase, UseMultilineMode, UseSinglelineMode,
UseExplicitCaptureMode, IgnoreWhitespace, UseRightToLeftMode, UseECMAScriptCompliantMode, and CultureInvariant.

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem

Classification Module 6.1.3 - User Guide

Validating the Text Extractor

Editing Text Extractors

Removing Text Extractors

Working with Regular Expression Text Extractors