Chat now with support
Chat with Support

Classification Module 6.1.3 - User Guide

Introduction Deploying Classification in Identity Manager Configuring Classification: Taxonomies, Categories, and Rules
An Overview of Classification Configuration Steps Required to Implement Classification Creating Taxonomies Implementing Rules for Automated Categorization Classifying Resources When Do Categorization and Classification Occur? Managing the Life Cycle of Taxonomies and Categories
Working with Categorized Resources Appendix A: PowerShell Commands Appendix B: Oracle Configuration Appendix C: Classifying Data with Data Governance Templates Appendix D: Creating a Taxonomy to Classify Data

Testing a Rule against a Resource

Once you add a rule, you should check that it has the desired results. To do this, set up a test file or SharePoint document containing data that will allow you to evaluate the rule. For example, if the rule involves credit card numbers, ensure the content of the test resource includes credit card numbers. Use the Get-QRuleResults command to perform your test. For information on testing all rules at once, see Testing all Rules Against a Resource.

The result of this command is an XML file, which details the results of your test. The file is divided into seven sections:

Test Results
Section Description
Log Messages Contains the messages that the rules or text extractors invoked to record into the log, along with timestamps.
AutomaticClassification Includes the categories added in the 'Adds' subsection, removed in the 'Removes' subsection, and other operations in the 'Others' subsection. The 'Key' node specifies the topic ID that corresponds to each category.
EntityCache Contains text that the text extractors found and the rules had hits on. For example, a 'Cities in California' rule could have a hit on 'Los Angeles', so it is stored in the entity cache, along with the offset (number of characters from the beginning) and length of the item.
ExtractorEvents Shows which rules requested which text extractors to perform extraction on the content, and what the results were. The text extractor event will either be an ExtractorResult if a text extractor was run on the content for the first time, or an ExtractorCacheHit if the text extractor's result for the given content had already been cached. Each event also has a timestamp.
FinalRuleStates Shows data contained in any rule states that had a match.
LastExtractorTime Shows the timestamp for when each text extractor was last invoked.
Properties Contains any properties that were set during processing by the rules or text extractors.

To test a rule against a resource using PowerShell

  1. If you do not know the ID of the rule you want to test, run the Get-QXmlRules command with the mandatory ServerAddress parameter, and note or copy the rule ID.
  2. Run the Get-QRuleResults command with the following mandatory parameters:
    You may want to send the results to an output file using the PowerShell parameter > filename.xml. This will make the results much easier to interpret.
    1. ServerAddress
      Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
    2. ResourcePath
      The full path to the test resource. For example c:\test files\creditcard.txt.
    3. RuleID
      The full ID of the rule you want to test.

Testing all Rules Against a Resource

Testing all rules allows you to see the results of each rule when run on a test resource. This can help with your understanding on how each rule works, and how they interact on a single resource. Use the Get-QAllRuleResults command to perform your test. For more information, see Managing Rules in the Classification System.

The result of this command is an XML file, which details the results of your test. You can use this output to see the effect of your rules, and to infer categorization. You need to know the threshold on a category, as well as the category’s settings in order to determine if it would be applied. For more information, see How Categories Work Together: Mutual Exclusivity, Strict Ordering and Inheritance and How Rules Affect Categorization.

For an explanation of the resulting XML file, see Testing a Rule against a Resource.

Depending on the number of rules in your system, you may find it helpful to test a single rule. For more information, see Testing a Rule against a Resource.

To test a rule against a resource using PowerShell

  1. Run the Get-QAllRuleResults command with the following mandatory parameters:
    You may want to send the results to an output file using the PowerShell parameter > filename.xml. This will make the results much easier to interpret.
    1. ServerAddress
      Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
    2. ResourcePath
      The full path to the test resource, including the computer name if applicable. For example c:\test files\creditcard.txt.
  2. Examine the output.

Viewing the Text from a Resource

A rule is run against the text that is extracted from a resource. You may be unsure what content in the resource caused the results of a rule. For example, you may wonder why a rule identifies a credit card number in your resource. Using this diagnostic, you can see exactly what text is extracted from a resource. Use the Get-QResourceTextExtracted command to perform this test.

To examine the text extracted from a resource

  1. Run the Get-QResourceTextExtracted command with the following mandatory parameters:
    You may want to send the results to an output file using the PowerShell parameter > filename.xml. This will make the results much easier to interpret.
    1. ServerAddress
      Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
    2. ResourcePath
      The full file path to the resource, including the computer name if applicable.
  2. Examine the output.

Determining Why Categorizations Were Applied to a Resource

A number of factors determine a resource’s categorization. The text in the resource is evaluated against all the rules in the system. When a rule matches, its match strength is determined. This, in combination with the weight of the rule associated with a category, and the category threshold, determine whether a category is applied. When evaluating why a categorization is applied, it is important to understand these concepts and the relationships between them. For more information, see How Rules Affect Categorization.

You may want to examine a particular categorization for many reasons, including:

  • You want to see why a category was applied or not applied to a particular resource.
  • You may be working with category thresholds and rule strengths to ensure your system is working as desired.
  • You may be testing a new category.
  • You may be introducing new rules into the system and need to ensure that your system is still working as desired.

The Get-QClassificationAnalysis command returns results in a structured format for you to analyze. The following information is included:

Information available for examining a resource’s categorization
tag Description
<Taxonomy> The name of the taxonomy that contains the categories.
<Category Name> Identifies the category that is being analyzed.
<Category ID> Provides the ID so you can easily run more commands on the category if needed, for example to modify the threshold or change a rule weight.
<Applied> Indicates whether the category was applied after all rules were taken into account. Only categories with at least one rule match are listed in the results.
<Threshold> The threshold value currently set on the category. The sum of the (rule weights x match strength) for all matching rules must equal or exceed this number for a category to be applied. For further explanation, see How Rules Affect Categorization.
<Rule> Identifies the rule that is being analyzed. All rules associated with category are listed, regardless of whether they match.
<Name> The name of the rule.
<Match> Indicates whether the rule matched.
<Strength> The strength of the rule match. This is used to calculate the value this rule contributes towards meeting the threshold, along with the rule weight. Match strength is set in the rule xml. For more information, see Writing XML Rules.
<Weight> The weight is part of the association between a rule and a category, and is used to calculate the value this rule contributes towards meeting the threshold, along with the match strength. You can change the rule weight using the Add-QRuleToCategory command. For more information, see Associating Rules to Categories and Applying Rule Weights.
<Strength_x_Weight> The value of this rule. The strength x weight is applied towards the threshold. If the sum of all rule values (for the rules associated with the category) exceeds the threshold, the category is applied.

To examine the categorization of a resource

  1. Run the Get-QClassificationAnalysis command with the following mandatory parameters:
    1. ServerAddress
      Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
    2. ResourcePath
      The full file path to the resource, including the computer name.
  2. Examine the output.
    If your results are lengthy, you may find it helpful to store the output in a file, using the PowerShell command > C:\path\filename.txt.
Related Documents

The document was helpful.

Select Rating

I easily found the information I needed.

Select Rating