Classification Module 6.1.1

A simple rule uses logic to determine if there is a match, and a match strength to be used to determine categorization. You can build more complex rules which have variations of the logic resulting in different match strengths. For example, a rule that locates identity numbers such as Social Security numbers could be written so that:

If there are 10 instances, the match strength is 1.0
If there are 5 instances, the match strength is 0.5
If there is 1 instance, the match strength is 0.25

In this case, it is important to consider how <if> blocks are processed. Once a match is found, no more conditions are processed. The condition with the strongest match strength should be the first <if> block, followed by subsequent match strengths in decreasing order.

The rule XML might look like this:

Working With Extractors

Editing Extractors

Categorization starts by comparing the text in a resource to a defined list of match criteria. This defined list is known as an extractor. You can build rules up using extractors. For example, if you want to locate credit card numbers, you may need to look for the following:

Numbers that have 13 to 16 digits
The number may have a space after every fourth digit
The number may have a dash after every fourth digit
The number may have no spaces or dashes

An extractor is where the requirements for a “credit card number” are defined. You can use multiple extractors together in a single rule. Using the above example, there can be numerical sequences that match your extractor that are not actually credit cards. To increase the accuracy of credit card identification, you could use a second extractor that looks for credit card providers such as Visa, MasterCard or American Express. Extractors are built separately, and then referenced in rules. This allows you to use reuse your extractors.

The following diagram shows a second example based on a name. In this case four different entities are used to build an extractor. The extractor is used in a rule, which, when applied to this resource, results in a match.

The following diagram shows how the elements of a rule work together to produce a rule match. For more information see Working With Extractors. In this rule, the name extractor looks for matches

Categorization calculation

There are two types of extractors:

Dictionaries (indicated as Eduction in the XML)
Dictionaries consist of pre-defined patterns. Sample dictionaries include numbers, names and so on. You can reference these dictionaries in an entity, and then refine them to match just the content in which you are interested.
Regular expressions (Indicated as RegEx in the XML)
You can use .NET regular expression syntax to develop the patterns that you want to locate in the extracted text. You can see examples of regular expressions used in the extractors included with Quest One Identity Manager. For more information, see Sample Text Extractors Details.

The following diagram shows the building blocks of a taxonomy, and how you can reuse extractors, grammars, entities and pattern matches:

Building blocks of a taxonomy with shading to show reused blocks

Extractors are included in rules. An extractor can be as simple as a reference to a library, or can be refined using grammars, entities and pattern matches:


Element	Description
Grammar	A grammar is a collection of entities that may compose the item of interest. For example, you can build a grammar specifically for numbers. This grammar can then be used in any entity that requires numbers, such as credit cards, bank accounts, license plates and so on. Grammars are indicated in the XML with the <grammar> tag, and can either be written inline, or referenced externally, as long as the <grammar> structure is followed in the external file.
Entities	An entity is a group of patterns. It can be either a pattern specified by referencing a library, or entities that you can create yourself in a custom grammar.
Pattern Matches	A pattern match describes the exact details of your match, if required. It refines the entity further, and can be useful to reduce the number of matches found by increasing the accuracy of the extractor. For example, the delimited credit card pattern indicates the exact pattern of numbers and dashes you are looking to match. This allows you to refine the extractor to eliminate matches that are not appropriate. Patterns can consist of one or more regular expressions, pre-defined entities (either included in Quest One Identity Manager or custom built), or combinations of the two. If your regular expression is complex, and does not require a library file or reference to custom grammar, consider writing a regular expression extractor. Patterns are indicated in the XML with the <pattern tag>. Each <pattern> tag can contain more than one element. Within one <pattern> tag, all elements must match. If there is more than one <pattern> tag, only one of them needs to match. See Sample Text Extractors Details for a list of the available patterns.

To view the extractors used in a taxonomy

Determine the ID of the taxonomy you want to export.
See Finding a Taxonomy or Category ID using PowerShell for details.
Run the Export-QTaxonomy cmdlet with the following parameters:
1. ServerAddress
  Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
2. TaxonomyId
3. IncludeEntityExtractors
  Set this to $true.
4. OutputFile
  Provide the path to a file to store the template XML.
  The taxonomy will be output to the screen if you skip this step.

Editing Extractors

If your rules are not finding the results you require, the extractors you are using may need refining. Refining extractors can help pinpoint the patterns you want to match, and reduce false positives. Extractors may be shared across multiple rules and categories, so when editing them, you should understand where they are in use. If you require similar versions of an extractor, copy the <TextExtractor> element within the XML file, provide a new name and unique ID for it, and then make your adjustments.

In order to export and edit an extractor, it must be referenced in a rule which is associated with a category. If needed, you can create a dummy rule and category for testing a new extractor you are introducing into your environment. Another way to do this is to work with the original sample extractors included in Quest One Identity Manager, and re-import them. Use caution doing this however, as you may accidentally overwrite changes that have been made to extractors in use in your environment.

A good starting point for understanding extractors is to view the extractors included in Quest One Identity Manager Data Governance Edition. You can see a variety of implementations in the XML. Typically, the file is located in the C:\Program Files\Quest Software\ QCS\Templates folder. Note that for ease of import, the file uses a similar XML structure to a template, but the taxonomy and category tags are just placeholders. You can also export any taxonomy that references these extractors. When you export a taxonomy, the extractor XML is self-contained, and does not use the placeholder tags.

The following table outlines the basic XML structure used to define an extractor:


Element	description
<Text extractor>	Sets the type of extractor (Eduction or RegEx) and the ID. The ID will be referenced in any rule that uses the extractor.
<Property id>	There are a number of different properties you can set on an extractor. Generally, these do not need to be edited. For an Eduction extractor, you can only match one entity (the eduction#match property id), and may need to use other entities as building blocks to get the desired results. This is done using a custom grammar (the eduction#grammarxml property id). For a RegEx extractor, the regular expression is defined using the regex#regular-expression property. There are no other elements required for a RegEx extractor.
<grammars>	The grammar element allows you to define custom grammars. You can combine grammars in an extractor by adding multiple <grammar> tags. Only Eduction extractors use this tag.
<grammar name>	Names the grammar you are using. This allows you to share entities between extractors. Entities have full path names that include the grammar name. A grammar named “number” may be a reference to a custom entity called “cc/delim” which details a delimited credit card. The full name of the entity, if referenced from another extractor, is “number/cc/delim”. Within the same extractor, the full path is not necessary.
<entity name>	Names the entity that is being built. You can use entities as building blocks for the entity that is referenced in the eduction#match property id. You can reference entities from other grammars by using the full path. You may want to edit the entities included in an extractor.
<pattern>	Identifies the exact patterns that make up the entity. You can use multiple patterns within an entity. You can use patterns that are included with Quest One Identity Manager, regular expressions, or a combination. You can also use other entities: this is how you combine patterns and grammars into the single entity that you reference from your eduction#match property.

To edit an extractors used in a taxonomy

Determine the ID of a taxonomy that has a rule that references the extractor you want to edit.
See Finding a Taxonomy or Category ID using PowerShell for details.
If you are introducing a new extractor to your environment but want to edit it before using it in your production environment, you can create an unpublished dummy category or taxonomy, create a rule referencing the new extractor and associate it with the category. Export the dummy taxonomy.
Run the Export-QTaxonomy cmdlet with the following parameters:
1. ServerAddress
  Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
2. TaxonomyId
3. IncludeEntityExtractors
  Set this to $true.
4. OutputFile
  Provide the path to a file to store the template XML.
  The taxonomy will be output to the screen if you skip this step.
Locate the desired extractor in the XML output.
Extractors are located at the top of the XML file.
Edit the XML as desired.
To copy an extractor to make a new version, copy the entire <TextExtractor> element, and paste below the extractor. Make sure you provide a new ID for the extractor.
Save the file.
To implement the change, run the Import-QTaxonomy cmdlet with the following mandatory parameters:
1. ServerAddress
  Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
2. TemplateXmlFile
  Provide the full path and name of the file containing the template.

Classification Module 6.1.1 - User Guide

Using Match Strength to Reduce the Number of Rules

Working With Extractors

Contents

Editing Extractors

Working with Categorized Resources

Contents

请选择您的产品：

为向您提供更好的服务，请填写'Purpose of your Chat'（联系目的）：

针对您的问题建议的解决方案

Classification Module 6.1.1 - User Guide

Using Match Strength to Reduce the Number of Rules

Working With Extractors

Contents

Editing Extractors

Working with Categorized Resources

Contents