A simple rule uses logic to determine if there is a match, and a match strength to be used to determine categorization. You can build more complex rules which have variations of the logic resulting in different match strengths. For example, a rule that locates identity numbers such as Social Security numbers could be written so that:
In this case, it is important to consider how <if> blocks are processed. Once a match is found, no more conditions are processed. The condition with the strongest match strength should be the first <if> block, followed by subsequent match strengths in decreasing order.
The rule XML might look like this:
Categorization starts by comparing the text in a resource to a defined list of match criteria. This defined list is known as an extractor. You can build rules up using extractors. For example, if you want to locate credit card numbers, you may need to look for the following:
An extractor is where the requirements for a “credit card number” are defined. You can use multiple extractors together in a single rule. Using the above example, there can be numerical sequences that match your extractor that are not actually credit cards. To increase the accuracy of credit card identification, you could use a second extractor that looks for credit card providers such as Visa, MasterCard or American Express. Extractors are built separately, and then referenced in rules. This allows you to use reuse your extractors.
The following diagram shows a second example based on a name. In this case four different entities are used to build an extractor. The extractor is used in a rule, which, when applied to this resource, results in a match.
The following diagram shows how the elements of a rule work together to produce a rule match. For more information see Working With Extractors. In this rule, the name extractor looks for matches
There are two types of extractors:
The following diagram shows the building blocks of a taxonomy, and how you can reuse extractors, grammars, entities and pattern matches:
Extractors are included in rules. An extractor can be as simple as a reference to a library, or can be refined using grammars, entities and pattern matches:
Element | Description |
Grammar | A grammar is a collection of entities that may compose the item of interest. For example, you can build a grammar specifically for numbers. This grammar can then be used in any entity that requires numbers, such as credit cards, bank accounts, license plates and so on. Grammars are indicated in the XML with the <grammar> tag, and can either be written inline, or referenced externally, as long as the <grammar> structure is followed in the external file. |
Entities | An entity is a group of patterns. It can be either a pattern specified by referencing a library, or entities that you can create yourself in a custom grammar. |
Pattern Matches | A pattern match describes the exact details of your match, if required. It refines the entity further, and can be useful to reduce the number of matches found by increasing the accuracy of the extractor. For example, the delimited credit card pattern indicates the exact pattern of numbers and dashes you are looking to match. This allows you to refine the extractor to eliminate matches that are not appropriate. Patterns can consist of one or more regular expressions, pre-defined entities (either included in Quest One Identity Manager or custom built), or combinations of the two. If your regular expression is complex, and does not require a library file or reference to custom grammar, consider writing a regular expression extractor. Patterns are indicated in the XML with the <pattern tag>. Each <pattern> tag can contain more than one element. Within one <pattern> tag, all elements must match. If there is more than one <pattern> tag, only one of them needs to match. See Sample Text Extractors Details for a list of the available patterns. |
To view the extractors used in a taxonomy
If your rules are not finding the results you require, the extractors you are using may need refining. Refining extractors can help pinpoint the patterns you want to match, and reduce false positives. Extractors may be shared across multiple rules and categories, so when editing them, you should understand where they are in use. If you require similar versions of an extractor, copy the <TextExtractor> element within the XML file, provide a new name and unique ID for it, and then make your adjustments.
In order to export and edit an extractor, it must be referenced in a rule which is associated with a category. If needed, you can create a dummy rule and category for testing a new extractor you are introducing into your environment. Another way to do this is to work with the original sample extractors included in Quest One Identity Manager, and re-import them. Use caution doing this however, as you may accidentally overwrite changes that have been made to extractors in use in your environment.
A good starting point for understanding extractors is to view the extractors included in Quest One Identity Manager Data Governance Edition. You can see a variety of implementations in the XML. Typically, the file is located in the C:\Program Files\Quest Software\ QCS\Templates folder. Note that for ease of import, the file uses a similar XML structure to a template, but the taxonomy and category tags are just placeholders. You can also export any taxonomy that references these extractors. When you export a taxonomy, the extractor XML is self-contained, and does not use the placeholder tags.
The following table outlines the basic XML structure used to define an extractor:
Element | description |
<Text extractor> | Sets the type of extractor (Eduction or RegEx) and the ID. The ID will be referenced in any rule that uses the extractor. |
<Property id> | There are a number of different properties you can set on an extractor. Generally, these do not need to be edited. For an Eduction extractor, you can only match one entity (the eduction#match property id), and may need to use other entities as building blocks to get the desired results. This is done using a custom grammar (the eduction#grammarxml property id). For a RegEx extractor, the regular expression is defined using the regex#regular-expression property. There are no other elements required for a RegEx extractor. |
<grammars> | The grammar element allows you to define custom grammars. You can combine grammars in an extractor by adding multiple <grammar> tags. Only Eduction extractors use this tag. |
<grammar name> | Names the grammar you are using. This allows you to share entities between extractors. Entities have full path names that include the grammar name. A grammar named “number” may be a reference to a custom entity called “cc/delim” which details a delimited credit card. The full name of the entity, if referenced from another extractor, is “number/cc/delim”. Within the same extractor, the full path is not necessary. |
<entity name> | Names the entity that is being built. You can use entities as building blocks for the entity that is referenced in the eduction#match property id. You can reference entities from other grammars by using the full path. You may want to edit the entities included in an extractor. |
<pattern> | Identifies the exact patterns that make up the entity. You can use multiple patterns within an entity. You can use patterns that are included with Quest One Identity Manager, regular expressions, or a combination. You can also use other entities: this is how you combine patterns and grammars into the single entity that you reference from your eduction#match property. |
To edit an extractors used in a taxonomy
© 2025 One Identity LLC. ALL RIGHTS RESERVED. 使用条款 隐私 Cookie Preference Center