A simple rule uses logic to determine if there is a match, and a match strength to be used to determine categorization. You can build more complex rules with variations of the logic, resulting in different match strengths. For example, a rule that locates identity numbers such as Social Security numbers could be written so that:
In this case, it is important to consider how <if> blocks are processed. Once a match is found, no more conditions are processed. The condition with the strongest match strength should be the first <if> block, followed by subsequent match strengths in decreasing order.
The rule XML might look like this:
<if>
<find id=”Extractors.National.Identity.All”mincount=”10”/>
<match strengrh=”1”/>
</if>
<if>
<find id=”Extractors.National.Identity.All”mincount=”5”/>
<match strengrh=”0.5”/>
</if>
<if>
<find id=”Extractors.National.Identity.All”mincount=”2”/>
<match strengrh=”0.25”/>
</if>
Categorization starts by comparing the text in a resource to a defined list of match criteria. This defined list is known as a text extractor. You can build rules using text extractors. For example, if you want to locate credit card numbers, you may need to look for the following:
A text extractor is where the requirements for a “credit card number” are defined. You can use multiple text extractors together in a single rule. Using the above example, there can be numerical sequences that match your text extractor that are not actually credit cards. To increase the accuracy of credit card identification, you could use a second text extractor that looks for credit card providers such as Visa, MasterCard or American Express. Text extractors are built separately, and then referenced in rules. This allows you to reuse your text extractors.
The text extractors are used in a rule, which, when applied to a resource may result in a match.
The following diagram shows the building blocks of a taxonomy, and how you can reuse text extractors, grammars, entities, and pattern matches:
A good starting point for understanding text extractors is to view the text extractors included in Quest One Identity Manager Data Governance Edition. You will be able to see a variety of implementations of the xml through the Categorization Manager.
You can work with text extractors using the following methods:
For details on the various types of available text extractors see:
Through the web portal, you can quickly view the text extractors that are included in the system and available for use within rules. At a glance you can see the text extractor ID, name, description, type, and associated rules.
Before you can remove a text extractor from the system, you must remove any associations to rules. You can see which rules are associated with each text extractor and remove them through the web portal or through PowerShell. |
To view a list of text extractors in the classification system using the web portal
To view a list of rules that have been associated with a given text extractor
To view a list of all text extractors with PowerShell
To view a regular expression text extractor with PowerShell
To view a dictionary text extractor with PowerShell
To view an advanced text extractor with PowerShell
To view the text extractors used in a specific taxonomy with PowerShell
To view the list of rules that have been associated with a given text extractor with PowerShell
Creating new text extractors for the classification system is a multi-level process that allows you to refine the match criteria at each step.
When creating a text extractor:
If you are introducing a new text extractor to your environment and want to test it before using it in your production environment, you can create an unpublished test category or taxonomy, create a rule referencing the new text extractor and associate it with the category. Once you have associated it with a category, you can test it using the Get-QAllRuleResults command to view the results of all rules in the system when run against a test file.
For specific details see:
© 2025 One Identity LLC. ALL RIGHTS RESERVED. 使用条款 隐私 Cookie Preference Center