The following example demonstrates how to implement a new taxonomy that will categorize a document based on a custom attribute.
Steps required for implementing a new taxonomy:
To implement a new taxonomy
- Gather information about categorization and classification requirements. For example, we need to create a taxonomy that categorizes files based on a “Confidentiality” property of a word document. The following tags are used: Secret, Confidential, Private, and Sensitive.
- Plan the taxonomy structure. For this example, the taxonomy will include the following levels of confidentiality: Secret, Private, Confidential, and Sensitive.
|
For this example, a document can only have one category applied (the highest level of confidentiality), therefore, the taxonomy will should be configured as strictly ordered. |
- Plan the text extraction patterns and rules.
- Create a text extractor to find this text through regular expression.
- Select Governed Data | Categorization Manager | Categorization.
- Select Regex to create a new text extractor.
- Enter a unique identifier. In this example, we will use Tag-secret.
- Enter a name and description. In this example, we will use Tag secret.
- Specify the patterns to match by entering the required regular expression. For this example, we will use: \[\[@Confidentiality="(?i:secret)"\]\].
- Create and enable a rule to define the criteria for categorization.
- Select the Rules tab.
- Select Create new rule.
- Enter a unique identifier for the rule. In this example, we will use Rule-Secret-match
- Enter a descriptive name. In this example, we will use Rule: Secret match.
- Select the required text extractor from the list. In this example, we will select the previously created text extractor: tag-secret.
- Click Save.
- Create and configure the new taxonomy.
- Select the Taxonomies tab.
- Click Create new taxonomy.
- For this example, we will name it Confidentiality.
- Modify any of the category parameters.
- Choose Allow this category to be used by the automated system, Mutually Exclusive, and Strictly Ordered.
- Click Save.
- Create and configure the required categories.
-
- Locate the Confidentiality taxonomy, and click Edit.
- Select the parent category of the new category.
- Click Add and add the required categories.
- For each category, enable the Allow this category to be used by the automated system option.
|
Ensure the categories are in the desired order when you create the taxonomy, as reordering categories within a taxonomy is not supported. |
- Associate required rules with each category.
-
- Select the category to edit.
- Click Modify Rule Associations.
- Click the Associate new rule, select the rule to associate, and set the required weight.
- Save your changes.
- Test rules and categories to ensure desired results. Run the following PowerShell command to make sure rule works as expected on a file: Get-QRuleResults -ServerAddress kdge:8723 -ResourcePath "\\kdge\C$\Word Document Tags\Secret.docx" -RuleId Rule-Secret-match >C:\Rule.xml.
- Make categories available for automated categorization by publishing the taxonomy and all its categories.
-
- Select Governed Data | Categorization Manager | Taxonomies.
- Locate the row containing the taxonomy with the desired category, and click Edit.
- Select the required category, and click Publish All.
- Test automated system to ensure desired results. Choose a folder with sample files and enable categorization of the folder.
|
It is recommended to perform an initial test on a small amount of files to ensure the automated system works as expected. If you send a lot of files to classification, you must wait for all files to be processed by the classification system. Once started, you cannot stop it until it completes. |
- To review the results of the categorization, select View Resources | Categorized resources and select a category/taxonomy and see the results: