Classification Module 6.1.3

Pattern Operators
Operator	Matched Pattern
\	Quote the next metacharacter.
^	Match the beginning of a line.
$	Match the end of a line.
.	Match any character (except newline).
\|	Alternation.
()	Used for grouping to force operator precedence.
[xy]	The character x or y.
[x-z]	The range of characters between
[^z]	Any character except z.

Pattern Quantifiers
Operator	Matched Pattern
*	Match 0 or more times.
+	Match 1 or more times.
?	Match 0 or 1 times.
{n}	Match exactly n times.
{n,}	Match at least n times.
{n,m}	Match at least n times, but no more than m times.

Pattern Metacharacters
Operator	Matched Pattern
\t	Match tab.
\n	Match newline.
\r	Match return.
\f	Match formfeed.
\a	Match alarm (bell, beep, and so on).
\e	Match escape.
\v	Match vertical tab.
\021	Match octal character (in this example, 21 octal).
\xF0	Match hex character (in this example, F0 hex).
\x{263a}	Match wide hex character (Unicode).
\w	Match word character (alphanum plus '_').
\W	Match non-word character.
\s	Match whitespace character. This metacharacter also includes 
\S	Match non-whitespace character.
\d	Match digit character.
\D	Match non-digit character.
\b	Match word boundary.
\B	Match non-word boundary.
\A	Match start of string (never match at line breaks).
\Z	Match end of string (Never match at line breaks. Only match at the end of the final buffer of text submitted for matching.)

Pattern Extensions
Operator	Matched Pattern
(?A^entity)	Match a previously defined entity, which is then referenced by the new entity. Referencing an entity minimizes the size and memory usage of the grammar, but decreases performance. The performance impact can vary from unnoticeable to significant depending on the size and structure of the grammar.

Appendix D: Creating a Taxonomy to Classify Data

Creating a Custom Taxonomy for Automatic Classification

Creating a Custom Taxonomy for Automatic Classification

The following example demonstrates how to implement a new taxonomy that will categorize a document based on a custom attribute.

Steps required for implementing a new taxonomy:

To implement a new taxonomy

Gather information about categorization and classification requirements. For example, we need to create a taxonomy that categorizes files based on a “Confidentiality” property of a word document. The following tags are used: Secret, Confidential, Private, and Sensitive.
Plan the taxonomy structure. For this example, the taxonomy will include the following levels of confidentiality: Secret, Private, Confidential, and Sensitive.

For this example, a document can only have one category applied (the highest level of confidentiality), therefore, the taxonomy will should be configured as strictly ordered.
Plan the text extraction patterns and rules.
To see how the properties look in the plain text run the “Get-QResourceTextExtracted” PowerShell command with a test file that contains the property.

Create a text extractor to find this text through regular expression.
1. Select Governed Data | Categorization Manager | Categorization.
2. Select Regex to create a new text extractor.
3. Enter a unique identifier. In this example, we will use Tag-secret.
  The identifier is used by the classification system. Once created, you cannot change this value. It is recommended you use a naming convention that reflects the purpose of the text extractor.
4. Enter a name and description. In this example, we will use Tag secret.
  The name and descriptions are useful when building rules to ensure the proper text extractors are being included, so provide all necessary information.
5. You are ready to define the settings.
6. Specify the patterns to match by entering the required regular expression. For this example, we will use: \[\[@Confidentiality="(?i:secret)"\]\].
  You can select the Check Syntax button to ensure the regular expression is properly formatted.
Create and enable a rule to define the criteria for categorization.
1. Select the Rules tab.
2. Select Create new rule.
3. Enter a unique identifier for the rule. In this example, we will use Rule-Secret-match
  The identifier is used by the classification system. Once created, you cannot change this value. It is recommended you use a naming convention that reflects the purpose of the rule.
4. Enter a descriptive name. In this example, we will use Rule: Secret match.
5. You are ready to define the settings.
6. Select the required text extractor from the list. In this example, we will select the previously created text extractor: tag-secret.
7. Click Save.
Create and configure the new taxonomy.
1. Select the Taxonomies tab.
2. Click Create new taxonomy.
3. For this example, we will name it Confidentiality.
  The name will appear anywhere the taxonomy is shown.
4. Modify any of the category parameters.
5. Choose Allow this category to be used by the automated system, Mutually Exclusive, and Strictly Ordered.
6. Click Save.
Create and configure the required categories.
1. Locate the Confidentiality taxonomy, and click Edit.
2. Select the parent category of the new category.
3. Click Add and add the required categories.
4. For each category, enable the Allow this category to be used by the automated system option.
  
  Ensure the categories are in the desired order when you create the taxonomy, as reordering categories within a taxonomy is not supported.
Associate required rules with each category.
1. Select the category to edit.
2. Click Modify Rule Associations.
3. Click the Associate new rule, select the rule to associate, and set the required weight.
4. Save your changes.
Test rules and categories to ensure desired results. Run the following PowerShell command to make sure rule works as expected on a file: Get-QRuleResults -ServerAddress kdge:8723 -ResourcePath "\\kdge\C$\Word Document Tags\Secret.docx" -RuleId Rule-Secret-match >C:\Rule.xml.
In the resulted xml file you can see for what extractor matched text was found in the document:
and what category would be assigned to the resource by automated system:
Make categories available for automated categorization by publishing the taxonomy and all its categories.
1. Select Governed Data | Categorization Manager | Taxonomies.
2. Locate the row containing the taxonomy with the desired category, and click Edit.
3. Select the required category, and click Publish All.

Test automated system to ensure desired results. Choose a folder with sample files and enable categorization of the folder.

It is recommended to perform an initial test on a small amount of files to ensure the automated system works as expected. If you send a lot of files to classification, you must wait for all files to be processed by the classification system. Once started, you cannot stop it until it completes.

To review the results of the categorization, select View Resources | Categorized resources and select a category/taxonomy and see the results:

Classification Module 6.1.3 - User Guide

Regular Expressions

Appendix D: Creating a Taxonomy to Classify Data

Contents

Creating a Custom Taxonomy for Automatic Classification

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem

Classification Module 6.1.3 - User Guide

Regular Expressions

Appendix D: Creating a Taxonomy to Classify Data

Contents

Creating a Custom Taxonomy for Automatic Classification