Advanced text extractors enable you to define the patterns that you want to locate in the extracted text based on matches to entries in grammar\library (.ecr) files.
To create a text extractor using the web portal
Ensure that you have correctly entered the file name. A formatting issue, may cause the Classification workers to become unresponsive. |
To add an advanced text extractor with PowerShell
To edit an advanced text extractor using the web portal
To edit an advanced text extractor with PowerShell
You can write grammar xml using any XML editor that supports UTF-8 encoding using the format described here. Once you have written your grammar xml and entered at least one match criteria, you can associate it with a rule, add it to the system, and test it. When your rule is performing as desired, you can associate it with a category. If you plan to reuse rules or text extractor across more than one category, ensure you take this into account when developing them. You should not refine it in a way that meets the needs of one situation but not all others.
By understanding the grammar elements, and examining the sample text extractors included in Quest One Identity Manager Data Governance Edition, you can write your own text extractors, or edit existing ones. The following are the elements in the grammar XML.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE grammars SYSTEM "edk.dtd"> <grammars> <grammar name> <entity name> <pattern> <pattern> </grammars>
<!DOCTYPE grammars SYSTEM’edk.dtd> specifies that this is an advanced text extractor that will base its matches on entries in grammar\library (.ecr) files.
<grammars> Represent patterns to be evaluated against content. The grammar element allows you to define custom grammars. You can combine grammars in a text extractor by adding multiple <grammar> tags. Only Advanced text extractors use this tag.
For a list of grammars included in Quest One Identity Manager, see Sample Advanced Text Extractors Details. The <grammar> tag, and can either be written inline, or referenced externally, as long as the <grammar> structure is followed in the external file.
<grammars pattern> This is mandatory field that names the grammar you are using. This allows you to share entities between text extractors. Entities have full path names that include the grammar name. A grammar named “number” may be a reference to a custom entity called “cc/delim”, which details a delimited credit card. The full name of the entity, if referenced from another text extractor, is “number/cc/delim”. Within the same text extractor, the full path is not necessary.
Note: The grammar name cannot begin with a number.
<entity> Names the entity that is being built. You can reference entities from other grammars by using the full path.
<pattern> Identifies the exact patterns that make up the entity. You can use multiple patterns within an entity. You can use patterns that are included with Quest One Identity Manager, regular expressions, or a combination.
Each <pattern> tag can contain more than one element. Within one <pattern> tag, all elements must match. If there is more than one <pattern> tag, only one of them needs to match.
The following example illustrates how to write an Advanced text extractor in the web portal using XML, grammar files, and specific match criteria. In this case, the extractor is designed to find instances of US and Canadian companies.
To create this advanced text extractor
<![CDATA[ <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE grammars SYSTEM "edk.dtd"> <grammars> <grammar name="names_all"> <entity name="company/all" type="public"> <pattern>(?A^company/all/engca)</pattern> <pattern>(?A^company/major_company/engus)</pattern> <pattern>(?A^company/fortune_500_2011/engus)</pattern> <pattern>(?A^company/forbes_largest_private_companies_2010/engus)</pattern> </entity> </grammar> </grammars>]]>
Before you make your category available to the classification system, you should test that the rules and category are behaving as desired. You can use the following diagnostics:
© 2025 One Identity LLC. ALL RIGHTS RESERVED. Terms of Use Privacy Cookie Preference Center