Personally Identifiable Information (PII)

Overview

With the advent of requirement for data security on any data elements which qualify as personally identifiable information (PII), it is common to want to identify, catalog, manage and report on the PII metadata within the data cataloging and governance system like this one. In fact, this product has a wealth of features designed to support this requirement in a number of ways. In particular, this scenario will walk you through the use of:

Data class definition, profiling and auto-tagging
The glossary for PII element terminology, standardization/normalization, and ontology or semantic architecture
Semantic usage and definition reporting

For the purposes of addressing PII requirements.

PII Data classes

MetaKarta is delivered with a fairly robust set of data classes already defined. Many of these may be considered PII. First thing we will do is identify which ones should be considered PII.

Create a Compound PII Data class

Compound data classes may be defined as the union of other data-detected and/or metadata-detected data classes. In this case, PII will be a compound data class that consists of several data and metadata-detected data classes which are categorized as personally identifiable information and their data should be hidden by default.

Go to MANAGE > Data classes.

Looking at the list we have at least five different types which could be PII:

Address Line
Gender
Last Name
US Postal Code
US Social Security Number

There may be others, but we will start with these for this exercise.

One of the powerful features of data classes is the ability to enable auto-tagging of elements and auto hiding at the same time. In order to take advantage of these features, we will create a new PII data class which is a compound type of the above PII related data classes we listed.

Click +Add and enter the following:

The data class is a Compound data class consisting of the five types in the earlier list.

Specify the Confidential as the DEFAULT SENSISTIVITY and associate several PII type data classes in the COMPOUND TYPES selection box.

Add all the data classes identified above.

Click SAVE.

The Hide Data setting is assigned for this sensitivity label (e.g., Classified). This way, when a data element is tagged with the PII data class, its Sensitivity Label with be Classified and thus its data will also be hidden from casual users who do not have the Data Management capability object role assignment.

Harvest with the Data class

Go to MANAGE > Configuration, select the Data Lake model and go to the Import Options tab. Note, this model is defined for Data Profiling and Sampling.

Click Import and be sure to check FULL SOURCE IMPORT INSTEAD OF INCREMENTAL to ensure that the cached copy is not simply reused.

Once the import has completed and the data profiled and samples (check the Logs tab), We will see what was profiling and auto tagged (and thus auto hidden).

Analyze the Auto Tagging and Hiding Results

Go to WORKSHEETS > File > Fields.

Add the Data Classifications column to the Grid view and Filter on Data Classifications = PII:

Here is a list of all the auto tagged PII fields.

Go to the object page for (click) SSN.

This field was tagged as both US Social Security Number and PII, as PII is a compound of several types including the other.

If we sign in as a casual user or even a user with the Data Viewer capability object role assignment (e.g., Dan), we cannot see the profiling information:

Demonstrating the auto hiding feature.

Data classes in the Glossary

Defining a compound data class like PII composed of different subtypes is very important to support auto tagging and auto hiding. In addition, it can be important to build up an ontology in one or more business glossaries with terminology that reflects (and is linked to) the significant PII data classes. In this way, you will be able to use the semantic usage and definition reporting on terminology and data elements already auto tagged as well as those you manually tag, classify or semantically map.

Define PII Terminology

Here we will create the specific terms in the glossary from which we will build an ontology.

Create a new term in the Finance glossary named GDPR and create a General Data Protection Regulation term:

And create a Personally Identifiable Information (PII) term:

Define PII Ontology

Then create an association of More General to More Specific between GDPR and PII:

Then, you may create terms for all the other concepts associated with PII, such as email Address, Address, US Social Security Number, etc.:

Create the associations of More General to More Specific between them. In the Semantic Mapping tab of the object page for GDPR we then have:

Link PII Terminology to Data classes

Go to MANAGE > Data classes again and click on US Social Security Number. Then, assign the US Social Security Number term to this data class.

Now, return to In the Semantic Mapping tab of the object page for GDPR we then have:

Linking all the other terminology to data classes, we have:

And suddenly a diagram is less useful.

Click the List tab on the left to get a list of the semantic usage elements:

And, one may also see Business Names and Definitions now (based upon the terminology):

And anything that is downstream in terms of pass-through lineage.