Manage reference data sets
Reference data sets contain relatively static, unchanging data values that are commonly used by an organization. In Pentaho Data Catalog, you can create reference data sets that contain valid data values for your organization to reference.
Some examples of common reference data sets include:
Branch Numbers
Country codes
Currencies
Exchange codes
Language codes
Measurement units
Postal codes
Product codes
Regions
Transaction codes
Add a category for reference data
Add categories to organize reference data sets into groups, enhancing the management and navigation of reference data sets.
Note: You must create at least one category before you can create a reference date set.
Perform the following steps to create a new category for reference data sets:
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, click Actions > Add New Category.
In the Create Category dialog box, enter a Category Name and click Create.
The new category is added to the Reference Data menu.
Add one or more reference data sets to the new category.
Add a reference data set
Add reference data sets to Data Catalog to facilitate the classification and categorization of data across the organization.
To add a reference data set to Data Catalog, complete the following tasks, in order:
Create a reference data set
Create a reference data set to categorize enterprise data and maintain organizational consistency.
If you need a new category to contain the reference data set, you must create the category before creating the reference data set. For instructions about creating a new category, see Add a category for reference data.
Perform the following steps to create a reference data set:
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, click Actions > Add New DataSet.
The Create DataSet dialog box opens.
In the DataSet Name box, Enter a DataSet Name.
In the Parent list, select the category or reference data set that you want to be the parent of the new reference data set.
Note: Select a reference data set as a parent only for organizational purposes. Reference data sets do not inherit any properties or information from parent reference data sets.
Click Create.
A new, empty reference data set is created and the Summary tab for the new reference data set opens.
In the Description box, enter a description for the reference data set.
In the Purpose box, enter an explanation of the purpose for the reference data set.
(Optional) In the Properties box, update one or more of the following properties:
Sensitivity
Unknown (default)
Low
Medium
High
Status
Info (default)
Valid
Warning
Expired
Version
1.0 (default) Note: The version number can only be increased.
9. Click Save.
The reference data set is created.
Add a schema to define and control the type of information contained in the reference data set.
Add schema to a reference data set
Add schema for a reference data set so that you can maintain data quality by standardizing and controlling what data values can be entered in the reference data set.
For example, you can add schema to specify that the value for a type of information is selected from a pre-defined list, and then specify the list of valid values.
CAUTION: A schema can be added that has the same values in all columns as an existing schema, but has a unique identifier assigned to it in the system. If the duplicate schema are used in different parts of an organization and one schema is updated, then the reference data values that the schema is meant to control might no longer be consistent across the organization. Verify that a schema with all the same values does not already exist before adding a new schema.
Tip: You can also import reference data schema and values in a CSV file or from a Data Catalog table by clicking Import to open the Import Reference Data wizard.
Perform the following steps to add schema to a reference data set:
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.
Click the Schema tab.
In the Reference Data Schema table, click + Add Row.
In the new table row, update the following fields:
Column Name
A column name that represents the type of data that the schema controls.
Data Type
The type of data that can be entered as a value. Data Type options include:
Text
String
Integer
Float
Binary
Length
The number of characters that can be entered for the value.
Input Type
The input method that can be used to enter a value. Input Type options include:
Pre-defined
Free text
Valid Value
A comma-separated list of values that are valid as input. You must update the Valid Value field when the schema Input Type is Pre-defined.For example, to create a list of colors that a user can select from, you might enter the following list of valid values: red, yellow, blue
.
Editable
A switch that can be toggled to specify whether the schema can be edited. Editable options are no and yes. You must have the Admin user role to specify whether a schema can be edited.
6. On the right side of the new table row, click Save.
The new schema is saved to the Reference Data Schema table and is added as a column to the Reference Data Values table on the Data Values tab.
Add values to a reference data set
Populate a reference data set with values to serve as authoritative lookup references for fields that are governed by the reference data set.
CAUTION: A reference data value can be added that has the same values in all columns as an existing reference data value, but has a unique identifier assigned to it in the system. If the duplicate values are used in different parts of an organization and one value is updated, then the reference data is no longer consistent across the organization. Verify that a reference data value with all the same values does not already exist before adding a new reference data value.
Tip: You can also import reference data schema and values in a CSV file or from a Data Catalog table by clicking Import to open the Import Reference Data wizard.
Perform the following steps to add values to a reference data set:
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.
Click the Data Values tab.
Click + Add Row.
Note: If the value already exists in a row that is disabled, you can re-enable that row by toggling the Status switch to the Enabled position.
A row is added to the Reference Data Values table. Columns in the table correspond to the schemas that are defined on the Schema tab.
Update the new table row with values that adhere to the schema that controls each column.
On the right side of the new table row, click Save.
The new values are saved to the Reference Data Values table.
If you made multiple modifications to the Reference Data Values table, consider committing a new version of the reference data set.
Update a reference data set
Update a reference data set to add or remove information, modify properties, or commit a new version of the reference data set.
You can update a reference data set by completing one or more of the following tasks:
Add a business term to a reference data set
Add a business term to a reference data set to clarify the context for using the data and to enhance organizational understanding of the data.
Perform the following steps to add a business term to a reference data set:
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.
Click the Business Terms tab.
In the Business Terms tab, click Add Terms.
The Add Business Terms dialog box opens.
Navigate to the business term that you want to add to the reference data set and select it.
Click Add.
The business term is added to the reference data set and appears in the Business Terms table.
Update properties of a reference data set
Update the properties of a reference data set to reflect changes in data sensitivity and data status or to increment the version number.
Perform the following steps to update the properties of a reference data set:
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.
Update one or more of the following properties by clicking the pencil icon next to the property:
Sensitivity
Unknown (default)
Low
Medium
High
Status
Info (default)
Valid
Warning
Expire
Version
1.0 (default)Note: The version number can only be increased.
4. Click Save.
The updates to the properties of the reference data set are saved.
Commit a new version of a reference data set
Consider committing a new version of a reference data set after you make multiple modifications to the reference data set.
Note: You can also increase the version number in the Properties pane of the Summary tab.
Perform the following steps to commit a new version of a reference data set:
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.
Click the Data Values tab.
Click Commit.
The Please Confirm dialog box opens.
Confirm that you want to commit a new version of the reference data set by completing one of the following actions:
Keep the version number that is automatically generated by Data Catalog. The automatically generated version number increments the minor version number by 1. For example, for the version number 1.0, the automatically generated number is 1.1.
Enter a new version number.
Click Confirm.
The new version of the reference data set is committed.
You can verify that the version number changed in the Data Set pane at the top of the page.
Delete a schema from a reference data set
Delete a schema that you do not want your organization to use in the reference data set.
Important: Deleted schema cannot be restored.
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to delete, and then select the reference data set.
Click the Schema tab.
Navigate to the schema that you want to delete from the Reference Data Schema table.
On the right side of the table row with the schema, click the trash can icon.
The Confirm Deletiondialog box opens.
Click Confirm.
The schema is deleted from the reference data set and cannot be restored.
Remove values from a reference data set
Remove values from a reference data set that you do not want your organization to use for the type of data defined in the reference data set.
Important: Deleted values cannot be restored.
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.
Click the Data Values tab.
Navigate to the value that you want to remove from the Reference Data Values table.
Remove the value by completing one of the following actions on the right side of the table row with the value:
To disable the value, click the pencil icon, and then toggle the Status switch to the Disabled position.
To delete the value, click the trash can icon.
The value is removed from the reference data set.
Comment on a reference data set
Comment on a reference set so that you can collaborate on the contents or structure of the reference data set with other members of your organization.
Perform the following steps to comment on a reference data set:
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.
Click the Comments tab.
(Optional) If the comment box is not visible, scroll to the bottom of the page.
In the comment box, enter a comment.
(Optional) Edit the comment by using one or more of the following options in the comment box toolbar:
Format the comment text by using text formatting options.
Attach a picture, video, or link by clicking the associated icon.
Apply a code tag to text by selecting the text and clicking the code icon.
Click Submit.
The comment is posted to the top of the Comments tab.
(Optional) To create a Jira Issue related to the comment, click + Create Jira Issue.
(Optional) To edit the comment, click the pencil icon.
(Optional) To delete the comment, click the trash can icon.
View activity for a reference data set
View activity to monitor changes to a reference data set, including the timing of the updates, the details of the modifications, and the systems or individuals involved in making the changes.
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to review, and then select the reference data set.
Click the Activity tab.
In the Activity Status table, navigate to the activity that you want to review.
Tip: You can use the search and filter functions to find a specific activity.
On the right side of the table row with the activity, click View.
The Details window opens with information about modifications to the reference data set.
Export a reference data set
Export a reference data set for external use, analysis, or archiving.
For example, you might import the reference data set into other systems for downstream comparisons. You can also edit the reference data in a spreadsheet and import the edited data back into Data Catalog.
Perform the following steps to export a reference data set:
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to export, and then select the reference data set.
In the Data Set pane, click Actions > Export.
The reference data set is exported as a CSV file and downloaded to your default download directory.
Delete a reference data set
Delete a reference data set to permanently remove it from Data Catalog.
Important: Deleted reference data sets cannot be restored.
Perform the following steps to delete a reference data set:
You must have the Admin user role to delete a reference data set.
Click Reference Data in the left navigation menu.
The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to delete.
Click the trash can icon next to the reference data set.
The Confirm Deletion dialog box opens.
Click Confirm.
The reference data set is deleted and cannot be restored.
Last updated
Was this helpful?