Examples

Your Pentaho distribution includes several sample transformations and datasets in the design-tools/data-integration/samples/transformations/discover-metadata-from-textfile directory.

The following code is a portion of the Sample1.txt file found in the directory:

policyID,county,eq_site_limit,eq_site_deductible,point_longitude
710400,CLAY COUNTY,0,0,-81.71624
703001,CLAY COUNTY,0,0,-81.706865
352792,CLAY COUNTY,0,0,-81.718452
717603,CLAY COUNTY,0,0,-81.718452
937659,SUWANNEE COUNTY,0,0,-82.926659
294022,SUWANNEE COUNTY,0,0,-82.926659
410500,SUWANNEE COUNTY,0,0,-82.926659
524433,SUWANNEE COUNTY,218475,0,-82.926155
972562,SUWANNEE COUNTY,0,0,-82.933777

When the step is run, the file is scanned to determine a consistent number of fields using the tab character, then the semi-colon, then the comma (default). When the Header column name detection strategy is set to First possible line containing only strings, the step identifies the first row as the header row. The following table shows the column names and data types.

column name

data type

policylD2

Integer

county

String

eq_site_limit

BigNumber

eq_site_deductible

Integer

point_longitude

BigNumber

If any of the fields in the first row are numbers or dates, the row is considered data, which means there is no header row in this example.

PreviousDelimiter and data type detection rules NextData lineage

Last updated 23 days ago

Was this helpful?