LogoLogo
CtrlK
Try Pentaho Data Integration and Analytics
10.2 Data Integration
  • Pentaho Documentation
  • Pentaho Data Integration
    • Starting the PDI client
    • Use the PDI client perspectives
    • Customize the PDI client
  • Use a Pentaho Repository in PDI
    • Create a connection in the PDI client
    • Connect to a Pentaho Repository
    • Manage repositories in the PDI client
    • Unsupported repositories
    • Use the Repository Explorer
      • Access the Repository Explorer window
      • Create a new folder in the repository
      • Open a folder, job, or transformation
      • Rename a folder, job, or transformation
      • Delete a folder, job, or transformation
      • Move objects
      • Restore objects
      • Use Pentaho Repository access control
      • Use version history
  • Scheduler perspective in the PDI client
    • Schedule a transformation or job
    • Edit a scheduled run of a transformation or job
    • Stop a schedule from running
    • Enable or disable a schedule from running
    • Delete a scheduled run of a transformation or job
    • Refresh the schedule list
  • Streaming analytics
    • Get started with streaming analytics in PDI
    • Data ingestion
    • Data processing
  • Data Integration perspective in the PDI client
    • Basic concepts of PDI
    • Work with transformations
      • Create a transformation
      • Open a transformation
      • Rename a folder
      • Save a transformation
      • Run your transformation
        • Run configurations
          • Select an Engine
        • Options
        • Parameters and Variables
        • Analyze your transformation results
      • Stop your transformation
      • Configure transformation properties
      • Use the Transformation menu
    • Work with jobs
      • Create a job
      • Open a job
      • Rename a folder
      • Save a job
      • Run your job
        • Run configurations
          • Pentaho engine
        • Options
        • Parameters and variables
      • Stop your job
      • Configure job properties
      • Use the Job menu
    • Add notes to transformations and jobs
      • Create a note
      • Edit a note
      • Reposition a note
      • Delete a note
    • Connecting to Virtual File Systems
      • Before you begin
        • Access to Google Cloud
        • Access to HCP REST
        • Access to Microsoft Azure
      • Create a VFS connection
      • Edit a VFS connection
      • Delete a VFS connection
      • Access files with a VFS connection
      • Pentaho address to a VFS connection
      • Create a VFS metastore
        • Enable a VFS metastore
        • Metastore configuration
      • Steps and entries supporting VFS connections
      • VFS browser
        • Before you begin
          • Access to a Google Drive
        • Access files with the VFS browser
        • Supported steps and entries
        • Configure VFS options
    • Logging and performance monitoring
      • Set up transformation logging
      • Set up job logging
      • Logging levels
      • Monitor performance
        • Sniff Test tool
        • Monitoring tab
        • Use performance graphs
      • PDI performance tuning tips
      • Logging best practices
    • Advanced topics
      • Understanding PDI data types and field metadata
        • Data type mappings
          • Using the correct data type for math operations
        • Using the fields table properties
          • Applying formatting
          • Applying calculations and rounding
        • Output type examples
      • PDI run modifiers
        • Arguments
        • Parameters
          • VFS properties
        • Variables
          • Environment variables
          • Kettle Variables
            • Set Kettle variables in the PDI client
            • Set Kettle variables manually
            • Set Kettle or Java environment variables in the Pentaho MapReduce job entry
            • Set the LAZY_REPOSITORY variable in the PDI client
          • Internal variables
      • Use checkpoints to restart jobs
      • Use the SQL Editor
      • Use the Database Explorer
      • Transactional databases and job rollback
        • Make a transformation database transactional
        • Make a job database transactional
      • Web services steps
  • Advanced Pentaho Data Integration topics
    • PDI and Hitachi Content Platform (HCP)
    • Hierarchical data
      • Hierarchical data path specifications
    • PDI and Snowflake
      • Snowflake job entries in PDI
    • Copybook steps in PDI
      • Copybook transformation steps in PDI
      • Metadata discovery
    • Work with the Streamlined Data Refinery
      • How does SDR work?
        • App Builder, CDE, and CTools
          • Get started with App Builder
          • Community Dashboard Editor and CTools
      • Install and configure the Streamlined Data Refinery
        • Installing and configuring the SDR sample
          • Install Pentaho software
          • Download and install the SDR sample
        • Configure KTR files for your environment
        • Clean up the All Requests Processed list
        • Install the Vertica JDBC driver
        • Use Hadoop with the SDR
        • App endpoints for SDR forms
        • App Builder and Community Dashboard Editor
          • Get started with App Builder
          • Community Dashboard Editor and CTools
      • Use the Streamlined Data Refinery
        • How to use the SDR sample form
          • Edit the Movie Ratings - SDR Sample form
        • Building blocks for the SDR
          • Use the Build Model job entry for SDR
            • Create a Build Model job entry
            • Select existing model options
            • Variables for Build Model job entry
          • Using the Annotate Stream step
            • Use the Annotate Stream step
              • Creating measures on stream fields
                • Create a measure on a stream field
              • Creating attributes
                • Create an attribute on a field
              • Creating link dimensions
                • Create a link dimension
                • Create a dimension key
            • Creating annotation groups
              • Create an annotation group for sharing with other users
              • Create an annotation group locally
            • Metadata injection support
          • Using the Shared Dimension step for SDR
            • Create a shared dimension
            • Create a dimension key in Shared Dimension step
            • Metadata injection support
          • Using the Publish Model job entry for SDR
            • Use the Publish Model job entry
    • Use Command Line Tools to Run Transformations and Jobs
      • Startup script options
      • Pan Options and Syntax
      • Pan Status Codes
      • Kitchen Options and Syntax
      • Kitchen Status Codes
      • Import KJB or KTR Files From a Zip Archive
      • Connect to a Repository with Command-Line Tools
      • Export Content from Repositories with Command-Line Tools
    • Using Pan and Kitchen with a Hadoop cluster
      • Using the PDI client
      • Using the Pentaho Server
    • Use Carte Clusters
      • About Carte Clusters
      • Set up a Carte cluster
        • Carte cluster configuration
          • Configure a static Carte cluster
          • Configure a Dynamic Carte Cluster
            • Configure a Carte Master Server
            • Configure Carte slave servers
            • Tuning Options
          • Configure Carte servers for SSL
          • Configure Carte servers for JAAS
          • Change Jetty Server Parameters
            • In the Carte Configuration file
            • In the Kettle Configuration file
        • Initialize Slave Servers
        • Create a cluster schema
        • Run transformations in a cluster
      • Schedule Jobs to Run on a Remote Carte Server
      • Stop Carte from the Command Line Interface or URL
      • Run Transformations and Jobs from the Repository on the Carte Server
    • Connecting to a Hadoop cluster with the PDI client
      • Audience and prerequisites
      • Using the pre-installed Apache Hadoop driver
      • Using the Apache Vanilla Hadoop driver
      • Install a driver for the PDI client
        • Configure CDP Public Cloud cluster with the PDI client
      • Adding a cluster connection
        • Import a cluster connection
        • Manually add a cluster connection
        • Add security to cluster connections
          • Specify Kerberos security
        • Test a cluster connection
      • Managing Hadoop cluster connections
        • Edit Hadoop cluster connections
        • Duplicate a Hadoop cluster connection
        • Delete a Hadoop cluster connection
      • Connect other Pentaho components to a cluster
    • Partitioning data
      • Get started
        • Partitioning during data processing
        • Understand repartitioning logic
        • Partitioning data over tables
      • Use partitioning
        • Use data swimlanes
        • Rules for partitioning
      • Partitioning clustered transformations
      • Learn more
    • Pentaho Data Services
      • Creating a regular or streaming Pentaho Data Service
        • Data service badge
      • Open or edit a Pentaho Data Service
      • Delete a Pentaho Data Service
      • Test a Pentaho Data Service
        • Run a basic test
        • Run a streaming optimization test
        • Run an optimization test
        • Examine test results
          • Pentaho Data Service SQL support reference and other development considerations
            • Supported SQL literals
            • Supported SQL clauses
            • Other development considerations
      • Optimize a Pentaho Data Service
        • Apply the service cache optimization
          • How the service cache optimization technique works
          • Adjust the cache duration
          • Disable the cache
          • Clear the cache
        • Apply a query pushdown optimization
          • How the query pushdown optimization technique works
          • Add the query pushdown parameter to the Table Input or MongoDB Input steps
          • Set up query pushdown parameter optimization
          • Disable the query pushdown optimization
        • Apply a parameter pushdown optimization
          • How the parameter pushdown optimization technique works
          • Add the parameter pushdown parameter to the step
          • Set up parameter pushdown optimization
        • Apply streaming optimization
          • How the streaming optimization technique works
          • Adjust the row or time limits
      • Publish a Pentaho Data Service
      • Share a Pentaho Data Service with others
        • Share a Pentaho Data Service with others
        • Connect to the Pentaho Data Service from a Pentaho tool
        • Connect to the Pentaho Data Service from a Non-Pentaho tool
          • Step 1: Download the Pentaho Data Service JDBC driver
            • Download using the PDI client
            • Download manually
          • Step 2: Install the Pentaho Data Service JDBC driver
          • Step 3: Create a connection from a non-Pentaho tool
        • Query a Pentaho Data Service
          • Example
      • Monitor a Pentaho Data Service
    • Data lineage
      • Sample use cases
      • Architecture
      • Setup
      • API
      • Steps and entries with custom data lineage analyzers
      • Contribute additional step and job entry analyzers to the Pentaho Metaverse
        • Examples
          • Create a new Maven project
          • Add dependencies
          • Create a class which implements IStepAnalyzer
          • Create the Blueprint configuration
          • Build and test your bundle
          • See it in action
        • Different types of step analyzers
          • Field manipulation
          • External resource
          • Connection-based external resource
          • Adding analyzers from existing PDI plug-ins (non-OSGi)
    • Use the Pentaho Marketplace to manage plugins
      • View installed plugins and versions
      • Install plugins
  • Troubleshooting possible data integration issues
    • Troubleshooting transformation steps and job entries
      • 'Missing plugins' error when a transformation or job is opened
      • Cannot execute or modify a transformation or job
      • Step is already on canvas error
    • Troubleshooting database connections
      • Unsupported databases
      • Database locks when reading and updating from a single table
      • Force PDI to use DATE instead of TIMESTAMP in Parameterized SQL queries
      • PDI does not recognize changes made to a table
    • Jobs scheduled on Pentaho Server cannot execute transformation on remote Carte server
    • Cannot run a job in a repository on a Carte instance from another job
    • Troubleshoot Pentaho data service issues
    • Kitchen and Pan cannot read files from a ZIP export
    • Using ODBC
    • Improving performance when writing multiple files
    • Snowflake timeout errors
    • Log table data is not deleted
  • PDI transformation steps
    • Abort
      • General
      • Options
      • Logging
    • Add a Checksum
      • Options
      • Example
      • Metadata injection support
    • Add sequence
      • General
      • Database generated sequence
      • PDI transformation counter generated sequence
    • AMQP Consumer
      • Before You begin
      • General
        • Create and save a new child transformation
      • Options
        • Setup tab
        • Create a new AMQP Message Queue
        • Use an existing AMQP Message Queue
          • Specify Routing Keys
          • Specify Headers
        • Security tab
        • Batch tab
        • Fields tab
        • Result Fields tab
      • Metadata injection support
      • See also
    • AMQP Producer
      • Before you begin
      • General
      • Options
        • Setup tab
        • Security tab
      • Metadata injection support
      • See also
    • Avro Input
      • General
      • Options
        • Source tab
          • Embedded schema
          • Separate schema
        • Avro Fields tab
        • Lookup Fields tab
          • Sample transformation walkthrough using the Lookup field
      • Metadata injection support
    • Avro Output
      • General
      • Options
        • Fields tab
        • Schema tab
        • Options tab
      • Metadata injection support
    • Calculator
      • General
      • Options
        • Calculator functions list
      • Troubleshooting the Calculator step
        • Length and precision
        • Data Types
        • Rounding method for the Round (A, B) function
    • Catalog Input
    • Catalog Output
    • Common Formats
      • Date formats
      • Number formats
    • Copybook Input
      • Before you begin
      • General
      • Options
        • Input tab
        • Output tab
        • Options tab
      • Use Error Handling
      • Metadata injection support
    • CouchDB Input
      • Options
      • Metadata injection support
    • CSV File Input
      • Options
      • Fields
      • Metadata injection support
    • Data types
    • Delete
      • General
      • The key(s) to look up the value(s) table
      • Metadata injection support
    • Discover metadata from a text file
      • General
      • Options
        • Input tab
        • Delimiter candidates tab
        • Enclosure candidates tab
        • Escape candidates tab
        • Delimiter and data type detection rules
      • Examples
      • Data lineage
      • Metadata injection support
    • Elasticsearch REST bulk insert
      • Before you begin
      • General
      • Options
        • General tab
        • Document tab
          • Creating a document to index with stream field data
          • Using an existing JSON document from a field
        • Output tab
    • ETL metadata injection
      • General
      • Options
        • Inject Metadata tab
          • Specify the source field
          • Injecting metadata into the ETL Metadata Injection step
        • Options tab
      • Example
        • Input data
        • Transformations
        • Results
      • Reference links
        • Articles
        • Video
      • Steps supporting metadata injection
    • Execute Row SQL Script
      • General
        • Output fields
      • Metadata injection support
    • Execute SQL Script
      • Notes
      • General
      • Options
        • Optional statistic fields
      • Example
      • Metadata injection support
    • Extract to Rows
      • Options
      • Fields
      • Example
    • File exists (Step)
    • Generate rows
      • Options
      • Fields table
    • Get records from stream
      • General
      • Options
      • Metadata injection support
      • See also
    • Get rows from result
      • General
      • Options
      • Metadata injection support
    • Get System Info
      • General
        • Data types
      • Metadata injection support
    • Google Analytics v4
      • General
      • Before you begin
      • Options
        • Connection tab
        • Date ranges tab
        • Fields tab
        • Filters tab
        • Options tab
    • Group By
      • General
        • The fields that make up the group table
        • Aggregates table
      • Examples
      • Metadata Injection Support
    • Hadoop File Input
      • General
      • Options
        • File tab
          • Accepting file names from a previous step
          • Show action buttons
          • Selecting a file using regular expressions
        • Open file
        • Content tab
        • Error Handling tab
        • Filters tab
        • Fields tab
          • Number formats
          • Scientific notation
          • Date formats
      • Metadata injection support
    • Hadoop File Output
      • General
      • Options
        • File tab
        • Content tab
        • Fields tab
      • Metadata injection support
    • HBase Input
      • General
      • Options
        • Configure query tab
          • Key fields table
        • Create/Edit mappings tab
          • Fields
          • Additional notes on data types
        • Filter result set tab
          • Fields
      • Namespaces
      • Performance considerations
      • Metadata injection support
    • HBase Output
      • General
      • Options
        • Configure connection tab
        • Create/Edit mappings tab
      • Performance considerations
      • Metadata injection support
    • HBase row decoder
      • General
      • Options
        • Configure fields tab
        • Create/Edit mappings tab
          • Key fields table
            • Additional notes on data types
      • Using HBase Row Decoder with Pentaho MapReduce
      • Metadata injection support
    • Hierarchical JSON Input
      • General
      • Options
      • Examples
    • Hierarchical JSON Output
      • General
      • Fields
    • Java filter
      • General
      • Options
      • Filter expression examples
    • JMS Consumer
      • Before you begin
      • General
        • Create and save a new child transformation
      • Options
        • Setup tab
        • Security tab
        • Batch tab
        • Fields tab
        • Result fields tab
      • Metadata injection support
      • See also
    • JMS Producer
      • Before you begin
      • General
      • JMS connection information
        • Setup tab
        • Security tab
        • Options tab
        • Properties tab
      • Metadata injection support
      • See also
    • Job Executor
      • Samples
      • General
      • Options
        • Parameters tab
        • Execution results tab
        • Row grouping tab
        • Results rows tab
        • Result files tab
    • JSON Input
      • General
      • Options
        • File tab
          • Selected files table
        • Content tab
        • Fields tab
          • Select fields
        • Additional output fields tab
      • Examples
      • Metadata injection support
    • Kafka consumer
      • General
        • Create and save a new child transformation
      • Options
        • Setup tab
        • Batch tab
        • Fields tab
        • Result fields tab
        • Options tab
        • Offset Settings tab
          • Modes
      • Security
        • Using SSL
        • Using SASL
        • Using SASL SSL
      • Metadata injection support
      • See also
    • Kafka Producer
      • General
      • Options
        • Setup tab
        • Options tab
      • Security
        • Using SSL
        • Using SASL
        • Using SASL SSL
      • Metadata injection support
      • See also
    • Kinesis consumer
      • General
        • Create and save a new child transformation
      • Options
        • Setup tab
        • Batch tab
        • Fields tab
        • Result fields tab
        • Options tab
      • Metadata injection support
      • See also
    • Kinesis Producer
      • General
      • Options
        • Setup tab
        • Options tab
      • Metadata injection support
      • See also
    • Mapping
      • General
        • Log lines in Kettle
      • Options
        • Parameters tab
        • Input tab
        • Add inputs to table
        • Output tab
      • Mapping Input Specification
        • Options
      • Mapping Output Specification
        • Options
      • Samples
    • MapReduce Input
      • Options
      • Metadata injection support
    • MapReduce Output
      • Options
      • Metadata injection support
    • Memory Group By
      • General
        • The fields that make up the Group Table
        • Aggregates table
      • Metadata injection support
    • Merge rows (diff)
      • General
      • Options
      • Examples
      • Metadata injection support
    • Microsoft Access input
      • Microsoft Access input
      • Options
        • File tab
        • Content tab
        • Fields tab
        • Additional output fields tab
      • Metadata injection support
    • Microsoft Excel Input
      • General
      • Options
        • Files tab
          • Selected files table
        • Sheets tab
        • Content tab
        • Error Handling tab
        • Fields tab
        • Additional output fields tab
      • Metadata injection support
    • Microsoft Excel Output
    • Microsoft Excel writer
      • General
      • Options
        • File & Sheet tab
          • File panel
          • Sheet panel
          • Template panel
        • Content Tab
          • Content options panel
          • When writing to existing sheet panel
          • Fields panel
      • Metadata injection support
    • Modified Java Script Value
      • General
      • Java script functions pane
      • Java Script pane
        • Script types
        • Fields table
        • Modify values
      • JavaScript Internal API Objects
      • Examples
        • Check for the existence of fields in a row
        • Add a new field in a row
        • Use NVL in JavaScript
        • Split fields
        • Comparing values
        • String values
        • Numeric values
        • Filter rows
      • Sample transformations
    • Modify values from a single row
      • General
      • Targets
      • Example
    • Modify values from grouped rows
      • General
      • Grouping fields
      • Modifications
      • Example
    • Mondrian Input
      • General
    • MongoDB Execute
      • General
      • Options
        • Main tab
        • Step tab
      • Execute commands
        • Database commands
        • Collection commands
      • Example of Execute step
      • Metadata injection support
    • MongoDB Input
      • General
      • Options
        • Configure connection tab
        • Input options tab
          • Tag set specification table
        • Query tab
        • Fields tab
      • Examples
        • Query expression
        • Aggregate pipeline
      • Metadata injection support
    • MongoDB Output
      • General
      • Options
        • Configure connection tab
        • Output options tab
        • Mongo document fields tab
          • Example
            • Input data
            • Document field definitions
            • Document structure
        • Create/drop indexes tab
          • Create/drop indexes example
      • Metadata injection support
    • MQTT Consumer
      • General
        • Create and save a new child transformation
      • Options
        • Setup tab
        • Security tab
        • Batch Tab
        • Fields tab
        • Result fields tab
        • Options tab
      • Metadata injection support
      • See also
    • MQTT Producer
      • General
      • Options
        • Setup tab
        • Security tab
        • Options tab
      • Metadata injection support
      • See also
    • ORC Input
      • Options
        • Fields
          • ORC types
      • Metadata injection support
    • ORC Output
      • General
      • Options
        • Fields tab
          • ORC types
        • Options tab
      • Metadata injection support
    • Parquet Input
      • General
        • Fields
          • PDI types
      • Metadata injection support
    • Parquet Output
      • General
      • Options
        • Fields tab
        • Options tab
      • Metadata injection support
    • Pentaho Reporting Output
      • General
      • Metadata injection support
    • Python Executor
      • Before you begin
      • General
      • Options
        • Script tab
          • Source panel
        • Input tab
          • Row by row processing
          • All rows processing
          • Mapping data types from PDI to Python
        • Output tab
          • Variable to fields processing
          • Frames to fields processing
          • Mapping data types from Python to PDI
    • Query HCP
      • Before you begin
      • General
      • Options
        • Query tab
        • Output tab
      • See also
    • Query metadata from a database
      • General
      • Options
        • Connection tab
        • Input tab
        • Fields tab
      • Metadata injection support
    • Read Metadata
    • Read metadata from Copybook
      • General
      • Example
      • Metadata injection support
    • Read metadata from HCP
      • General
      • Options
      • See also
    • Regex Evaluation
      • General
        • Capture Group Fields table
      • Options
        • Settings tab
          • Regular expression evaluation window
        • Content tab
      • Examples
    • Replace in String
      • General
      • Fields string table
      • Example: Using regular expression group references
      • Metadata injection support
      • See also
    • REST client step
      • General
      • Options
        • General tab
        • Authentication tab
        • SSL tab
        • Headers tab
        • Parameters tab
        • Matrix Parameters tab
    • Row Denormaliser
      • General
        • Group field table
        • Target fields table
      • Examples
      • Metadata injection support
    • Row Flattener
      • General
      • Example
    • Row Normaliser
      • General
        • Fields table
      • Examples
      • Metadata injection support
    • S3 CSV Input
      • Options
        • Fields
      • AWS credentials
      • Metadata injection support
      • See also
    • S3 File Output
      • Big Data warning
      • General
      • Options
        • File tab
        • Content tab
        • Fields tab
      • AWS credentials
      • Metadata injection support
      • See also
    • Salesforce bulk operation
      • General
      • Options
        • Connection tab
        • Operation tab
        • Fields tab
        • Advanced tab
      • Metadata Injection Support
    • Salesforce Delete
      • General
      • Options
        • Connection
        • Settings
    • Salesforce Input
      • General
      • Options
        • Settings tab
          • Connection
          • Settings
        • Content tab
          • Advanced
          • Additional fields
          • Other Fields
        • Fields tab
      • Metadata injection support
    • Salesforce Insert
      • General
      • Options
        • Connection
        • Settings
        • Output Fields
        • Fields
    • Salesforce Update
      • General
      • Options
        • Connection
        • Settings
        • Fields
    • Salesforce Upsert
      • General
      • Options
        • Connection
        • Settings
        • Output Fields
        • Fields
    • Select Values
      • General
      • Options
        • Select & Alter tab
        • Edit Mapping
        • Remove tab
        • Meta-data tab
      • Examples
      • Metadata injection support
    • Set Field Value
      • General
      • Options
      • Metadata injection support
    • Set Field Value to a Constant
      • General
      • Options
      • Metadata Injection Support
    • Simple Mapping (sub-transformation)
      • General
        • Log lines in Kettle
      • Options
        • Parameters tab
        • Input tab
        • Output tab
    • Single Threader
      • General
      • Options
        • Options tab
        • Parameters tab
    • Sort rows
      • General
        • Options
        • Fields column settings
      • Metadata injection support
    • Split Fields
      • General
      • Fields table
      • Example
      • Metadata injection support
    • Splunk Input
      • Prerequisites
      • General
      • Options
        • Connection tab
        • Fields tab
      • Raw field parsing
      • Date handling
      • Metadata injection support
    • Splunk Output
      • Prerequisites
      • General
      • Options
        • Connection tab
        • Event tab
      • Metadata injection support
    • String Operations
      • General
      • The fields to process
      • Metadata injection support
    • Strings cut
      • General
      • The fields to cut
      • Example
      • Metadata injection support
    • Switch-Case
      • Options
      • Example
      • Metadata injection support
    • Table Input
      • General
      • Options
      • Example
      • Metadata injection support
    • Table Output
      • General
      • Options
        • Main options tab
        • Database fields tab
        • Enter Mapping window
      • Metadata injection support
    • Text File Input
      • General
      • Options
        • File tab
          • Regular expressions
          • Selected files table
          • Accept file names
          • Show action buttons
        • Content tab
        • Error Handling tab
        • Filters tab
        • Fields tab
        • Additional output fields tab
      • Metadata injection support
    • Text File Output
      • General
      • Options
        • File tab
        • Content tab
        • Fields tab
      • Metadata injection support
      • See also
    • Transformation Executor
      • Error handling and parent transformation logging notes
      • Samples
      • General
      • Options
        • Parameters tab
          • Order of processing
        • Execution results tab
        • Row grouping tab
        • Result rows tab
        • Result files tab
    • Unique Rows
      • Prerequisites
      • General
        • Settings
      • See also
    • Unique Rows (HashSet)
      • General
        • Settings
      • See also
    • User Defined Java Class
      • Not complete Java
      • General (User Defined Java Class)
        • Class Code (User Defined Java Class)
          • Process rows
          • Error handling
          • Logging
        • Class and code fragments
      • Options
        • Fields tab
        • Parameters tab
        • Info steps tab
        • Target steps tab
      • Examples
      • Metadata injection support
    • Write Metadata
    • Write metadata to HCP
      • General
      • Options
      • See also
    • XML Input Stream (StAX)
      • Samples
      • Options
      • Element blocks example
    • XML Output
      • General
      • Options
        • File tab
        • Content tab
        • Fields tab
      • Metadata injection support
  • PDI job entries
    • Amazon EMR Job Executor
      • Before you begin
      • General
      • Options
        • EMR settings tab
          • AWS connection
          • Cluster
        • Job settings tab
    • Amazon Hive Job Executor
      • Before you begin
      • General
      • Options
        • Hive settings tab
          • AWS connection
          • Cluster
        • Job settings tab
    • Bulk load into Amazon Redshift
      • Before you begin
      • General
      • Options
        • Input tab
        • Output tab
        • Options tab
        • Parameters tab
    • Bulk load into Azure SQL DB
      • Before you begin
      • General
      • Options
        • Input tab
        • Output tab
        • Options tab
        • Advanced options tab
    • Bulk load into Databricks
      • General
      • Options
        • Input tab
        • Output tab
    • Bulk load into Snowflake
      • Before you begin
      • General
      • Options
        • Input tab
        • Output tab
        • Options tab
        • Advanced options tab
    • Create Snowflake warehouse
      • General
      • Options
        • Database connection and warehouse
        • Warehouse settings
        • Cluster settings
        • Activity settings
    • Delete Snowflake warehouse
      • General
      • Options
    • File Exists (Job Entry)
    • Google BigQuery loader
      • Before you begin
      • General
      • Options
        • Setup tab
        • File tab
    • Hadoop Copy Files
      • General
      • Options
        • Files/Folders tab
        • Settings tab
    • Job (job entry)
      • General
      • Options
        • Options tab
        • Logging tab
        • Argument tab
        • Parameters tab
    • Kafka Offset
      • General
      • Options
        • Setup tab
        • Options tab
        • Offset Settings tab
      • Examples
    • Modify Snowflake warehouse
      • General
      • Options
        • Database connection and warehouse
        • Warehouse settings
        • Cluster settings
        • Activity settings
    • Pentaho MapReduce
      • General
      • Options
        • Mapper tab
        • Combiner tab
        • Reducer tab
        • Job Setup tab
        • Cluster tab
          • Hadoop cluster configuration
        • User Defined tab
      • Use PDI outside and inside the Hadoop cluster
        • Pentaho MapReduce workflow
          • PDI Transformation
          • PDI Job
        • PDI Hadoop job workflow
        • Hadoop to PDI data type conversion
        • Hadoop Hive-specific SQL limitations
        • Big data tutorials
    • Spark Submit
      • Before you begin
      • Install and configure Spark client for PDI use
        • Spark version 2.x.x
      • General
      • Options
        • Files tab
          • Java or Scala
          • Python
        • Arguments tab
        • Options tab
      • Troubleshooting your configuration
        • Running a Spark job from a Windows machine
    • Sqoop Export
      • General
        • Quick Setup mode
        • Advanced Options mode
    • Sqoop Import
      • General (Sqoop Import)
        • Quick Setup mode
        • Advanced Options mode
    • Start Snowflake warehouse
      • General
      • Options
    • Stop Snowflake warehouse
      • General
      • Options
    • Transformation (job entry)
      • General
      • Options
        • Options tab
        • Logging tab
        • Arguments tab
        • Parameters tab
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Advanced Pentaho Data Integration topics
  2. Partitioning data

Learn more

  • Set up a Carte cluster

PreviousPartitioning clustered transformationsNextPentaho Data Services

Last updated 23 days ago

Was this helpful?

LogoLogo

About

  • Pentaho.com

Support

  • Pentaho Support

Resources

  • Privacy

© 2025 Hitachi Vantara LLC