LogoLogo
Ctrlk
Try Pentaho Data Integration and Analytics
  • Pentaho Documentation
  • Pentaho Data Integration
  • Get started with the PDI client
  • Use a Pentaho Repository in PDI
  • Data Integration perspective in the PDI client
  • Schedule perspective in the PDI client
  • Streaming analytics
  • Advanced topics
    • PDI and Hitachi Content Platform (HCP)
    • PDI and Data Catalog
    • PDI and Snowflake
    • Copybook steps in PDI
    • Work with the Streamlined Data Refinery
    • Use Command Line Tools to Run Transformations and Jobs
    • Using Pan and Kitchen with a Hadoop cluster
    • Use Carte Clusters
    • Connecting to a Hadoop cluster with the PDI client
    • Adaptive Execution Layer
      • Set Up AEL
      • Use AEL
        • Recommended PDI steps to use with Spark on AEL
      • Vendor-specific setups for Spark
      • AEL logging
      • Advanced topics
      • Troubleshooting
    • Partitioning data
    • Pentaho Data Services
    • Data lineage
    • Use the Pentaho Marketplace to manage plugins
    • Customize PDI Data Explorer
  • Troubleshooting possible data integration issues
  • PDI transformation steps
  • PDI job entries
Powered by GitBook
On this page

Was this helpful?

  1. Advanced topics
  2. Adaptive Execution Layer
  3. Use AEL

Recommended PDI steps to use with Spark on AEL

When you want to access, process, and analyze large datasets, the following PDI transformation steps are coded for Spark to work well with big data technologies.

  • Abort

  • AMQP Consumer

  • Avro Input

  • Avro Output

  • Copy rows to result

  • Dummy (do nothing)

  • ETL Metadata Injection

  • Filter Rows

  • Get records from stream

  • Get rows from result

  • Group By

  • Hadoop File Input

  • Hadoop File Output

  • HBase Input

  • HBase Output

  • Java filter

  • Join Rows (Cartesian product)

  • Kafka Consumer

  • Mapping (Sub-transformation)

  • Mapping Input Specification

  • Mapping Output Specification

  • Memory Group By

  • Merge Join

  • Merge Rows (diff)

  • MQTT Consumer

  • ORC Input

  • ORC Output

  • Parquet Input

  • Parquet Output

  • Simple Mapping

  • Sort rows

  • Stream Lookup

  • Switch / Case

  • Table Input

  • Table Output

  • Text File Input

  • Text File Output

  • Transformation Executor

  • Unique Rows

  • Unique Rows (HashSet)

  • Write to Log

PreviousUse AELNextVendor-specific setups for Spark

Last updated 5 months ago

Was this helpful?

LogoLogo

About

  • Pentaho.com

Support

  • Pentaho Support

Resources

  • Privacy

© 2025 Hitachi Vantara LLC