LogoLogo
search
⌘Ctrlk
  • Pentaho Documentation
LogoLogo
  • Pentaho Data Integration
  • Get started with the PDI client
  • Use a Pentaho Repository in PDI
  • Data Integration perspective in the PDI client
  • Schedule perspective in the PDI client
  • Streaming analytics
  • Advanced topics
    • PDI and Hitachi Content Platform (HCP)
    • PDI and Data Catalog
    • PDI and Snowflake
    • Copybook steps in PDI
    • Work with the Streamlined Data Refinery
    • Use Command Line Tools to Run Transformations and Jobs
    • Using Pan and Kitchen with a Hadoop cluster
    • Use Carte Clusters
    • Connecting to a Hadoop cluster with the PDI client
    • Adaptive Execution Layer
      • Set Up AEL
      • Use AEL
        • Recommended PDI steps to use with Spark on AEL
      • Vendor-specific setups for Spark
      • AEL logging
      • Advanced topics
      • Troubleshooting
    • Partitioning data
    • Pentaho Data Services
    • Data lineage
    • Use the Pentaho Marketplace to manage plugins
    • Customize PDI Data Explorer
  • Troubleshooting possible data integration issues
  • PDI transformation steps
  • PDI job entries
gitbookPowered by GitBook
block-quoteOn this pagechevron-down
  1. Advanced topicschevron-right
  2. Adaptive Execution Layerchevron-right
  3. Use AEL

Recommended PDI steps to use with Spark on AEL

When you want to access, process, and analyze large datasets, the following PDI transformation steps are coded for Spark to work well with big data technologies.

  • Abort

  • AMQP Consumer

  • Avro Input

  • Avro Output

  • Copy rows to resultarrow-up-right

  • Dummy (do nothing)arrow-up-right

  • ETL Metadata Injection

  • Filter Rowsarrow-up-right

  • Get records from stream

  • Get rows from result

  • Group By

  • Hadoop File Input

  • Hadoop File Output

  • HBase Input

  • HBase Output

  • Java filter

  • Join Rows (Cartesian product)arrow-up-right

  • Kafka Consumer

  • Mapping (Sub-transformation)

  • Mapping Input Specification

  • Mapping Output Specification

  • Memory Group By

  • Merge Joinarrow-up-right

  • Merge Rows (diff)

  • MQTT Consumer

  • ORC Input

  • ORC Output

  • Parquet Input

  • Parquet Output

  • Simple Mapping

  • Sort rows

  • Stream Lookuparrow-up-right

  • Switch / Case

  • Table Input

  • Table Output

  • Text File Input

  • Text File Output

  • Transformation Executor

  • Unique Rows

  • Unique Rows (HashSet)

  • Write to Logarrow-up-right

PreviousUse AELchevron-leftNextVendor-specific setups for Sparkchevron-right

Last updated 8 months ago

Was this helpful?

LogoLogo

About

  • Pentaho.com

Support

  • Pentaho Support

Resources

  • Privacy

© 2025 Hitachi Vantara LLC

Was this helpful?