Steps using Dataset tuning options

As part of Spark tuning, you can use the Dataset tuning options with the following steps.

Step category

Step name

Agile

  • MonetDB Agile Mart

  • Table Agile Mart

Big Data

  • Avro output

  • Cassandra Input

  • Cassandra Output

  • CouchDB Input

  • Hadoop file output

  • HBase output

  • HBase row decoder

  • MapReduce Input

  • MapReduce Output

  • MongoDB Input

  • MongoDB Output

  • Orc output

  • Parquet output

  • SSTable Output

Bulk loading

  • ElasticSearch Bulk Insert

  • Greenplum Load

  • Infobright Loader

  • Ingres VectorWise Bulk Loader

  • MonetDB Bulk Loader

  • MySQL Bulk Loader

  • Oracle Bulk Loader

  • PostgresSQL Bulk Loader

  • SAP HANA Bulk Loader

  • Teradata Fastload Bulk Loader

  • Teradata TPT Insert Upsert Bulk Loader

  • Vertica Bulk Loader

Cryptography

  • Decrypt files with PGP

  • Encrypt Files with PGP

  • Secret Key Generator

  • Symmetric Cryptography

Data Mining

  • AARF Output

  • Knowledge Flow

  • Weka Forecasting

  • Weka Scoring

Data Warehouse

  • Combination lookup/update

  • Dimension lookup/update

Deprecated

  • Aggregate Rows

  • Example plugin

  • Get previous row fields

  • Greenplum Bulk loader

  • IBM WebSphere MQ Consumer

  • IBM WebSphere MQ Producer

  • JMS Consumer (deprecated)

  • JMS Producer (deprecated)

  • LucidDB Bulk Loader

  • LucidDB Streaming Loader

  • OpenERP Object Delete

  • OpenERP Object Input

  • OpenERP Object Output

  • Palo Cell Input

  • Palo Cell Output

  • Palo Dimension Input

  • SAP Input

  • Text file output (deprecated)

Experimental

  • Script

  • SFTP Put

Flow

  • Abort

  • Annotate stream

  • Blocking step

  • Detect empty stream

  • Dummy

  • ETL metadata injection

  • Filter rows

  • Identify last row in a stream

  • Java filter

  • Shared Dimension

  • Switch / Case

  • Transformation executor

Inline

  • Injector

  • Socket reader

  • Socket writer

Input

  • CSV file input

  • Data Grid

  • De-serialize from file

  • Email messages input

  • ESRI Shapefile Reader

  • Fixed file input

  • Generate random credit card numbers

  • Generate random value

  • Generate rows

  • Get data from XML

  • Get File Names

  • Get File Rows Count

  • Get repository names

  • Get SubFolder names

  • Get System Info

  • Get table names

  • Google Analytics

  • Google Docs Input

  • GZIP CSV Input

  • HL7 Input

  • JMS Consumer

  • JSON Input

  • LDAP Input

  • LDIF Input

  • Load file content in memory

  • Microsoft Access Input

  • Microsoft Excel Input

  • Mondrian Input

  • OLAP Input

  • Property Input

  • RSS Input

  • Salesforce Input

  • SAS Input

  • XBase Input

  • XML Input Stream (StAX)

  • Yaml Input

Job

  • Copy rows to result

  • Get files from result

  • Get rows from result

  • Get Session Variables

  • Set files in result

  • Set Session Variables

Joins

  • Join rows

  • Merge join

  • Merge rows (diff)

  • Multiway Merge Join

  • Sorted Merge

  • XML Join

Lookup

  • Call DB Procedure

  • Check if a column exists

  • Check if file is locked

  • Check if webservice is available

  • Database join

  • Database lookup

  • Dynamic SQL row

  • File exists

  • Fuzzy match

  • HTTP client

  • HTTP Post

  • MaxMind GeoIP Lookup

  • REST Client

  • Stream lookup

  • Table exists

  • Web services lookup

Mapping

  • Mapping

  • Mapping input specification

  • Mapping output specification

  • Simple mapping

N/A

  • Spark Special - FileInputResolver

  • Spark Special - GenericSparkOperation

  • Spark Special - RecordsFromStreamSparkOperation

Output

  • Automatic Documentation Output

  • Delete

  • Insert / Update

  • JSON output

  • LDAP Output

  • Microsoft Access Output

  • Microsoft Excel Output

  • Microsoft Excel Writer

  • Pentaho Reporting Output

  • Properties Output

  • RSS Output

  • Salesforce Delete

  • Salesforce Insert

  • Salesforce Update

  • Salesforce Upsert

  • Serialize to file

  • SQL File Output

  • Synchronize after merge

  • Table output

  • Text file output

  • Update

  • XML Output

Pentaho Server

  • Call Endpoint

  • Get Session Variables

  • Set Session Variables

Scripting

  • Execute row SQL script

  • Execute SQL script

  • Formula

  • Modified Java Script Value

  • Python Executor

  • Regex Evaluation

  • Rule Accumulator

  • Rule Executor

  • User Defined Java Class

  • User Defined Java Expression

Statistics

  • Analytic Query

  • Group by

  • Memory group by

  • Output steps metrics

  • R script executor

  • Reservoir Sampling

  • Sample rows

  • Univariate Statistics

Streaming

  • AMQP Producer

  • JMS Producer

  • Kafka Producer

  • Get records from stream

  • MQTT producer

Transform

  • Add a Checksum

  • Add constants

  • Add sequence

  • Add value fields changing sequence

  • Add XML

  • Calculator

  • Closure Generator

  • Concat Fields

  • Get ID from slave server

  • Number range

  • Replace in string

  • Row denormaliser

  • Row flattener

  • Row Normaliser

  • Select values

  • Set field value

  • Set field value to a constant

  • Sort rows

  • Split field to rows

  • Split Fields

  • Splunk Input

  • Splunk Output

  • String operations

  • Strings cut

  • Unique rows

  • Unique rows (Hashset)

  • Value Mapper

  • XSL Transformation

Utility

  • Change file encoding

  • Clone row

  • Delay row

  • Edi to XML

  • Execute a process

  • If field value is null

  • Mail

  • Metadata structure of stream

  • Null if...

  • Process files

  • Run SSH commands

  • Send messge to Syslog

  • Table Compare

  • Write to log

  • Zip File

Validation

  • Credit card validator

  • Data Validator

  • Mail Validator

  • XSD Validator

Last updated

Was this helpful?