Steps using Dataset tuning options
As part of Spark tuning, you can use the Dataset tuning options with the following steps.
Step category
Step name
Agile
MonetDB Agile Mart
Table Agile Mart
Big Data
Avro output
Cassandra Input
Cassandra Output
CouchDB Input
Hadoop file output
HBase output
HBase row decoder
MapReduce Input
MapReduce Output
MongoDB Input
MongoDB Output
Orc output
Parquet output
SSTable Output
Bulk loading
ElasticSearch Bulk Insert
Greenplum Load
Infobright Loader
Ingres VectorWise Bulk Loader
MonetDB Bulk Loader
MySQL Bulk Loader
Oracle Bulk Loader
PostgresSQL Bulk Loader
SAP HANA Bulk Loader
Teradata Fastload Bulk Loader
Teradata TPT Insert Upsert Bulk Loader
Vertica Bulk Loader
Cryptography
Decrypt files with PGP
Encrypt Files with PGP
Secret Key Generator
Symmetric Cryptography
Data Mining
AARF Output
Knowledge Flow
Weka Forecasting
Weka Scoring
Data Warehouse
Combination lookup/update
Dimension lookup/update
Deprecated
Aggregate Rows
Example plugin
Get previous row fields
Greenplum Bulk loader
IBM WebSphere MQ Consumer
IBM WebSphere MQ Producer
JMS Consumer (deprecated)
JMS Producer (deprecated)
LucidDB Bulk Loader
LucidDB Streaming Loader
OpenERP Object Delete
OpenERP Object Input
OpenERP Object Output
Palo Cell Input
Palo Cell Output
Palo Dimension Input
SAP Input
Text file output (deprecated)
Experimental
Script
SFTP Put
Flow
Abort
Annotate stream
Blocking step
Detect empty stream
Dummy
ETL metadata injection
Filter rows
Identify last row in a stream
Java filter
Shared Dimension
Switch / Case
Transformation executor
Inline
Injector
Socket reader
Socket writer
Input
CSV file input
Data Grid
De-serialize from file
Email messages input
ESRI Shapefile Reader
Fixed file input
Generate random credit card numbers
Generate random value
Generate rows
Get data from XML
Get File Names
Get File Rows Count
Get repository names
Get SubFolder names
Get System Info
Get table names
Google Analytics
Google Docs Input
GZIP CSV Input
HL7 Input
JMS Consumer
JSON Input
LDAP Input
LDIF Input
Load file content in memory
Microsoft Access Input
Microsoft Excel Input
Mondrian Input
OLAP Input
Property Input
RSS Input
Salesforce Input
SAS Input
XBase Input
XML Input Stream (StAX)
Yaml Input
Job
Copy rows to result
Get files from result
Get rows from result
Get Session Variables
Set files in result
Set Session Variables
Joins
Join rows
Merge join
Merge rows (diff)
Multiway Merge Join
Sorted Merge
XML Join
Lookup
Call DB Procedure
Check if a column exists
Check if file is locked
Check if webservice is available
Database join
Database lookup
Dynamic SQL row
File exists
Fuzzy match
HTTP client
HTTP Post
MaxMind GeoIP Lookup
REST Client
Stream lookup
Table exists
Web services lookup
Mapping
Mapping
Mapping input specification
Mapping output specification
Simple mapping
N/A
Spark Special - FileInputResolver
Spark Special - GenericSparkOperation
Spark Special - RecordsFromStreamSparkOperation
Output
Automatic Documentation Output
Delete
Insert / Update
JSON output
LDAP Output
Microsoft Access Output
Microsoft Excel Output
Microsoft Excel Writer
Pentaho Reporting Output
Properties Output
RSS Output
Salesforce Delete
Salesforce Insert
Salesforce Update
Salesforce Upsert
Serialize to file
SQL File Output
Synchronize after merge
Table output
Text file output
Update
XML Output
Pentaho Server
Call Endpoint
Get Session Variables
Set Session Variables
Scripting
Execute row SQL script
Execute SQL script
Formula
Modified Java Script Value
Python Executor
Regex Evaluation
Rule Accumulator
Rule Executor
User Defined Java Class
User Defined Java Expression
Statistics
Analytic Query
Group by
Memory group by
Output steps metrics
R script executor
Reservoir Sampling
Sample rows
Univariate Statistics
Streaming
AMQP Producer
JMS Producer
Kafka Producer
Get records from stream
MQTT producer
Transform
Add a Checksum
Add constants
Add sequence
Add value fields changing sequence
Add XML
Calculator
Closure Generator
Concat Fields
Get ID from slave server
Number range
Replace in string
Row denormaliser
Row flattener
Row Normaliser
Select values
Set field value
Set field value to a constant
Sort rows
Split field to rows
Split Fields
Splunk Input
Splunk Output
String operations
Strings cut
Unique rows
Unique rows (Hashset)
Value Mapper
XSL Transformation
Utility
Change file encoding
Clone row
Delay row
Edi to XML
Execute a process
If field value is null
Mail
Metadata structure of stream
Null if...
Process files
Run SSH commands
Send messge to Syslog
Table Compare
Write to log
Zip File
Validation
Credit card validator
Data Validator
Mail Validator
XSD Validator
Last updated
Was this helpful?