Ctrlk

Try Pentaho Data Integration and Analytics

Steps using Dataset tuning options

As part of Spark tuning, you can use the Dataset tuning options with the following steps.

Step category

Step name

Agile

MonetDB Agile Mart
Table Agile Mart

Big Data

Avro output
Cassandra Input
Cassandra Output
CouchDB Input
Hadoop file output
HBase output
HBase row decoder
MapReduce Input
MapReduce Output
MongoDB Input
MongoDB Output
Orc output
Parquet output
SSTable Output

Bulk loading

ElasticSearch Bulk Insert
Greenplum Load
Infobright Loader
Ingres VectorWise Bulk Loader
MonetDB Bulk Loader
MySQL Bulk Loader
Oracle Bulk Loader
PostgresSQL Bulk Loader
SAP HANA Bulk Loader
Teradata Fastload Bulk Loader
Teradata TPT Insert Upsert Bulk Loader
Vertica Bulk Loader

Cryptography

Decrypt files with PGP
Encrypt Files with PGP
Secret Key Generator
Symmetric Cryptography

Data Mining

AARF Output
Knowledge Flow
Weka Forecasting
Weka Scoring

Data Warehouse

Combination lookup/update
Dimension lookup/update

Deprecated

Aggregate Rows
Example plugin
Get previous row fields
Greenplum Bulk loader
IBM WebSphere MQ Consumer
IBM WebSphere MQ Producer
JMS Consumer (deprecated)
JMS Producer (deprecated)
LucidDB Bulk Loader
LucidDB Streaming Loader
OpenERP Object Delete
OpenERP Object Input
OpenERP Object Output
Palo Cell Input
Palo Cell Output
Palo Dimension Input
SAP Input
Text file output (deprecated)

Experimental

Script
SFTP Put

Flow

Abort
Annotate stream
Blocking step
Detect empty stream
Dummy
ETL metadata injection
Filter rows
Identify last row in a stream
Java filter
Shared Dimension
Switch / Case
Transformation executor

Inline

Injector
Socket reader
Socket writer

Input

Job

Copy rows to result
Get files from result
Get rows from result
Get Session Variables
Set files in result
Set Session Variables

Joins

Join rows
Merge join
Merge rows (diff)
Multiway Merge Join
Sorted Merge
XML Join

Lookup

Call DB Procedure
Check if a column exists
Check if file is locked
Check if webservice is available
Database join
Database lookup
Dynamic SQL row
File exists
Fuzzy match
HTTP client
HTTP Post
MaxMind GeoIP Lookup
REST Client
Stream lookup
Table exists
Web services lookup

Mapping

Mapping
Mapping input specification
Mapping output specification
Simple mapping

N/A

Spark Special - FileInputResolver
Spark Special - GenericSparkOperation
Spark Special - RecordsFromStreamSparkOperation

Output

Automatic Documentation Output
Delete
Insert / Update
JSON output
LDAP Output
Microsoft Access Output
Microsoft Excel Output
Microsoft Excel Writer
Pentaho Reporting Output
Properties Output
RSS Output
Salesforce Delete
Salesforce Insert
Salesforce Update
Salesforce Upsert
Serialize to file
SQL File Output
Synchronize after merge
Table output
Text file output
Update
XML Output

Pentaho Server

Call Endpoint
Get Session Variables
Set Session Variables

Scripting

Execute row SQL script
Execute SQL script
Formula
Modified Java Script Value
Python Executor
Regex Evaluation
Rule Accumulator
Rule Executor
User Defined Java Class
User Defined Java Expression

Statistics

Analytic Query
Group by
Memory group by
Output steps metrics
R script executor
Reservoir Sampling
Sample rows
Univariate Statistics

Streaming

AMQP Producer
JMS Producer
Kafka Producer
Get records from stream
MQTT producer

Transform

Add a Checksum
Add constants
Add sequence
Add value fields changing sequence
Add XML
Calculator
Closure Generator
Concat Fields
Get ID from slave server
Number range
Replace in string
Row denormaliser
Row flattener
Row Normaliser
Select values
Set field value
Set field value to a constant
Sort rows
Split field to rows
Split Fields
Splunk Input
Splunk Output
String operations
Strings cut
Unique rows
Unique rows (Hashset)
Value Mapper
XSL Transformation

Utility

Change file encoding
Clone row
Delay row
Edi to XML
Execute a process
If field value is null
Mail
Metadata structure of stream
Null if...
Process files
Run SSH commands
Send messge to Syslog
Table Compare
Write to log
Zip File

Validation

Credit card validator
Data Validator
Mail Validator
XSD Validator

PreviousDataset tuning options NextJoin tuning options

Last updated 4 months ago

Was this helpful?