Optimize a Pentaho Data Service
As you Test a Pentaho Data Service, you might notice certain bottlenecks, or parts of the transformation that could run more efficiently. If you want to improve the performance of your data service, apply an optimization technique. Some techniques are specifically designed for Pentaho Data Services. See the Administer Pentaho Data Integration and Analytics document to learn about other general design and optimization techniques that can improve the performance of your transformation.
Optimization Technique
When to Use
For a regular data service only, adjust how long data results are cached. Consider using this technique if either of the following situations apply:
Your result set contains modest data sizes.
You query Big Data sources. Increasing the cache duration can help subsequent follow-on queries run more quickly.
Note: This optimization technique is not available for a streaming data service. It will not appear as an optimization tab if Data Service Type is set to Streaming.
Handle input step queries at the source. Consider using this technique if both of the following situations apply:
Your transformation contains the Table Input or MongoDB Input steps.
You are using simple or complex
WHERE
clauses that includeAND
,IN
, or other specific operators in your query. Limits for theWHERE
clause construction appear in Pentaho Data Service SQL support reference and other development considerations.
Handle step queries at the source. Consider using this technique if both of the following situations apply:
Your transformation contains any step that should be optimized, including input steps like REST where a parameter in the URL could limit the results returned by a web service.
You do not use more complex
WHERE
clauses in your query that might containIN
orOR
keywords such asWHERE REGION = "South" OR Code = "Yellow"
. Limits for theWHERE
clause construction appear in Pentaho Data Service SQL support reference and other development considerations.
For a streaming data service only, adjust the maximum number of rows and elapsed time to produce a new streaming window for processing. Consider using this technique if you are creating a data service from one of the following streaming data steps:
Note: This optimization technique is not available for a regular data service. It will not appear as an optimization tab if Data Service Type is set to Regular.
Last updated
Was this helpful?