Data Source Wizard

Before you can create reports and analysis, Pentaho Server must know your data. It must know where it lives and how to connect. It must know what data to use and how it is structured.

Use the Data Source Wizard to define a data source. You can create relationalarrow-up-right and multidimensionalarrow-up-right models for reporting and analysis.

After you define a data source, you can share it with report authors. One data source can support many reports and analyses.

Before using the Data Source Wizard

circle-exclamation

The Data Source Wizard is intended for prototyping. It can also serve as a starting point for Pentaho Analysis and Pentaho Metadata models.

As a best practice, re-create or edit these models with Schema Workbench or Metadata Editor. This improves query performance and enables advanced features in production.

Data sources created with the Data Source Wizard are not supported in production for these reasons:

  • Data sources created with the Data Source Wizard cannot be secured using Pentaho roles, as per the security configuration.

  • They are not optimized for performance.

  • They may not conform to design best practices.

Get started with the Data Source Wizard

When you configured Pentaho Server, you defined connection information. This includes where the data is stored. It also includes what protocol or driver to use.

The Install Pentaho Data Integration and Analytics document shows you how to change or add connection information.

To create a data source, pick an existing connection. Then use the wizard to build an initial model. You can use the Data Source Model Editor to refine the model further.

The Data Source Wizard guides you through setting up CSV files. It also guides you through creating relational and multidimensional models. These models are data sources for interactive reports and analysis reports.

The initial models let you see results immediately. Interactive Reports and Analyzer update as you change the model structure.

When you need more security, localization, or advanced changes, export the initial model. Then import it into one of these tools:

  • Pentaho Metadata Editor for relational models

  • Pentaho Schema Workbench for multidimensional models

circle-info

Relational and multidimensional models edited in Metadata Editor and Schema Workbench can no longer be edited in the Data Source Model Editor.

Create New menu in the Pentaho User Console

If you are not logged in with permissions to create, edit, and delete data sources, you are limited to view-only permissions. You do not see icons for adding, editing, or deleting data sources.

Manage Data Sources dialog box

You can access the Data Source Wizard in several places. You do not need to backtrack to create a data source.

If you are in
Then follow these steps

Home perspective

Click Create New > Data Source.

Home perspective

Click Manage Data Sources and then New Data Source.

Home perspective

Click Create New > Analysis Report and then the New icon.

Home perspective

Click Create New > Interactive Report and then the New icon.

Opened perspective

Click the New icon and then select Data Source Wizard.

Any perspective, from the menu bar

Click File > New > Data Source.

Any perspective, from the menu bar

Click File > Manage Data Sources and then click New Data Source.

Dashboard pane, creating a chart or data table

Click the Add icon in the Select Data Source dialog box.

Data Source Wizard dialog box

After accessing the Data Source Wizard, you are ready to create your first data source.

Choose a data source type

The Data Source Wizard supports several source types. Pick the one that matches where your data lives.

Pick a data source type

Use these summaries to choose quickly. Then follow the matching procedure below.

CSV file

A CSV file stores data in rows and columns. You typically separate values with commas.

  • Use it to stage a database table in the Pentaho Repository.

  • Re-upload the file if the data changes.

  • Upload CSV files without a data connection.

  • Creates a multidimensional model for Analyzer.

  • Creates a relational model for Interactive Reports and Dashboard Designer.

Expertise: Basic database concepts. Time: About 10 minutes. Best practice: Use for evaluation and prototypes.

SQL query

An SQL query defines the data you want to expose. It runs against a relational database.

  • You can control column names and aggregation options.

  • Creates a multidimensional model for Analyzer.

  • Creates a relational model for Interactive Reports and Dashboard Designer.

Expertise: Basic database concepts and SQL. Time: About 20 minutes. Best practice: Use for evaluation and prototypes.

Database table

Database table sources use one or more database tables. They support both relational reporting and multidimensional analysis.

  • Works well for reporting against an operational data store (ODS).

  • Works well for analysis against a star schema.

Expertise: Relational and/or multidimensional database design. Time: About 30–60 minutes. Best practice: For prototypes, use representative sample data.

Create a CSV data source

Before you begin, review these terms.

  • Delimiter: Character used to separate values in a row.

  • Enclosure: Character used to wrap values, like quotes.

  • Length: Maximum characters allowed in a field.

  • Precision: Digits after the decimal point.

  1. Sign in to the User Console.

  2. Select Create New, then select Data Source.

  3. Select New Data Source.

    The Data Source Wizard opens.

  4. Enter a name in Data Source Name.

    These characters are not allowed:

    Data Source Wizard
  5. From Source Type, select CSV File.

  6. Select Import and choose your CSV file.

  7. Choose a delimiter and enclosure type.

    To use the first row as data, clear First row is header.

    Data Source Wizard CSV First row is header
  8. Review File Preview.

    Ensure the columns align. Then confirm the delimiter and enclosure settings.

    To use the first row as headings, keep First row is header selected.

  9. Select Next.

    The Staging Settings page lists columns from the CSV file.

    Data Source Wizard Staging Settings
  10. Select the columns to include.

    Use Select All or Deselect All as needed.

  11. Update Name and Type, if needed.

    1. Choose date and number formats from the drop-down list.

    2. Enter a value in Source Format when the list is unavailable.

      Boolean values render as true or false.

  12. Optional: Select Show File Contents to preview a sample.

    Select Close to return to Staging Settings.

    Data Source Wizard File Preview
  13. Continue configuring the CSV settings.

    Or select Finish.

    The Data Source Created window opens.

  14. Select Keep default model or Customize model now.

    Select OK to finish.

Your new data source is available in Analyzer, Interactive Reports, and Dashboard Designer. If you selected Customize model now, the Data Source Model Editor opens.

Create a SQL query data source

Once you create the data source, it is available to users of Interactive Reports, Analyzer, and Dashboard Designer.

  1. Sign in to the User Console.

  2. Select Create New, then select Data Source.

  3. Select New Data Source.

    The Data Source Wizard opens.

  4. Enter a name in Data Source Name.

    These characters are not allowed:

    SQL Query Data Source Wizard dialog
  5. From Source Type, select SQL Query.

    Select Next.

  6. Under Data Connection, select a database connection.

  7. Enter your query in SQL Query.

    Select Data Preview to validate the results.

    SQL Query Data Preview
  8. Select Close to exit the preview.

  9. Select Finish.

  10. In Data Source Created, select Keep default model or Customize model now.

    Select OK.

    SQL Query Data Source Created

A relational model is generated from the SQL query. If you selected Customize model now, the Data Source Model Editor opens.

Create a database table source

Database table sources support two build modes. Choose based on your table design and reporting needs.

  • Reporting Only: Best for detailed relational reports.

  • Reporting and Analysis: Best for star schemas and analysis.

Before you begin, review these terms.

  • Fact table: Table that stores measurements or metrics.

  • Star schema: Fact table plus dimension tables.

  • Snowflake schema: Fact table plus normalized dimensions.

Reporting Only vs Reporting and Analysis

Reporting Only

  • Use for operational or transactional databases.

  • Use when you need row-level detail with sort and filter.

  • Generates a relational model.

  • You can refine the model in the Data Source Model Editor.

  • You can extend the model in Pentaho Metadata Editor.

Reporting and Analysis

  • Use for multidimensional designs like star schemas.

  • Use when you need summaries and detail.

  • Generates a multidimensional model.

  • You can refine the model in the Data Source Model Editor.

  • You can extend the model in Pentaho Schema Workbench.

Create a database table data source (Reporting Only)

circle-info

Start your database before you begin.

  1. Sign in to the User Console.

  2. Select Create New, then select Data Source.

  3. Select New Data Source.

    The Data Source Wizard opens.

  4. Enter a name in Data Source Name.

    These characters are not allowed:

    Table Data Source Reporting Only
  5. From Source Type, select Database Table(s).

  6. Under Connection, select a database connection.

  7. Select Reporting Only, then select Next.

  8. From Schema, select a schema.

    The Available Tables list appears.

    Table Data Source Reporting Only Available Tables
  9. From Available Tables, select one or more tables.

  10. Use the right arrow to move tables to Selected Tables.

    To remove a table, select it in Selected Tables, then use the left arrow.

    To select multiple tables, hold Ctrl while selecting.

  11. Select Next.

    The Define Joins page appears.

  12. From Left Table, select the fact table.

  13. From Right Table, select the table to join.

  14. Select Create Join.

  15. Create join conditions for each join.

    The join relationship appears in the lower section.

    To delete a join, select the join, then select Delete Join.

  16. Select Finish.

    The Data Source Created window opens.

  17. Select Keep default model or Customize model now.

    Select OK.

    Table Data Source Reporting Only Data Source Created

Create a database table data source (Reporting and Analysis)

circle-info

Start your database before you begin.

  1. Sign in to the User Console.

  2. Select Create New, then select Data Source.

  3. Select New Data Source.

    The Data Source Wizard opens.

  4. Enter a name in Data Source Name.

    These characters are not allowed:

    Table Data Source Reporting and Analysis
  5. From Source Type, select Database Table(s).

  6. Under Connection, select a database connection.

  7. Select Reporting and Analysis, then select Next.

  8. From Schema, select a schema.

    The Available Tables list appears.

    Table Data Source Reporting and Analysis Available Tables
  9. From Available Tables, select one or more tables.

  10. Use the right arrow to move the tables to Selected Tables.

    To remove a table, select it in Selected Tables, then use the left arrow.

    To select multiple tables, hold Ctrl while selecting.

  11. From Fact Table, select a fact table.

    Then select Next.

    The joins page appears.

    Table Data Source Reporting and Analysis Define Joins
  12. From Left Table, select the fact table.

  13. From Right Table, select the table to join.

  14. Select Create Join.

    To delete a join, select the join, then select Delete Join.

  15. Create join conditions for each join.

  16. Select Finish.

    The Data Source Created window opens.

  17. Select Keep default model or Customize model now.

    Select OK.

    Table Data Source Reporting and Analysis Data Source Created

A multidimensional model is generated for Analyzer, Interactive Reports, and Dashboard Designer. If you selected Customize model now, the Data Source Model Editor opens.

circle-info

Before you access the new data source in the PDI client (Spoon), clear shared objects:

  1. Select Tools > Repository > Clear Shared Object.

Last updated

Was this helpful?