Join tuning options
Use the broadcast join tuning option instead of the hash join to optimize join queries when the size of one side of the data is below a specific threshold. Customizing your broadcast join can efficiently join a large table with small tables, such as a fact table with a dimensions table, which can reduce the amount of data sent over the network.
Option
Description
Value type
Example value
join.broadcast.stepName
Marks a DataFrame as small enough for use in broadcast joins. See the Spark API documentation for more information.
String
stepName
Last updated
Was this helpful?