Pipelines using the SCD processor fail with “Cannot broadcast the table that is larger than 8GB”

3 years ago
January 27, 2022
0 replies
411 views

AkshayJadhav
StreamSets Employee
101 replies

What is Broadcast Join?

Broadcast join is an important part of Spark SQL’s execution engine. When used, it performs a join on two relations by first broadcasting the smaller one to all Spark executors, then evaluating the join criteria with each executor’s partitions of the other relation. When the broadcasted relation is small enough, broadcast joins are fast, as they require minimal data shuffling. Above a certain threshold, however, broadcast joins tend to be less reliable or performant than shuffle-based join algorithms, due to bottlenecks in-network and memory usage

Spark attempts to estimate the size of tables before deciding whether or not to use a broadcast join. Sometimes Spark estimates incorrectly and tries to use a broadcast join with a table larger than the hard-coded max size of 8GB.

To work around this, you can try applying the Spark config. This will tell Spark to disable broadcast joins.

spark.sql.autoBroadcastJoinThreshold = -1

However, If your “master” and “changes” table schemas are identical, it’s possible Spark might choose a broadcast join anyway. In that scenario make a schema change to the master table (e.g. add one of the SCD tracking columns) and Spark should no longer choose a broadcast to join.

Did this topic help you find an answer to your question?

Be the first to reply!

Reply

Related topics

write data into mysql from mongodb how to handle auto increment columnicon

Pipeline fails with java.lang.NullPointerException when there are no more records to read from table with JDBC Query Consumer & JDBC Producuericon

Subscriptions created for create incident for failed job in ServiceNow has not triggered.icon

Performance is very slow when I am using JDBC Lookup processor in Data Collectoricon

Technical Service Bulletin 2020-11-19 (TSB) - Multithreaded pipelines using Credential Stores can fail with IllegalStateException and an error message of "Not in UserGroupScope scope"

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded