Skip to main content

JDBC Multi-Table Consumer can only read from one database.

  • February 17, 2022
  • 1 reply
  • 232 views

AkshayJadhav
StreamSets Employee
Forum|alt.badge.img

JDBC Multi Table Consumer can only read from one database.  In some cases there is a requirement to get data from multiple databases - perhaps the table structure is similar or the databases contain related  information.  

There is one simple options to read from 2 databases in one pipeline -- use database-level Synonyms.
Example:

The following creates a synonym for the base table, "Product" in the AdventureWorks2012 database:
CREATE SYNONYM MyProduct
FOR AdventureWorks2012.Production.Product;

For SQL Server synonyms, more info can be found in the following links:
https://docs.microsoft.com/en-us/sql/relational-databases/synonyms/synonyms-database-engine?view=sql-server-2017
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-synonym-transact-sql?view=sql-server-2017

 

You can also use two pipelines by creating a pipeline for each database with the JDBC origin. If the pipeline is complex, you can use pipeline fragments to better manage the stages in the rest of the pipeline 

Documentation for fragments (Control Hub 3.2.0 and up): https://streamsets.com/documentation/controlhub/3.2.0/help/pdesigner/#datacollector/UserGuide/Pipeline_Fragments/PipelineFragments.html%23concept_msg_4hf_ndb

3. Create one pipeline with JDC Multi-table origin for each database and feed into an 'combined' database or a queueing system such as Kafka. Then make a third pipeline that reads from the 'combined' database.

Did this topic help you find an answer to your question?

1 reply

Forum|alt.badge.img+1

Hi @AkshayJadhav , 

 

Nice topic and relevant to the work that I am doing. I guess the context needs to be set for each option. 

 

Database Synonyms is good thought process. I haven’t tried that option - thanks for highlighting. 

 

Reg. Option 2

If the Table structures and the ingest process are the same (i.e. no specific transformations are required) , 

 

  • Data Collector (Open Source Version) - Yes, you need to create two pipelines.
  • Control Hub - Since both the table structures are the same and ONLY the source / host name and / or the database names are different, you could create one pipeline then leverage Job template and have separate instances of the same pipeline each pointing to a different host / database as Job Instances.

Since you are creating a single pipeline you may not require pipeline fragments unless you want to call a pipeline fragment for every pipeline (some sort of standard process in your organization).

 

Reg. Option 3

Yes, irrespective of whether you are working on the open source version or a Control Hub version, you will need a separate pipeline to read data that was ingested in the previous step from two sources.

 

Well, if you are using Transformer I guess you don’t need to worry about having a separate pipeline. 

 

Cheers,

Srini

 


Reply