Cloud Architecture


Hi Team,


I have few queries on Architecture and feature support:


  1. When using Streamsets cloud ( SaaS ) , Can i deploy control pane in our network? or Control pane resides in Streamsets boundaries wheres as the processing takes place in Clients AWS account. 

            For later, i have a followup question, that will client needs to install agent to communicate with control pane or does control pane requires direct access using some kind of cross account role to spin up / manage and spin down resources like EMR etc ?


  1. Does any data ( in preview or Debug Mode ) goes back to Control Pane or Streamsets cloud infra? 
  1. Is CDC supported for MongoDB, DynamoDB, PostgreSQL, AuroraDB ?


  1. With Kafka, does it supports Kerberos based authentication and authorisation?
  1. Can i replay the data any any point in the pipeline?
  1. Does it offers connectivity to on-premise databases over TCPS protocol?
  1. Does it offers push based processing for sources like Oracle, SQL-Server, Snowflake?


  1. Finally , does Streamsets supports AD authentication, LDAP based authorisations, Github integrations, Control-M based Scheduling, DBT , etc ?

2 replies

Userlevel 5
Badge +1


Yes Streamsets supports most of the scenarios mentioned above.

Kindly play around the tool , i believe you will like it.

Userlevel 2
  1. StreamSets Cloud Control Hub / Data Ops Platform is only offered in a Cloud hosting by SteamSets mode at this time.
  2. We have two communication options for the DataOPs PLatform, either with WebSocket tunnelling enabled or not; if disabled, Preview Data is shown via a JQuery call to Engines directly, so whilst the URL appears to be our platform the actual data routes over your local/corporate network, With WebSocket tunneling enabled, it routes across the SaaS platform, but is not retained by StreamSets
  3. CDC is supported for MongoDB and PostgresDB, I would need to check the other two
  4. Kafka, Yes.
  5. Replay is an interesting term, id need to understand more to answer that question
  6. Most database connections occur using JDBC, but happy to understand this further and give a better answer 
  7. I would need to understand this requirement more, its probably a linguistic difference to what im used to 
  8. We can do AD authentication, Azure AD ive seen often integrated, but we have a range of SAML/SSO integrations available , We have API layers which then allows to connect other tooling such as Control-M, We have had customers connect this to GitHub.

    But i think it would be great if you drop me a line, lets try and hook you up with an account lead to see how we can support you in your StreamSets Journey and get you the answers you need