Hi, I’ve started working with StreamSets recently and I want a Transformer Engine to process data from IBM Cloud Object Storage (COS) and write it to a Hive table managed by Watsonx.data. However, I’m facing challenges connecting StreamSets to the Spark engine in Watsonx.data.
Watsonx.data uses an optimized Spark engine, and from what I understand, it behaves as a Standalone Cluster. The issue is that I don’t have access to the Spark Master URL (spark://<master-host>:<port>), which StreamSets requires to connect the Transformer Engine. StreamSets needs a Master URL to send jobs to the Spark cluster. Without this, I haven’t been able to configure the pipeline to process jobs directly in Watsonx.data’s Spark engine.
As well I’ve tried with the data collector engine, that I understand that it’s not the best fit but it requires easier connections. It didn’t work neither, I’ll leave logs at the bottom for this situation.
Is it possible to connect StreamSets to the Spark engine in Watsonx.data without explicitly specifying a Master URL? Has anyone successfully integrated StreamSets with Watsonx.data or faced a similar situation? Any insights / courses / documentation / alternative approaches would be greatly appreciated.
Thank you in advanced!
DATA COLLECTOR logs:
Error: sHive Query 1 - JDBC URL] Cannot make connection with default hive database starting with URL: jdbc:hive2://689b962a-b945-135e-e0ff-cvbipzb8rpre.cp28rdll06kbl63pbgc0.lakehouse.appdomain.cloud:31375/. Reason:null (HIVE_22)
Technical Details: Cannot make connection with default hive database starting with URL: jdbc:hive2://xxxx:yyyyy/. Reason:null
Extended from logs:
Caused by: org.apache.thrift.transport.sasl.TSaslNegotiationException: Invalid status 21 at org.apache.thrift.transport.sasl.NegotiationStatus.byValue(NegotiationStatus.java:57) ~thive-exec-3.1.3000.7.1.9.0-387.jar:3.1.3000.7.1.9.0-387] at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:155) ~Thive-exec-3.1.3000.7.1.9.0-387.jar:3.1.3000.7.1.9.0-387]
Caused by: java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://689b962a-b945-135e-e0ff-cvbipzb8rpre.cp28rdll06kbl63pbgc0.lakehouse.appdomain.cloud:31375/: Invalid status 21 at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:367) ~ohive-jdbc-3.1.3000.7.1.9.0-387.jar:3.1.3000.7.1.9.0-387]