Connect Engines with HIVE and Watsonx

4 months ago
December 2, 2024
0 replies
32 views

BugalloF
Fan

Hi, I’ve started working with StreamSets recently and I want a Transformer Engine to process data from IBM Cloud Object Storage (COS) and write it to a Hive table managed by Watsonx.data. However, I’m facing challenges connecting StreamSets to the Spark engine in Watsonx.data.

Watsonx.data uses an optimized Spark engine, and from what I understand, it behaves as a Standalone Cluster. The issue is that I don’t have access to the Spark Master URL (spark://<master-host>:<port>), which StreamSets requires to connect the Transformer Engine. StreamSets needs a Master URL to send jobs to the Spark cluster. Without this, I haven’t been able to configure the pipeline to process jobs directly in Watsonx.data’s Spark engine.
As well I’ve tried with the data collector engine, that I understand that it’s not the best fit but it requires easier connections. It didn’t work neither, I’ll leave logs at the bottom for this situation.

Is it possible to connect StreamSets to the Spark engine in Watsonx.data without explicitly specifying a Master URL? Has anyone successfully integrated StreamSets with Watsonx.data or faced a similar situation? Any insights / courses / documentation / alternative approaches would be greatly appreciated.

Thank you in advanced!

DATA COLLECTOR logs:

Error: [Hive Query 1 - JDBC URL] Cannot make connection with default hive database starting with URL: jdbc:hive2://689b962a-b945-135e-e0ff-cvbipzb8rpre.cp28rdll06kbl63pbgc0.lakehouse.appdomain.cloud:31375/. Reason:null (HIVE_22)
Technical Details: Cannot make connection with default hive database starting with URL: jdbc:hive2://xxxx:yyyyy/. Reason:null

Extended from logs:
Caused by: org.apache.thrift.transport.sasl.TSaslNegotiationException: Invalid status 21 at org.apache.thrift.transport.sasl.NegotiationStatus.byValue(NegotiationStatus.java:57) ~[hive-exec-3.1.3000.7.1.9.0-387.jar:3.1.3000.7.1.9.0-387] at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:155) ~[hive-exec-3.1.3000.7.1.9.0-387.jar:3.1.3000.7.1.9.0-387]

Caused by: java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://689b962a-b945-135e-e0ff-cvbipzb8rpre.cp28rdll06kbl63pbgc0.lakehouse.appdomain.cloud:31375/: Invalid status 21 at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:367) ~[hive-jdbc-3.1.3000.7.1.9.0-387.jar:3.1.3000.7.1.9.0-387]

Reply

Related topics

New Saved Message Slack trigger does not work anymore after Slack updateicon

New Saved Message not triggering for DMs on Slackicon

Get Slack notifications when an important email arrives in Gmail

What's New: 76 updated integrations for December 2024

Can't push Slack Message to a Zap from anyone on the same Slack teamicon

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded