Solved

launching pipeline using SDK

3 years ago
December 28, 2021
5 replies
126 views

ashok verma
Discovered Fame
13 replies

from the python shell i am unable to launch data collector

below options i have tried

from streamsets.sdk import DataCollector

dc = DataCollector('https://localhost:18630')

error : None object has no attribute use_websocket_tunning

2.from streamsets.sdk import DataCollector,ControlHub

sch = ControlHub(<SCH URL>, credential_id=<credential id>, token=<token>)pipeline_builder = sch.get_pipeline_builder(engine_type='data_collector', engine_url=<SDC URL>)

in the above step, i have given engine_url by login into streamsets and under engine tab the active i gave and i am getting below error

error : instnace is not in list

Best answer by Kirti

@ashok verma ,

one way to see if an execution engine is registered /deployed in StreamSets DataOps Platform, is to look into Engines UI. (From left hand side navigation menu, click on `Set Up` and then on `Engines`). The expected SDC should be listed there. If you are dealing with Transformer, on right hand pane click on `Transformers` and it would be listed there.

I would say, even try to create a pipeline using UI using that SDC/Transformer just to make sure access rights etc. are correct.

The next step would be to create a pipeline using StreamSets SDK for Python. Note here: the major change in version 4.0.0beta (which allows one to interact with StreamSets DataOps Platform) is classes DataCollector and Transformer are no more public as even in UI they are headless engines.

It means

rather than using

from streamsets.sdk import DataCollector

dc = DataCollector('https://localhost:18630')

recommended way to build a pipeline is as shown in the following example in SDK Documentation

https://docs.streamsets.com/platform-sdk/learn/examples/sdk_examples_job_pipeline.html

which is what you tried in as you showed later in your question, I feel.

If you still get that error, is the expected SDC listed, if you give a call like `sch.data_collectors` ? It will show you the URL for SDC.

# Import the ControlHub class from the SDK.
from streamsets.sdk import ControlHub
# Connect to the Control Hub instance you want to interact with.
sch = ControlHub(<SCH URL>, credential_id=<credential id>, token=<token>)

sch.data_collectors

View original

Did this topic help you find an answer to your question?

dima
StreamSets Employee
83 replies
3 years ago
December 28, 2021

Do you have existing SDC engines running and registered with the DataOps Platform or are you trying to actually launch new ones via the SDK?

ashok verma
Author
Discovered Fame
13 replies
3 years ago
December 28, 2021

Yes sdc engine is running and I think it registered with dataops platform. And we are not launching using SDK.

Could you please let me know how to check whether it registered with dataops platform or not.

Kirti
StreamSets Employee
29 replies
Answer
3 years ago
December 28, 2021

@ashok verma ,

one way to see if an execution engine is registered /deployed in StreamSets DataOps Platform, is to look into Engines UI. (From left hand side navigation menu, click on `Set Up` and then on `Engines`). The expected SDC should be listed there. If you are dealing with Transformer, on right hand pane click on `Transformers` and it would be listed there.

I would say, even try to create a pipeline using UI using that SDC/Transformer just to make sure access rights etc. are correct.

The next step would be to create a pipeline using StreamSets SDK for Python. Note here: the major change in version 4.0.0beta (which allows one to interact with StreamSets DataOps Platform) is classes DataCollector and Transformer are no more public as even in UI they are headless engines.

It means

rather than using

from streamsets.sdk import DataCollector

dc = DataCollector('https://localhost:18630')

recommended way to build a pipeline is as shown in the following example in SDK Documentation

https://docs.streamsets.com/platform-sdk/learn/examples/sdk_examples_job_pipeline.html

which is what you tried in as you showed later in your question, I feel.

If you still get that error, is the expected SDC listed, if you give a call like `sch.data_collectors` ? It will show you the URL for SDC.

# Import the ControlHub class from the SDK.
from streamsets.sdk import ControlHub
# Connect to the Control Hub instance you want to interact with.
sch = ControlHub(<SCH URL>, credential_id=<credential id>, token=<token>)

sch.data_collectors

Kirti
StreamSets Employee
29 replies
3 years ago
December 28, 2021

Also, please make sure there is no trailing `/` on SCH_URL.

ashok verma
Author
Discovered Fame
13 replies
3 years ago
December 29, 2021

Kirti wrote:

@ashok verma ,

one way to see if an execution engine is registered /deployed in StreamSets DataOps Platform, is to look into Engines UI. (From left hand side navigation menu, click on `Set Up` and then on `Engines`). The expected SDC should be listed there. If you are dealing with Transformer, on right hand pane click on `Transformers` and it would be listed there.

I would say, even try to create a pipeline using UI using that SDC/Transformer just to make sure access rights etc. are correct.

The next step would be to create a pipeline using StreamSets SDK for Python. Note here: the major change in version 4.0.0beta (which allows one to interact with StreamSets DataOps Platform) is classes DataCollector and Transformer are no more public as even in UI they are headless engines.

It means

rather than using

from streamsets.sdk import DataCollector

dc = DataCollector('https://localhost:18630')

recommended way to build a pipeline is as shown in the following example in SDK Documentation

https://docs.streamsets.com/platform-sdk/learn/examples/sdk_examples_job_pipeline.html

which is what you tried in as you showed later in your question, I feel.

If you still get that error, is the expected SDC listed, if you give a call like `sch.data_collectors` ? It will show you the URL for SDC.

# Import the ControlHub class from the SDK.
from streamsets.sdk import ControlHub
# Connect to the Control Hub instance you want to interact with.
sch = ControlHub(<SCH URL>, credential_id=<credential id>, token=<token>)

sch.data_collectors

thanks for help, able to use datacollector now, by using sch.data_collectors i got know url

Reply

Related topics

Expressions 101 for non-technical users...

July's Roundup

Programming Languages

Ataccama one functions and expressionsicon

Ataccama Academy Learning Journey: Starting with Data Quality & Catalog

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded