Skip to main content
Solved

launching pipeline using SDK

  • December 28, 2021
  • 5 replies
  • 126 views

ashok verma
Discovered Fame

from the python shell i am unable to launch data collector

below options i have tried

  1. from streamsets.sdk import DataCollector

dc = DataCollector('https://localhost:18630')

 

error : None object has no attribute use_websocket_tunning

 

2.from streamsets.sdk import DataCollector,ControlHub

sch = ControlHub(<SCH URL>, credential_id=<credential id>, token=<token>)pipeline_builder = sch.get_pipeline_builder(engine_type='data_collector', engine_url=<SDC URL>)

in the above step, i have given engine_url by login into streamsets and under engine tab the active i gave and i am getting below error

 

error : instnace is not in list

Best answer by Kirti

@ashok verma ,

  1. one way to see if an execution engine is registered /deployed in StreamSets DataOps Platform, is to look into Engines UI. (From left hand side navigation menu, click on `Set Up` and then on `Engines`). The expected SDC should  be listed there. If you are dealing with Transformer, on right hand pane click on `Transformers` and it would be listed there.
  1. I would say, even try to create a pipeline using UI using that SDC/Transformer just to make sure access rights etc. are correct.
  1. The next step would be to create a pipeline using StreamSets SDK for Python. Note here: the major change in version 4.0.0beta (which allows one to interact with StreamSets DataOps Platform) is classes DataCollector and Transformer are no more public as even in UI they are headless engines.

It means

rather than using 

from streamsets.sdk import DataCollector

dc = DataCollector('https://localhost:18630')

recommended way to build a pipeline is as shown in the following example in SDK Documentation

https://docs.streamsets.com/platform-sdk/learn/examples/sdk_examples_job_pipeline.html

which is what you tried in as you showed later in your question, I feel.

  1. If you still get that error, is the expected SDC listed, if you give a call like `sch.data_collectors` ? It will show you the URL for SDC.
# Import the ControlHub class from the SDK.
from streamsets.sdk import ControlHub
# Connect to the Control Hub instance you want to interact with.
sch = ControlHub(<SCH URL>, credential_id=<credential id>, token=<token>)

sch.data_collectors

 

View original
Did this topic help you find an answer to your question?

5 replies

dima
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 83 replies
  • December 28, 2021

Do you have existing SDC engines running and registered with the DataOps Platform or are you trying to actually launch new ones via the SDK?


ashok verma
Discovered Fame
  • Author
  • Discovered Fame
  • 13 replies
  • December 28, 2021

Yes sdc engine is running and I think it registered with dataops platform. And we are not launching using SDK.

Could you please let me know how to check whether it registered with dataops platform or not.


Kirti
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 29 replies
  • Answer
  • December 28, 2021

@ashok verma ,

  1. one way to see if an execution engine is registered /deployed in StreamSets DataOps Platform, is to look into Engines UI. (From left hand side navigation menu, click on `Set Up` and then on `Engines`). The expected SDC should  be listed there. If you are dealing with Transformer, on right hand pane click on `Transformers` and it would be listed there.
  1. I would say, even try to create a pipeline using UI using that SDC/Transformer just to make sure access rights etc. are correct.
  1. The next step would be to create a pipeline using StreamSets SDK for Python. Note here: the major change in version 4.0.0beta (which allows one to interact with StreamSets DataOps Platform) is classes DataCollector and Transformer are no more public as even in UI they are headless engines.

It means

rather than using 

from streamsets.sdk import DataCollector

dc = DataCollector('https://localhost:18630')

recommended way to build a pipeline is as shown in the following example in SDK Documentation

https://docs.streamsets.com/platform-sdk/learn/examples/sdk_examples_job_pipeline.html

which is what you tried in as you showed later in your question, I feel.

  1. If you still get that error, is the expected SDC listed, if you give a call like `sch.data_collectors` ? It will show you the URL for SDC.
# Import the ControlHub class from the SDK.
from streamsets.sdk import ControlHub
# Connect to the Control Hub instance you want to interact with.
sch = ControlHub(<SCH URL>, credential_id=<credential id>, token=<token>)

sch.data_collectors

 


Kirti
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 29 replies
  • December 28, 2021

Also, please make sure there is no trailing `/` on SCH_URL.


ashok verma
Discovered Fame
  • Author
  • Discovered Fame
  • 13 replies
  • December 29, 2021
Kirti wrote:

@ashok verma ,

  1. one way to see if an execution engine is registered /deployed in StreamSets DataOps Platform, is to look into Engines UI. (From left hand side navigation menu, click on `Set Up` and then on `Engines`). The expected SDC should  be listed there. If you are dealing with Transformer, on right hand pane click on `Transformers` and it would be listed there.
  1. I would say, even try to create a pipeline using UI using that SDC/Transformer just to make sure access rights etc. are correct.
  1. The next step would be to create a pipeline using StreamSets SDK for Python. Note here: the major change in version 4.0.0beta (which allows one to interact with StreamSets DataOps Platform) is classes DataCollector and Transformer are no more public as even in UI they are headless engines.

It means

rather than using 

from streamsets.sdk import DataCollector

dc = DataCollector('https://localhost:18630')

recommended way to build a pipeline is as shown in the following example in SDK Documentation

https://docs.streamsets.com/platform-sdk/learn/examples/sdk_examples_job_pipeline.html

which is what you tried in as you showed later in your question, I feel.

  1. If you still get that error, is the expected SDC listed, if you give a call like `sch.data_collectors` ? It will show you the URL for SDC.
# Import the ControlHub class from the SDK.
from streamsets.sdk import ControlHub
# Connect to the Control Hub instance you want to interact with.
sch = ControlHub(<SCH URL>, credential_id=<credential id>, token=<token>)

sch.data_collectors

 

thanks for help, able to use datacollector now, by using sch.data_collectors i got know url


Reply