Skip to main content

from the python shell i am unable to launch data collector

below options i have tried

  1. from streamsets.sdk import DataCollector

dc = DataCollector('https://localhost:18630')

 

error : None object has no attribute use_websocket_tunning

 

2.from streamsets.sdk import DataCollector,ControlHub

sch = ControlHub(<SCH URL>, credential_id=<credential id>, token=<token>)pipeline_builder = sch.get_pipeline_builder(engine_type='data_collector', engine_url=<SDC URL>)

in the above step, i have given engine_url by login into streamsets and under engine tab the active i gave and i am getting below error

 

error : instnace is not in list

Do you have existing SDC engines running and registered with the DataOps Platform or are you trying to actually launch new ones via the SDK?


Yes sdc engine is running and I think it registered with dataops platform. And we are not launching using SDK.

Could you please let me know how to check whether it registered with dataops platform or not.


@ashok verma ,

  1. one way to see if an execution engine is registered /deployed in StreamSets DataOps Platform, is to look into Engines UI. (From left hand side navigation menu, click on `Set Up` and then on `Engines`). The expected SDC should  be listed there. If you are dealing with Transformer, on right hand pane click on `Transformers` and it would be listed there.
  1. I would say, even try to create a pipeline using UI using that SDC/Transformer just to make sure access rights etc. are correct.
  1. The next step would be to create a pipeline using StreamSets SDK for Python. Note here: the major change in version 4.0.0beta (which allows one to interact with StreamSets DataOps Platform) is classes DataCollector and Transformer are no more public as even in UI they are headless engines.

It means

rather than using 

from streamsets.sdk import DataCollector

dc = DataCollector('https://localhost:18630')

recommended way to build a pipeline is as shown in the following example in SDK Documentation

https://docs.streamsets.com/platform-sdk/learn/examples/sdk_examples_job_pipeline.html

which is what you tried in as you showed later in your question, I feel.

  1. If you still get that error, is the expected SDC listed, if you give a call like `sch.data_collectors` ? It will show you the URL for SDC.
# Import the ControlHub class from the SDK.
from streamsets.sdk import ControlHub
# Connect to the Control Hub instance you want to interact with.
sch = ControlHub(<SCH URL>, credential_id=<credential id>, token=<token>)

sch.data_collectors

 


Also, please make sure there is no trailing `/` on SCH_URL.


@ashok verma ,

  1. one way to see if an execution engine is registered /deployed in StreamSets DataOps Platform, is to look into Engines UI. (From left hand side navigation menu, click on `Set Up` and then on `Engines`). The expected SDC should  be listed there. If you are dealing with Transformer, on right hand pane click on `Transformers` and it would be listed there.
  1. I would say, even try to create a pipeline using UI using that SDC/Transformer just to make sure access rights etc. are correct.
  1. The next step would be to create a pipeline using StreamSets SDK for Python. Note here: the major change in version 4.0.0beta (which allows one to interact with StreamSets DataOps Platform) is classes DataCollector and Transformer are no more public as even in UI they are headless engines.

It means

rather than using 

from streamsets.sdk import DataCollector

dc = DataCollector('https://localhost:18630')

recommended way to build a pipeline is as shown in the following example in SDK Documentation

https://docs.streamsets.com/platform-sdk/learn/examples/sdk_examples_job_pipeline.html

which is what you tried in as you showed later in your question, I feel.

  1. If you still get that error, is the expected SDC listed, if you give a call like `sch.data_collectors` ? It will show you the URL for SDC.
# Import the ControlHub class from the SDK.
from streamsets.sdk import ControlHub
# Connect to the Control Hub instance you want to interact with.
sch = ControlHub(<SCH URL>, credential_id=<credential id>, token=<token>)

sch.data_collectors

 

thanks for help, able to use datacollector now, by using sch.data_collectors i got know url


Reply