Question

Error while creating sdc instance in Python SDK

  • 25 April 2023
  • 9 replies
  • 83 views

sch = ControlHub(API_CREDENTIAL_ID, API_TOKEN)sdc=DataCollector(server_url='http://023aee033abf:18630',control_hub=sch)

sdc instance creation is taking lot of time to execute, when I force stop run, it shows the following err,

-------------------------------------------------------------------------------------

Traceback (most recent call last):
  File "C:\Users\mihir_kale\PycharmProjects\Informatica_to_SS\sdc_instance.py", line 15, in <module>
    sdc=DataCollector(server_url='http://023aee033abf:18630',control_hub=sch)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mihir_kale\PycharmProjects\Informatica_to_SS\venv\Lib\site-packages\streamsets\sdk\sdc.py", line 90, in __init__
    self.api_client = sdc_api.ApiClient(server_url=current_server_url,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mihir_kale\PycharmProjects\Informatica_to_SS\venv\Lib\site-packages\streamsets\sdk\sdc_api.py", line 97, in __init__
    result = self._fetch_tunneling_instance_id()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mihir_kale\PycharmProjects\Informatica_to_SS\venv\Lib\site-packages\streamsets\sdk\sdc_api.py", line 140, in _fetch_tunneling_instance_id
    wait_for_condition(_get_tunneling_instance_id, [self], timeout=300000)
  File "C:\Users\mihir_kale\PycharmProjects\Informatica_to_SS\venv\Lib\site-packages\streamsets\sdk\utils.py", line 366, in wait_for_condition
    outcome = condition(*condition_args or [], **condition_kwargs or {})
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mihir_kale\PycharmProjects\Informatica_to_SS\venv\Lib\site-packages\streamsets\sdk\sdc_api.py", line 124, in _get_tunneling_instance_id
    response = api_client._get(end_point, absolute_endpoint=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\mihir_kale\PycharmProjects\Informatica_to_SS\venv\Lib\site-packages\streamsets\sdk\sdc_api.py", line 1268, in _get
    response = self.session.get(url, params=params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


  File "C:\Users\mihir_kale\PycharmProjects\Informatica_to_SS\venv\Lib\site-packages\requests\sessions.py", line 600, in get
    return self.request("GET", url, **kwargs)


 

What I want to do is I want to validate multiple pipelines at the same time. For that, I have to create multiple sdc instances for multiple engines, validate pipelines belonging to those sdc engines simultaneously using validate method.

I tried using sch instance but found that there is no engine configuration attribute linked to sch obj,i.e I can’t select a particular authoring engine for a particular pipeline using any sch instance method.

Is there a way to solve this?
 


9 replies

Userlevel 4
Badge

@Mihir Kale 

 

Please can you confirm the version of sdk you have installed.

 

from streamsets.sdk import ControlHub
sch = ControlHub(credential_id=credential_id, token=token)
sdc = sch.data_collectors.get(url='<data_collector_address>')

and then use sch and sdc accordingly.

 

depending upon the version you might have to pass another parameter to sdc. I will try and find an example and share

Hi @saleempothiwala 

I have installed sdk version 5.0

Userlevel 4
Badge

@Mihir Kale 

 

use the code below.

Substitute your credential id, token, engine url and hopefully it should work

from streamsets.sdk import ControlHub
sch = ControlHub(credential_id=<credential_id>, token=<token>)
sdc = sch.data_collectors.get(engine_url='<data_collector_address>')

 

Hi @saleempothiwala 

The above code works, but sdc obj is of type Engine class. Is there a way to create an object of streamsets.sdk.sdc.DataCollector class so that I can use the validate_pipeline method?

Userlevel 5
Badge +1

@Mihir Kale 

 

can you please try the below snippet which will help you in creating object for data collector builder.

 

python3

from streamsets.sdk import ControlHub
sch = ControlHub(credential_id='your_crediential_id', token='your_token_id')
sdc = sch.data_collectors.get(url='http://your_data_collector_hostname:18630')

builder = sch.get_pipeline_builder(engine_type='data_collector', engine_id=sdc.id)

dev_raw_data_source = builder.add_stage('Dev Raw Data Source')
trash = builder.add_stage('Trash')
dev_raw_data_source >> trash

pipeline = builder.build('My first pipeline')

sch.publish_pipeline(pipeline, commit_message='First commit of my first pipeline')

 

Thanks Bikram, but I want to use validate pipeline method of Data Collector class instead of Control Hub class. Is there a way to do this?

Userlevel 4
Badge

@Mihir Kale please can you let me know the reason you want to use DataCollector class to validate. 

I just want to check if validate method of DataCollector class takes less time to validate as compared to calling validate method of Control Hub.

Userlevel 5
Badge +1

@Mihir Kale 

Validation happens quickly for both SDK as well as in control hub .  This only checks the config details , if all are in sync then it will give message as validated else throw error.

Thanks & Regards

Bikram_

Reply