Skip to main content

Hi 

I am getting error while trying to launch pipeline using SDK for python.

 

1 . Installed python in my system 

2 . Tried to install

pip3 install streamsets~=3.0 but it throws me error as given below.  bikramrout@Bikrams-MacBook-Air ~ % pip3 install streamsets~=3.0Collecting streamsets~=3.0  Using cached streamsets-3.12.1.tar.gz (3.7 MB)  Preparing metadata (setup.py) ... doneCollecting dpath==1.4.2  Using cached dpath-1.4.2.tar.gz (14 kB)  Preparing metadata (setup.py) ... error  error: subprocess-exited-with-error    × python setup.py egg_info did not run successfully.  │ exit code: 1  ╰─> c12 lines of output]      Traceback (most recent call last):        File "<string>", line 2, in <module>        File "<pip-setuptools-caller>", line 34, in <module>        File "/private/var/folders/kv/z10gtcs561n998_5h4sbh1sc0000gn/T/pip-install-h1cm6id7/dpath_c4c52c6cb62e45f3b376d4216846f80f/setup.py", line 2, in <module>          import dpath.version        File "/private/var/folders/kv/z10gtcs561n998_5h4sbh1sc0000gn/T/pip-install-h1cm6id7/dpath_c4c52c6cb62e45f3b376d4216846f80f/dpath/__init__.py", line 13, in <module>          from .util import *        File "/private/var/folders/kv/z10gtcs561n998_5h4sbh1sc0000gn/T/pip-install-h1cm6id7/dpath_c4c52c6cb62e45f3b376d4216846f80f/dpath/util.py", line 1, in <module>          import dpath.path        File "/private/var/folders/kv/z10gtcs561n998_5h4sbh1sc0000gn/T/pip-install-h1cm6id7/dpath_c4c52c6cb62e45f3b376d4216846f80f/dpath/path.py", line 9, in <module>          from collections import MutableSequence, MutableMapping      ImportError: cannot import name 'MutableSequence' from 'collections' (/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/collections/__init__.py)      nend of output]    note: This error originates from a subprocess, and is likely not a problem with pip.error: metadata-generation-failed× Encountered error while generating package metadata.╰─> See above for output.

 

  1. I installed pip3 install streamsets and it went fine.

  

from streamsets.sdk import DataCollector

data_collector = DataCollector('https://na01.hub.streamsets.com')

 

I am getting error while trying to connect my streamsets data collector.

 

Error Details :Traceback (most recent call last):
  File "/Users/bikramrout/a.py", line 2, in <module>
    data_collector = DataCollector('https://na01.hub.streamsets.com')
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/streamsets/sdk/sdc.py", line 83, in __init__
    if self.control_hub.use_websocket_tunneling:
AttributeError: 'NoneType' object has no attribute 'use_websocket_tunneling'

 

Kindly advise me how to proceed further.

 

I am referring the document provided by streamsets but getting errors.

https://docs.streamsets.com/sdk/latest/installation.html

 

@Bikram 

pip3 install streamsets will install the latest version. If you are using SCH 3.x then 

pip3 install streamsets~=3.0 is what you would need.

Let us have a call later today and we can go through the steps together. We will update this thread for others to benefit.


@Bikram 

For DataOps Platform, follow the steps

  1. Install the SDK
    pip3 install streamsets~=4.0
  2. In your Control Hub, create API credentials. Save your credential_id and token
  3. Execute the following commands
python3

from streamsets.sdk import ControlHub
sch = ControlHub(credential_id='your_crediential_id', token='your_token_id')
sdc = sch.data_collectors.get(url='http://your_data_collector_hostname:18630')

builder = sch.get_pipeline_builder(engine_type='data_collector', engine_id=sdc.id)

dev_raw_data_source = builder.add_stage('Dev Raw Data Source')
trash = builder.add_stage('Trash')
dev_raw_data_source >> trash

pipeline = builder.build('My first pipeline')

sch.publish_pipeline(pipeline, commit_message='First commit of my first pipeline')

 


Hi Saleem 

I will give a try and let you know on the same.

 

Thanks & regards

Bikram_


@saleempothiwala 

 

I am getting error while trying to connect to data collector. 

 

sdc = sch.data_collectors.get(url ='https://na01.hub.streamsets.com:18630') Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "/Users/bikramrout/Library/Python/3.9/lib/python/site-packages/streamsets/sdk/utils.py", line 389, in get

    return next(i for i in self if all(getattr(i, k) == v for k, v in kwargs.items()))

  File "/Users/bikramrout/Library/Python/3.9/lib/python/site-packages/streamsets/sdk/utils.py", line 389, in <genexpr>

    return next(i for i in self if all(getattr(i, k) == v for k, v in kwargs.items()))

  File "/Users/bikramrout/Library/Python/3.9/lib/python/site-packages/streamsets/sdk/utils.py", line 389, in <genexpr>

    return next(i for i in self if all(getattr(i, k) == v for k, v in kwargs.items()))

  File "/Users/bikramrout/Library/Python/3.9/lib/python/site-packages/streamsets/sdk/sch_models.py", line 166, in __getattr__

    raise AttributeError('Could not find attribute {}.'.format(name_))

AttributeError: Could not find attribute url.


@Bikram 

You need to provide hostname for your data collector and not SCH.


@Bikram :

For DataOps, you need to use engine_url

sdc = sch.data_collectors.get(engine_url='http://sdc.cluster:18630')


@wilsonshamim 

Thanks a lot , it works for me.


@saleempothiwala 

 

Thanks a lot for your help and managed to connect to SDC using python SDK.

 

Thanks & regards

Bikram_


@Bikram, glad to know that the suggestion worked for you.

 


Reply