Question

Fetching pipeline names using python sdk takes long time than expected ?

  • 14 July 2023
  • 1 reply
  • 17 views

I am trying to get all pipeline name with below code statement

 

from streamsets.sdk import Controlhub

sch = Controlhub(credential_id="my_id", token ="Mytoken", use_webscoke_tunneling=False)

pipelines = sch.pipeline.name

filterd_name = [name for name in pipelines if name.startswith('abc')]

 

i observed its taking more than an minute to fetch all pipeline names in to the list

 

Is the righ approach to filter pipeline names from control hub or is there any issue on streamset sdk?

 

 


1 reply

Userlevel 3
Badge

Hey Hari,

Pipeline names are neither unique nor indexed, so `ControlHub.pipelines.get(name=<name>)` ends up being an O(n) operation. You'll get much better performance if you query by pipeline ID, for example, by passing `pipeline_id` into your `get`. 

Reply