Solved

compare the average transform time

  • 29 March 2022
  • 5 replies
  • 99 views

In streamsets pipeline,   I see the "run history" durations show in the history are created each time when I start and stop even retry the job, but how to isolate the time over the specific period and how to compare the average transform time from last month and previous?

icon

Best answer by Anonymous 30 March 2022, 18:45

View original

5 replies

Hello!
 

You can take a look at the History tab in the job properties or monitor panel to view past run times. Here is some more information about that.

https://docs.streamsets.com/portal/controlhub/latest/onpremhelp/controlhub/UserGuide/Jobs/Jobs-Monitoring.html#concept_ghf_tn4_ylb

thank you, I saw the information from “history” tab but it’s just giving the duration of how long the pipeline run.  
I wonder how to “calculate the average transformation duration for a given transformation within ours over a desired period of time. Subtract datetime an event entered pipe from the datetime it exited pipeline and average over a requested time range for incoming and outgoing data per pipeline over a requested time range?  Substract outgoing the average events per seconds (EPS) from incoming EPS to get the comparison value.?  Do we need to pull  each pipeline history and calculate manually?

Hello,

thanks for your clarifying! I think you might be looking for Time Series Analysis. Here is a post on how to check if it’s enabled and how to request it if its not. 

 

 

 

Userlevel 3
Badge

@maytim00 , the answer @Brenna gave you actually depends on whether you are on Control Hub 3.x or DataOps.
If the latter, we no longer have a time series database; if you want to perform time-series analysis of executions, we’d recommend that you download the metrics (e.g. using the Control Hub REST APIs) and load them to your time series database of choice.

You could indeed use a Data Collector pipeline to perform that operation at regular intervals, as needed.

 

@Giuseppe Mura - thank you and that’s what I thought @Brenna answer is not exactly what we want. to know, Is there a way we could see how long the job ran/stopped in a month, year, etc…. ?   we needed to download the metrics and load them to the time series database?  The time series db is not captured this each time we run?  How can I use a Data Collector pipeline to perform that operation at regular intervals, as needed in a month, year, or etc..?

 

thank you,

 

Reply