Skip to main content
Solved

compare the average transform time


maytim00
Fan

In streamsets pipeline,   I see the "run history" durations show in the history are created each time when I start and stop even retry the job, but how to isolate the time over the specific period and how to compare the average transform time from last month and previous?

Best answer by Anonymous

Hello,

thanks for your clarifying! I think you might be looking for Time Series Analysis. Here is a post on how to check if it’s enabled and how to request it if its not. 

 

 

 

View original
Did this topic help you find an answer to your question?

5 replies

  • 0 replies
  • March 29, 2022

Hello!
 

You can take a look at the History tab in the job properties or monitor panel to view past run times. Here is some more information about that.

https://docs.streamsets.com/portal/controlhub/latest/onpremhelp/controlhub/UserGuide/Jobs/Jobs-Monitoring.html#concept_ghf_tn4_ylb


maytim00
Fan
  • Author
  • Fan
  • 4 replies
  • March 30, 2022

thank you, I saw the information from “history” tab but it’s just giving the duration of how long the pipeline run.  
I wonder how to “calculate the average transformation duration for a given transformation within ours over a desired period of time. Subtract datetime an event entered pipe from the datetime it exited pipeline and average over a requested time range for incoming and outgoing data per pipeline over a requested time range?  Substract outgoing the average events per seconds (EPS) from incoming EPS to get the comparison value.?  Do we need to pull  each pipeline history and calculate manually?


  • 0 replies
  • Answer
  • March 30, 2022

Hello,

thanks for your clarifying! I think you might be looking for Time Series Analysis. Here is a post on how to check if it’s enabled and how to request it if its not. 

 

 

 


Giuseppe Mura
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 37 replies
  • March 31, 2022

@maytim00 , the answer @Brenna gave you actually depends on whether you are on Control Hub 3.x or DataOps.
If the latter, we no longer have a time series database; if you want to perform time-series analysis of executions, we’d recommend that you download the metrics (e.g. using the Control Hub REST APIs) and load them to your time series database of choice.

You could indeed use a Data Collector pipeline to perform that operation at regular intervals, as needed.


maytim00
Fan
  • Author
  • Fan
  • 4 replies
  • March 31, 2022

 

@Giuseppe Mura - thank you and that’s what I thought @Brenna answer is not exactly what we want. to know, Is there a way we could see how long the job ran/stopped in a month, year, etc…. ?   we needed to download the metrics and load them to the time series database?  The time series db is not captured this each time we run?  How can I use a Data Collector pipeline to perform that operation at regular intervals, as needed in a month, year, or etc..?

 

thank you,

 


Reply