Skip to main content
Solved

i want to extract multiple fields from JSON/XML using XML Parser etc..


ashok verma
Discovered Fame

i want to extract multiple fields from JSON/XML using XML Parser etc..i am able to extarct with groovy but i want to achive like below

  1. reading a file from S3 using data_format as XML
  2. extarct multiple fileds from XML in step 2

<body>

<head>1</heaad>

<m>3</m>

<tail>2</tail>

<body>

in step 2 i want to have 2 values in my output with out using any groovy etc..

i want to achive using XML parser or filed mapper etc.. as of today i see only one value i can extarct these ex : /body/head

but i want to extarct both 

/body/head

/body/tail

 

Best answer by dima

Field Remover processor in the mode that only keeps the fields you specify should work. 

View original
Did this topic help you find an answer to your question?

7 replies

dima
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 83 replies
  • January 5, 2022

If I understand your use case correctly, I don’t think you need to use an XML Parser processor at all (it’s designed to parse XML in string fields whereas it sounds like you have complete XML files in S3). If that’s the case, just using an Amazon S3 origin with Data Type set to XML will give you every field of the XML, at which point you can decide how to handle them.


ashok verma
Discovered Fame
  • Author
  • Discovered Fame
  • 13 replies
  • January 6, 2022

Hi Dima,

in above case what you have mentioned i will get whole XML data but i need specific fields only. without any Groovy,Jython componenet how can i achive that one.


dima
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 83 replies
  • Answer
  • January 6, 2022

Field Remover processor in the mode that only keeps the fields you specify should work. 


ashok verma
Discovered Fame
  • Author
  • Discovered Fame
  • 13 replies
  • January 6, 2022

got it thanks . but in SDK when i declare like below i am getting exception like 

field_remover.action='Keep Listed Fields' 

err:

CREATION_010 - Configuration value 'Keep Listed Fields' is not a valid 'FilterOperation' enum value:

could you please let me know how to know the value present in action in SDK. 

 


dima
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 83 replies
  • January 6, 2022

Click the icon next to the drop-down once making a choice and it'll show what the underlying arg value that can be set via SDK is called. 


ashok verma
Discovered Fame
  • Author
  • Discovered Fame
  • 13 replies
  • January 7, 2022

in UI, i can select but i dont want to login into UI and from SDK only i want to know. how can i do it.


dima
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 83 replies
  • January 7, 2022

You can't. The SDK is designed to supplement the typical UI-driven workflow, not to replace it entirely. 


Reply