Question

Reading XML

  • 27 June 2023
  • 6 replies
  • 46 views

I am reading xml from s3 using data format as xml and putting it in snowflake but it is going in a single column.

I need to put it multiple columns as node avaliable


6 replies

Userlevel 5
Badge +1

@Akambe21 

 

Can you please parse xml file using xml parser processor and let me know if it helps.

Please provide me the sample xml file , i can give a try to parse the xml and load data to snowflake to different tables.

 

I tried but it is giving error of offset  0

Below is the xml expected output will be 2 rows 

<Mainx

<Version>

<Majorver>2</Majorver>

<MinorVer>0</Minorver>

</Version>

<Tests>

<Test1>

<A>27</A>

<Blocks>

<Block>

<Number>l</Number>

<Batches>

<Batch>

<BNO>2</BNO>

<BData>

<BID>4</BID>

</BData>

<CTYPE>MOBTLE</CTYPE>

<Items>

<Item>

<N>340</N>

<0></0>

<Images>

<Image>

<Imageview>Front</Imageview>

</Image>

<Image>

<ImageView>Rear</ImageView>

</Image>

</Images>

<UInfo>

<Field1></Field1>

<Field3></Field3> 

<Field5>12375</Field5>

</UInfo>

ex>0</ex>

<DCode></DCode>

</Item>

<Item>

<N>30</N>

<0></0>

<Images>

<Image>

<ImageView>Front</ImageView>

</Image>

<Image>

<Imageview>Rear</ImageView>

</Image>

</Images>

<UInfo>

<Field1></Field1>

<Field3></Field3>

<Field5>273</Field5>

</UInfo>

<Ex>0</ex>

<DCode></Dcode>

</Item>

</Items>

</Batch>

</Batches>

</Block>

</Blocks>

<BCount>l</BCount>

</Test1>

</Tests>

</Main>

Userlevel 5
Badge +1

@Akambe21 

can you please try the below code snippet in jython processor and check if it helps or not.

 

   

from java.io import BufferedReader
from java.io import InputStreamReader

for record in records:
try:

# Read the file line by line and store the document's
# text in the variable "text"
reader = BufferedReader(InputStreamReader(record.value['fileRef'].getInputStream()))
text = ''

try:
while True:
line = reader.readLine()
if line is None:
break
else:
text += line
finally:
reader.close()

record.value['text'] = text
sdc.output.write(record)

except Exception as e:
sdc.error.write(record, str(e))


 

Empty node is giving problem  where we can not find value need to achieve this without using scripting

Userlevel 4
Badge

@Akambe21 

if your main XML carries all other xml records as list then you should use Field Pivoter with ‘/’ or relevant field. This will then split each list into records,

Reply