Question

Need help with getting data from API and writing it in Parquet format.


Hi, I am trying to get some data from a SOAP API, I get the data in the from of an XML response. The requirement is to convert the data into Parquet format and store it on an ADLS GEN 2 storage.

As far as I understand, I can use the data collector to write files in AVRO and then convert the AVRO to Parquet using a whole file transformer.

I know that transformer can write in parquet directly, so is there any way for me to skip the intermediate AVRO file creation? 


2 replies

Userlevel 2
Badge

Hi @Paawan In SDC we don’t have any destination which can create the Parquet files directly. 

https://docs.streamsets.com/portal/platform-datacollector/latest/datacollector/UserGuide/Apx-DataFormats/DataFormat_Title.html#concept_jn1_nzb_kv

So I don’t think you can skip the intermediate AVRO file creation.

Hi,

  I am trying to get some data from a SOAP API, I get the data in the from of an XML response. Facing issue while connecting via http client. Please help to resolve the issue.

error message :: 415 unsupported media type

 

 

API config Details:

auth type is Basic authorization

Method :: POST

url :: WSDL detailts

Request Body:: Soap Request

content type :: application/xml

authentication :: Basic Auth

data format     ::Text/XML

Reply