Scenario:
When you try to ingest into HBase, you're faced with the following error:
UNKNOWN com.streamsets.pipeline.api.StageException: HBASE_26 - Error while writing to HBase: 'java.lang.IllegalArgumentException: KeyValue size too large' at com.streamsets.pipeline.stage.destination.hbase.HBaseTarget.throwStageException(HBaseTarget.java:384) at com.streamsets.pipeline.stage.destination.hbase.HBaseTarget.write(HBaseTarget.java:373) at com.streamsets.pipeline.configurablestage.DTarget.write(DTarget.java:34) at com.streamsets.datacollector.runner.StageRuntime$2.call(StageRuntime.java:238) at com.streamsets.datacollector.runner.StageRuntime$2.call(StageRuntime.java:222) at com.streamsets.datacollector.runner.StageRuntime.execute(StageRuntime.java:180) at com.streamsets.datacollector.runner.StageRuntime.execute(StageRuntime.java:249) at com.streamsets.datacollector.runner.StagePipe.process(StagePipe.java:231) at com.streamsets.datacollector.execution.runner.common.ProductionPipelineRunner.processPipe(ProductionPipelineRunner.java:718) at com.streamsets.datacollector.execution.runner.common.ProductionPipelineRunner.lambda$runSourceLessBatch$2(ProductionPipelineRunner.java:746) at com.streamsets.datacollector.runner.PipeRunner.forEach(PipeRunner.java:78) at com.streamsets.datacollector.execution.runner.common.ProductionPipelineRunner.runSourceLessBatch(ProductionPipelineRunner.java:745) at com.streamsets.datacollector.execution.runner.common.ProductionPipelineRunner.runPollSource(ProductionPipelineRunner.java:532) at com.streamsets.datacollector.execution.runner.common.ProductionPipelineRunner.run(ProductionPipelineRunner.java:361) at com.streamsets.datacollector.runner.Pipeline.run(Pipeline.java:500) at com.streamsets.datacollector.execution.runner.common.ProductionPipeline.run(ProductionPipeline.java:109) at com.streamsets.datacollector.execution.runner.common.ProductionPipelineRunnable.run(ProductionPipelineRunnable.java:74) at com.streamsets.datacollector.execution.runner.standalone.StandaloneRunner.start(StandaloneRunner.java:740) at com.streamsets.datacollector.execution.runner.slave.SlaveStandaloneRunner.start(SlaveStandaloneRunner.java:157) at com.streamsets.datacollector.execution.runner.common.AsyncRunner.lambda$start$3(AsyncRunner.java:151) at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.lambda$call$0(SafeScheduledExecutorService.java:249) at com.streamsets.datacollector.security.GroupsInScope.execute(GroupsInScope.java:33) at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.call(SafeScheduledExecutorService.java:245) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: KeyValue size too large at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1570) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.validatePut(BufferedMutatorImpl.java:152) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:127) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:113) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1080) at com.streamsets.pipeline.stage.destination.hbase.HBaseTarget.performPut(HBaseTarget.java:432) at com.streamsets.pipeline.stage.destination.hbase.HBaseTarget.doPut(HBaseTarget.java:420) at com.streamsets.pipeline.stage.destination.hbase.HBaseTarget.writeBatch(HBaseTarget.java:396) at com.streamsets.pipeline.stage.destination.hbase.HBaseTarget.lambda$write$1(HBaseTarget.java:369) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at com.streamsets.pipeline.stage.destination.hbase.HBaseTarget.write(HBaseTarget.java:368) ... 27 moreGoal:
To be able to ingest data containing keys that are larger than what's currently supported.
Solution:
The property that typically controls the size of key value on the client side is `hbase.client.keyvalue.maxsize`. Setting this to 0 would typically allow you to carry out inserts of keys of any size. But if your key size is something that'd cross, say, 1-2 GB, then you should be aware that this may have performance implications depending on your use case.
This parameter can be set in the following ways,
=> hbase-site.xml on the client node.
=> if using code, then in your Configuration object.
This does not require an HBase service restart to the best of my knowledge.
Alternatively, the configuration (hbase.client.keyvalue.maxsize) can also be applied in the HBase extra configs via the SDC UI. That way, you needn't modify 'hbase-site.xml' on all the HBase client nodes.