如何提高python写hbase的性能

答案:2 悬赏:60 手机版

解决时间 2021-02-28 05:14

提问者网友：临风不自傲
2021-02-27 21:35

如何提高python写hbase的性能

最佳答案

五星知识达人网友：第四晚心情
2021-02-27 22:58

HBase Client会在数据累积到设置的阈值后才提交Region Server。这样做的好处在于可以减少RPC连接次数。同时，我们得计算一下服务端因此而消耗的内存：hbase.client.write.buffer * hbase.regionserver.handler.count。在减少PRC次数和增加服务器端内存之间找到平衡点。

全部回答

1楼网友：舍身薄凉客
2021-02-28 00:07

configuration conf = hbaseconfiguration.create(); string tablename = "testtable"; scan scan = new scan(); scan.setcaching(10000); scan.setcacheblocks(false); conf.set(tableinputformat.input_table, tablename); clientprotos.scan proto = protobufutil.toscan(scan); string scantostring = base64.encodebytes(proto.tobytearray()); conf.set(tableinputformat.scan, scantostring); javapairrdd myrdd = sc .newapihadooprdd(conf, tableinputformat.class, immutablebyteswritable.class, result.class); 在spark使用如上hadoop提供的标准接口读取hbase表数据（全表读），读取5亿左右数据，要20m+，而同样的数据保存在hive中，读取却只需要1m以内，性能差别非常大。转载，仅供参考。

我要举报

如以上问答信息为低俗、色情、不良、暴力、侵权、涉及违法等信息，可以点下面链接进行举报！