How to speed up hbase read data efficiency

I. Tool development background:

Business is currently the main calculation logic of the data source is hbase, but we do not have a tool to hbase data for a single more transformative data to verify the logic, the previous practice is to hbase data are exported, changed and then reloaded back, or use the hbase shell interface to change (there is a limitation that hbase shell get out of the data Chinese characters can not be seen), inefficient and not easy to automate the regression. The previous approach was to export all the hbase data and reload it back after changing it, or use the hbase shell interface to change it (one limitation is that the hbase shell can't see the Chinese characters in the data it gets out), which is inefficient and doesn't make it easy to automate regression. Testing is very passive.

So at the suggestion of my sister, I expected to have tools to manipulate hbase data to improve our efficiency and validate big data.

II. Tools Introduction:

Tools written in java jar package, in ihbase.sh simple data processing on the jar package to call. The main function of the data for the addition, deletion, modification and checking , support for gbk, utf8 encoding. By configuring a configuration file in xml format (or not).

Three. Properties.sh: in which to configure hbase, hadoop and other environment variables, which is currently the default configuration of our test cluster as a reference. Note that some of the basic jar package must have.

2.config: xml format configuration hbase export data information. In the massive export data or according to the rowkey everywhere when the data is used.

3. ihbase.sh tool to use the interface.

4. Brief introduction to the use:

Operation are in the bin directory.

I. Query data functions

1. . /ihbase -t table_name -rowkey rowkey -enc encoding -s

-The purpose of -enc encoding is to specify the encoding in which the hbase data is to be read out, which is currently supported by utf8 and gbk. If you don't add this parameter, then it will be read out in utf by default.

View the data in table_name, rowkey as rowkey.

2. . /ihbase -t table_name -k CF:COLUMN=value -k CF:COLUMN=value -w column -s

. /ihbase.sh -t "test" -k "a:name=jkfs" -k "a:id=111" -w "nid" -s

Query for rows that satisfy a:name=jkfs and a:id=111, if add -w parameter, only the columns passed in by -w are displayed. Without -w, all columns are displayed.

(This command is generally seldom used, because the filer using this = needs to sweep the hbase full table, because this way is seldom used, so for the time being did not consider how to optimize)

II. Delete data function

1. . /ihbase -t table_name -rowkey rowkey -delete

Delete based on rowkey.

2. . /ihbase -t table_name -k CF:COLUMN=value -k CF:COLUMN=value -delete

. /ihbase -t test -k "a:id=jaks" -k "name=a:jkasj" -delete

Delete the data that satisfies a:id=jaks and a:name=jkasf (at present, AND is supported for deletion, OR is not supported, the principle is the same as that way of querying)

. with the query in the same way)

Three. Add data function

. /ihbase -t table_name -rowkey rowkey -v CF:COLUMN=value -v CF:COLUMN=value -a

. /ihbase.sh -t "a:test" -rowkey "111" -v "a:name=jkfs" -v "id=111" -a

Add data with rowkey 111, followed by key, value can be specified arbitrarily. If the rowkey already exists, an error is reported not to add it.

Adding data requires that the rowkey must be specified

IV. Modifying Data Functions

1. . /ihbase -t table_name -rowkey rowkey -v key=value -v key=value -u

. /ihbase.sh -t "test" -rowkey "1111" -v "name=jkasjd" -u

Modify a row's column data based on rowkey

2. . /ihbase -t table_name -k key=value -k CF:COLUMN=value -v CF:COLUMN=value -u

. /ihbase.sh -t test -k "a:name=jksdj" -k "mge=kjdk" -v "a:name=huanyu" -u

Based on the non-row key being modification, -k provides the condition for modification, -u provides the number data of the column to be modified. (The principle of the same query, scan the whole table)

V. Export hbase specified columns of data (all data)

. /ihbase -f config There is a restriction here: the configuration file for the exported table must be placed in the bin directory. If you don't like this, you can modify the ihbase script to adjust it.

config is the xml configuration for exporting table information

<?xml version="1.0"? >

<table>

<in_enc>gbk</in_enc>

<out_enc>utf8</out_enc& gt;

<field_separator>\001</field_separator>

<record_ separator>\002</record_separator>

bmw_shops:title

</column>

<outpath>/ test/huanyu/hbase</outpath>

</table>

</configuration>

in_enc:Encoding in hbase to parse the encoding used for data in hbase.

out_enc:The encoding to output to the hdfs path.

tablename: the table name of the operation.

field_separator: used as separator between multiple fields if exporting multiple fields.

record_separator: row separator for exported data. (except for the \n character, since it will be row-separated by default).

column: field to export. If the field does not exist then '' is exported.

outpath: path to the exported data. (In fact, this path should be deleted first in the code, but for fear that the user forgets to modify the path mistakenly deleted, so did not do so)

How many regions start how many maps.

VI. Export the data of the specified column of the specified row of hbase

. /ihbase -f config -rf rfile

Configure the exported columns, character encoding table name and other information inside config.

rfile Configure which rowkey's data to export. (One rowkey one row)

Similar to above.

VII. Help Information

. /ihbase -h

Displaying help information

Frequently Asked Questions about Baidu Mobile

What does collection of information in schools mean

Seek English translation, urgent, high reward points