Useful code to work with HBase

While working with HBase in CreditVidya, we come across different requirements which we found difficult to get an answer on the internet. The repo is built to share our such experience that might be helpful to developer come across under the same use case.

Issue: Multiple regions starting with the same key

The hbase .meta region become offline, Instead of making it online we end up corrupting it. So we follow offline meta repair using org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair mapreduce job. It did recreate .meta but now we started observing issue like region inconsistency, a region overlapping and multiple regions starting with the same key issues. Most of the issue can resolve using hbase hbck -fixAssignments -fixMeta tableName cmd. But an issue like multiple regions starting with the same key was not getting resolved. So we come up with a solution based on merging region. But doing it through hbase shell was difficult for us until you are comfortable with writing ruby code using shell ruby wrappers. Instead, We used java client to identify affected region and iteratively merged affected region.

Resolution in details

Java code for reading HBase exported sequence

We used HBase export utility to take daily dump/backup of HBase tables. Dump data is further ingested to s3 to make it available for data science (i.e. DS) team. Since DS team works with a subset of data where the subset is not defined by time range. We need to read dump data and create segregated data such that it is usable by DS team. Data segregation can be achieved through spark, but being startup company we can’t afford to run continuous spark job just for data ingestion, we need something standalone that can run on spot instances and need not to be a spark based.

How to read Hbase sequence through Java

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
doc		doc
gradle/wrapper		gradle/wrapper
hbase-sequence-file-reader		hbase-sequence-file-reader
resolve-region-multisamekey-issue		resolve-region-multisamekey-issue
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Useful code to work with HBase

Issue: Multiple regions starting with the same key

Java code for reading HBase exported sequence

About

Releases

Packages

Languages

License

ZhiwenDeng-dev/HBaseRecoveryTools

Folders and files

Latest commit

History

Repository files navigation

Useful code to work with HBase

Issue: Multiple regions starting with the same key

Java code for reading HBase exported sequence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages