Basically, the Scan object retrieves all the rows from the table, but what if you want to retrieve only the rows where the value of a given column is equal to something ? Let me introduce you the Filters, they work like the WHERE in SQL.
Before starting using the filters, if you know how your row_keys are stored, you can set a starting row and an ending one for your Scan, which will optimize your query.
In HBase, row_keys are stored in the lexicographic order, but you can still use salting to change the way it is stored, I will not explain salting in this topic, it would take too long and that's not the point.
Let's get back to our row bounds, you have two methods to use to set the starting and ending row
Scan scan = new Scan();
scan.setStartRow(Bytes.toBytes("row_10"));
scan.setStopRow(Bytes.toBytes("row_42"));
This will change your scanner behavior to fetch all the rows between "row_10" and "row_42".
NB : As in most of the "sub" methods (for example substring), the startRow is inclusive and the stopRow is exclusive.
Now that we can bound our Scan, we should now add some filters to our scans, there are lots of those, but we will see here the most important ones.
Use the RowPrefixFilter
:
Scan scan = new Scan();
scan.setRowPrefixFilter(Bytes.toBytes("hello"));
With this code, your scan will only retrieve the rows having a row_key starting by "hello".
Use the SingleColumnValueFilter
:
Scan scan = new Scan();
SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("myFamily"),Bytes.toBytes("myColumn"), CompareOp.EQUAL, Bytes.toBytes("42"));
scan.setFilter(filter);
With this code, you will get all the rows where the value of the column myColumn is equal to 42. You have different values for CompareOp
which are explained in the Parameters section.
-Good, but what if I want to use regular expressions
Use the RegexStringComparator
filter :
Scan scan = new Scan();
RegexStringComparator comparator = new RegexStringComparator(".hello.");
SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("myFamily"),Bytes.toBytes("myColumn"), CompareOp.EQUAL, comparator);
scan.setFilter(filter);
And you will get all the rows where the column myColumn contains hello.
Please also notice that the method Scan.setFilter()
can also take a list of Filter
as parameters