|-h||hostname. Defaults to "localhost." If you do not specify a host repair is run on the same host that the command is executed from.|
|-p||JMX port. The default is 7199.|
|-u||username. Only required if JMX security is enabled.|
|-pw||password. Only required if JMX security is enabled.|
|-local||Only compare and stream data from nodes in the "local" data center.|
|-pr||"Partitioner Range" repair. Only repair the primary token range for a replica. Faster than repairing all ranges of your replicas, as it prevents repairing the same data multiple times. Note that if you use this option for repairing one node, you must also use it for the rest of your cluster, as well.|
|-par||Run repairs in parallel. Gets repairs done faster, but significantly restricts the cluster's ability to handle requests.|
|-hosts||Allows you to specify a comma-delimited list of nodes to stream your data from. Useful if you have nodes that are known to be "good." While it is documented as a valid option for Cassandra 2.1+, it also works with Cassandra 2.0.|
Cassandra Anti-Entropy Repairs:
Anti-entropy repair in Cassandra has two distinct phases. To run successful, performant repairs, it is important to understand both of them.
Merkle Tree calculations: This computes the differences between the nodes and their replicas.
Data streaming: Based on the outcome of the Merkle Tree calculations, data is scheduled to be streamed from one node to another. This is an attempt to synchronize the data between replicas.
Stopping a Repair:
You can stop a repair by issuing a STOP VALIDATION command from nodetool:
$ nodetool stop validation
How do I know when repair is completed?
You can check for the first phase of repair (Merkle Tree calculations) by checking
You can check for repair streams using
nodetool netstats. Repair streams will also be visible in your logs. You can
grep for them in your system logs like this:
$ grep Entropy system.log INFO [AntiEntropyStage:1] 2016-07-25 07:32:47,077 RepairSession.java (line 164) [repair #70c35af0-526e-11e6-8646-8102d8573519] Received merkle tree for test_users from /192.168.14.3 INFO [AntiEntropyStage:1] 2016-07-25 07:32:47,081 RepairSession.java (line 164) [repair #70c35af0-526e-11e6-8646-8102d8573519] Received merkle tree for test_users from /192.168.16.5 INFO [AntiEntropyStage:1] 2016-07-25 07:32:47,091 RepairSession.java (line 221) [repair #70c35af0-526e-11e6-8646-8102d8573519] test_users is fully synced INFO [AntiEntropySessions:4] 2016-07-25 07:32:47,091 RepairSession.java (line 282) [repair #70c35af0-526e-11e6-8646-8102d8573519] session completed successfully
Active repair streams can also be monitored with this (Bash) command:
$ while true; do date; diff <(nodetool -h 192.168.0.1 netstats) <(sleep 5 && nodetool -h 192.168.0.1 netstats); done
How to check for stuck or orphaned repair streams?
On each node, you can monitor this with
nodetool tpstats, and check for anything "blocked" on the "AntiEntropy" lines.
$ nodetool tpstats Pool Name Active Pending Completed Blocked All time blocked ... AntiEntropyStage 0 0 854866 0 0 ... AntiEntropySessions 0 0 2576 0 0 ...