Pythian Blog: Technical Track

How to Perform (UDC) User-Defined Compactions in Cassandra

User-defined compactions allow us to manually select which files should be compacted. This enables us to reclaim space and limit the size of compaction so it can fit into the remaining space. These compactions are relevant only for SizeTieredCompactionStrategy (STCS) and are most useful in specific situations:

  • There is one big SSTable (Sorted Strings Table) as a result of previous major compactions, and a number of smaller files.
  • The node is running out of space because of overstreaming, and minor compactions will fail due to lack of disk space.

Prior to Cassandra 3.4 you had to use JMX (Java Management Extensions) directly. For more recent versions, you can use JMX or nodetool. I show both methods here.

A step-by-step guide to performing UDC on Cassandra version > 3.4

1. We need to determine which files we want to compact. For demonstration purposes we can assume that smaller files created in April should contain updates and tombstones for the rows in file from October, which would mean that compacting these together would be beneficial.

cassandra@hostname-1 /data/cassandra/data/demo_keyspace/demo_table $ ls -l *Data*
-rw-rw-r-- 1 cassandra cassandra 9919563003 Oct 5 2020 demo_keyspace-demo_table-md-1290-Data.db
-rw-rw-r-- 1 cassandra cassandra 13181126491 Dec 5 22:50 demo_keyspace-demo_table-md-2615-Data.db
-rw-rw-r-- 1 cassandra cassandra 12490613138 Apr 2 04:35 demo_keyspace-demo_table-md-4027-Data.db
-rw-rw-r-- 1 cassandra cassandra 785650141 Apr 10 14:50 demo_keyspace-demo_table-md-4121-Data.db
-rw-rw-r-- 1 cassandra cassandra 784706108 Apr 18 00:27 demo_keyspace-demo_table-md-4206-Data.db
-rw-rw-r-- 1 cassandra cassandra 831251133 Apr 25 03:02 demo_keyspace-demo_table-md-4290-Data.db
-rw-rw-r-- 1 cassandra cassandra 17660000 Apr 25 06:43 demo_keyspace-demo_table-md-4291-Data.db
-rw-rw-r-- 1 cassandra cassandra 17403073 Apr 25 13:32 demo_keyspace-demo_table-md-4292-Data.db

2. Run nodetool compact:

nodetool compact --user-defined /data/cassandra/data/demo_keyspace/demo_table/demo_keyspace-demo_table-md-1290-Data.db,/data/cassandra/data/demo_keyspace/demo_table/demo_keyspace-demo_table-md-4121-Data.db,/data/cassandra/data/demo_keyspace/demo_table/demo_keyspace-demo_table-md-4206-Data.db,/data/cassandra/data/demo_keyspace/demo_table/demo_keyspace-demo_table-md-4290-Data.db,/data/cassandra/data/demo_keyspace/demo_table/emo_keyspace-demo_table-md-4291-Data.db,/data/cassandra/data/demo_keyspace/demo_table/emo_keyspace-demo_table-md-4292-Data.db

3. After the successful run, you can see SSTables are compacted:

cassandra@hostname-1 /data/cassandra/data/demo_keyspace/demo_table $ ls -l *Data*
-rw-rw-r-- 1 cassandra cassandra 13181126491 Dec 5 22:50 demo_keyspace-demo_table-md-2615-Data.db
-rw-rw-r-- 1 cassandra cassandra 12490613138 Apr 2 04:35 demo_keyspace-demo_table-md-4027-Data.db
-rw-rw-r-- 1 cassandra cassandra 12178742872 Apr 25 15:22 demo_keyspace-demo_table-md-4293-Data.db

Note: UDC can behave differently if you have incremental repair enabled, or if it was enabled in the past and SSTables still have the Repaired value set to more than 0.

A step-by-step guide to performing UDC on Cassandra version < 3.4

1. Flush the memtables to the disk:

nodetool flush

2. Prepare the JMX terminal program. If you don’t already have it, you can download it here: CYCLOPSGROUP DOCS – Jmxterm. You need to copy one jar file into a convenient directory.

$ java -jar jmxterm-1.0-alpha-4-uber.jar
Welcome to JMX terminal. Type "help" for available commands.

3. Connect to the Cassandra process on the JMX port:

$>open 7199
#Connection to 7199 is opened
$>

4. Set the relevant domain and beans:

$>domain org.apache.cassandra.db
#domain is set to org.apache.cassandra.db
$>bean org.apache.cassandra.db:type=CompactionManager
#bean is set to org.apache.cassandra.db:type=CompactionManager

5. We need to determine which files we want to compact. Just like above, we can assume that smaller files created in April should contain updates and tombstones for the rows in file from October, so compacting these together would be beneficial.

cassandra@hostname-1 /data/cassandra/data/demo_keyspace/demo_table $ ls -l *Data*
-rw-rw-r-- 1 cassandra cassandra 9919563003 Oct 5 2020 demo_keyspace-demo_table-md-1290-Data.db
-rw-rw-r-- 1 cassandra cassandra 13181126491 Dec 5 22:50 demo_keyspace-demo_table-md-2615-Data.db
-rw-rw-r-- 1 cassandra cassandra 12490613138 Apr 2 04:35 demo_keyspace-demo_table-md-4027-Data.db
-rw-rw-r-- 1 cassandra cassandra 785650141 Apr 10 14:50 demo_keyspace-demo_table-md-4121-Data.db
-rw-rw-r-- 1 cassandra cassandra 784706108 Apr 18 00:27 demo_keyspace-demo_table-md-4206-Data.db
-rw-rw-r-- 1 cassandra cassandra 831251133 Apr 25 03:02 demo_keyspace-demo_table-md-4290-Data.db
-rw-rw-r-- 1 cassandra cassandra 17660000 Apr 25 06:43 demo_keyspace-demo_table-md-4291-Data.db
-rw-rw-r-- 1 cassandra cassandra 17403073 Apr 25 13:32 demo_keyspace-demo_table-md-4292-Data.db

6. We can run user-defined compaction only for selected files. The JMX session will hang until compaction is completed.

$>run forceUserDefinedCompaction demo_keyspace-demo_table-md-1290-Data.db,demo_keyspace-demo_table-md-4121-Data.db,demo_keyspace-demo_table-md-4206-Data.db,demo_keyspace-demo_table-md-4290-Data.db,demo_keyspace-demo_table-md-4291-Data.db,demo_keyspace-demo_table-md-4292-Data.db

#calling operation forceUserDefinedCompaction of mbean org.apache.cassandra.db:type=CompactionManager
......
#operation returns:
null

7. We can monitor progress using the second session and standard command “nodetool compactionstats.”

cassandra@hostname-1 ~ $ nodetool compactionstats
pending tasks: 1
compaction type keyspace table completed total unit progress
Compaction demo_keyspace demo_table 1628935106 40715608912 bytes 4.00%
Active compaction remaining time : 0h09m42s

8. Once compaction is completed, we can see the effect on files:

cassandra@hostname-1 /data/cassandra/data/demo_keyspace/demo_table $ ls -l *Data*
-rw-rw-r-- 1 cassandra cassandra 13181126491 Dec 5 22:50 demo_keyspace-demo_table-md-2615-Data.db
-rw-rw-r-- 1 cassandra cassandra 12490613138 Apr 2 04:35 demo_keyspace-demo_table-md-4027-Data.db
-rw-rw-r-- 1 cassandra cassandra 12178742872 Apr 25 15:22 demo_keyspace-demo_table-md-4293-Data.db

I hope this was helpful. If you have any thoughts or questions, please leave them in the comments.

No Comments Yet

Let us know what you think

Subscribe by email