Pythian Blog: Technical Track

Let’s Deal with High Read Latencies in Cassandra

High latency values may indicate a cluster at the edge of its processing capacity, issues with the data model—such as poor choice of partition key or high levels of tombstones—or issues with the underlying infrastructure. Below are some major reasons behind read latency in Cassandra and the best practices to deal with them.

Secondary indexes

Secondary indexes in Cassandra can be useful and are tempting when your data model has changed and you need to query based on a new column. However, secondary indexes are NOT part of a partition key, and the partition key is what Cassandra uses to know where your data lives. When you run a query that uses this kind of index, Cassandra has to check each node in your ring to try to satisfy your query unless you also provide the partition key. So the best practice is to avoid them.

Consistency level

Use LOCAL_QUORUM. Cassandra avoids the latency required by validating operations across multiple data centers. If a keyspace used the Cassandra QUORUM value as the consistency level, read/write operations would have to be validated across all data centers.

Not using recommended OS settings

Here are the recommended settings:

  • Tune TCP settings.
  • Disable zone_reclaim_mode on NUMA (non-uniform memory access) systems.
  • Set user resource limits.
  • Disable swap.
  • Optimize SSDs (solid-state drives) (i.e. set the readahead value for the block device to 8 KB).
  • Set the IO scheduler to either deadline or noop.
  • Ensure that the SysFS rotational flag is set to false (zero).
Network

Whenever facing read latencies, always check if the network is causing any delays. At the OS level you can check using ping, traceroute or MTR—and check network throughput using the iftop command. For Cassandra you can use nodetool proxyhistogram.

Tombstones

Serious performance problems can occur if reads encounter large numbers of tombstones. You can find the number of tombstones returned in a particular query by running the query in cqlsh with tracing enabled. Nodetool tablestats output shows statistics for the number of tombstones encountered recently for each table. For more information on tombstones read Examining the Lifecycle of Tombstones in Apache Cassandra | Official Pythian®® Blog.

Read repair

Enabled read_repair_chance and dclocal_read_repair_chance put internal read load cost on your cluster that serves little purpose and provides no guarantees. Disable this feature for all tables to see some improvement in read latencies.

High number of SSTables (Sorted Strings Tables)

If a high number of SSTables were accessed to serve a single read, this could cause your high read latency. You can check this using nodetool tablehistograms. As a fix, make sure your compactions aren’t lagging behind. Try to tune either the compaction strategy, or the concurrent_compactors or compaction_throughput options.

Wide partition

Wide partitions in Cassandra can put tremendous pressure on the Java heap and garbage collector, impact read latencies, and can cause issues ranging from load shedding and dropped messages to crashed and downed nodes. The nodetool tablehistogram “Partition Size” can help you assess the size of your partitions.

Bad query pattern

Queries using count(*) allow filtering and, in clause, are a big no-no.

Bloom filter false positive

Since a high number of Bloom filter false positive counts can cause read latency, you can tune bloom_filter_fp_chance.

Additional considerations

In addition to the above culprits, which can potentially cause slowdowns and delays, here are a few more things to consider when dealing with high read latencies:

Compression

Run nodetool tablestats and check the value of the compression ratio. For example, if it’s 0.8949133028999068 it indicates that data is hardly compressed and is taking up almost 90 percent of the original space, which isn’t particularly great. Disabling compression would take some extra disk space but provide benefits in terms of read latencies.

Note: After disabling compression, you need to rewrite the sstables on all the nodes with the help of nodetool upgradesstables.

trickle_fsync

Set this to true in cassandra.yaml if you’re using SSDs.

High GC (garbage collection) pauses

These Stop the World events can slow down (time out) your reads. There are many reasons you might experience long GC pauses including a bad data model, insufficient max heap size or other untuned GC parameters.

Check Tracing

If you find any specific query is taking a long time, you can set “TRACING ON” in cqlsh, then run a problematic query or “nodetool settraceprobability 0.001” and check the trace to see what’s going on.

I hope this information is helpful! If you have any thoughts or questions, please leave them in the comments.

No Comments Yet

Let us know what you think

Subscribe by email