InnoDB Adaptive Hash Index can be a performance bottleneck on high-end MySQL 5.6 installations - which wasn’t the case with 5.5. Turning it off gave us much better (consistent) read performance by utilizing more CPU+IO resources and also reduced occurences of replication lag. Docs say:
The architectural changes in MySQL 5.6 and higher make more workloads suitable for disabling the adaptive hash index than in earlier releases, although it is still enabled by default.
Here at Base CRM we’re relying heavily on MySQL. A little bit late to the party we started to upgrade our MySQL 5.5 installations to Percona Server 5.6. Main reasons for that include well known perks like increased visibility, scalability, replication features and of course - better performance.
We started with our mobile sync service database. It’s a write-intensive, high-end MySQL installation running on AWS i2.4xlarge instances - making good use of those SSD’s that come with it. There’s also read-intensive workload that’s running on MySQL slaves - and it was the main place where we have noticed performance degradation after upgrading to MySQL 5.6.
Apart from longer response times from services workers we have noticed issues with
- constant replica lag under read workload, forcing workers to fallback to Master
- worse buffer pool hit rate on read replicas
- underutilized IO
- unredutilized CPU
First blind guess was the Performance Schema. It is known (FIXME: link) to provide some overhead - so we tried to disable it. Improvement was noticable, but still far from old setup. Notice that with every new MySQL version the overhead is getting less and less noticable and it is probably best to keep it turned on - visibility matters.
IO performance and spare resources
Having tons of metrics in our Graphtie cluster (measure all the things!) from old and new setup I have started to look for differences and hints how to fix the issue.
First thing that I noticed were differences in IO operations per second on MySQL data volume (mdraid from four SSD disks). Main reason for that was different stripe chunk size on old(4kb) and new(16kb) array, undetected before the upgrade (due to shortcomings of our provisioning process).
While provisioning the new instance with different mdraid settings and seeing underutilized resources I have started to implement recomendations from great presentation by Pinterest enginners - [All your IOPS are belong to us]. Those guys bencharked the same instance types as we were using which gave a very good starting point for further optimizations.
64kb stripe chunk size provided best replication performance.
Spare resources in terms of CPU and IOPS made me do some additional tweaks:
- increased innodb_io_read_threads to 16
- decreased innodb_Read_thead_threshold to 32 (default: 56)