Tuesday, August 4, 2009

ZFS filesystems with compression enabled made my system unresponsive

At home, I'm running some services onto a Solaris Express Community Edition. The last time I live-upgraded it was to build 113. Before running SXCE, I was running Solaris 10 on it. This workstation also manages a RAIDZ-1 pool composed of four USB drives. I was perfectly aware about the performance penalty I was paying for running USB instead of eSATA (or even internal SATA), but there was no choice, then. By the way, that pool is just used to offline some snapshot backup scheduled at night, so I wasn't usually hit by the slowness of this solution.

The most annoying bug affecting this setup was this:

6586537: async zio taskqs can block out userland commands

The effect was the unresponsiveness of the userland commands I was using, including Xorg itself, when doing some big I/O on a compressed file system of that pool. Originally, the compressed file systems were lzjb and moved to gzip-based compression as soon as it was released, but before knowing I was affected by that bug. The bug really made the system unusable: sometimes even the keyboard echo could be delayed more than 5 seconds. I haven't any other machine running compressed ZFS file systems and I couldn't test the bug status so often. Moreover, that's the typical machine you'd better not touch, unless you've got a lot of spare time to recover from a disaster (just in case). With time, at the end, I just learned to live with it.

The good news is that I just realized that the bug was fixed and committed in SXCE build 115 and scheduled for Solaris 10 Update 9. I live upgraded my old SXCE to build 116, the same I'm happily running on my laptop, and I must recognize that the system is really, really much more responsive when ZFS compressed file systems are being stressed. Maybe there also were improvements on the USB side, who knows.

If you're running older build affected by this bug, upgrade now.

No comments: