On Fri, Jun 03, 2011 at 05:49:17PM -0400, Micah Anderson wrote:
> However, ever since I did that change, I've noticed an increase in I/O
> wait state on the CPUs. I've been trying to determine why, and if there
> were some things I should tune on this ext4 filesystem.
How are you measuring this?
- Ted
On Mon, Jun 06, 2011 at 12:43:10PM -0400, micah anderson wrote:
>
> Through munin graphs, unfortunately we don't have a lot of data from
> before the change, but you can see the jump early on in this graph,
> where the change was made:
>
> http://lackof.org/~taggart/tmp/willet-cpu-year.png
>
> I'll note that we also moved to squeeze from lenny at this
> time. Basically we decided to move to squeeze and then convert to ext4,
> so that throws in some other variables here too.
>
> As I mentioned before, this is a high traffic mailing list system, which
> does a lot of I/O. We're also seeing lots of rescheduling interrupts
> after the upgrade to the squeeze kernel:
>
> http://lackof.org/~taggart/tmp/willet-irqstats-year.png
Oh, I bet I know what's going on. Ext3 defaults to barriers being
off. Ext4 defaults to barriers turned on, which is safer if you have
power drops. If you have a UPS and are confident that the UPS
monitoring software is properly setup so the system will go through a
controlled, clean shutdown when the UPS power is running low, then you
could consider disabling barriers on ext4 without committing
professional sysadmin malpractice. :-)
Since mail systems tend to be very fsync() happy, and fsyncs()
translate to barriers, that's probably the explanation of what's going
on here.
- Ted