From: Ted Ts'o Subject: Re: [PATCH, RFC] Don't do page stablization if !CONFIG_BLKDEV_INTEGRITY Date: Thu, 8 Mar 2012 10:54:19 -0500 Message-ID: <20120308155419.GB6777@thunk.org> References: <4F57F523.3020703@redhat.com> <4F581BF6.8000305@zabbo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Eric Sandeen , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org To: Zach Brown Return-path: Received: from li9-11.members.linode.com ([67.18.176.11]:55066 "EHLO test.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757258Ab2CHPyY (ORCPT ); Thu, 8 Mar 2012 10:54:24 -0500 Content-Disposition: inline In-Reply-To: <4F581BF6.8000305@zabbo.net> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Wed, Mar 07, 2012 at 09:39:50PM -0500, Zach Brown wrote: > > >Can you devise a non-secret testcase that demonstrates this? > > Hmm. I bet you could get fio to do it. Giant file, random mmap() > writes, spin until the CPU overwhelms writeback? Kick off a bunch of fio processes, each in separate I/O cgroups set up so that each of the processes get a "fair" amount of the I/O bandwidth. (This is quite common in cloud deployments where you are packing a huge number of tasks onto a single box; whether the tasks are inside virtual machines or containers don't really matter for the purpose of this exercise. We basically need to simulate a system where the disks are busy.) Then in one of those cgroups, create a process which is constantly appending to a file using buffered I/O; this could be a log file, or an application-level journal file; and measure the latency of that write system call. Every so often, writeback will push the dirty pages corresponding to the log/journal file to disk. When that happens, and page stablization is enabled, the latency of that write system call will spike. And any time you have a distributed system where you are depending on a large number of RPC/SOAP/Service Oriented Architecture Enterpise Service Bus calls (I don't really care which buzzword you use, but IBM and Oracle really like the last one :-), long-tail latencies are what kill your responsiveness and predictability. Especially when a thread goes away for a second or more... - Ted