To: linux-kernel@vger.kernel.org
Cc: linux-ext4@vger.kernel.org, Mingming Cao <cmm@us.ibm.com>
Subject: 2.6.32+: ext4 direct-io kernel thread invasion
From: Nix <nix@esperi.org.uk>
Emacs: you'll understand when you're older, dear.
Date: Mon, 18 Jan 2010 23:40:23 +0000
Message-ID: <87y6jv9e3c.fsf@spindle.srvr.nix>
User-Agent: Gnus/5.1008 (Gnus v5.10.8) XEmacs/21.5-b29 (linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-DCC-x.dcc-servers-Metrics: spindle 104; Body=5 Fuz1=5 Fuz2=5
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Length: 1741
Lines: 33

So I upgraded one of my servers to 2.6.32 recently. It's got twelve ext4
filesystems on it, right now, and has direct I/O enabled because I have
one program that wants to do direct I/O to one of those filesystems on
rare occasions, and because you never know but someone might install
something else that wants to do direct I/O.

But what did I find in 2.6.32 but a new set of per-CPU, per-sb
workqueues whose raison d'etre appears to be something related to direct
I/O. Per-CPU, per-sb, that's a lot. Their full name is apparently
'ext4-dio-unwritten', apparently something to do with writing extents
for direct I/O-written blocks. But, boyoboy are there a lot of them:

nix@spindle 9 /home/nix% ps -fC ext4-dio-unwrit | wc -l
97

Now kernel threads are all very well --- it's obvious that e.g. a per-sb
journal flushing thread is worthwhile --- but *ninety-seven* kernel
threads for something like direct I/O, which in all but very unusual
high-end Oracle workloads is going to be a small proportion of I/O, is
just *insane*. And as core count goes up it's just going to get more
insane. Even my readonly-mounted filesystems seem to have some.

Do these threads really have to be per-CPU? Can't they default to
something much less aggressive, and then people with massive beefy
Oracle installations can up the number of these threads? (Dynamically
tuning their numbers according to the workload would be ideal: the
slow-work infrastructure can do this, IIRC, spawning new threads when
needed.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/