2004-02-27 06:16:36

by Lever, Charles

[permalink] [raw]
Subject: global kernel lock contention

we've been wondering how bad the BKL contention problem
is in the Linux NFS/RPC client, so we've done some kernel
execution profiling during a TPC-C benchmark on a 2.5GHz
(four physical, eight logical processors) Linux client
with 8GB of memory. the database files are stored on
a filer of recent vintage attached via GbE and accessed
over NFS.

the database is using NFS O_DIRECT to access its files,
so the only data copy involved is from the socket layer
into the user's buffer.

three spin locks were hot in our runs -- the BKL, the
xprt->sock_lock, and the rpc_queue_lock. these were
all in the top 20. (the BKL is hot even on single
threaded workloads running on dual processor systems,
it turns out).

so i set out to discover why the BKL is so contended.

after playing around with oprofile for a while, i found
that several RPC scheduler functions were fairly hot.
to find out more, i instrumented these with counters to
see how often they were called per RPC request. i
discovered that rpc_sleep_on, rpc_add_wait_queue,
and rpc_remove_wait_queue are called several times per
RPC request. but the real surprise is that the
rpc_queue_lock is taken on average between 15 and 18
times per RPC request, independent of request size or
network protocol (udp or tcp).

this particular spin lock also locks out bottom halves.
so when it is released, bottom halves run again. in
the case of the RPC client, the bottom half appears to
do the data copy from socket buffer to page cache or
to the user's buffer. this means we're single threading
data copies from our transport sockets, and on heavy NFS
workloads, other processors could be spinning waiting to
do work while one processor is busy copying data.

as a fun experiment, i took the BKL out of the direct I/O
path in the NFS client (professional driver on a closed
course; do not attempt). i tried an eight thread random
32KB read workload against sparse files on my filer, and got
almost full wire speed (about 95MB/s over GbE, which is
about as fast as my NIC will go; the filer and the client
both had CPU to spare). i think this is because the data
copies could run concurrently on the client.

so basically we have the same problem in today's RPC client
that the network layer had in early 2.3.

unfortunately if i kicked the thread count up to 16, i was
able to deadlock my client with the random read workload.
we kind of expected this, and i still need to find out what
is causing it.


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs