Message-ID: <51A8B08C.3090309@oracle.com>
Date: Fri, 31 May 2013 16:15:40 +0200
From: jens kusch <jens.kusch@oracle.com>
MIME-Version: 1.0
To: linux-nfs@vger.kernel.org
Subject: strange nfsd scheduling in 2.6.32
Content-Type: text/plain; charset=UTF-8; format=flowed
Sender: linux-nfs-owner@vger.kernel.org

Hi all,

we have a problem with nfsd performance in 2.6.32. They don't seem to 
able to cope with the load. This is different in 2.6.18. Anybody seen 
this before?

On Linux 2.6.32:

- IOs are often processed by nfsd processes in a delayed fashion, as if 
they have been queued before (seen from application traces).
- NFS pool statistics show only a smaller fraction processed immediately 
(10..20%). The rest is queued or delayed.
- On the other hand there are lots of nfsd processes that sit idle at 
the same time!
- CPU usage is very unevenly distributed among the nfsd servers, many 
are never used

I'd just like to emphasize one detail: note the output from 
/proc/fs/nfsd/pool_stats below:

# pool packets-arrived sockets-enqueued threads-woken overloads-avoided 
threads-timedout
0 7740103 1837083 885771 1837081 480

The stat overloads-avoided always gets incremented in our runs. Here is 
a brief description:

Counts how many times the sunrpc server layer chose not to wake an nfsd 
thread, despite the presence of idle nfsd threads, because too many nfsd 
threads had been recently woken but could not get enough CPU time to 
actually run. In our runs, CPU utilization never gets close to 100%, so 
I wonder why NFS decided not to wake up one of the idle threads we see.

In our runs, CPU utilization never gets close to 100%, so I wonder why 
NFS decided not to wake up one of the idle threads we see.


On Linux 2.6.18

- Performance via NFS is better
- CPU usage is more evenly distributed among the nfsd processes, all 
nfsd processes are really used

We would appreciate any hint about what could be wrong in 2.6.32.

Best regards,
Jens