2008-01-18 17:02:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: Understanding iostat-ms output


On Fri, 2008-01-18 at 10:37 +0100, Fredrik Lindgren wrote:
> Hello,
>
> I have been messing around with the excellent "iostat-ms" tool in
> an effort to understand what is going on with our machines that are
> showing lots of IOwait.
>
> Below are two 15 second samples, the first one is taken during peak
> and the second one during the night. The obvious difference is in the
> "exe" field. Please excuse my ignorance, but what does this field mean?
> Where is this additional time spent? On the filer or in the NFS stack
> on the host?
>
> stbnas1:/foo mounted on /misc/foo:
>
> op/s rpc bklog
> 2364.60 0.96
> read: ops/s Kb/s Kb/op retrans avg RTT (ms) avg exe (ms)
> 43.267 659.624 15.246 0 (0.0%) 17.066 293.880
> write: ops/s Kb/s Kb/op retrans avg RTT (ms) avg exe (ms)
> 16.133 60.314 3.738 0 (0.0%) 3.868 255.074
>
> stbnas1:/foo mounted on /misc/foo:
>
> op/s rpc bklog
> 1446.07 0.26
> read: ops/s Kb/s Kb/op retrans avg RTT (ms) avg exe (ms)
> 105.200 189.155 1.798 0 (0.0%) 8.264 8.702
> write: ops/s Kb/s Kb/op retrans avg RTT (ms) avg exe (ms)
> 105.067 297.774 2.834 0 (0.0%) 1.048 1.310

AFAICS the RTT (i.e. round trip time) is being reported as being of the
order of milliseconds in both cases, so the filer would not appear to be
responsible. The discrepancy has to be on the client side.

It would be interesting to see the 'rpc bklog' parameter in order to
confirm if the requests are being held up in the backlog queue.
Unfortunately, there appears to be a bug in iostat-ms: it appears to be
reading the 'number of requests in flight' where it should be reading
the 'length of backlog queue' parameter.
See if the attached patch doesn't help... (it also fixes a minor issue
of storing unsigned long long counters in the wrong type)

Cheers
Trond


Attachments:
iostat-ms.dif (1.13 kB)