2008-01-18 17:37:40

by Chuck Lever

[permalink] [raw]
Subject: Re: Understanding iostat-ms output

On Jan 18, 2008, at 12:02 PM, Trond Myklebust wrote:
> On Fri, 2008-01-18 at 10:37 +0100, Fredrik Lindgren wrote:
>> Hello,
>>
>> I have been messing around with the excellent "iostat-ms" tool in
>> an effort to understand what is going on with our machines that are
>> showing lots of IOwait.
>>
>> Below are two 15 second samples, the first one is taken during peak
>> and the second one during the night. The obvious difference is in the
>> "exe" field. Please excuse my ignorance, but what does this field
>> mean?
>> Where is this additional time spent? On the filer or in the NFS stack
>> on the host?
>>
>> stbnas1:/foo mounted on /misc/foo:
>>
>> op/s rpc bklog
>> 2364.60 0.96
>> read: ops/s Kb/s Kb/op
>> retrans avg RTT (ms) avg exe (ms)
>> 43.267 659.624 15.246 0
>> (0.0%) 17.066 293.880
>> write: ops/s Kb/s Kb/op
>> retrans avg RTT (ms) avg exe (ms)
>> 16.133 60.314 3.738 0
>> (0.0%) 3.868 255.074
>>
>> stbnas1:/foo mounted on /misc/foo:
>>
>> op/s rpc bklog
>> 1446.07 0.26
>> read: ops/s Kb/s Kb/op
>> retrans avg RTT (ms) avg exe (ms)
>> 105.200 189.155 1.798 0
>> (0.0%) 8.264 8.702
>> write: ops/s Kb/s Kb/op
>> retrans avg RTT (ms) avg exe (ms)
>> 105.067 297.774 2.834 0
>> (0.0%) 1.048 1.310
>
> AFAICS the RTT (i.e. round trip time) is being reported as being of
> the
> order of milliseconds in both cases, so the filer would not appear
> to be
> responsible. The discrepancy has to be on the client side.
>
> It would be interesting to see the 'rpc bklog' parameter in order to
> confirm if the requests are being held up in the backlog queue.
> Unfortunately, there appears to be a bug in iostat-ms: it appears
> to be
> reading the 'number of requests in flight' where it should be reading
> the 'length of backlog queue' parameter.
> See if the attached patch doesn't help... (it also fixes a minor issue
> of storing unsigned long long counters in the wrong type)


I've applied Trond's patch, and patches from NetApp to support RDMA
statistics to these two scripts, and posted fresh versions here:

http://oss.oracle.com/~cel/linux-2.6/2.6.24/

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2008-01-18 18:06:41

by Trond Myklebust

[permalink] [raw]
Subject: Re: Understanding iostat-ms output


On Fri, 2008-01-18 at 12:37 -0500, Chuck Lever wrote:

> I've applied Trond's patch, and patches from NetApp to support RDMA
> statistics to these two scripts, and posted fresh versions here:
>
> http://oss.oracle.com/~cel/linux-2.6/2.6.24/

Any chance you could put this in a git repository on git.linux-nfs.org
so that we can track changes?

Trond