2002-03-20 18:49:58

by Jeff V. Merkey

[permalink] [raw]
Subject: Putrid Elevator Behavior 2.4.18/19




Jens/Linux,

The elevator code is malfunctioning in 2.4.18/19-pre when we start
reaching the upward limits with multiple 3Ware adapters
running together. We started seeing the problem when we went to
64 K aligned writes with sustained > 200 MB/S writes to
multiple 3Ware adapters.

We have verified that the 3Ware adapters are not holding the request
off, but that one of the requests is getting severely starved and
does not get posted to the 3Ware adapters until thousands of IOs
have gone before it.

The basic symptom is a lower offset 4K write gets hung in the elevator
as it traverses a very long list of requests being written linearly
to a disk device. Both Darren and I have seen this problem in NetWare
with remirroring, which is why we went to the A/B alternating
list to prevent this type of starvation. There are a very small number
of reads being posted during this test to update meta-data.

The data that is being held off is meta-data that occupies a lower sector
offset on the device. This startvation error is very troublesome and
results in certain sectors not being freed up as anticipated, which
results in a fatal error for our system. The elevator ends of getting
very far behind.

By way of example, this delayed write is held off for several **MINUTES**.
This is severaly broken.

Please let me know what other information you would like Darren and I
to run down and provide. We are at present coding some changes into
your elevator to implement an A and B list so this startvation problem
is completely avoided.

Please advise,

Jeff





2002-03-20 22:01:56

by Mike Fedyk

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

On Wed, Mar 20, 2002 at 12:04:55PM -0700, Jeff V. Merkey wrote:
>
>
>
> Jens/Linux,
>
> The elevator code is malfunctioning in 2.4.18/19-pre when we start
> reaching the upward limits with multiple 3Ware adapters
> running together. We started seeing the problem when we went to
> 64 K aligned writes with sustained > 200 MB/S writes to
> multiple 3Ware adapters.
>
> We have verified that the 3Ware adapters are not holding the request
> off, but that one of the requests is getting severely starved and
> does not get posted to the 3Ware adapters until thousands of IOs
> have gone before it.
>

This elevator starvation problem has been identified and a patch already
merged into 2.4.19-pre2.

Can you verify the affects it produces for your workload?

2002-03-20 22:05:16

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

On Wed, Mar 20, 2002 at 02:02:41PM -0800, Mike Fedyk wrote:
> On Wed, Mar 20, 2002 at 12:04:55PM -0700, Jeff V. Merkey wrote:
> >
> >
> >
> > Jens/Linux,
> >
> > The elevator code is malfunctioning in 2.4.18/19-pre when we start
> > reaching the upward limits with multiple 3Ware adapters
> > running together. We started seeing the problem when we went to
> > 64 K aligned writes with sustained > 200 MB/S writes to
> > multiple 3Ware adapters.
> >
> > We have verified that the 3Ware adapters are not holding the request
> > off, but that one of the requests is getting severely starved and
> > does not get posted to the 3Ware adapters until thousands of IOs
> > have gone before it.
> >
>
> This elevator starvation problem has been identified and a patch already
> merged into 2.4.19-pre2.
>
> Can you verify the affects it produces for your workload?

I will comply. I tested with pre-3 patches and still saw this problem??
Let me go and check the patches I applied to verify, I may not have
applied the correct patch.

Jeff





2002-03-20 22:10:06

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

> > > Jens/Linux,
> > >
> > > The elevator code is malfunctioning in 2.4.18/19-pre when we start
> > > reaching the upward limits with multiple 3Ware adapters
> > > running together. We started seeing the problem when we went to
> > > 64 K aligned writes with sustained > 200 MB/S writes to
> > > multiple 3Ware adapters.
> > >
> > > We have verified that the 3Ware adapters are not holding the request
> > > off, but that one of the requests is getting severely starved and
> > > does not get posted to the 3Ware adapters until thousands of IOs
> > > have gone before it.
> > >
> >
> > This elevator starvation problem has been identified and a patch already
> > merged into 2.4.19-pre2.
> >
> > Can you verify the affects it produces for your workload?
>
> I will comply. I tested with pre-3 patches and still saw this problem??
> Let me go and check the patches I applied to verify, I may not have
> applied the correct patch.
>
> Jeff

I verified we were using a stock 2.4.18 kernel on the specific system
without the pre-3 patches installed. We have been testing with the
latest patches but not on this system. We will apply and retest and
I will verify.

Jeff

>
>
>
>

2002-03-21 01:24:47

by Andrew Morton

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

"Jeff V. Merkey" wrote:
>
> ...
> > I will comply. I tested with pre-3 patches and still saw this problem??
> > Let me go and check the patches I applied to verify, I may not have
> > applied the correct patch.
> >
> > Jeff
>
> I verified we were using a stock 2.4.18 kernel on the specific system
> without the pre-3 patches installed. We have been testing with the
> latest patches but not on this system. We will apply and retest and
> I will verify.
>

The elevator starvation change went into 2.4.19-pre1 I think.
It shouldn't affect the problem which you've described - that
change improved the situation where tasks were sleeping for
long periods when they want to insert new requests. But the
problem which you're observing appears to affect already-inserted
requests.

"Several minutes" is downright odd. From your description
it seems that all the requests are writes, but some of the
writes (at a remote end of the disk) are being bypassed far
too many times.

The bypass count _is_ tunable. Although it sounds like the logic
has come unstuck in some manner, it would be interesting if
changing the elevator latency parameters for that queue affected
the situation.

Have you experimented with `elvtune -r NNN /dev/foo' and
`elvtune -w NNN /dev/foo'?

-

2002-03-21 06:30:56

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

On Wed, Mar 20, 2002 at 05:22:18PM -0800, Andrew Morton wrote:
> "Jeff V. Merkey" wrote:
> >
> > ...
> > > I will comply. I tested with pre-3 patches and still saw this problem??
> > > Let me go and check the patches I applied to verify, I may not have
> > > applied the correct patch.
> > >
> > > Jeff
> >
> > I verified we were using a stock 2.4.18 kernel on the specific system
> > without the pre-3 patches installed. We have been testing with the
> > latest patches but not on this system. We will apply and retest and
> > I will verify.
> >
>
> The elevator starvation change went into 2.4.19-pre1 I think.
> It shouldn't affect the problem which you've described - that
> change improved the situation where tasks were sleeping for
> long periods when they want to insert new requests. But the
> problem which you're observing appears to affect already-inserted
> requests.
>
> "Several minutes" is downright odd. From your description
> it seems that all the requests are writes, but some of the
> writes (at a remote end of the disk) are being bypassed far
> too many times.
>
> The bypass count _is_ tunable. Although it sounds like the logic
> has come unstuck in some manner, it would be interesting if
> changing the elevator latency parameters for that queue affected
> the situation.
>
> Have you experimented with `elvtune -r NNN /dev/foo' and
> `elvtune -w NNN /dev/foo'?

No, but I will test this tonight. I am in tonight working on
this problem until I run it down.

Jeff


>
> -
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2002-03-26 01:01:00

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

> > The elevator starvation change went into 2.4.19-pre1 I think.
> > It shouldn't affect the problem which you've described - that
> > change improved the situation where tasks were sleeping for
> > long periods when they want to insert new requests. But the
> > problem which you're observing appears to affect already-inserted
> > requests.
> >
> > "Several minutes" is downright odd. From your description
> > it seems that all the requests are writes, but some of the
> > writes (at a remote end of the disk) are being bypassed far
> > too many times.
> >
> > The bypass count _is_ tunable. Although it sounds like the logic
> > has come unstuck in some manner, it would be interesting if
> > changing the elevator latency parameters for that queue affected
> > the situation.
> >
> > Have you experimented with `elvtune -r NNN /dev/foo' and
> > `elvtune -w NNN /dev/foo'?
>
> No, but I will test this tonight. I am in tonight working on
> this problem until I run it down.
>
> Jeff
>


Andrew,

I have been running a test run against 2.4.19-pre4 (and later) for
over a week non-stop and the elevator problem appears to have been
corrected by this fix. I will update further if the problem
resurfaces.

:-)

Jeff


2002-03-26 01:41:21

by Mike Fedyk

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

On Mon, Mar 25, 2002 at 06:16:45PM -0700, Jeff V. Merkey wrote:
> > > The elevator starvation change went into 2.4.19-pre1 I think.
> > > It shouldn't affect the problem which you've described - that
> > > change improved the situation where tasks were sleeping for
> > > long periods when they want to insert new requests. But the
> > > problem which you're observing appears to affect already-inserted
> > > requests.
> > >
> > > "Several minutes" is downright odd. From your description
> > > it seems that all the requests are writes, but some of the
> > > writes (at a remote end of the disk) are being bypassed far
> > > too many times.
> > >
> > > The bypass count _is_ tunable. Although it sounds like the logic
> > > has come unstuck in some manner, it would be interesting if
> > > changing the elevator latency parameters for that queue affected
> > > the situation.
> > >
> > > Have you experimented with `elvtune -r NNN /dev/foo' and
> > > `elvtune -w NNN /dev/foo'?
> >
> > No, but I will test this tonight. I am in tonight working on
> > this problem until I run it down.
> >
> > Jeff
> >
>
>
> Andrew,
>
> I have been running a test run against 2.4.19-pre4 (and later) for
> over a week non-stop and the elevator problem appears to have been
> corrected by this fix. I will update further if the problem
> resurfaces.
>

That's good news.

Are you still working on the A/B list patch? I'd imagine that it could make
several problems easier to fix in the block layer.

> :-)
>

:)

2002-03-26 01:46:21

by David Rees

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

On Mon, Mar 25, 2002 at 06:16:45PM -0700, Jeff V. Merkey wrote:
> > > The elevator starvation change went into 2.4.19-pre1 I think.
> > > It shouldn't affect the problem which you've described - that
> > > change improved the situation where tasks were sleeping for
> > > long periods when they want to insert new requests. But the
> > > problem which you're observing appears to affect already-inserted
> > > requests.
> > >
> > > "Several minutes" is downright odd. From your description
> > > it seems that all the requests are writes, but some of the
> > > writes (at a remote end of the disk) are being bypassed far
> > > too many times.
> > >
> > > The bypass count _is_ tunable. Although it sounds like the logic
> > > has come unstuck in some manner, it would be interesting if
> > > changing the elevator latency parameters for that queue affected
> > > the situation.
> > >
> > > Have you experimented with `elvtune -r NNN /dev/foo' and
> > > `elvtune -w NNN /dev/foo'?
> >
> > No, but I will test this tonight. I am in tonight working on
> > this problem until I run it down.
>
> I have been running a test run against 2.4.19-pre4 (and later) for
> over a week non-stop and the elevator problem appears to have been
> corrected by this fix. I will update further if the problem
> resurfaces.

Jeff,

Did upgrading to 2.4.19-pre4 by itself fix your problems, or did you need to
tweak with elvtune as well? If so, what values did you find produced
optimal results?

-Dave

2002-03-26 01:56:23

by Mike Fedyk

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

On Mon, Mar 25, 2002 at 05:45:55PM -0800, David Rees wrote:
> On Mon, Mar 25, 2002 at 06:16:45PM -0700, Jeff V. Merkey wrote:
> > > > The elevator starvation change went into 2.4.19-pre1 I think.
> > > > It shouldn't affect the problem which you've described - that
> > > > change improved the situation where tasks were sleeping for
> > > > long periods when they want to insert new requests. But the
> > > > problem which you're observing appears to affect already-inserted
> > > > requests.
> > > >
> > > > "Several minutes" is downright odd. From your description
> > > > it seems that all the requests are writes, but some of the
> > > > writes (at a remote end of the disk) are being bypassed far
> > > > too many times.
> > > >
> > > > The bypass count _is_ tunable. Although it sounds like the logic
> > > > has come unstuck in some manner, it would be interesting if
> > > > changing the elevator latency parameters for that queue affected
> > > > the situation.
> > > >
> > > > Have you experimented with `elvtune -r NNN /dev/foo' and
> > > > `elvtune -w NNN /dev/foo'?
> > >
> > > No, but I will test this tonight. I am in tonight working on
> > > this problem until I run it down.
> >
> > I have been running a test run against 2.4.19-pre4 (and later) for
> > over a week non-stop and the elevator problem appears to have been
> > corrected by this fix. I will update further if the problem
> > resurfaces.
>
> Jeff,
>
> Did upgrading to 2.4.19-pre4 by itself fix your problems, or did you need to
> tweak with elvtune as well? If so, what values did you find produced
> optimal results?
>

I'd doubt that Jeff's optimal (magic) elvtune numbers would be much use to
other people, as elvtune should be set for each particular workload.

Now, if we had a small guide that said "these value ranges/combinations have
worked best for $this workload" that would be quite helpful...

2002-03-26 16:44:38

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

>
> Did upgrading to 2.4.19-pre4 by itself fix your problems, or did you need to
> tweak with elvtune as well? If so, what values did you find produced
> optimal results?
>
> -Dave


Defaults with the updated patch seen to work fine.

Jeff


> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2002-03-26 16:47:28

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

>
> That's good news.
>
> Are you still working on the A/B list patch? I'd imagine that it could make
> several problems easier to fix in the block layer.
>

Yes. I am asking Darren Major, who wrote the A/B implementation
in NetWare to review the patch before we submit it. It may affect
some drivers. We are verifying that the change I instrumented
will not break anything.

Jeff


> > :-)
> >
>
> :)

2002-03-27 07:05:12

by Jens Axboe

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

On Tue, Mar 26 2002, Jeff V. Merkey wrote:
> >
> > That's good news.
> >
> > Are you still working on the A/B list patch? I'd imagine that it could make
> > several problems easier to fix in the block layer.
> >
>
> Yes. I am asking Darren Major, who wrote the A/B implementation
> in NetWare to review the patch before we submit it. It may affect
> some drivers. We are verifying that the change I instrumented
> will not break anything.

I'm curious how you are doing this cleanly in 2.4. There are lots of
places in the kernel that do direct list management on the queue_head.
Are you adding two separate hidden lists and splicing content to the
queue_head?

2.5 has this done much more cleanly (of course I'm very biased). See the
deadline I/O scheduler patch I've posted before, stuff like this can be
done a lot cleaner there. Internal I/O scheduler structures are
completely hidden from drivers.

--
Jens Axboe

2002-03-27 23:04:06

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: Putrid Elevator Behavior 2.4.18/19

On Wed, Mar 27, 2002 at 08:03:25AM +0100, Jens Axboe wrote:
> On Tue, Mar 26 2002, Jeff V. Merkey wrote:
> > >
> > > That's good news.
> > >
> > > Are you still working on the A/B list patch? I'd imagine that it could make
> > > several problems easier to fix in the block layer.
> > >
> >
> > Yes. I am asking Darren Major, who wrote the A/B implementation
> > in NetWare to review the patch before we submit it. It may affect
> > some drivers. We are verifying that the change I instrumented
> > will not break anything.
>
> I'm curious how you are doing this cleanly in 2.4. There are lots of
> places in the kernel that do direct list management on the queue_head.
> Are you adding two separate hidden lists and splicing content to the
> queue_head?

Correct. I am still reviewing drivers and kernel code to ascertain
whether I am not leaving any holes. I have spliced it as a non-obtrusive
implementation that preserves the existing code with no changes.

Some of the drivers may have problems if they cache the head address
of the current list.

>
> 2.5 has this done much more cleanly (of course I'm very biased). See the
> deadline I/O scheduler patch I've posted before, stuff like this can be
> done a lot cleaner there. Internal I/O scheduler structures are
> completely hidden from drivers.
>

2.5 would be nice, but 2.4.X needs it too and this is the kernel we are
using for our development and testing, so we will need it there.


Jeff