2022-11-18 13:03:39

by Jinlong Chen

[permalink] [raw]
Subject: [RFC PATCH 0/2] elevator: restore old io scheduler on failure in elevator_switch

Hi!

These two patches bring back the fallback feature in elevator_switch if
switching to the new io scheduler failed.

elevator_switch contains the fallback logic in sq era, but it was removed
when moving to mq (commit: a1ce35fa49852db60fc6e268038530be533c5b15),
leaving the document mismatched with the behavior. As far as I can see,
restoring the old io scheduler is more reasonable than just leaving the
scheduler none, hence there is the series.

However, now it's hard to keep the old io scheduler untouched. We can only
re-initialize the old scheduler if we want to restore it, and the
statistics the old scheduler collected would be lost. Besides, the
restoration itself might fail too. I have no idea whether the two problems
matter. Any comments are welcomed.

Jinlong Chen (2):
elevator: add a helper for applying scheduler to request_queue
elevator: restore the old io scheduler if failed to switch to the new
one

block/elevator.c | 49 +++++++++++++++++++++++++++++++++++++++---------
1 file changed, 40 insertions(+), 9 deletions(-)

--
2.31.1



2022-11-18 13:04:57

by Jinlong Chen

[permalink] [raw]
Subject: [RFC PATCH 2/2] elevator: restore the old io scheduler if failed to switch to the new one

If we failed to switch to the new io scheduler, we should try to restore
the old one instead of just switching to none.

This also makes elevator_switch match its document.

Signed-off-by: Jinlong Chen <[email protected]>
---
block/elevator.c | 29 +++++++++++++++++++++++++----
1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 517857a9a68f..b7bd0b8468bd 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -672,6 +672,7 @@ static int __elevator_apply(struct request_queue *q, struct elevator_type *e)
*/
int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
{
+ struct elevator_type *old_e = NULL;
int ret;

lockdep_assert_held(&q->sysfs_lock);
@@ -680,17 +681,37 @@ int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
blk_mq_quiesce_queue(q);

if (q->elevator) {
+ old_e = q->elevator->type;
+ /*
+ * Keep a reference so we can fallback on failure.
+ */
+ __elevator_get(old_e);
elv_unregister_queue(q);
elevator_exit(q);
}

ret = __elevator_apply(q, new_e);
- if (ret)
- goto out_unfreeze;
+ if (likely(!ret)) {
+ blk_add_trace_msg(q, "elv switch: %s", new_e->elevator_name);
+ } else if (old_e) {
+ int err;
+
+ err = __elevator_apply(q, old_e);
+ if (unlikely(err)) {
+ blk_add_trace_msg(q,
+ "elv switch failed: %s (%d), fallback failed: %s (%d)",
+ new_e->elevator_name, ret, old_e->elevator_name, err
+ );
+ }
+ }

- blk_add_trace_msg(q, "elv switch: %s", new_e->elevator_name);
+ if (old_e) {
+ /*
+ * Done, release the reference we kept.
+ */
+ elevator_put(old_e);
+ }

-out_unfreeze:
blk_mq_unquiesce_queue(q);
blk_mq_unfreeze_queue(q);
return ret;
--
2.31.1


2022-11-21 07:28:32

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] elevator: restore old io scheduler on failure in elevator_switch

On Fri, Nov 18, 2022 at 08:09:52PM +0800, Jinlong Chen wrote:
> elevator_switch contains the fallback logic in sq era, but it was removed
> when moving to mq (commit: a1ce35fa49852db60fc6e268038530be533c5b15),
> leaving the document mismatched with the behavior. As far as I can see,
> restoring the old io scheduler is more reasonable than just leaving the
> scheduler none, hence there is the series.

What failure scenariou can you think off where switching to the intended
schedule fails, but switching back to the previous one will succeed?

2022-11-22 12:41:49

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] elevator: restore old io scheduler on failure in elevator_switch

On Tue, Nov 22, 2022 at 08:14:30PM +0800, Jinlong Chen wrote:
> Mostly failures specific to the intended io scheduler, like consuming more
> resources than the old one that the system can not afford. But sure it's
> rare, so do you think I should just correct the outdated document?

I'd be tempted to just documented the behavior, because I think the
chances are high that if switching to one schedule will fail that
switching back to the old one will fail as well. I've done a quick
audit of all three schedulers, and unless I missed something there
are no other failure cases except for running out of memory.

Maybe a printk to document that switching the scheduler failed are
we aren't using any scheduler now might be useful, though.

2022-11-22 12:43:22

by Jinlong Chen

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] elevator: restore old io scheduler on failure in elevator_switch

> On Fri, Nov 18, 2022 at 08:09:52PM +0800, Jinlong Chen wrote:
> > elevator_switch contains the fallback logic in sq era, but it was removed
> > when moving to mq (commit: a1ce35fa49852db60fc6e268038530be533c5b15),
> > leaving the document mismatched with the behavior. As far as I can see,
> > restoring the old io scheduler is more reasonable than just leaving the
> > scheduler none, hence there is the series.
>
> What failure scenariou can you think off where switching to the intended
> schedule fails, but switching back to the previous one will succeed?

Mostly failures specific to the intended io scheduler, like consuming more
resources than the old one that the system can not afford. But sure it's
rare, so do you think I should just correct the outdated document?

Thanks!
Jinlong Chen

2022-11-22 13:26:10

by Jinlong Chen

[permalink] [raw]
Subject: Re: Re: [RFC PATCH 0/2] elevator: restore old io scheduler on failure in elevator_switch

> On Tue, Nov 22, 2022 at 08:14:30PM +0800, Jinlong Chen wrote:
> > Mostly failures specific to the intended io scheduler, like consuming more
> > resources than the old one that the system can not afford. But sure it's
> > rare, so do you think I should just correct the outdated document?
>
> I'd be tempted to just documented the behavior, because I think the
> chances are high that if switching to one schedule will fail that
> switching back to the old one will fail as well. I've done a quick
> audit of all three schedulers, and unless I missed something there
> are no other failure cases except for running out of memory.
>
> Maybe a printk to document that switching the scheduler failed are
> we aren't using any scheduler now might be useful, though.

Ok, then I'll send two patches with the document updated and the printk added.

Thanks!
Jinlong Chen