2019-08-09 08:42:00

by Pavel Tikhomirov

[permalink] [raw]
Subject: mm/swap: possible problem introduced when replacing REQ_NOIDLE with REQ_IDLE

Hi, all.

Then porting patches from mainstream I've found some strange code:

> commit a2b809672ee6fcb4d5756ea815725b3dbaea654e
> Author: Christoph Hellwig <[email protected]>
> Date: Tue Nov 1 07:40:09 2016 -0600
>
> block: replace REQ_NOIDLE with REQ_IDLE
>
> Noidle should be the default for writes as seen by all the compounds
> definitions in fs.h using it. In fact only direct I/O really should
> be using NODILE, so turn the whole flag around to get the defaults
> right, which will make our life much easier especially onces the
> WRITE_* defines go away.
>
> This assumes all the existing "raw" users of REQ_SYNC for writes
> want noidle behavior, which seems to be spot on from a quick audit.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index ccedccb28ec8..46a74209917f 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -197,11 +197,11 @@ typedef int (dio_iodone_t)(struct kiocb *iocb,
loff_t offset,
> #define WRITE REQ_OP_WRITE
>
> #define READ_SYNC 0
> -#define WRITE_SYNC (REQ_SYNC | REQ_NOIDLE)
> -#define WRITE_ODIRECT REQ_SYNC
> -#define WRITE_FLUSH (REQ_NOIDLE | REQ_PREFLUSH)
> -#define WRITE_FUA (REQ_NOIDLE | REQ_FUA)
> -#define WRITE_FLUSH_FUA (REQ_NOIDLE | REQ_PREFLUSH |
REQ_FUA)
> +#define WRITE_SYNC REQ_SYNC
> +#define WRITE_ODIRECT (REQ_SYNC | REQ_IDLE)
> +#define WRITE_FLUSH REQ_PREFLUSH
> +#define WRITE_FUA REQ_FUA
> +#define WRITE_FLUSH_FUA (REQ_PREFLUSH | REQ_FUA)
>
> /*
> * Attribute flags. These should be or-ed together to figure out what

The above commit changes the meaning of the REQ_SYNC flag, before the
patch it was equal to WRITE_ODIRECT and after the patch it is equal to
WRITE_SYNC. And thus I think it became treated differently (I see only
one place left in wbt_should_throttle.).

But in __swap_writepage() both before and after the mentioned patch we
still pass a single REQ_SYNC without any REQ_IDLE/REQ_UNIDLE:

> [snorch@snorch linux]$ git blame
a2b809672ee6fcb4d5756ea815725b3dbaea654e^ mm/page_io.c | grep -a5 REQ_SYNC
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 319)
unlock_page(page);
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 320)
ret = -ENOMEM;
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 321)
goto out;
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 322) }
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 323)
if (wbc->sync_mode == WB_SYNC_ALL)
> ba13e83ec334c (Jens Axboe 2016-08-01 09:38:44 -0600 324)
bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
> ba13e83ec334c (Jens Axboe 2016-08-01 09:38:44 -0600 325)
else
> ba13e83ec334c (Jens Axboe 2016-08-01 09:38:44 -0600 326)
bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> f8891e5e1f93a (Christoph Lameter 2006-06-30 01:55:45 -0700 327)
count_vm_event(PSWPOUT);
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 328)
set_page_writeback(page);
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 329)
unlock_page(page);
> [snorch@snorch linux]$ git blame
a2b809672ee6fcb4d5756ea815725b3dbaea654e mm/page_io.c | grep -a5 REQ_SYNC
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 319)
unlock_page(page);
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 320)
ret = -ENOMEM;
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 321)
goto out;
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 322) }
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 323)
if (wbc->sync_mode == WB_SYNC_ALL)
> ba13e83ec334c (Jens Axboe 2016-08-01 09:38:44 -0600 324)
bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
> ba13e83ec334c (Jens Axboe 2016-08-01 09:38:44 -0600 325)
else
> ba13e83ec334c (Jens Axboe 2016-08-01 09:38:44 -0600 326)
bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> f8891e5e1f93a (Christoph Lameter 2006-06-30 01:55:45 -0700 327)
count_vm_event(PSWPOUT);
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 328)
set_page_writeback(page);
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 329)
unlock_page(page);

It looks like we've changed the way how we handle swap page writes from
"odirect" way to "regular" sync write way, these can be wrong. This may
also affect deprecated cfq io-scheduler on older kernels.

Thanks in advance for any advice on what to do with these, may be I miss
something.

--
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.


2019-11-11 14:53:14

by Pavel Tikhomirov

[permalink] [raw]
Subject: Re: mm/swap: possible problem introduced when replacing REQ_NOIDLE with REQ_IDLE

Christoph, a2b809672ee6 is your patch, maybe you have some ideas about
these problem?

ping

On 8/9/19 11:39 AM, Pavel Tikhomirov wrote:
> Hi, all.
>
> Then porting patches from mainstream I've found some strange code:
>
> > commit a2b809672ee6fcb4d5756ea815725b3dbaea654e
> > Author: Christoph Hellwig <[email protected]>
> > Date: Tue Nov 1 07:40:09 2016 -0600
> >
> > block: replace REQ_NOIDLE with REQ_IDLE
> >
> > Noidle should be the default for writes as seen by all the compounds
> > definitions in fs.h using it. In fact only direct I/O really should
> > be using NODILE, so turn the whole flag around to get the defaults
> > right, which will make our life much easier especially onces the
> > WRITE_* defines go away.
> >
> > This assumes all the existing "raw" users of REQ_SYNC for writes
> > want noidle behavior, which seems to be spot on from a quick audit.
> >
> > Signed-off-by: Christoph Hellwig <[email protected]>
> > Signed-off-by: Jens Axboe <[email protected]>
> >
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index ccedccb28ec8..46a74209917f 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -197,11 +197,11 @@ typedef int (dio_iodone_t)(struct kiocb *iocb,
> loff_t offset,
> > #define WRITE REQ_OP_WRITE
> >
> > #define READ_SYNC 0
> > -#define WRITE_SYNC (REQ_SYNC | REQ_NOIDLE)
> > -#define WRITE_ODIRECT REQ_SYNC
> > -#define WRITE_FLUSH (REQ_NOIDLE | REQ_PREFLUSH)
> > -#define WRITE_FUA (REQ_NOIDLE | REQ_FUA)
> > -#define WRITE_FLUSH_FUA (REQ_NOIDLE | REQ_PREFLUSH |
> REQ_FUA)
> > +#define WRITE_SYNC REQ_SYNC
> > +#define WRITE_ODIRECT (REQ_SYNC | REQ_IDLE)
> > +#define WRITE_FLUSH REQ_PREFLUSH
> > +#define WRITE_FUA REQ_FUA
> > +#define WRITE_FLUSH_FUA (REQ_PREFLUSH | REQ_FUA)
> >
> > /*
> > * Attribute flags. These should be or-ed together to figure out what
>
> The above commit changes the meaning of the REQ_SYNC flag, before the
> patch it was equal to WRITE_ODIRECT and after the patch it is equal to
> WRITE_SYNC. And thus I think it became treated differently (I see only
> one place left in wbt_should_throttle.).
>
> But in __swap_writepage() both before and after the mentioned patch we
> still pass a single REQ_SYNC without any REQ_IDLE/REQ_UNIDLE:
>
> > [snorch@snorch linux]$ git blame
> a2b809672ee6fcb4d5756ea815725b3dbaea654e^ mm/page_io.c | grep -a5 REQ_SYNC
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 319)
> unlock_page(page);
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 320)
> ret = -ENOMEM;
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 321)
> goto out;
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 322) }
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 323)
> if (wbc->sync_mode == WB_SYNC_ALL)
> > ba13e83ec334c (Jens Axboe 2016-08-01 09:38:44 -0600 324)
> bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
> > ba13e83ec334c (Jens Axboe 2016-08-01 09:38:44 -0600 325)
> else
> > ba13e83ec334c (Jens Axboe 2016-08-01 09:38:44 -0600 326)
> bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> > f8891e5e1f93a (Christoph Lameter 2006-06-30 01:55:45 -0700 327)
> count_vm_event(PSWPOUT);
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 328)
> set_page_writeback(page);
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 329)
> unlock_page(page);
> > [snorch@snorch linux]$ git blame
> a2b809672ee6fcb4d5756ea815725b3dbaea654e mm/page_io.c | grep -a5 REQ_SYNC
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 319)
> unlock_page(page);
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 320)
> ret = -ENOMEM;
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 321)
> goto out;
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 322) }
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 323)
> if (wbc->sync_mode == WB_SYNC_ALL)
> > ba13e83ec334c (Jens Axboe 2016-08-01 09:38:44 -0600 324)
> bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
> > ba13e83ec334c (Jens Axboe 2016-08-01 09:38:44 -0600 325)
> else
> > ba13e83ec334c (Jens Axboe 2016-08-01 09:38:44 -0600 326)
> bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> > f8891e5e1f93a (Christoph Lameter 2006-06-30 01:55:45 -0700 327)
> count_vm_event(PSWPOUT);
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 328)
> set_page_writeback(page);
> > ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 329)
> unlock_page(page);
>
> It looks like we've changed the way how we handle swap page writes from
> "odirect" way to "regular" sync write way, these can be wrong. This may
> also affect deprecated cfq io-scheduler on older kernels.
>
> Thanks in advance for any advice on what to do with these, may be I miss
> something.
>

--
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.