2017-03-09 08:00:52

by Vlastimil Babka

[permalink] [raw]
Subject: Re: + compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch added to -mm tree

On 03/09/2017 12:55 AM, [email protected] wrote:
>
> The patch titled
> Subject: compaction: add def_blk_aops migrate function for memory compaction
> has been added to the -mm tree. Its filename is
> compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
>
> This patch should soon appear at
> http://ozlabs.org/~akpm/mmots/broken-out/compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
> and later at
> http://ozlabs.org/~akpm/mmotm/broken-out/compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
>
> Before you just go and hit "reply", please:
> a) Consider who else should be cc'ed
> b) Prefer to cc a suitable mailing list as well
> c) Ideally: find the original patch on the mailing list and do a
> reply-to-all to that, adding suitable additional cc's
>
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>
> The -mm tree is included into linux-next and is updated
> there every 3-4 working days
>
> ------------------------------------------------------
> From: zhouxianrong <[email protected]>
> Subject: compaction: add def_blk_aops migrate function for memory compaction

That's not really a mm/compaction patch, but a block layer/migration patch. I
don't know internals of those so well, so I added some CC's.

> The reason for doing this is based on two factors.
>
> 1. larg file read/write operations with order 0 can fragmentize
> memory rapidly.
>
> 2. when a special filesystem does not supply migratepage callback,
> kernel would fallback to default function fallback_migrate_page.
> but fallback_migrate_page could not migrate diry page nicely;
> specially kcompactd with MIGRATE_SYNC_LIGHT could not migrate
> diry pages due to this until clear_page_dirty_for_io in some
> procedure. i think it is not suitable here in this scenario.
> for dirty pages we should migrate it rather than skip or writeout
> it in kcomapctd with MIGRATE_SYNC_LIGHT. i think this problem is
> for all filesystem without migratepage not only for block device fs.
>
> So for compaction under large file writing supply migratepage for
> def_blk_aops.

Is this really safe to do? buffer_migrate_page() has some assumptions listed in
its comment (and maybe more that are not listed). Do we know it's safe to use it
for all def_blk_aops users?

> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: zhouxianrong <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> fs/block_dev.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff -puN fs/block_dev.c~compaction-add-def_blk_aops-migrate-function-for-memory-compaction fs/block_dev.c
> --- a/fs/block_dev.c~compaction-add-def_blk_aops-migrate-function-for-memory-compaction
> +++ a/fs/block_dev.c
> @@ -2064,6 +2064,9 @@ static const struct address_space_operat
> .releasepage = blkdev_releasepage,
> .direct_IO = blkdev_direct_IO,
> .is_dirty_writeback = buffer_check_dirty_writeback,
> +#ifdef CONFIG_MIGRATION
> + .migratepage = buffer_migrate_page,
> +#endif
> };
>
> #define BLKDEV_FALLOC_FL_SUPPORTED \
> _
>
> Patches currently in -mm which might be from [email protected] are
>
> compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
>


2017-03-09 11:52:04

by zhouxianrong

[permalink] [raw]
Subject: Re: + compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch added to -mm tree



On 2017/3/9 15:58, Vlastimil Babka wrote:
> On 03/09/2017 12:55 AM, [email protected] wrote:
>>
>> The patch titled
>> Subject: compaction: add def_blk_aops migrate function for memory compaction
>> has been added to the -mm tree. Its filename is
>> compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
>>
>> This patch should soon appear at
>> http://ozlabs.org/~akpm/mmots/broken-out/compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
>> and later at
>> http://ozlabs.org/~akpm/mmotm/broken-out/compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
>>
>> Before you just go and hit "reply", please:
>> a) Consider who else should be cc'ed
>> b) Prefer to cc a suitable mailing list as well
>> c) Ideally: find the original patch on the mailing list and do a
>> reply-to-all to that, adding suitable additional cc's
>>
>> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>>
>> The -mm tree is included into linux-next and is updated
>> there every 3-4 working days
>>
>> ------------------------------------------------------
>> From: zhouxianrong <[email protected]>
>> Subject: compaction: add def_blk_aops migrate function for memory compaction
>
> That's not really a mm/compaction patch, but a block layer/migration patch. I
> don't know internals of those so well, so I added some CC's.
>
>> The reason for doing this is based on two factors.
>>
>> 1. larg file read/write operations with order 0 can fragmentize
>> memory rapidly.
>>
>> 2. when a special filesystem does not supply migratepage callback,
>> kernel would fallback to default function fallback_migrate_page.
>> but fallback_migrate_page could not migrate diry page nicely;
>> specially kcompactd with MIGRATE_SYNC_LIGHT could not migrate
>> diry pages due to this until clear_page_dirty_for_io in some
>> procedure. i think it is not suitable here in this scenario.
>> for dirty pages we should migrate it rather than skip or writeout
>> it in kcomapctd with MIGRATE_SYNC_LIGHT. i think this problem is
>> for all filesystem without migratepage not only for block device fs.
>>
>> So for compaction under large file writing supply migratepage for
>> def_blk_aops.
>
> Is this really safe to do? buffer_migrate_page() has some assumptions listed in
> its comment (and maybe more that are not listed). Do we know it's safe to use it
> for all def_blk_aops users?

I could not find out differences for different disks in block device filesystem;
they should behave consistently in block device filesystem layer.

for a page of file /dev/block/xxx, when we migrate it, i think it has no difference
just like ext4 file migration.

but i dare not to say yes. i hope more peoples give their suggestions.

>
>> Link: http://lkml.kernel.org/r/[email protected]
>> Signed-off-by: zhouxianrong <[email protected]>
>> Cc: Kirill A. Shutemov <[email protected]>
>> Cc: Johannes Weiner <[email protected]>
>> Cc: Minchan Kim <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Vlastimil Babka <[email protected]>
>> Cc: Al Viro <[email protected]>
>> Cc: <[email protected]>
>> Cc: <[email protected]>
>> Cc: <[email protected]>
>> Cc: <[email protected]>
>> Cc: <[email protected]>
>> Cc: <[email protected]>
>> Signed-off-by: Andrew Morton <[email protected]>
>> ---
>>
>> fs/block_dev.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff -puN fs/block_dev.c~compaction-add-def_blk_aops-migrate-function-for-memory-compaction fs/block_dev.c
>> --- a/fs/block_dev.c~compaction-add-def_blk_aops-migrate-function-for-memory-compaction
>> +++ a/fs/block_dev.c
>> @@ -2064,6 +2064,9 @@ static const struct address_space_operat
>> .releasepage = blkdev_releasepage,
>> .direct_IO = blkdev_direct_IO,
>> .is_dirty_writeback = buffer_check_dirty_writeback,
>> +#ifdef CONFIG_MIGRATION
>> + .migratepage = buffer_migrate_page,
>> +#endif
>> };
>>
>> #define BLKDEV_FALLOC_FL_SUPPORTED \
>> _
>>
>> Patches currently in -mm which might be from [email protected] are
>>
>> compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
>>
>
>
> .
>

2017-03-09 12:15:15

by Jan Kara

[permalink] [raw]
Subject: Re: + compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch added to -mm tree

On Thu 09-03-17 08:58:45, Vlastimil Babka wrote:
> On 03/09/2017 12:55 AM, [email protected] wrote:
> >
> > The patch titled
> > Subject: compaction: add def_blk_aops migrate function for memory compaction
> > has been added to the -mm tree. Its filename is
> > compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
> >
> > This patch should soon appear at
> > http://ozlabs.org/~akpm/mmots/broken-out/compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
> > and later at
> > http://ozlabs.org/~akpm/mmotm/broken-out/compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
> >
> > Before you just go and hit "reply", please:
> > a) Consider who else should be cc'ed
> > b) Prefer to cc a suitable mailing list as well
> > c) Ideally: find the original patch on the mailing list and do a
> > reply-to-all to that, adding suitable additional cc's
> >
> > *** Remember to use Documentation/SubmitChecklist when testing your code ***
> >
> > The -mm tree is included into linux-next and is updated
> > there every 3-4 working days
> >
> > ------------------------------------------------------
> > From: zhouxianrong <[email protected]>
> > Subject: compaction: add def_blk_aops migrate function for memory compaction
>
> That's not really a mm/compaction patch, but a block layer/migration patch. I
> don't know internals of those so well, so I added some CC's.

Thanks!

> > The reason for doing this is based on two factors.
> >
> > 1. larg file read/write operations with order 0 can fragmentize
> > memory rapidly.
> >
> > 2. when a special filesystem does not supply migratepage callback,
> > kernel would fallback to default function fallback_migrate_page.
> > but fallback_migrate_page could not migrate diry page nicely;
> > specially kcompactd with MIGRATE_SYNC_LIGHT could not migrate
> > diry pages due to this until clear_page_dirty_for_io in some
> > procedure. i think it is not suitable here in this scenario.
> > for dirty pages we should migrate it rather than skip or writeout
> > it in kcomapctd with MIGRATE_SYNC_LIGHT. i think this problem is
> > for all filesystem without migratepage not only for block device fs.
> >
> > So for compaction under large file writing supply migratepage for
> > def_blk_aops.
>
> Is this really safe to do? buffer_migrate_page() has some assumptions listed in
> its comment (and maybe more that are not listed). Do we know it's safe to use it
> for all def_blk_aops users?

So this is definitely *not* OK for ext4 which is using def_blk_aops. Ext4
will create temporary buffer heads that alias buffer heads attached to the
block device page when committing a transaction (see fs/jbd2/journal.c:
jbd2_journal_write_metadata_buffer()) and these will point to the original
page.

Now I'm not saying ext4 / jbd2 cannot be modified so that it somehow makes
it clear to the page migration code it should stay away from the page.
However it demostrates that this patch is not safe as is...

Honza

>
> > Link: http://lkml.kernel.org/r/[email protected]
> > Signed-off-by: zhouxianrong <[email protected]>
> > Cc: Kirill A. Shutemov <[email protected]>
> > Cc: Johannes Weiner <[email protected]>
> > Cc: Minchan Kim <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Vlastimil Babka <[email protected]>
> > Cc: Al Viro <[email protected]>
> > Cc: <[email protected]>
> > Cc: <[email protected]>
> > Cc: <[email protected]>
> > Cc: <[email protected]>
> > Cc: <[email protected]>
> > Cc: <[email protected]>
> > Signed-off-by: Andrew Morton <[email protected]>
> > ---
> >
> > fs/block_dev.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff -puN fs/block_dev.c~compaction-add-def_blk_aops-migrate-function-for-memory-compaction fs/block_dev.c
> > --- a/fs/block_dev.c~compaction-add-def_blk_aops-migrate-function-for-memory-compaction
> > +++ a/fs/block_dev.c
> > @@ -2064,6 +2064,9 @@ static const struct address_space_operat
> > .releasepage = blkdev_releasepage,
> > .direct_IO = blkdev_direct_IO,
> > .is_dirty_writeback = buffer_check_dirty_writeback,
> > +#ifdef CONFIG_MIGRATION
> > + .migratepage = buffer_migrate_page,
> > +#endif
> > };
> >
> > #define BLKDEV_FALLOC_FL_SUPPORTED \
> > _
> >
> > Patches currently in -mm which might be from [email protected] are
> >
> > compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
> >
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-03-09 13:38:33

by Michal Hocko

[permalink] [raw]
Subject: Re: + compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch added to -mm tree

On Thu 09-03-17 13:15:07, Jan Kara wrote:
> On Thu 09-03-17 08:58:45, Vlastimil Babka wrote:
> > On 03/09/2017 12:55 AM, [email protected] wrote:
> > >
> > > The patch titled
> > > Subject: compaction: add def_blk_aops migrate function for memory compaction
> > > has been added to the -mm tree. Its filename is
> > > compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
> > >
> > > This patch should soon appear at
> > > http://ozlabs.org/~akpm/mmots/broken-out/compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
> > > and later at
> > > http://ozlabs.org/~akpm/mmotm/broken-out/compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch
> > >
> > > Before you just go and hit "reply", please:
> > > a) Consider who else should be cc'ed
> > > b) Prefer to cc a suitable mailing list as well
> > > c) Ideally: find the original patch on the mailing list and do a
> > > reply-to-all to that, adding suitable additional cc's
> > >
> > > *** Remember to use Documentation/SubmitChecklist when testing your code ***
> > >
> > > The -mm tree is included into linux-next and is updated
> > > there every 3-4 working days
> > >
> > > ------------------------------------------------------
> > > From: zhouxianrong <[email protected]>
> > > Subject: compaction: add def_blk_aops migrate function for memory compaction
> >
> > That's not really a mm/compaction patch, but a block layer/migration patch. I
> > don't know internals of those so well, so I added some CC's.
>
> Thanks!
>
> > > The reason for doing this is based on two factors.
> > >
> > > 1. larg file read/write operations with order 0 can fragmentize
> > > memory rapidly.
> > >
> > > 2. when a special filesystem does not supply migratepage callback,
> > > kernel would fallback to default function fallback_migrate_page.
> > > but fallback_migrate_page could not migrate diry page nicely;
> > > specially kcompactd with MIGRATE_SYNC_LIGHT could not migrate
> > > diry pages due to this until clear_page_dirty_for_io in some
> > > procedure. i think it is not suitable here in this scenario.
> > > for dirty pages we should migrate it rather than skip or writeout
> > > it in kcomapctd with MIGRATE_SYNC_LIGHT. i think this problem is
> > > for all filesystem without migratepage not only for block device fs.
> > >
> > > So for compaction under large file writing supply migratepage for
> > > def_blk_aops.
> >
> > Is this really safe to do? buffer_migrate_page() has some assumptions listed in
> > its comment (and maybe more that are not listed). Do we know it's safe to use it
> > for all def_blk_aops users?
>
> So this is definitely *not* OK for ext4 which is using def_blk_aops. Ext4
> will create temporary buffer heads that alias buffer heads attached to the
> block device page when committing a transaction (see fs/jbd2/journal.c:
> jbd2_journal_write_metadata_buffer()) and these will point to the original
> page.
>
> Now I'm not saying ext4 / jbd2 cannot be modified so that it somehow makes
> it clear to the page migration code it should stay away from the page.
> However it demostrates that this patch is not safe as is...

Which would suggest to enahnce the documentation of buffer_migrate_page.
--
Michal Hocko
SUSE Labs