2020-10-22 17:51:53

by Gao Xiang

[permalink] [raw]
Subject: [PATCH 1/4] erofs: fix setting up pcluster for temporary pages

From: Gao Xiang <[email protected]>

pcluster should be only set up for all managed pages instead of
temporary pages. Since it currently uses page->mapping to identify,
the impact is minor for now.

Fixes: 5ddcee1f3a1c ("erofs: get rid of __stagingpage_alloc helper")
Cc: <[email protected]> # 5.5+
Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/zdata.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 50912a5420b4..86fd3bf62af6 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1078,8 +1078,11 @@ static struct page *pickup_page_for_submission(struct z_erofs_pcluster *pcl,
cond_resched();
goto repeat;
}
- set_page_private(page, (unsigned long)pcl);
- SetPagePrivate(page);
+
+ if (tocache) {
+ set_page_private(page, (unsigned long)pcl);
+ SetPagePrivate(page);
+ }
out: /* the only exit (for tracing and debugging) */
return page;
}
--
2.24.0


2020-10-22 17:52:05

by Gao Xiang

[permalink] [raw]
Subject: [PATCH 4/4] erofs: complete a missing case for inplace I/O

From: Gao Xiang <[email protected]>

Add a missing case which could cause unnecessary page allocation
but not directly use inplace I/O instead, which increases runtime
extra memory footprint.

The detail is, considering a file-backed page, the right half of
the page is chosen to be cached (e.g. the end page) and some of
its data doesn't exist in managed cache, so the pcluster will be
kept in the submission chain. (In other words, it cannot be
decompressed in advance, for example, due to the bypass chain).

Currently, the pcluster for this case is downgraded as NOINPLACE,
and stop the left half of the page from doing inplace I/O (so an
extra page allocation is needed then.)

Let's avoid such unnecessary downgrade instead.

Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/zdata.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index edd7325570e1..ef12275f7fcc 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -160,7 +160,8 @@ static void preload_compressed_pages(struct z_erofs_collector *clt,
const unsigned int clusterpages = BIT(pcl->clusterbits);
struct page **pages = clt->compressedpages;
pgoff_t index = pcl->obj.index + (pages - pcl->compressed_pages);
- bool standalone = true;
+ bool io_cant_bypass = false;
+ bool updated = false;

if (clt->mode < COLLECT_PRIMARY_FOLLOWED)
return;
@@ -180,20 +181,31 @@ static void preload_compressed_pages(struct z_erofs_collector *clt,
} else if (type == DELAYEDALLOC) {
t = tagptr_init(compressed_page_t, PAGE_UNALLOCATED);
} else { /* DONTALLOC */
- if (standalone)
+ /* update to first inplace I/O page */
+ if (!updated) {
clt->compressedpages = pages;
- standalone = false;
+ updated = true;
+ }
+ io_cant_bypass = true;
continue;
}

- if (!cmpxchg_relaxed(pages, NULL, tagptr_cast_ptr(t)))
+ if (!cmpxchg_relaxed(pages, NULL, tagptr_cast_ptr(t))) {
+ if (type == DELAYEDALLOC)
+ io_cant_bypass = true;
continue;
+ }

if (page)
put_page(page);
}

- if (standalone) /* downgrade to PRIMARY_FOLLOWED_NOINPLACE */
+ /*
+ * for COLLECT_PRIMARY_FOLLOWED pcluster, if all managed pages have
+ * been found (which means it can be handled without submittng I/O)
+ * it's unsafe to do inplace I/O. let's downgrade to NOINPLACE instead.
+ */
+ if (!io_cant_bypass)
clt->mode = COLLECT_PRIMARY_FOLLOWED_NOINPLACE;
}

--
2.24.0

2020-10-30 12:42:05

by Vladimir Zapolskiy

[permalink] [raw]
Subject: Re: [PATCH 1/4] erofs: fix setting up pcluster for temporary pages

Hello Gao Xiang,

On 10/22/20 5:57 PM, Gao Xiang via Linux-erofs wrote:
> From: Gao Xiang <[email protected]>
>
> pcluster should be only set up for all managed pages instead of
> temporary pages. Since it currently uses page->mapping to identify,
> the impact is minor for now.
>
> Fixes: 5ddcee1f3a1c ("erofs: get rid of __stagingpage_alloc helper")
> Cc: <[email protected]> # 5.5+
> Signed-off-by: Gao Xiang <[email protected]>

I was looking exactly at this problem recently, my change is one-to-one
to your fix, thus I can provide a tag:

Tested-by: Vladimir Zapolskiy <[email protected]>


The fixed problem is minor, but the kernel log becomes polluted, if
a page allocation debug option is enabled:

% md5sum ~/erofs/testfile
BUG: Bad page state in process kworker/u9:0 pfn:687de
page:0000000057b8bcb4 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x687de
flags: 0x4000000000002000(private)
raw: 4000000000002000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 ffff888066758690 00000000ffffffff 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
Modules linked in:
CPU: 1 PID: 602 Comm: kworker/u9:0 Not tainted 5.9.1 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Workqueue: erofs_unzipd z_erofs_decompressqueue_work
Call Trace:
dump_stack+0x84/0xba
bad_page.cold+0xac/0xb1
check_free_page_bad+0xb0/0xc0
free_pcp_prepare+0x2c8/0x2d0
free_unref_page+0x18/0xf0
put_pages_list+0x11a/0x120
z_erofs_decompressqueue_work+0xc9/0x110
? z_erofs_decompress_pcluster.isra.0+0xf10/0xf10
? read_word_at_a_time+0x12/0x20
? strscpy+0xc7/0x1a0
process_one_work+0x30c/0x730
worker_thread+0x91/0x640
? __kasan_check_read+0x11/0x20
? rescuer_thread+0x8a0/0x8a0
kthread+0x1dd/0x200
? kthread_unpark+0xa0/0xa0
ret_from_fork+0x1f/0x30
Disabling lock debugging due to kernel taint

--
Best wishes,
Vladimir

2020-10-30 12:52:12

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH 1/4] erofs: fix setting up pcluster for temporary pages

Hi Vladimir,

On Fri, Oct 30, 2020 at 02:20:31PM +0200, Vladimir Zapolskiy wrote:
> Hello Gao Xiang,
>
> On 10/22/20 5:57 PM, Gao Xiang via Linux-erofs wrote:
> > From: Gao Xiang <[email protected]>
> >
> > pcluster should be only set up for all managed pages instead of
> > temporary pages. Since it currently uses page->mapping to identify,
> > the impact is minor for now.
> >
> > Fixes: 5ddcee1f3a1c ("erofs: get rid of __stagingpage_alloc helper")
> > Cc: <[email protected]> # 5.5+
> > Signed-off-by: Gao Xiang <[email protected]>
>
> I was looking exactly at this problem recently, my change is one-to-one
> to your fix, thus I can provide a tag:
>
> Tested-by: Vladimir Zapolskiy <[email protected]>

Many thanks for confirming this!
I found this when I was killing magical stagingpage page->mapping,
it's somewhat late :-)

>
>
> The fixed problem is minor, but the kernel log becomes polluted, if
> a page allocation debug option is enabled:
>
> % md5sum ~/erofs/testfile
> BUG: Bad page state in process kworker/u9:0 pfn:687de
> page:0000000057b8bcb4 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x687de
> flags: 0x4000000000002000(private)
> raw: 4000000000002000 dead000000000100 dead000000000122 0000000000000000
> raw: 0000000000000000 ffff888066758690 00000000ffffffff 0000000000000000
> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> Modules linked in:
> CPU: 1 PID: 602 Comm: kworker/u9:0 Not tainted 5.9.1 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> Workqueue: erofs_unzipd z_erofs_decompressqueue_work
> Call Trace:
> dump_stack+0x84/0xba
> bad_page.cold+0xac/0xb1
> check_free_page_bad+0xb0/0xc0
> free_pcp_prepare+0x2c8/0x2d0
> free_unref_page+0x18/0xf0
> put_pages_list+0x11a/0x120
> z_erofs_decompressqueue_work+0xc9/0x110
> ? z_erofs_decompress_pcluster.isra.0+0xf10/0xf10
> ? read_word_at_a_time+0x12/0x20
> ? strscpy+0xc7/0x1a0
> process_one_work+0x30c/0x730
> worker_thread+0x91/0x640
> ? __kasan_check_read+0x11/0x20
> ? rescuer_thread+0x8a0/0x8a0
> kthread+0x1dd/0x200
> ? kthread_unpark+0xa0/0xa0
> ret_from_fork+0x1f/0x30
> Disabling lock debugging due to kernel taint

Yeah, I can make a pull-request to Linus if you need this to be in master
now, or I can post it for v5.11-rc1 since 5.4 LTS isn't effected (and it
would be only a print problem with debugging option.)

Thanks,
Gao Xiang

>
> --
> Best wishes,
> Vladimir
>

2020-10-30 13:36:02

by Vladimir Zapolskiy

[permalink] [raw]
Subject: Re: [PATCH 1/4] erofs: fix setting up pcluster for temporary pages

Hi Gao Xiang,

On 10/30/20 2:47 PM, Gao Xiang wrote:
> Hi Vladimir,
>
> On Fri, Oct 30, 2020 at 02:20:31PM +0200, Vladimir Zapolskiy wrote:
>> Hello Gao Xiang,
>>
>> On 10/22/20 5:57 PM, Gao Xiang via Linux-erofs wrote:
>>> From: Gao Xiang <[email protected]>
>>>
>>> pcluster should be only set up for all managed pages instead of
>>> temporary pages. Since it currently uses page->mapping to identify,
>>> the impact is minor for now.
>>>
>>> Fixes: 5ddcee1f3a1c ("erofs: get rid of __stagingpage_alloc helper")
>>> Cc: <[email protected]> # 5.5+
>>> Signed-off-by: Gao Xiang <[email protected]>
>>
>> I was looking exactly at this problem recently, my change is one-to-one
>> to your fix, thus I can provide a tag:
>>
>> Tested-by: Vladimir Zapolskiy <[email protected]>
>
> Many thanks for confirming this!
> I found this when I was killing magical stagingpage page->mapping,
> it's somewhat late :-)
>

sure, for me it was an exciting immersion into the filesystem code :)

>>
>>
>> The fixed problem is minor, but the kernel log becomes polluted, if
>> a page allocation debug option is enabled:
>>
>> % md5sum ~/erofs/testfile
>> BUG: Bad page state in process kworker/u9:0 pfn:687de
>> page:0000000057b8bcb4 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x687de
>> flags: 0x4000000000002000(private)
>> raw: 4000000000002000 dead000000000100 dead000000000122 0000000000000000
>> raw: 0000000000000000 ffff888066758690 00000000ffffffff 0000000000000000
>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> Modules linked in:
>> CPU: 1 PID: 602 Comm: kworker/u9:0 Not tainted 5.9.1 #2
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
>> Workqueue: erofs_unzipd z_erofs_decompressqueue_work
>> Call Trace:
>> dump_stack+0x84/0xba
>> bad_page.cold+0xac/0xb1
>> check_free_page_bad+0xb0/0xc0
>> free_pcp_prepare+0x2c8/0x2d0
>> free_unref_page+0x18/0xf0
>> put_pages_list+0x11a/0x120
>> z_erofs_decompressqueue_work+0xc9/0x110
>> ? z_erofs_decompress_pcluster.isra.0+0xf10/0xf10
>> ? read_word_at_a_time+0x12/0x20
>> ? strscpy+0xc7/0x1a0
>> process_one_work+0x30c/0x730
>> worker_thread+0x91/0x640
>> ? __kasan_check_read+0x11/0x20
>> ? rescuer_thread+0x8a0/0x8a0
>> kthread+0x1dd/0x200
>> ? kthread_unpark+0xa0/0xa0
>> ret_from_fork+0x1f/0x30
>> Disabling lock debugging due to kernel taint
>
> Yeah, I can make a pull-request to Linus if you need this to be in master
> now, or I can post it for v5.11-rc1 since 5.4 LTS isn't effected (and it
> would be only a print problem with debugging option.)
>

As for myself I don't utterly need this fix on the master branch ASAP, however
it might be reasonable to get it included right into the next v5.10 release,
because I believe it'll be an LTS. Eventually it's up to you to make a decision,
from my side I won't urge you, the fixed issue is obviously a non-critical one.

Thank you for the original fix and taking my opinion into consideration :)

--
Best wishes,
Vladimir

2020-10-30 14:12:37

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH 1/4] erofs: fix setting up pcluster for temporary pages

On Fri, Oct 30, 2020 at 03:32:55PM +0200, Vladimir Zapolskiy wrote:
> Hi Gao Xiang,
>
> On 10/30/20 2:47 PM, Gao Xiang wrote:
> > Hi Vladimir,
> >
> > On Fri, Oct 30, 2020 at 02:20:31PM +0200, Vladimir Zapolskiy wrote:
> > > Hello Gao Xiang,
> > >
> > > On 10/22/20 5:57 PM, Gao Xiang via Linux-erofs wrote:
> > > > From: Gao Xiang <[email protected]>
> > > >
> > > > pcluster should be only set up for all managed pages instead of
> > > > temporary pages. Since it currently uses page->mapping to identify,
> > > > the impact is minor for now.
> > > >
> > > > Fixes: 5ddcee1f3a1c ("erofs: get rid of __stagingpage_alloc helper")
> > > > Cc: <[email protected]> # 5.5+
> > > > Signed-off-by: Gao Xiang <[email protected]>
> > >
> > > I was looking exactly at this problem recently, my change is one-to-one
> > > to your fix, thus I can provide a tag:
> > >
> > > Tested-by: Vladimir Zapolskiy <[email protected]>
> >
> > Many thanks for confirming this!
> > I found this when I was killing magical stagingpage page->mapping,
> > it's somewhat late :-)
> >
>
> sure, for me it was an exciting immersion into the filesystem code :)

Thanks for your effort on this!

You could also post related kernel message in advance and
I will definitly look into that as well. :)

>
> > >
> > >
> > > The fixed problem is minor, but the kernel log becomes polluted, if
> > > a page allocation debug option is enabled:
> > >
> > > % md5sum ~/erofs/testfile
> > > BUG: Bad page state in process kworker/u9:0 pfn:687de
> > > page:0000000057b8bcb4 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x687de
> > > flags: 0x4000000000002000(private)
> > > raw: 4000000000002000 dead000000000100 dead000000000122 0000000000000000
> > > raw: 0000000000000000 ffff888066758690 00000000ffffffff 0000000000000000
> > > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> > > Modules linked in:
> > > CPU: 1 PID: 602 Comm: kworker/u9:0 Not tainted 5.9.1 #2
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > > Workqueue: erofs_unzipd z_erofs_decompressqueue_work
> > > Call Trace:
> > > dump_stack+0x84/0xba
> > > bad_page.cold+0xac/0xb1
> > > check_free_page_bad+0xb0/0xc0
> > > free_pcp_prepare+0x2c8/0x2d0
> > > free_unref_page+0x18/0xf0
> > > put_pages_list+0x11a/0x120
> > > z_erofs_decompressqueue_work+0xc9/0x110
> > > ? z_erofs_decompress_pcluster.isra.0+0xf10/0xf10
> > > ? read_word_at_a_time+0x12/0x20
> > > ? strscpy+0xc7/0x1a0
> > > process_one_work+0x30c/0x730
> > > worker_thread+0x91/0x640
> > > ? __kasan_check_read+0x11/0x20
> > > ? rescuer_thread+0x8a0/0x8a0
> > > kthread+0x1dd/0x200
> > > ? kthread_unpark+0xa0/0xa0
> > > ret_from_fork+0x1f/0x30
> > > Disabling lock debugging due to kernel taint
> >
> > Yeah, I can make a pull-request to Linus if you need this to be in master
> > now, or I can post it for v5.11-rc1 since 5.4 LTS isn't effected (and it
> > would be only a print problem with debugging option.)
> >
>
> As for myself I don't utterly need this fix on the master branch ASAP, however
> it might be reasonable to get it included right into the next v5.10 release,
> because I believe it'll be an LTS. Eventually it's up to you to make a decision,
> from my side I won't urge you, the fixed issue is obviously a non-critical one.
>
> Thank you for the original fix and taking my opinion into consideration :)

Yeah, v5.10 is a LTS version, and you are right, I will try to make a
pull-request after I get Chao's RVB.

Thanks,
Gao Xiang

>
> --
> Best wishes,
> Vladimir
>

2020-11-04 01:07:57

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH 1/4] erofs: fix setting up pcluster for temporary pages

On 2020/10/22 22:57, Gao Xiang wrote:
> From: Gao Xiang <[email protected]>
>
> pcluster should be only set up for all managed pages instead of
> temporary pages. Since it currently uses page->mapping to identify,
> the impact is minor for now.
>
> Fixes: 5ddcee1f3a1c ("erofs: get rid of __stagingpage_alloc helper")
> Cc: <[email protected]> # 5.5+
> Signed-off-by: Gao Xiang <[email protected]>

Reviewed-by: Chao Yu <[email protected]>

Thanks,

2020-11-04 01:13:44

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH 1/4] erofs: fix setting up pcluster for temporary pages

On Wed, Nov 04, 2020 at 09:05:56AM +0800, Chao Yu wrote:
> On 2020/10/22 22:57, Gao Xiang wrote:
> > From: Gao Xiang <[email protected]>
> >
> > pcluster should be only set up for all managed pages instead of
> > temporary pages. Since it currently uses page->mapping to identify,
> > the impact is minor for now.
> >
> > Fixes: 5ddcee1f3a1c ("erofs: get rid of __stagingpage_alloc helper")
> > Cc: <[email protected]> # 5.5+
> > Signed-off-by: Gao Xiang <[email protected]>
>
> Reviewed-by: Chao Yu <[email protected]>

Thanks, I've also added a note to the commit message like this,
"
[ Update: Vladimir reported the kernel log becomes polluted
because PAGE_FLAGS_CHECK_AT_FREE flag(s) set if the page
allocation debug option is enabled. ]
"
Will apply all of this to -fixes branch.

Thanks,
Gao Xiang

>
> Thanks,
>

2020-11-04 01:49:04

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH 1/4] erofs: fix setting up pcluster for temporary pages

On 2020/11/4 9:11, Gao Xiang wrote:
> On Wed, Nov 04, 2020 at 09:05:56AM +0800, Chao Yu wrote:
>> On 2020/10/22 22:57, Gao Xiang wrote:
>>> From: Gao Xiang <[email protected]>
>>>
>>> pcluster should be only set up for all managed pages instead of
>>> temporary pages. Since it currently uses page->mapping to identify,
>>> the impact is minor for now.
>>>
>>> Fixes: 5ddcee1f3a1c ("erofs: get rid of __stagingpage_alloc helper")
>>> Cc: <[email protected]> # 5.5+
>>> Signed-off-by: Gao Xiang <[email protected]>
>>
>> Reviewed-by: Chao Yu <[email protected]>
>
> Thanks, I've also added a note to the commit message like this,
> "
> [ Update: Vladimir reported the kernel log becomes polluted
> because PAGE_FLAGS_CHECK_AT_FREE flag(s) set if the page
> allocation debug option is enabled. ]
> "
> Will apply all of this to -fixes branch.

Thanks for noticing that, looks fine to me.

Thanks,

>
> Thanks,
> Gao Xiang
>
>>
>> Thanks,
>>
>
> .
>