LinuxLists.cc - [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev

2023-11-30 15:22:20

Subject: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store

We encountered a deadlock issue when using zram on kernel version 5.10.
The reason is that the backing_dev_store will first hold the
zram->init_lock, and then it will allocate memory with kvmalloc_node. At
this moment, the system may be in a state of memory shortage, so it will
enter the memory swapping process. In zram_bvec_write, we will trigger
our own thread to move data from zram to the backing device to free up
more available memory, which is the work done in the
try_wakeup_zram_wbd. In this function, zram->init_lock will be acquired
again to read zram->bd_wb_limit, which causes a deadlock as follow call
trace.

The latest version of Linux does not have the bug, but a potential risk
in future.If we try to acquire zram->init_lock again in the zram process
(for example, when we need to read zram->bd_wb_limit), there is a risk
of deadlock. The bug is quite elusive, and it only appears with low
probability after conducting a week-long stress test using 50 devices.
Therefore, I suggest passing the GFP_ATOMIC flag to allocate memory in
the backing_dev_store, to avoid similar issues we encountered. Please
consider it, Thank you.

INFO: task init:331 blocked for more than 120 seconds. "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:init state:D stack: 0 pid: 1 ppid: 0 flags:0x04000000
Call trace:
__switch_to+0x244/0x4e4
__schedule+0x5bc/0xc48
schedule+0x80/0x164
rwsem_down_read_slowpath+0x4fc/0xf9c
__down_read+0x140/0x188
down_read+0x14/0x24
try_wakeup_wbd_thread+0x78/0x1ec [zram]
__zram_bvec_write+0x720/0x878 [zram]
zram_bvec_rw+0xa8/0x234 [zram]
zram_submit_bio+0x16c/0x268 [zram]
submit_bio_noacct+0x128/0x3c8
submit_bio+0x1cc/0x3d0
__swap_writepage+0x5c4/0xd4c
swap_writepage+0x130/0x158
pageout+0x1f4/0x478
shrink_page_list+0x9b4/0x1eb8
shrink_inactive_list+0x2f4/0xaa8
shrink_lruvec+0x184/0x340
shrink_node_memcgs+0x84/0x3a0
shrink_node+0x2c4/0x6c4
shrink_zones+0x16c/0x29c
do_try_to_free_pages+0xe4/0x2b4
try_to_free_pages+0x388/0x7b4
__alloc_pages_direct_reclaim+0x88/0x278
__alloc_pages_slowpath+0x4ec/0xf6c
__alloc_pages_nodemask+0x1f4/0x3dc
kmalloc_order+0x54/0x338
kmalloc_order_trace+0x34/0x1bc
__kmalloc+0x5e8/0x9c0
kvmalloc_node+0xa8/0x264
backing_dev_store+0x1a4/0x818 [zram]
dev_attr_store+0x38/0x8c
sysfs_kf_write+0x64/0xc4
kernfs_fop_write_iter+0x168/0x2ac
vfs_write+0x300/0x37c
ksys_write+0x7c/0xf0
__arm64_sys_write+0x20/0x30
el0_svc_common+0xd4/0x270
el0_svc+0x28/0x98
el0_sync_handler+0x8c/0xf0
el0_sync+0x1b4/0x1c0

Signed-off-by: Dongyun Liu <[email protected]>
---
drivers/block/zram/zram_drv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index d77d3664ca08..ee6c22c50e09 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -514,7 +514,7 @@ static ssize_t backing_dev_store(struct device *dev,

nr_pages = i_size_read(inode) >> PAGE_SHIFT;
bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
- bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
+ bitmap = kmalloc(bitmap_sz, GFP_ATOMIC);
if (!bitmap) {
err = -ENOMEM;
goto out;
--
2.25.1

2023-11-30 15:37:57

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store

On 11/30/23 8:20 AM, Dongyun Liu wrote:
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index d77d3664ca08..ee6c22c50e09 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -514,7 +514,7 @@ static ssize_t backing_dev_store(struct device *dev,
>
> nr_pages = i_size_read(inode) >> PAGE_SHIFT;
> bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
> - bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
> + bitmap = kmalloc(bitmap_sz, GFP_ATOMIC);
> if (!bitmap) {
> err = -ENOMEM;
> goto out;

Outside of this moving from a zeroed alloc to one that does not, the
change looks woefully incomplete. Why does this allocation need to be
GFP_ATOMIC, and:

1) file_name = kmalloc(PATH_MAX, GFP_KERNEL); does not
2) filp_open() -> getname_kernel() -> __getname() does not
3) filp_open() -> getname_kernel() does not
4) bdev_open_by_dev() does not

IOW, you have a slew of GFP_KERNEL allocations in there, and you
probably just patched the largest one. But the core issue remains.

The whole handling of backing_dev_store() looks pretty broken.

--
Jens Axboe

2023-12-01 06:51:54

by Dongyun Liu

[permalink] [raw]

Subject: Re: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store

On 2023/11/30 23:37, Jens Axboe wrote:
> On 11/30/23 8:20 AM, Dongyun Liu wrote:
>> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> index d77d3664ca08..ee6c22c50e09 100644
>> --- a/drivers/block/zram/zram_drv.c
>> +++ b/drivers/block/zram/zram_drv.c
>> @@ -514,7 +514,7 @@ static ssize_t backing_dev_store(struct device *dev,
>>
>> nr_pages = i_size_read(inode) >> PAGE_SHIFT;
>> bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
>> - bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
>> + bitmap = kmalloc(bitmap_sz, GFP_ATOMIC);
>> if (!bitmap) {
>> err = -ENOMEM;
>> goto out;
>
> Outside of this moving from a zeroed alloc to one that does not, the
> change looks woefully incomplete. Why does this allocation need to be
> GFP_ATOMIC, and:

By using GFP_ATOMIC, it indicates that the caller cannot reclaim or
sleep, although we can prevent the risk of deadlock when acquiring the
zram->lock again in zram_bvec_write.

>
> 1) file_name = kmalloc(PATH_MAX, GFP_KERNEL); does not

There is no zram->init_lock held here, so there is no need to use
GFP_ATOMIC.

> 2) filp_open() -> getname_kernel() -> __getname() does not
> 3) filp_open() -> getname_kernel() does not
> 4) bdev_open_by_dev() does not

Missing the use of GFP_ATOMIC.

>
> IOW, you have a slew of GFP_KERNEL allocations in there, and you
> probably just patched the largest one. But the core issue remains.
>
> The whole handling of backing_dev_store() looks pretty broken.
>

Indeed, this patch only solves the biggest problem and does not
fundamentally solve it, because there are many processes for holding
zram->init_lock before allocation memory in backing_dev_store that need
to be fully modified, and I did not consider it thoroughly. Obviously,
a larger and better patch is needed to eliminate this risk, but it is
currently not necessary.

Thank you for your kind and patient.

2023-12-01 14:19:32

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store

On 11/30/23 11:51 PM, Dongyun Liu wrote:
>
>
> On 2023/11/30 23:37, Jens Axboe wrote:
>> On 11/30/23 8:20 AM, Dongyun Liu wrote:
>>> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>>> index d77d3664ca08..ee6c22c50e09 100644
>>> --- a/drivers/block/zram/zram_drv.c
>>> +++ b/drivers/block/zram/zram_drv.c
>>> @@ -514,7 +514,7 @@ static ssize_t backing_dev_store(struct device *dev,
>>> nr_pages = i_size_read(inode) >> PAGE_SHIFT;
>>> bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
>>> - bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
>>> + bitmap = kmalloc(bitmap_sz, GFP_ATOMIC);
>>> if (!bitmap) {
>>> err = -ENOMEM;
>>> goto out;
>>
>> Outside of this moving from a zeroed alloc to one that does not, the
>> change looks woefully incomplete. Why does this allocation need to be
>> GFP_ATOMIC, and:
>
> By using GFP_ATOMIC, it indicates that the caller cannot reclaim or
> sleep, although we can prevent the risk of deadlock when acquiring
> the zram->lock again in zram_bvec_write.

Yes, I am very much aware of how gfp allocation flags work and how why
it's broken. It was a rhetorical question as to why you think you could
get away with just fixing one of them.

>> 1) file_name = kmalloc(PATH_MAX, GFP_KERNEL); does not
>
> There is no zram->init_lock held here, so there is no need to use
> GFP_ATOMIC.

True

>> 2) filp_open() -> getname_kernel() -> __getname() does not
>> 3) filp_open() -> getname_kernel() does not
>> 4) bdev_open_by_dev() does not
>
> Missing the use of GFP_ATOMIC.

Indeed!

>> IOW, you have a slew of GFP_KERNEL allocations in there, and you
>> probably just patched the largest one. But the core issue remains.
>>
>> The whole handling of backing_dev_store() looks pretty broken.
>>
>
> Indeed, this patch only solves the biggest problem and does not
> fundamentally solve it, because there are many processes for holding
> zram->init_lock before allocation memory in backing_dev_store that
> need to be fully modified, and I did not consider it thoroughly.
> Obviously, a larger and better patch is needed to eliminate this risk,
> but it is currently not necessary.

You agree that it doesn't fix the issue, it just happens to fix the one
that you hit. And then you jump to the conclusion that this is all
that's needed to fix it. Ehm, confused?

--
Jens Axboe

2023-12-01 15:28:50

by Sergey Senozhatsky

[permalink] [raw]

Subject: Re: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store

On (23/12/01 07:19), Jens Axboe wrote:
> >> IOW, you have a slew of GFP_KERNEL allocations in there, and you
> >> probably just patched the largest one. But the core issue remains.
> >>
> >> The whole handling of backing_dev_store() looks pretty broken.
> >>
> >
> > Indeed, this patch only solves the biggest problem and does not
> > fundamentally solve it, because there are many processes for holding
> > zram->init_lock before allocation memory in backing_dev_store that
> > need to be fully modified, and I did not consider it thoroughly.
> > Obviously, a larger and better patch is needed to eliminate this risk,
> > but it is currently not necessary.
>
> You agree that it doesn't fix the issue, it just happens to fix the one
> that you hit. And then you jump to the conclusion that this is all
> that's needed to fix it. Ehm, confused?

Yeah.

zram is very sensitive to memory - zsmalloc pool needs physical pages
(allocated on demand) to keep data written to zram. zram probably simply
should not be used on systems under such heavy memory pressure. Throwing
GPF_ATOMICs at zram isn't going to fix anything.

Jens, you said that zram's backing device handling is broken. Got a minute
to elaborate?

2023-12-01 15:40:51

by Sergey Senozhatsky

[permalink] [raw]

Subject: Re: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store

On (23/11/30 23:20), Dongyun Liu wrote:
> INFO: task init:331 blocked for more than 120 seconds. "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:init state:D stack: 0 pid: 1 ppid: 0 flags:0x04000000
> Call trace:
> __switch_to+0x244/0x4e4
> __schedule+0x5bc/0xc48
> schedule+0x80/0x164
> rwsem_down_read_slowpath+0x4fc/0xf9c
> __down_read+0x140/0x188
> down_read+0x14/0x24
> try_wakeup_wbd_thread+0x78/0x1ec [zram]
> __zram_bvec_write+0x720/0x878 [zram]
> zram_bvec_rw+0xa8/0x234 [zram]
> zram_submit_bio+0x16c/0x268 [zram]
> submit_bio_noacct+0x128/0x3c8
> submit_bio+0x1cc/0x3d0
> __swap_writepage+0x5c4/0xd4c
> swap_writepage+0x130/0x158
> pageout+0x1f4/0x478
> shrink_page_list+0x9b4/0x1eb8
> shrink_inactive_list+0x2f4/0xaa8
> shrink_lruvec+0x184/0x340
> shrink_node_memcgs+0x84/0x3a0
> shrink_node+0x2c4/0x6c4
> shrink_zones+0x16c/0x29c
> do_try_to_free_pages+0xe4/0x2b4
> try_to_free_pages+0x388/0x7b4
> __alloc_pages_direct_reclaim+0x88/0x278
> __alloc_pages_slowpath+0x4ec/0xf6c
> __alloc_pages_nodemask+0x1f4/0x3dc
> kmalloc_order+0x54/0x338
> kmalloc_order_trace+0x34/0x1bc
> __kmalloc+0x5e8/0x9c0
> kvmalloc_node+0xa8/0x264
> backing_dev_store+0x1a4/0x818 [zram]
> dev_attr_store+0x38/0x8c
> sysfs_kf_write+0x64/0xc4

Hmm, I'm not really following this backtrace. Backing device
configuration is only possible on un-initialized zram device.
If it's uninitialized, then why is it being used for swapout
later in the call stack?

2023-12-01 18:41:54

by Minchan Kim

[permalink] [raw]

Subject: Re: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store

On Thu, Nov 30, 2023 at 11:20:47PM +0800, Dongyun Liu wrote:
> We encountered a deadlock issue when using zram on kernel version 5.10.
> The reason is that the backing_dev_store will first hold the
> zram->init_lock, and then it will allocate memory with kvmalloc_node. At
> this moment, the system may be in a state of memory shortage, so it will
> enter the memory swapping process. In zram_bvec_write, we will trigger
> our own thread to move data from zram to the backing device to free up
> more available memory, which is the work done in the
> try_wakeup_zram_wbd. In this function, zram->init_lock will be acquired
> again to read zram->bd_wb_limit, which causes a deadlock as follow call
> trace.

Isn't it your custom feature instead of upstream?

>
> The latest version of Linux does not have the bug, but a potential risk
> in future.If we try to acquire zram->init_lock again in the zram process
> (for example, when we need to read zram->bd_wb_limit), there is a risk
> of deadlock. The bug is quite elusive, and it only appears with low

It would be very helpful if you show how the zram in the upstream could
cause the deadlock instead of your custom feature.

> probability after conducting a week-long stress test using 50 devices.
> Therefore, I suggest passing the GFP_ATOMIC flag to allocate memory in
> the backing_dev_store, to avoid similar issues we encountered. Please
> consider it, Thank you.
>
> INFO: task init:331 blocked for more than 120 seconds. "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:init state:D stack: 0 pid: 1 ppid: 0 flags:0x04000000
> Call trace:
> __switch_to+0x244/0x4e4
> __schedule+0x5bc/0xc48
> schedule+0x80/0x164
> rwsem_down_read_slowpath+0x4fc/0xf9c
> __down_read+0x140/0x188
> down_read+0x14/0x24
> try_wakeup_wbd_thread+0x78/0x1ec [zram]
> __zram_bvec_write+0x720/0x878 [zram]
> zram_bvec_rw+0xa8/0x234 [zram]
> zram_submit_bio+0x16c/0x268 [zram]
> submit_bio_noacct+0x128/0x3c8
> submit_bio+0x1cc/0x3d0
> __swap_writepage+0x5c4/0xd4c
> swap_writepage+0x130/0x158
> pageout+0x1f4/0x478
> shrink_page_list+0x9b4/0x1eb8
> shrink_inactive_list+0x2f4/0xaa8
> shrink_lruvec+0x184/0x340
> shrink_node_memcgs+0x84/0x3a0
> shrink_node+0x2c4/0x6c4
> shrink_zones+0x16c/0x29c
> do_try_to_free_pages+0xe4/0x2b4
> try_to_free_pages+0x388/0x7b4
> __alloc_pages_direct_reclaim+0x88/0x278
> __alloc_pages_slowpath+0x4ec/0xf6c
> __alloc_pages_nodemask+0x1f4/0x3dc
> kmalloc_order+0x54/0x338
> kmalloc_order_trace+0x34/0x1bc
> __kmalloc+0x5e8/0x9c0
> kvmalloc_node+0xa8/0x264
> backing_dev_store+0x1a4/0x818 [zram]
> dev_attr_store+0x38/0x8c
> sysfs_kf_write+0x64/0xc4
> kernfs_fop_write_iter+0x168/0x2ac
> vfs_write+0x300/0x37c
> ksys_write+0x7c/0xf0
> __arm64_sys_write+0x20/0x30
> el0_svc_common+0xd4/0x270
> el0_svc+0x28/0x98
> el0_sync_handler+0x8c/0xf0
> el0_sync+0x1b4/0x1c0
>
> Signed-off-by: Dongyun Liu <[email protected]>
> ---
> drivers/block/zram/zram_drv.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index d77d3664ca08..ee6c22c50e09 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -514,7 +514,7 @@ static ssize_t backing_dev_store(struct device *dev,
>
> nr_pages = i_size_read(inode) >> PAGE_SHIFT;
> bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
> - bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
> + bitmap = kmalloc(bitmap_sz, GFP_ATOMIC);
> if (!bitmap) {
> err = -ENOMEM;
> goto out;
> --
> 2.25.1
>

2023-12-02 13:54:58

by Dongyun Liu

[permalink] [raw]

Subject: Re: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store

On 2023/12/1 22:19, Jens Axboe wrote:
> On 11/30/23 11:51 PM, Dongyun Liu wrote:
>>
>>
>> On 2023/11/30 23:37, Jens Axboe wrote:
>>> On 11/30/23 8:20 AM, Dongyun Liu wrote:
>>>> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>>>> index d77d3664ca08..ee6c22c50e09 100644
>>>> --- a/drivers/block/zram/zram_drv.c
>>>> +++ b/drivers/block/zram/zram_drv.c
>>>> @@ -514,7 +514,7 @@ static ssize_t backing_dev_store(struct device *dev,
>>>> nr_pages = i_size_read(inode) >> PAGE_SHIFT;
>>>> bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
>>>> - bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
>>>> + bitmap = kmalloc(bitmap_sz, GFP_ATOMIC);
>>>> if (!bitmap) {
>>>> err = -ENOMEM;
>>>> goto out;
>>>
>>> Outside of this moving from a zeroed alloc to one that does not, the
>>> change looks woefully incomplete. Why does this allocation need to be
>>> GFP_ATOMIC, and:
>>
>> By using GFP_ATOMIC, it indicates that the caller cannot reclaim or
>> sleep, although we can prevent the risk of deadlock when acquiring
>> the zram->lock again in zram_bvec_write.
>
> Yes, I am very much aware of how gfp allocation flags work and how why
> it's broken. It was a rhetorical question as to why you think you could
> get away with just fixing one of them.
>
>>> 1) file_name = kmalloc(PATH_MAX, GFP_KERNEL); does not
>>
>> There is no zram->init_lock held here, so there is no need to use
>> GFP_ATOMIC.
>
> True
>
>>> 2) filp_open() -> getname_kernel() -> __getname() does not
>>> 3) filp_open() -> getname_kernel() does not
>>> 4) bdev_open_by_dev() does not
>>
>> Missing the use of GFP_ATOMIC.
>
> Indeed!
>
>>> IOW, you have a slew of GFP_KERNEL allocations in there, and you
>>> probably just patched the largest one. But the core issue remains.
>>>
>>> The whole handling of backing_dev_store() looks pretty broken.
>>>
>>
>> Indeed, this patch only solves the biggest problem and does not
>> fundamentally solve it, because there are many processes for holding
>> zram->init_lock before allocation memory in backing_dev_store that
>> need to be fully modified, and I did not consider it thoroughly.
>> Obviously, a larger and better patch is needed to eliminate this risk,
>> but it is currently not necessary.
>
> You agree that it doesn't fix the issue, it just happens to fix the one
> that you hit. And then you jump to the conclusion that this is all
> that's needed to fix it. Ehm, confused?
>

Hi, Jens, Maybe there's something wrong with my expression. You can
think of it this way: I agree with you that it doesn't fix the issue.

2023-12-02 15:50:48

by Dongyun Liu

[permalink] [raw]

Subject: Re: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store

On 2023/12/1 23:39, Sergey Senozhatsky wrote:
> On (23/11/30 23:20), Dongyun Liu wrote:
>> INFO: task init:331 blocked for more than 120 seconds. "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:init state:D stack: 0 pid: 1 ppid: 0 flags:0x04000000
>> Call trace:
>> __switch_to+0x244/0x4e4
>> __schedule+0x5bc/0xc48
>> schedule+0x80/0x164
>> rwsem_down_read_slowpath+0x4fc/0xf9c
>> __down_read+0x140/0x188
>> down_read+0x14/0x24
>> try_wakeup_wbd_thread+0x78/0x1ec [zram]
>> __zram_bvec_write+0x720/0x878 [zram]
>> zram_bvec_rw+0xa8/0x234 [zram]
>> zram_submit_bio+0x16c/0x268 [zram]
>> submit_bio_noacct+0x128/0x3c8
>> submit_bio+0x1cc/0x3d0
>> __swap_writepage+0x5c4/0xd4c
>> swap_writepage+0x130/0x158
>> pageout+0x1f4/0x478
>> shrink_page_list+0x9b4/0x1eb8
>> shrink_inactive_list+0x2f4/0xaa8
>> shrink_lruvec+0x184/0x340
>> shrink_node_memcgs+0x84/0x3a0
>> shrink_node+0x2c4/0x6c4
>> shrink_zones+0x16c/0x29c
>> do_try_to_free_pages+0xe4/0x2b4
>> try_to_free_pages+0x388/0x7b4
>> __alloc_pages_direct_reclaim+0x88/0x278
>> __alloc_pages_slowpath+0x4ec/0xf6c
>> __alloc_pages_nodemask+0x1f4/0x3dc
>> kmalloc_order+0x54/0x338
>> kmalloc_order_trace+0x34/0x1bc
>> __kmalloc+0x5e8/0x9c0
>> kvmalloc_node+0xa8/0x264
>> backing_dev_store+0x1a4/0x818 [zram]
>> dev_attr_store+0x38/0x8c
>> sysfs_kf_write+0x64/0xc4
>
> Hmm, I'm not really following this backtrace. Backing device
> configuration is only possible on un-initialized zram device.
> If it's uninitialized, then why is it being used for swapout
> later in the call stack?

Uh, at this moment, zram has finished initializing and is
working. The backing device is an optional zram-based feature.
I think it can be created later.

2023-12-03 01:23:37

by Sergey Senozhatsky

[permalink] [raw]

Subject: Re: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store

On (23/12/02 23:50), Dongyun Liu wrote:
> On 2023/12/1 23:39, Sergey Senozhatsky wrote:
> > On (23/11/30 23:20), Dongyun Liu wrote:
> > > INFO: task init:331 blocked for more than 120 seconds. "echo 0 >
> > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > task:init state:D stack: 0 pid: 1 ppid: 0 flags:0x04000000
> > > Call trace:
> > > __switch_to+0x244/0x4e4
> > > __schedule+0x5bc/0xc48
> > > schedule+0x80/0x164
> > > rwsem_down_read_slowpath+0x4fc/0xf9c
> > > __down_read+0x140/0x188
> > > down_read+0x14/0x24
> > > try_wakeup_wbd_thread+0x78/0x1ec [zram]
> > > __zram_bvec_write+0x720/0x878 [zram]
> > > zram_bvec_rw+0xa8/0x234 [zram]
> > > zram_submit_bio+0x16c/0x268 [zram]
> > > submit_bio_noacct+0x128/0x3c8
> > > submit_bio+0x1cc/0x3d0
> > > __swap_writepage+0x5c4/0xd4c
> > > swap_writepage+0x130/0x158
> > > pageout+0x1f4/0x478
> > > shrink_page_list+0x9b4/0x1eb8
> > > shrink_inactive_list+0x2f4/0xaa8
> > > shrink_lruvec+0x184/0x340
> > > shrink_node_memcgs+0x84/0x3a0
> > > shrink_node+0x2c4/0x6c4
> > > shrink_zones+0x16c/0x29c
> > > do_try_to_free_pages+0xe4/0x2b4
> > > try_to_free_pages+0x388/0x7b4
> > > __alloc_pages_direct_reclaim+0x88/0x278
> > > __alloc_pages_slowpath+0x4ec/0xf6c
> > > __alloc_pages_nodemask+0x1f4/0x3dc
> > > kmalloc_order+0x54/0x338
> > > kmalloc_order_trace+0x34/0x1bc
> > > __kmalloc+0x5e8/0x9c0
> > > kvmalloc_node+0xa8/0x264
> > > backing_dev_store+0x1a4/0x818 [zram]
> > > dev_attr_store+0x38/0x8c
> > > sysfs_kf_write+0x64/0xc4
> >
> > Hmm, I'm not really following this backtrace. Backing device
> > configuration is only possible on un-initialized zram device.
> > If it's uninitialized, then why is it being used for swapout
> > later in the call stack?
>
> Uh, at this moment, zram has finished initializing and is
> working. The backing device is an optional zram-based feature.
> I think it can be created later.

backing_dev_store() can't be called on an initialized device,
that's what init_done() at the very beginning of backing_dev_store()
is supposed to ensure:

...
down_write(&zram->init_lock);
if (init_done(zram)) {
pr_info("Can't setup backing device for initialized device\n");
err = -EBUSY;
goto out;
}
...

2023-12-03 15:06:07

by Dongyun Liu

[permalink] [raw]

Subject: Re: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store

On 2023/12/3 9:23, Sergey Senozhatsky wrote:
> On (23/12/02 23:50), Dongyun Liu wrote:
>> On 2023/12/1 23:39, Sergey Senozhatsky wrote:
>>> On (23/11/30 23:20), Dongyun Liu wrote:
>>>> INFO: task init:331 blocked for more than 120 seconds. "echo 0 >
>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> task:init state:D stack: 0 pid: 1 ppid: 0 flags:0x04000000
>>>> Call trace:
>>>> __switch_to+0x244/0x4e4
>>>> __schedule+0x5bc/0xc48
>>>> schedule+0x80/0x164
>>>> rwsem_down_read_slowpath+0x4fc/0xf9c
>>>> __down_read+0x140/0x188
>>>> down_read+0x14/0x24
>>>> try_wakeup_wbd_thread+0x78/0x1ec [zram]
>>>> __zram_bvec_write+0x720/0x878 [zram]
>>>> zram_bvec_rw+0xa8/0x234 [zram]
>>>> zram_submit_bio+0x16c/0x268 [zram]
>>>> submit_bio_noacct+0x128/0x3c8
>>>> submit_bio+0x1cc/0x3d0
>>>> __swap_writepage+0x5c4/0xd4c
>>>> swap_writepage+0x130/0x158
>>>> pageout+0x1f4/0x478
>>>> shrink_page_list+0x9b4/0x1eb8
>>>> shrink_inactive_list+0x2f4/0xaa8
>>>> shrink_lruvec+0x184/0x340
>>>> shrink_node_memcgs+0x84/0x3a0
>>>> shrink_node+0x2c4/0x6c4
>>>> shrink_zones+0x16c/0x29c
>>>> do_try_to_free_pages+0xe4/0x2b4
>>>> try_to_free_pages+0x388/0x7b4
>>>> __alloc_pages_direct_reclaim+0x88/0x278
>>>> __alloc_pages_slowpath+0x4ec/0xf6c
>>>> __alloc_pages_nodemask+0x1f4/0x3dc
>>>> kmalloc_order+0x54/0x338
>>>> kmalloc_order_trace+0x34/0x1bc
>>>> __kmalloc+0x5e8/0x9c0
>>>> kvmalloc_node+0xa8/0x264
>>>> backing_dev_store+0x1a4/0x818 [zram]
>>>> dev_attr_store+0x38/0x8c
>>>> sysfs_kf_write+0x64/0xc4
>>>
>>> Hmm, I'm not really following this backtrace. Backing device
>>> configuration is only possible on un-initialized zram device.
>>> If it's uninitialized, then why is it being used for swapout
>>> later in the call stack?
>>
>> Uh, at this moment, zram has finished initializing and is
>> working. The backing device is an optional zram-based feature.
>> I think it can be created later.
>
> backing_dev_store() can't be called on an initialized device,
> that's what init_done() at the very beginning of backing_dev_store()
> is supposed to ensure:
>

We have modified the logic here to allow backing_dev_store() to be
called multiple times. That's why the call trace comes in. On the other
hand, I realized that I should show the upstream's call stack instead
of my custom feature next time. Thank you for your toleration.

> ...
> down_write(&zram->init_lock);
> if (init_done(zram)) {
> pr_info("Can't setup backing device for initialized device\n");
> err = -EBUSY;
> goto out;
> }
> ...