2012-11-21 08:37:31

by Minchan Kim

[permalink] [raw]
Subject: Lockdep complain for zram

Hi alls,

Today, I saw below complain of lockdep.
As a matter of fact, I knew it long time ago but forgot that.
The reason lockdep complains is that now zram uses GFP_KERNEL
in reclaim path(ex, __zram_make_request) :(
I can fix it via replacing GFP_KERNEL with GFP_NOIO.
But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
Of course, I can change it with __vmalloc which can receive gfp_t.
But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
allocation of GFP_KERNEL. That's why I sent the patch.
https://lkml.org/lkml/2012/4/23/77
Since then, I forgot it, saw the bug today and poped the question again.

Yes. Fundamental problem is utter crap API vmalloc.
If we can fix it, everyone would be happy. But life isn't simple like seeing
my thread of the patch.

So next option is to move zram_init_device into setting disksize time.
But it makes unnecessary metadata waste until zram is used really(That's why
Nitin move zram_init_device from disksize setting time to make_request) and
it makes user should set the disksize before using, which are behavior change.

I would like to clean up this issue before promoting because it might change
usage behavior.

Do you have any idea?

============ 8< ==============


[ 335.772277] =================================
[ 335.772615] [ INFO: inconsistent lock state ]
[ 335.772955] 3.7.0-rc6 #162 Tainted: G C
[ 335.773320] ---------------------------------
[ 335.773663] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
[ 335.774170] kswapd0/23 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 335.774564] (&zram->init_lock){+++++-}, at: [<ffffffffa0009d0a>] zram_make_request+0x4a/0x260 [zram]
[ 335.775321] {RECLAIM_FS-ON-W} state was registered at:
[ 335.775716] [<ffffffff81099532>] mark_held_locks+0x82/0x130
[ 335.776009] [<ffffffff81099c47>] lockdep_trace_alloc+0x67/0xc0
[ 335.776009] [<ffffffff811189d4>] __alloc_pages_nodemask+0x94/0xa00
[ 335.776009] [<ffffffff81152806>] alloc_pages_current+0xb6/0x120
[ 335.776009] [<ffffffff811143d4>] __get_free_pages+0x14/0x50
[ 335.776009] [<ffffffff81157dff>] kmalloc_order_trace+0x3f/0xf0
[ 335.776009] [<ffffffffa0009b1b>] zram_init_device+0x7b/0x220 [zram]
[ 335.776009] [<ffffffffa0009f0a>] zram_make_request+0x24a/0x260 [zram]
[ 335.776009] [<ffffffff81299d8a>] generic_make_request+0xca/0x100
[ 335.776009] [<ffffffff81299e3b>] submit_bio+0x7b/0x160
[ 335.776009] [<ffffffff81197b92>] submit_bh+0xf2/0x120
[ 335.776009] [<ffffffff8119b855>] block_read_full_page+0x235/0x3a0
[ 335.776009] [<ffffffff8119efc8>] blkdev_readpage+0x18/0x20
[ 335.776009] [<ffffffff8111c107>] __do_page_cache_readahead+0x2c7/0x2d0
[ 335.776009] [<ffffffff8111c249>] force_page_cache_readahead+0x79/0xb0
[ 335.776009] [<ffffffff8111c683>] page_cache_sync_readahead+0x43/0x50
[ 335.776009] [<ffffffff81111490>] generic_file_aio_read+0x4f0/0x760
[ 335.776009] [<ffffffff8119ffbb>] blkdev_aio_read+0xbb/0xf0
[ 335.776009] [<ffffffff811661f3>] do_sync_read+0xa3/0xe0
[ 335.776009] [<ffffffff81166910>] vfs_read+0xb0/0x180
[ 335.776009] [<ffffffff81166a32>] sys_read+0x52/0xa0
[ 335.776009] [<ffffffff8155a982>] system_call_fastpath+0x16/0x1b
[ 335.776009] irq event stamp: 97589
[ 335.776009] hardirqs last enabled at (97589): [<ffffffff812b3374>] throtl_update_dispatch_stats+0x94/0xf0
[ 335.776009] hardirqs last disabled at (97588): [<ffffffff812b332d>] throtl_update_dispatch_stats+0x4d/0xf0
[ 335.776009] softirqs last enabled at (67416): [<ffffffff81046c59>] __do_softirq+0x139/0x280
[ 335.776009] softirqs last disabled at (67395): [<ffffffff8155bb0c>] call_softirq+0x1c/0x30
[ 335.776009]
[ 335.776009] other info that might help us debug this:
[ 335.776009] Possible unsafe locking scenario:
[ 335.776009]
[ 335.776009] CPU0
[ 335.776009] ----
[ 335.776009] lock(&zram->init_lock);
[ 335.776009] <Interrupt>
[ 335.776009] lock(&zram->init_lock);
[ 335.776009]
[ 335.776009] *** DEADLOCK ***
[ 335.776009]
[ 335.776009] no locks held by kswapd0/23.
[ 335.776009]
[ 335.776009] stack backtrace:
[ 335.776009] Pid: 23, comm: kswapd0 Tainted: G C 3.7.0-rc6 #162
[ 335.776009] Call Trace:
[ 335.776009] [<ffffffff81547ff7>] print_usage_bug+0x1f5/0x206
[ 335.776009] [<ffffffff8100f90f>] ? save_stack_trace+0x2f/0x50
[ 335.776009] [<ffffffff81096a65>] mark_lock+0x295/0x2f0
[ 335.776009] [<ffffffff81095e90>] ? print_irq_inversion_bug.part.37+0x1f0/0x1f0
[ 335.776009] [<ffffffff812b4af8>] ? blk_throtl_bio+0x88/0x630
[ 335.776009] [<ffffffff81097024>] __lock_acquire+0x564/0x1c00
[ 335.776009] [<ffffffff810996e5>] ? trace_hardirqs_on_caller+0x105/0x190
[ 335.776009] [<ffffffff812b4e32>] ? blk_throtl_bio+0x3c2/0x630
[ 335.776009] [<ffffffff812b4af8>] ? blk_throtl_bio+0x88/0x630
[ 335.776009] [<ffffffff812a101c>] ? create_task_io_context+0xdc/0x150
[ 335.776009] [<ffffffff812a101c>] ? create_task_io_context+0xdc/0x150
[ 335.776009] [<ffffffffa0009d0a>] ? zram_make_request+0x4a/0x260 [zram]
[ 335.776009] [<ffffffff81098c85>] lock_acquire+0x85/0x130
[ 335.776009] [<ffffffffa0009d0a>] ? zram_make_request+0x4a/0x260 [zram]
[ 335.776009] [<ffffffff8154f87c>] down_read+0x4c/0x61
[ 335.776009] [<ffffffffa0009d0a>] ? zram_make_request+0x4a/0x260 [zram]
[ 335.776009] [<ffffffff81299ac2>] ? generic_make_request_checks+0x222/0x420
[ 335.776009] [<ffffffff8111a4fe>] ? test_set_page_writeback+0x6e/0x1a0
[ 335.776009] [<ffffffffa0009d0a>] zram_make_request+0x4a/0x260 [zram]
[ 335.776009] [<ffffffff81299d8a>] generic_make_request+0xca/0x100
[ 335.776009] [<ffffffff81299e3b>] submit_bio+0x7b/0x160
[ 335.776009] [<ffffffff81119423>] ? account_page_writeback+0x13/0x20
[ 335.776009] [<ffffffff8111a585>] ? test_set_page_writeback+0xf5/0x1a0
[ 335.776009] [<ffffffff81148aa9>] swap_writepage+0x1b9/0x240
[ 335.776009] [<ffffffff81149bf5>] ? __swap_duplicate+0x65/0x170
[ 335.776009] [<ffffffff8112715a>] ? shmem_writepage+0x17a/0x2f0
[ 335.776009] [<ffffffff8112715a>] ? shmem_writepage+0x17a/0x2f0
[ 335.776009] [<ffffffff8154f411>] ? __mutex_unlock_slowpath+0xd1/0x160
[ 335.776009] [<ffffffff810996e5>] ? trace_hardirqs_on_caller+0x105/0x190
[ 335.776009] [<ffffffff8109977d>] ? trace_hardirqs_on+0xd/0x10
[ 335.776009] [<ffffffff81127195>] shmem_writepage+0x1b5/0x2f0
[ 335.776009] [<ffffffff81122b96>] shrink_page_list+0x516/0x9a0
[ 335.776009] [<ffffffff8111ca20>] ? __activate_page+0x150/0x150
[ 335.776009] [<ffffffff811235c7>] shrink_inactive_list+0x1f7/0x3f0
[ 335.776009] [<ffffffff81123f05>] shrink_lruvec+0x435/0x540
[ 335.776009] [<ffffffff81065050>] ? __init_waitqueue_head+0x60/0x60
[ 335.776009] [<ffffffff81125093>] kswapd+0x703/0xc80
[ 335.776009] [<ffffffff81551dd0>] ? _raw_spin_unlock_irq+0x30/0x50
[ 335.776009] [<ffffffff81065050>] ? __init_waitqueue_head+0x60/0x60
[ 335.776009] [<ffffffff81124990>] ? try_to_free_pages+0x6f0/0x6f0
[ 335.776009] [<ffffffff810646ea>] kthread+0xea/0xf0
[ 335.776009] [<ffffffff81064600>] ? flush_kthread_work+0x1a0/0x1a0
[ 335.776009] [<ffffffff8155a8dc>] ret_from_fork+0x7c/0xb0
[ 335.776009] [<ffffffff81064600>] ? flush_kthread_work+0x1a0/0x1a0


--
Kind regards,
Minchan Kim


2012-11-21 18:06:39

by Nitin Gupta

[permalink] [raw]
Subject: Re: Lockdep complain for zram

On 11/21/2012 12:37 AM, Minchan Kim wrote:
> Hi alls,
>
> Today, I saw below complain of lockdep.
> As a matter of fact, I knew it long time ago but forgot that.
> The reason lockdep complains is that now zram uses GFP_KERNEL
> in reclaim path(ex, __zram_make_request) :(
> I can fix it via replacing GFP_KERNEL with GFP_NOIO.
> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> Of course, I can change it with __vmalloc which can receive gfp_t.
> But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
> allocation of GFP_KERNEL. That's why I sent the patch.
> https://lkml.org/lkml/2012/4/23/77
> Since then, I forgot it, saw the bug today and poped the question again.
>
> Yes. Fundamental problem is utter crap API vmalloc.
> If we can fix it, everyone would be happy. But life isn't simple like seeing
> my thread of the patch.
>
> So next option is to move zram_init_device into setting disksize time.
> But it makes unnecessary metadata waste until zram is used really(That's why
> Nitin move zram_init_device from disksize setting time to make_request) and
> it makes user should set the disksize before using, which are behavior change.
>
> I would like to clean up this issue before promoting because it might change
> usage behavior.
>
> Do you have any idea?
>

Maybe we can alloc_vm_area() right on device creation in create_device()
assuming the default disksize. If user explicitly sets the disksize,
this vm area is deallocated and a new one is allocated based on the new
disksize. When the device is reset, we should only free physical pages
allocated for the table and the virtual area should be set back as if
disksize is set to the default.

At the device init time, all the pages can be allocated with GFP_NOIO |
__GPF_HIGHMEM and since the VM area is preallocated, map_vm_area() will
not hit any of those page-table allocations with GFP_KERNEL.

Other allocations made directly from zram, for instance in the partial
I/O case, should also be changed to GFP_NOIO | __GFP_HIGHMEM.

Thanks,
Nitin

> ============ 8< ==============
>
>
> [ 335.772277] =================================
> [ 335.772615] [ INFO: inconsistent lock state ]
> [ 335.772955] 3.7.0-rc6 #162 Tainted: G C
> [ 335.773320] ---------------------------------
> [ 335.773663] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
> [ 335.774170] kswapd0/23 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [ 335.774564] (&zram->init_lock){+++++-}, at: [<ffffffffa0009d0a>] zram_make_request+0x4a/0x260 [zram]
> [ 335.775321] {RECLAIM_FS-ON-W} state was registered at:
> [ 335.775716] [<ffffffff81099532>] mark_held_locks+0x82/0x130
> [ 335.776009] [<ffffffff81099c47>] lockdep_trace_alloc+0x67/0xc0
> [ 335.776009] [<ffffffff811189d4>] __alloc_pages_nodemask+0x94/0xa00
> [ 335.776009] [<ffffffff81152806>] alloc_pages_current+0xb6/0x120
> [ 335.776009] [<ffffffff811143d4>] __get_free_pages+0x14/0x50
> [ 335.776009] [<ffffffff81157dff>] kmalloc_order_trace+0x3f/0xf0
> [ 335.776009] [<ffffffffa0009b1b>] zram_init_device+0x7b/0x220 [zram]
> [ 335.776009] [<ffffffffa0009f0a>] zram_make_request+0x24a/0x260 [zram]
> [ 335.776009] [<ffffffff81299d8a>] generic_make_request+0xca/0x100
> [ 335.776009] [<ffffffff81299e3b>] submit_bio+0x7b/0x160
> [ 335.776009] [<ffffffff81197b92>] submit_bh+0xf2/0x120
> [ 335.776009] [<ffffffff8119b855>] block_read_full_page+0x235/0x3a0
> [ 335.776009] [<ffffffff8119efc8>] blkdev_readpage+0x18/0x20
> [ 335.776009] [<ffffffff8111c107>] __do_page_cache_readahead+0x2c7/0x2d0
> [ 335.776009] [<ffffffff8111c249>] force_page_cache_readahead+0x79/0xb0
> [ 335.776009] [<ffffffff8111c683>] page_cache_sync_readahead+0x43/0x50
> [ 335.776009] [<ffffffff81111490>] generic_file_aio_read+0x4f0/0x760
> [ 335.776009] [<ffffffff8119ffbb>] blkdev_aio_read+0xbb/0xf0
> [ 335.776009] [<ffffffff811661f3>] do_sync_read+0xa3/0xe0
> [ 335.776009] [<ffffffff81166910>] vfs_read+0xb0/0x180
> [ 335.776009] [<ffffffff81166a32>] sys_read+0x52/0xa0
> [ 335.776009] [<ffffffff8155a982>] system_call_fastpath+0x16/0x1b
> [ 335.776009] irq event stamp: 97589
> [ 335.776009] hardirqs last enabled at (97589): [<ffffffff812b3374>] throtl_update_dispatch_stats+0x94/0xf0
> [ 335.776009] hardirqs last disabled at (97588): [<ffffffff812b332d>] throtl_update_dispatch_stats+0x4d/0xf0
> [ 335.776009] softirqs last enabled at (67416): [<ffffffff81046c59>] __do_softirq+0x139/0x280
> [ 335.776009] softirqs last disabled at (67395): [<ffffffff8155bb0c>] call_softirq+0x1c/0x30
> [ 335.776009]
> [ 335.776009] other info that might help us debug this:
> [ 335.776009] Possible unsafe locking scenario:
> [ 335.776009]
> [ 335.776009] CPU0
> [ 335.776009] ----
> [ 335.776009] lock(&zram->init_lock);
> [ 335.776009] <Interrupt>
> [ 335.776009] lock(&zram->init_lock);
> [ 335.776009]
> [ 335.776009] *** DEADLOCK ***
> [ 335.776009]
> [ 335.776009] no locks held by kswapd0/23.
> [ 335.776009]
> [ 335.776009] stack backtrace:
> [ 335.776009] Pid: 23, comm: kswapd0 Tainted: G C 3.7.0-rc6 #162
> [ 335.776009] Call Trace:
> [ 335.776009] [<ffffffff81547ff7>] print_usage_bug+0x1f5/0x206
> [ 335.776009] [<ffffffff8100f90f>] ? save_stack_trace+0x2f/0x50
> [ 335.776009] [<ffffffff81096a65>] mark_lock+0x295/0x2f0
> [ 335.776009] [<ffffffff81095e90>] ? print_irq_inversion_bug.part.37+0x1f0/0x1f0
> [ 335.776009] [<ffffffff812b4af8>] ? blk_throtl_bio+0x88/0x630
> [ 335.776009] [<ffffffff81097024>] __lock_acquire+0x564/0x1c00
> [ 335.776009] [<ffffffff810996e5>] ? trace_hardirqs_on_caller+0x105/0x190
> [ 335.776009] [<ffffffff812b4e32>] ? blk_throtl_bio+0x3c2/0x630
> [ 335.776009] [<ffffffff812b4af8>] ? blk_throtl_bio+0x88/0x630
> [ 335.776009] [<ffffffff812a101c>] ? create_task_io_context+0xdc/0x150
> [ 335.776009] [<ffffffff812a101c>] ? create_task_io_context+0xdc/0x150
> [ 335.776009] [<ffffffffa0009d0a>] ? zram_make_request+0x4a/0x260 [zram]
> [ 335.776009] [<ffffffff81098c85>] lock_acquire+0x85/0x130
> [ 335.776009] [<ffffffffa0009d0a>] ? zram_make_request+0x4a/0x260 [zram]
> [ 335.776009] [<ffffffff8154f87c>] down_read+0x4c/0x61
> [ 335.776009] [<ffffffffa0009d0a>] ? zram_make_request+0x4a/0x260 [zram]
> [ 335.776009] [<ffffffff81299ac2>] ? generic_make_request_checks+0x222/0x420
> [ 335.776009] [<ffffffff8111a4fe>] ? test_set_page_writeback+0x6e/0x1a0
> [ 335.776009] [<ffffffffa0009d0a>] zram_make_request+0x4a/0x260 [zram]
> [ 335.776009] [<ffffffff81299d8a>] generic_make_request+0xca/0x100
> [ 335.776009] [<ffffffff81299e3b>] submit_bio+0x7b/0x160
> [ 335.776009] [<ffffffff81119423>] ? account_page_writeback+0x13/0x20
> [ 335.776009] [<ffffffff8111a585>] ? test_set_page_writeback+0xf5/0x1a0
> [ 335.776009] [<ffffffff81148aa9>] swap_writepage+0x1b9/0x240
> [ 335.776009] [<ffffffff81149bf5>] ? __swap_duplicate+0x65/0x170
> [ 335.776009] [<ffffffff8112715a>] ? shmem_writepage+0x17a/0x2f0
> [ 335.776009] [<ffffffff8112715a>] ? shmem_writepage+0x17a/0x2f0
> [ 335.776009] [<ffffffff8154f411>] ? __mutex_unlock_slowpath+0xd1/0x160
> [ 335.776009] [<ffffffff810996e5>] ? trace_hardirqs_on_caller+0x105/0x190
> [ 335.776009] [<ffffffff8109977d>] ? trace_hardirqs_on+0xd/0x10
> [ 335.776009] [<ffffffff81127195>] shmem_writepage+0x1b5/0x2f0
> [ 335.776009] [<ffffffff81122b96>] shrink_page_list+0x516/0x9a0
> [ 335.776009] [<ffffffff8111ca20>] ? __activate_page+0x150/0x150
> [ 335.776009] [<ffffffff811235c7>] shrink_inactive_list+0x1f7/0x3f0
> [ 335.776009] [<ffffffff81123f05>] shrink_lruvec+0x435/0x540
> [ 335.776009] [<ffffffff81065050>] ? __init_waitqueue_head+0x60/0x60
> [ 335.776009] [<ffffffff81125093>] kswapd+0x703/0xc80
> [ 335.776009] [<ffffffff81551dd0>] ? _raw_spin_unlock_irq+0x30/0x50
> [ 335.776009] [<ffffffff81065050>] ? __init_waitqueue_head+0x60/0x60
> [ 335.776009] [<ffffffff81124990>] ? try_to_free_pages+0x6f0/0x6f0
> [ 335.776009] [<ffffffff810646ea>] kthread+0xea/0xf0
> [ 335.776009] [<ffffffff81064600>] ? flush_kthread_work+0x1a0/0x1a0
> [ 335.776009] [<ffffffff8155a8dc>] ret_from_fork+0x7c/0xb0
> [ 335.776009] [<ffffffff81064600>] ? flush_kthread_work+0x1a0/0x1a0
>
>

2012-11-22 19:16:36

by Nitin Gupta

[permalink] [raw]
Subject: Re: Lockdep complain for zram

On 11/22/2012 12:31 AM, Minchan Kim wrote:
> Hi Nitin,
>
> On Wed, Nov 21, 2012 at 10:06:33AM -0800, Nitin Gupta wrote:
>> On 11/21/2012 12:37 AM, Minchan Kim wrote:
>>> Hi alls,
>>>
>>> Today, I saw below complain of lockdep.
>>> As a matter of fact, I knew it long time ago but forgot that.
>>> The reason lockdep complains is that now zram uses GFP_KERNEL
>>> in reclaim path(ex, __zram_make_request) :(
>>> I can fix it via replacing GFP_KERNEL with GFP_NOIO.
>>> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
>>> Of course, I can change it with __vmalloc which can receive gfp_t.
>>> But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
>>> allocation of GFP_KERNEL. That's why I sent the patch.
>>> https://lkml.org/lkml/2012/4/23/77
>>> Since then, I forgot it, saw the bug today and poped the question again.
>>>
>>> Yes. Fundamental problem is utter crap API vmalloc.
>>> If we can fix it, everyone would be happy. But life isn't simple like seeing
>>> my thread of the patch.
>>>
>>> So next option is to move zram_init_device into setting disksize time.
>>> But it makes unnecessary metadata waste until zram is used really(That's why
>>> Nitin move zram_init_device from disksize setting time to make_request) and
>>> it makes user should set the disksize before using, which are behavior change.
>>>
>>> I would like to clean up this issue before promoting because it might change
>>> usage behavior.
>>>
>>> Do you have any idea?
>>>
>>
>> Maybe we can alloc_vm_area() right on device creation in
>> create_device() assuming the default disksize. If user explicitly
>> sets the disksize, this vm area is deallocated and a new one is
>> allocated based on the new disksize. When the device is reset, we
>> should only free physical pages allocated for the table and the
>> virtual area should be set back as if disksize is set to the
>> default.
>>
>> At the device init time, all the pages can be allocated with
>> GFP_NOIO | __GPF_HIGHMEM and since the VM area is preallocated,
>> map_vm_area() will not hit any of those page-table allocations with
>> GFP_KERNEL.
>>
>> Other allocations made directly from zram, for instance in the
>> partial I/O case, should also be changed to GFP_NOIO |
>> __GFP_HIGHMEM.
>>
>
> Yes. It's a good idea and actually I thought about it.
> My concern about that approach is following as.
>
> 1) User of zram normally do mkfs.xxx or mkswap before using
> the zram block device(ex, normally, do it at booting time)
> It ends up allocating such metadata of zram before real usage so
> benefit of lazy initialzation would be mitigated.
>
> 2) Some user want to use zram when memory pressure is high.(ie, load zram
> dynamically, NOT booting time). It does make sense because people don't
> want to waste memory until memory pressure is high(ie, where zram is really
> helpful time). In this case, lazy initialzation could be failed easily
> because we will use GFP_NOIO instead of GFP_KERNEL due to swap use-case.
> So the benefit of lazy initialzation would be mitigated, too.
>
> 3) Current zram's documenation is wrong.
> Set Disksize isn't optional when we use zram firstly.
> Once user set disksize, it could be optional, but NOT optional
> at first usage time. It's very odd behavior. So I think user set to disksizes
> before using is more safe and clear.
>
> So my suggestion is following as.
>
> * Let's change disksize setting to MUST before using for consistent behavior.
> * When user set to disksize, let's allocate metadata all at once.
> 4K : 12 byte(64bit) -> 64G : 192M so 0.3% isn't big overhead.
> If insane user use such big zram device up to 20, it could consume 6% of ram
> but efficieny of zram will cover the waste.
> * If someone has a concern about this, let's guide for him set to disksize
> right before zram using.
>
> What do you think about it?
>

I agree. This lazy initialization has been a problem since the
beginning. So, lets just force user to first set the disksize and
document this behavior as such.

I plan to reduce struct table to just an array of handles (so getting
rid of size, flag, count fields), but for now we could just use existing
struct table and do this change later.

Thanks,
Nitin


2012-11-22 19:17:41

by Jerome Marchand

[permalink] [raw]
Subject: Re: Lockdep complain for zram

On 11/21/2012 09:37 AM, Minchan Kim wrote:
> Hi alls,
>
> Today, I saw below complain of lockdep.
> As a matter of fact, I knew it long time ago but forgot that.
> The reason lockdep complains is that now zram uses GFP_KERNEL
> in reclaim path(ex, __zram_make_request) :(
> I can fix it via replacing GFP_KERNEL with GFP_NOIO.
> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> Of course, I can change it with __vmalloc which can receive gfp_t.
> But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
> allocation of GFP_KERNEL. That's why I sent the patch.
> https://lkml.org/lkml/2012/4/23/77
> Since then, I forgot it, saw the bug today and poped the question again.
>
> Yes. Fundamental problem is utter crap API vmalloc.
> If we can fix it, everyone would be happy. But life isn't simple like seeing
> my thread of the patch.
>
> So next option is to move zram_init_device into setting disksize time.
> But it makes unnecessary metadata waste until zram is used really(That's why
> Nitin move zram_init_device from disksize setting time to make_request) and
> it makes user should set the disksize before using, which are behavior change.
>
> I would like to clean up this issue before promoting because it might change
> usage behavior.
>
> Do you have any idea?

This is a false positive due to the memory allocation in
zram_init_device() called from zram_make_request(). It appears to
lockdep that the allocation might trigger a request on the device that
would try to take init_lock again, but in fact it doesn't. The device
is not initialized yet, even less swapped on.

The following (quickly tested) patch should prevent lockdep complain.

Jerome

---
>From ebb3514c4ee18276da7c5ca08025991b493ac204 Mon Sep 17 00:00:00 2001
From: Jerome Marchand <[email protected]>
Date: Thu, 22 Nov 2012 09:07:40 +0100
Subject: [PATCH] staging: zram: Avoid lockdep warning

zram triggers a lockdep warning. The cause of it is the call to
zram_init_device() from zram_make_request(). The memory allocation in
zram_init_device() could start a memory reclaim which in turn could
cause swapout and (as it appears to lockdep) a call to
zram_make_request(). However this is a false positive: an
unititialized device can't be used as swap.
A solution is to split init_lock in two lock. One mutex that protects
init, reset and size setting and a rw_semaphore that protects requests
and reset. Thus init and request would be protected by different locks
and lockdep will be happy.

Signed-off-by: Jerome Marchand <[email protected]>
---
drivers/staging/zram/zram_drv.c | 41 +++++++++++++++++++-----------------
drivers/staging/zram/zram_drv.h | 16 ++++++++++---
drivers/staging/zram/zram_sysfs.c | 20 +++++++++---------
3 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index fb4a7c9..b3bc3c4 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -470,11 +470,11 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
{
struct zram *zram = queue->queuedata;

- if (unlikely(!zram->init_done) && zram_init_device(zram))
+ if (unlikely(!is_initialized(zram)) && zram_init_device(zram))
goto error;

- down_read(&zram->init_lock);
- if (unlikely(!zram->init_done))
+ down_read(&zram->req_lock);
+ if (unlikely(!is_initialized(zram)))
goto error_unlock;

if (!valid_io_request(zram, bio)) {
@@ -483,12 +483,12 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
}

__zram_make_request(zram, bio, bio_data_dir(bio));
- up_read(&zram->init_lock);
+ up_read(&zram->req_lock);

return;

error_unlock:
- up_read(&zram->init_lock);
+ up_read(&zram->req_lock);
error:
bio_io_error(bio);
}
@@ -497,7 +497,7 @@ void __zram_reset_device(struct zram *zram)
{
size_t index;

- zram->init_done = 0;
+ atomic_set(&zram->init_done, 0);

/* Free various per-device buffers */
kfree(zram->compress_workmem);
@@ -529,9 +529,12 @@ void __zram_reset_device(struct zram *zram)

void zram_reset_device(struct zram *zram)
{
- down_write(&zram->init_lock);
- __zram_reset_device(zram);
- up_write(&zram->init_lock);
+ mutex_lock(&zram->init_lock);
+ down_write(&zram->req_lock);
+ if (is_initialized(zram))
+ __zram_reset_device(zram);
+ up_write(&zram->req_lock);
+ mutex_unlock(&zram->init_lock);
}

int zram_init_device(struct zram *zram)
@@ -539,10 +542,10 @@ int zram_init_device(struct zram *zram)
int ret;
size_t num_pages;

- down_write(&zram->init_lock);
+ mutex_lock(&zram->init_lock);

- if (zram->init_done) {
- up_write(&zram->init_lock);
+ if (is_initialized(zram)) {
+ mutex_unlock(&zram->init_lock);
return 0;
}

@@ -583,8 +586,8 @@ int zram_init_device(struct zram *zram)
goto fail;
}

- zram->init_done = 1;
- up_write(&zram->init_lock);
+ atomic_set(&zram->init_done, 1);
+ mutex_unlock(&zram->init_lock);

pr_debug("Initialization done!\n");
return 0;
@@ -594,7 +597,7 @@ fail_no_table:
zram->disksize = 0;
fail:
__zram_reset_device(zram);
- up_write(&zram->init_lock);
+ mutex_unlock(&zram->init_lock);
pr_err("Initialization failed: err=%d\n", ret);
return ret;
}
@@ -619,7 +622,8 @@ static int create_device(struct zram *zram, int device_id)
int ret = 0;

init_rwsem(&zram->lock);
- init_rwsem(&zram->init_lock);
+ mutex_init(&zram->init_lock);
+ init_rwsem(&zram->req_lock);
spin_lock_init(&zram->stat64_lock);

zram->queue = blk_alloc_queue(GFP_KERNEL);
@@ -672,7 +676,7 @@ static int create_device(struct zram *zram, int device_id)
goto out;
}

- zram->init_done = 0;
+ atomic_set(&zram->init_done, 0);

out:
return ret;
@@ -755,8 +759,7 @@ static void __exit zram_exit(void)
zram = &zram_devices[i];

destroy_device(zram);
- if (zram->init_done)
- zram_reset_device(zram);
+ zram_reset_device(zram);
}

unregister_blkdev(zram_major, "zram");
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index df2eec4..f6bcead 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -96,9 +96,12 @@ struct zram {
* against concurrent read and writes */
struct request_queue *queue;
struct gendisk *disk;
- int init_done;
- /* Prevent concurrent execution of device init, reset and R/W request */
- struct rw_semaphore init_lock;
+ atomic_t init_done;
+ /* Prevent concurrent execution of device init, reset and
+ * disksize_store */
+ struct mutex init_lock;
+ /* Prevent concurent execution device reset and R/W requests */
+ struct rw_semaphore req_lock;
/*
* This is the limit on amount of *uncompressed* worth of data
* we can store in a disk.
@@ -108,6 +111,11 @@ struct zram {
struct zram_stats stats;
};

+static inline int is_initialized(struct zram *zram)
+{
+ return atomic_read(&zram->init_done);
+}
+
extern struct zram *zram_devices;
unsigned int zram_get_num_devices(void);
#ifdef CONFIG_SYSFS
@@ -115,6 +123,6 @@ extern struct attribute_group zram_disk_attr_group;
#endif

extern int zram_init_device(struct zram *zram);
-extern void __zram_reset_device(struct zram *zram);
+extern void zram_reset_device(struct zram *zram);

#endif
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index de1eacf..b300881 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -62,16 +62,19 @@ static ssize_t disksize_store(struct device *dev,
if (!disksize)
return -EINVAL;

- down_write(&zram->init_lock);
- if (zram->init_done) {
- up_write(&zram->init_lock);
+ mutex_lock(&zram->init_lock);
+ down_write(&zram->req_lock);
+ if (is_initialized(zram)) {
+ up_write(&zram->req_lock);
+ mutex_unlock(&zram->init_lock);
pr_info("Cannot change disksize for initialized device\n");
return -EBUSY;
}

zram->disksize = PAGE_ALIGN(disksize);
set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
- up_write(&zram->init_lock);
+ up_write(&zram->req_lock);
+ mutex_unlock(&zram->init_lock);

return len;
}
@@ -81,7 +84,7 @@ static ssize_t initstate_show(struct device *dev,
{
struct zram *zram = dev_to_zram(dev);

- return sprintf(buf, "%u\n", zram->init_done);
+ return sprintf(buf, "%u\n", atomic_read(&zram->init_done));
}

static ssize_t reset_store(struct device *dev,
@@ -110,10 +113,7 @@ static ssize_t reset_store(struct device *dev,
if (bdev)
fsync_bdev(bdev);

- down_write(&zram->init_lock);
- if (zram->init_done)
- __zram_reset_device(zram);
- up_write(&zram->init_lock);
+ zram_reset_device(zram);

return len;
}
@@ -186,7 +186,7 @@ static ssize_t mem_used_total_show(struct device *dev,
u64 val = 0;
struct zram *zram = dev_to_zram(dev);

- if (zram->init_done)
+ if (is_initialized(zram))
val = zs_get_total_size_bytes(zram->mem_pool);

return sprintf(buf, "%llu\n", val);
--
1.7.7.6

2012-11-22 23:19:59

by Minchan Kim

[permalink] [raw]
Subject: Re: Lockdep complain for zram

On Thu, Nov 22, 2012 at 02:08:43AM -0800, Nitin Gupta wrote:
> On 11/22/2012 12:31 AM, Minchan Kim wrote:
> >Hi Nitin,
> >
> >On Wed, Nov 21, 2012 at 10:06:33AM -0800, Nitin Gupta wrote:
> >>On 11/21/2012 12:37 AM, Minchan Kim wrote:
> >>>Hi alls,
> >>>
> >>>Today, I saw below complain of lockdep.
> >>>As a matter of fact, I knew it long time ago but forgot that.
> >>>The reason lockdep complains is that now zram uses GFP_KERNEL
> >>>in reclaim path(ex, __zram_make_request) :(
> >>>I can fix it via replacing GFP_KERNEL with GFP_NOIO.
> >>>But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> >>>Of course, I can change it with __vmalloc which can receive gfp_t.
> >>>But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
> >>>allocation of GFP_KERNEL. That's why I sent the patch.
> >>>https://lkml.org/lkml/2012/4/23/77
> >>>Since then, I forgot it, saw the bug today and poped the question again.
> >>>
> >>>Yes. Fundamental problem is utter crap API vmalloc.
> >>>If we can fix it, everyone would be happy. But life isn't simple like seeing
> >>>my thread of the patch.
> >>>
> >>>So next option is to move zram_init_device into setting disksize time.
> >>>But it makes unnecessary metadata waste until zram is used really(That's why
> >>>Nitin move zram_init_device from disksize setting time to make_request) and
> >>>it makes user should set the disksize before using, which are behavior change.
> >>>
> >>>I would like to clean up this issue before promoting because it might change
> >>>usage behavior.
> >>>
> >>>Do you have any idea?
> >>>
> >>
> >>Maybe we can alloc_vm_area() right on device creation in
> >>create_device() assuming the default disksize. If user explicitly
> >>sets the disksize, this vm area is deallocated and a new one is
> >>allocated based on the new disksize. When the device is reset, we
> >>should only free physical pages allocated for the table and the
> >>virtual area should be set back as if disksize is set to the
> >>default.
> >>
> >>At the device init time, all the pages can be allocated with
> >>GFP_NOIO | __GPF_HIGHMEM and since the VM area is preallocated,
> >>map_vm_area() will not hit any of those page-table allocations with
> >>GFP_KERNEL.
> >>
> >>Other allocations made directly from zram, for instance in the
> >>partial I/O case, should also be changed to GFP_NOIO |
> >>__GFP_HIGHMEM.
> >>
> >
> >Yes. It's a good idea and actually I thought about it.
> >My concern about that approach is following as.
> >
> >1) User of zram normally do mkfs.xxx or mkswap before using
> > the zram block device(ex, normally, do it at booting time)
> > It ends up allocating such metadata of zram before real usage so
> > benefit of lazy initialzation would be mitigated.
> >
> >2) Some user want to use zram when memory pressure is high.(ie, load zram
> > dynamically, NOT booting time). It does make sense because people don't
> > want to waste memory until memory pressure is high(ie, where zram is really
> > helpful time). In this case, lazy initialzation could be failed easily
> > because we will use GFP_NOIO instead of GFP_KERNEL due to swap use-case.
> > So the benefit of lazy initialzation would be mitigated, too.
> >
> >3) Current zram's documenation is wrong.
> > Set Disksize isn't optional when we use zram firstly.
> > Once user set disksize, it could be optional, but NOT optional
> > at first usage time. It's very odd behavior. So I think user set to disksizes
> > before using is more safe and clear.
> >
> >So my suggestion is following as.
> >
> > * Let's change disksize setting to MUST before using for consistent behavior.
> > * When user set to disksize, let's allocate metadata all at once.
> > 4K : 12 byte(64bit) -> 64G : 192M so 0.3% isn't big overhead.
> > If insane user use such big zram device up to 20, it could consume 6% of ram
> > but efficieny of zram will cover the waste.
> > * If someone has a concern about this, let's guide for him set to disksize
> > right before zram using.
> >
> >What do you think about it?
> >
>
> I agree. This lazy initialization has been a problem since the
> beginning. So, lets just force user to first set the disksize and
> document this behavior as such.

Sure.

>
> I plan to reduce struct table to just an array of handles (so
> getting rid of size, flag, count fields), but for now we could just
> use existing struct table and do this change later.

Good to hear.

Thanks for the quick feedback.

--
Kind regards,
Minchan Kim

2012-11-22 23:34:29

by Minchan Kim

[permalink] [raw]
Subject: Re: Lockdep complain for zram

On Thu, Nov 22, 2012 at 12:13:24PM +0100, Jerome Marchand wrote:
> On 11/21/2012 09:37 AM, Minchan Kim wrote:
> > Hi alls,
> >
> > Today, I saw below complain of lockdep.
> > As a matter of fact, I knew it long time ago but forgot that.
> > The reason lockdep complains is that now zram uses GFP_KERNEL
> > in reclaim path(ex, __zram_make_request) :(
> > I can fix it via replacing GFP_KERNEL with GFP_NOIO.
> > But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> > Of course, I can change it with __vmalloc which can receive gfp_t.
> > But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
> > allocation of GFP_KERNEL. That's why I sent the patch.
> > https://lkml.org/lkml/2012/4/23/77
> > Since then, I forgot it, saw the bug today and poped the question again.
> >
> > Yes. Fundamental problem is utter crap API vmalloc.
> > If we can fix it, everyone would be happy. But life isn't simple like seeing
> > my thread of the patch.
> >
> > So next option is to move zram_init_device into setting disksize time.
> > But it makes unnecessary metadata waste until zram is used really(That's why
> > Nitin move zram_init_device from disksize setting time to make_request) and
> > it makes user should set the disksize before using, which are behavior change.
> >
> > I would like to clean up this issue before promoting because it might change
> > usage behavior.
> >
> > Do you have any idea?
>
> This is a false positive due to the memory allocation in
> zram_init_device() called from zram_make_request(). It appears to
> lockdep that the allocation might trigger a request on the device that
> would try to take init_lock again, but in fact it doesn't. The device
> is not initialized yet, even less swapped on.

That's not a only swap case.
Let's think following usecase.

1) Booting
2) echo $((DISKSIZE)) > /sys/block/zram0/disksize
3) dd if=/dev/zero of=/dev/zram0 bs=4K count=1
4) Written 4K page(page-A) is still page cache and isn't submitted
to zram block device.
5) Memory pressure happen by some memory hogger.
6) VM start to reclaim and write page-A to zram0.
7) zram_init_device is called at last.
8) allocate GFP_KERNEL in zram_init_device
9) goto reclaim path again.
10) deadlock.

So I think it's not false positive.
Even if it is, I think lock split isn't a good idea to just avoid
lockdep warn. It makes code unnecessary complicated and it would be more
error-prone. Let's not add another lock without performance trouble report
by the lock.

As I discussed with Nitin in this thread, lazy initialization don't have
much point and disksize setting option isn't consistent for user behavior.
And I expect Nitin will send patch "diet of table" soonish.

So just moving the initialzation part from reclaim context to process's one
is simple and clear solution, I believe.

>
> The following (quickly tested) patch should prevent lockdep complain.
>
> Jerome
>
> ---
> >From ebb3514c4ee18276da7c5ca08025991b493ac204 Mon Sep 17 00:00:00 2001
> From: Jerome Marchand <[email protected]>
> Date: Thu, 22 Nov 2012 09:07:40 +0100
> Subject: [PATCH] staging: zram: Avoid lockdep warning
>
> zram triggers a lockdep warning. The cause of it is the call to
> zram_init_device() from zram_make_request(). The memory allocation in
> zram_init_device() could start a memory reclaim which in turn could
> cause swapout and (as it appears to lockdep) a call to
> zram_make_request(). However this is a false positive: an
> unititialized device can't be used as swap.
> A solution is to split init_lock in two lock. One mutex that protects
> init, reset and size setting and a rw_semaphore that protects requests
> and reset. Thus init and request would be protected by different locks
> and lockdep will be happy.
>
> Signed-off-by: Jerome Marchand <[email protected]>
> ---
> drivers/staging/zram/zram_drv.c | 41 +++++++++++++++++++-----------------
> drivers/staging/zram/zram_drv.h | 16 ++++++++++---
> drivers/staging/zram/zram_sysfs.c | 20 +++++++++---------
> 3 files changed, 44 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index fb4a7c9..b3bc3c4 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -470,11 +470,11 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
> {
> struct zram *zram = queue->queuedata;
>
> - if (unlikely(!zram->init_done) && zram_init_device(zram))
> + if (unlikely(!is_initialized(zram)) && zram_init_device(zram))
> goto error;
>
> - down_read(&zram->init_lock);
> - if (unlikely(!zram->init_done))
> + down_read(&zram->req_lock);
> + if (unlikely(!is_initialized(zram)))
> goto error_unlock;
>
> if (!valid_io_request(zram, bio)) {
> @@ -483,12 +483,12 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
> }
>
> __zram_make_request(zram, bio, bio_data_dir(bio));
> - up_read(&zram->init_lock);
> + up_read(&zram->req_lock);
>
> return;
>
> error_unlock:
> - up_read(&zram->init_lock);
> + up_read(&zram->req_lock);
> error:
> bio_io_error(bio);
> }
> @@ -497,7 +497,7 @@ void __zram_reset_device(struct zram *zram)
> {
> size_t index;
>
> - zram->init_done = 0;
> + atomic_set(&zram->init_done, 0);
>
> /* Free various per-device buffers */
> kfree(zram->compress_workmem);
> @@ -529,9 +529,12 @@ void __zram_reset_device(struct zram *zram)
>
> void zram_reset_device(struct zram *zram)
> {
> - down_write(&zram->init_lock);
> - __zram_reset_device(zram);
> - up_write(&zram->init_lock);
> + mutex_lock(&zram->init_lock);
> + down_write(&zram->req_lock);
> + if (is_initialized(zram))
> + __zram_reset_device(zram);
> + up_write(&zram->req_lock);
> + mutex_unlock(&zram->init_lock);
> }
>
> int zram_init_device(struct zram *zram)
> @@ -539,10 +542,10 @@ int zram_init_device(struct zram *zram)
> int ret;
> size_t num_pages;
>
> - down_write(&zram->init_lock);
> + mutex_lock(&zram->init_lock);
>
> - if (zram->init_done) {
> - up_write(&zram->init_lock);
> + if (is_initialized(zram)) {
> + mutex_unlock(&zram->init_lock);
> return 0;
> }
>
> @@ -583,8 +586,8 @@ int zram_init_device(struct zram *zram)
> goto fail;
> }
>
> - zram->init_done = 1;
> - up_write(&zram->init_lock);
> + atomic_set(&zram->init_done, 1);
> + mutex_unlock(&zram->init_lock);
>
> pr_debug("Initialization done!\n");
> return 0;
> @@ -594,7 +597,7 @@ fail_no_table:
> zram->disksize = 0;
> fail:
> __zram_reset_device(zram);
> - up_write(&zram->init_lock);
> + mutex_unlock(&zram->init_lock);
> pr_err("Initialization failed: err=%d\n", ret);
> return ret;
> }
> @@ -619,7 +622,8 @@ static int create_device(struct zram *zram, int device_id)
> int ret = 0;
>
> init_rwsem(&zram->lock);
> - init_rwsem(&zram->init_lock);
> + mutex_init(&zram->init_lock);
> + init_rwsem(&zram->req_lock);
> spin_lock_init(&zram->stat64_lock);
>
> zram->queue = blk_alloc_queue(GFP_KERNEL);
> @@ -672,7 +676,7 @@ static int create_device(struct zram *zram, int device_id)
> goto out;
> }
>
> - zram->init_done = 0;
> + atomic_set(&zram->init_done, 0);
>
> out:
> return ret;
> @@ -755,8 +759,7 @@ static void __exit zram_exit(void)
> zram = &zram_devices[i];
>
> destroy_device(zram);
> - if (zram->init_done)
> - zram_reset_device(zram);
> + zram_reset_device(zram);
> }
>
> unregister_blkdev(zram_major, "zram");
> diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
> index df2eec4..f6bcead 100644
> --- a/drivers/staging/zram/zram_drv.h
> +++ b/drivers/staging/zram/zram_drv.h
> @@ -96,9 +96,12 @@ struct zram {
> * against concurrent read and writes */
> struct request_queue *queue;
> struct gendisk *disk;
> - int init_done;
> - /* Prevent concurrent execution of device init, reset and R/W request */
> - struct rw_semaphore init_lock;
> + atomic_t init_done;
> + /* Prevent concurrent execution of device init, reset and
> + * disksize_store */
> + struct mutex init_lock;
> + /* Prevent concurent execution device reset and R/W requests */
> + struct rw_semaphore req_lock;
> /*
> * This is the limit on amount of *uncompressed* worth of data
> * we can store in a disk.
> @@ -108,6 +111,11 @@ struct zram {
> struct zram_stats stats;
> };
>
> +static inline int is_initialized(struct zram *zram)
> +{
> + return atomic_read(&zram->init_done);
> +}
> +
> extern struct zram *zram_devices;
> unsigned int zram_get_num_devices(void);
> #ifdef CONFIG_SYSFS
> @@ -115,6 +123,6 @@ extern struct attribute_group zram_disk_attr_group;
> #endif
>
> extern int zram_init_device(struct zram *zram);
> -extern void __zram_reset_device(struct zram *zram);
> +extern void zram_reset_device(struct zram *zram);
>
> #endif
> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
> index de1eacf..b300881 100644
> --- a/drivers/staging/zram/zram_sysfs.c
> +++ b/drivers/staging/zram/zram_sysfs.c
> @@ -62,16 +62,19 @@ static ssize_t disksize_store(struct device *dev,
> if (!disksize)
> return -EINVAL;
>
> - down_write(&zram->init_lock);
> - if (zram->init_done) {
> - up_write(&zram->init_lock);
> + mutex_lock(&zram->init_lock);
> + down_write(&zram->req_lock);
> + if (is_initialized(zram)) {
> + up_write(&zram->req_lock);
> + mutex_unlock(&zram->init_lock);
> pr_info("Cannot change disksize for initialized device\n");
> return -EBUSY;
> }
>
> zram->disksize = PAGE_ALIGN(disksize);
> set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
> - up_write(&zram->init_lock);
> + up_write(&zram->req_lock);
> + mutex_unlock(&zram->init_lock);
>
> return len;
> }
> @@ -81,7 +84,7 @@ static ssize_t initstate_show(struct device *dev,
> {
> struct zram *zram = dev_to_zram(dev);
>
> - return sprintf(buf, "%u\n", zram->init_done);
> + return sprintf(buf, "%u\n", atomic_read(&zram->init_done));
> }
>
> static ssize_t reset_store(struct device *dev,
> @@ -110,10 +113,7 @@ static ssize_t reset_store(struct device *dev,
> if (bdev)
> fsync_bdev(bdev);
>
> - down_write(&zram->init_lock);
> - if (zram->init_done)
> - __zram_reset_device(zram);
> - up_write(&zram->init_lock);
> + zram_reset_device(zram);
>
> return len;
> }
> @@ -186,7 +186,7 @@ static ssize_t mem_used_total_show(struct device *dev,
> u64 val = 0;
> struct zram *zram = dev_to_zram(dev);
>
> - if (zram->init_done)
> + if (is_initialized(zram))
> val = zs_get_total_size_bytes(zram->mem_pool);
>
> return sprintf(buf, "%llu\n", val);
> --
> 1.7.7.6
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim

2012-11-23 14:47:17

by Jerome Marchand

[permalink] [raw]
Subject: Re: Lockdep complain for zram

On 11/23/2012 12:34 AM, Minchan Kim wrote:
> On Thu, Nov 22, 2012 at 12:13:24PM +0100, Jerome Marchand wrote:
>> On 11/21/2012 09:37 AM, Minchan Kim wrote:
>>> Hi alls,
>>>
>>> Today, I saw below complain of lockdep.
>>> As a matter of fact, I knew it long time ago but forgot that.
>>> The reason lockdep complains is that now zram uses GFP_KERNEL
>>> in reclaim path(ex, __zram_make_request) :(
>>> I can fix it via replacing GFP_KERNEL with GFP_NOIO.
>>> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
>>> Of course, I can change it with __vmalloc which can receive gfp_t.
>>> But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
>>> allocation of GFP_KERNEL. That's why I sent the patch.
>>> https://lkml.org/lkml/2012/4/23/77
>>> Since then, I forgot it, saw the bug today and poped the question again.
>>>
>>> Yes. Fundamental problem is utter crap API vmalloc.
>>> If we can fix it, everyone would be happy. But life isn't simple like seeing
>>> my thread of the patch.
>>>
>>> So next option is to move zram_init_device into setting disksize time.
>>> But it makes unnecessary metadata waste until zram is used really(That's why
>>> Nitin move zram_init_device from disksize setting time to make_request) and
>>> it makes user should set the disksize before using, which are behavior change.
>>>
>>> I would like to clean up this issue before promoting because it might change
>>> usage behavior.
>>>
>>> Do you have any idea?
>>
>> This is a false positive due to the memory allocation in
>> zram_init_device() called from zram_make_request(). It appears to
>> lockdep that the allocation might trigger a request on the device that
>> would try to take init_lock again, but in fact it doesn't. The device
>> is not initialized yet, even less swapped on.
>
> That's not a only swap case.
> Let's think following usecase.
>
> 1) Booting
> 2) echo $((DISKSIZE)) > /sys/block/zram0/disksize
> 3) dd if=/dev/zero of=/dev/zram0 bs=4K count=1
> 4) Written 4K page(page-A) is still page cache and isn't submitted
> to zram block device.
> 5) Memory pressure happen by some memory hogger.
> 6) VM start to reclaim and write page-A to zram0.
> 7) zram_init_device is called at last.
> 8) allocate GFP_KERNEL in zram_init_device
> 9) goto reclaim path again.
> 10) deadlock.
>
> So I think it's not false positive.

I guess you're right. That's a scenario I haven't imagined. At any rate, my
patch fixes that.

> Even if it is, I think lock split isn't a good idea to just avoid
> lockdep warn. It makes code unnecessary complicated and it would be more
> error-prone. Let's not add another lock without performance trouble report
> by the lock.
>
> As I discussed with Nitin in this thread, lazy initialization don't have
> much point and disksize setting option isn't consistent for user behavior.
> And I expect Nitin will send patch "diet of table" soonish.
>
> So just moving the initialzation part from reclaim context to process's one
> is simple and clear solution, I believe.

Although that would avoid deadlocks (I guess, I'm not sure anymore...), it
won't stop lockdep from complaining. It still makes an allocation while
holding a lock that is also taken in a reclaim context.
Anyway, I like the idea to removes the lazy initialization. It makes things
more complicated without any actual advantage.

Jerome

>
>>
>> The following (quickly tested) patch should prevent lockdep complain.
>>
>> Jerome
>>
>> ---
>> >From ebb3514c4ee18276da7c5ca08025991b493ac204 Mon Sep 17 00:00:00 2001
>> From: Jerome Marchand <[email protected]>
>> Date: Thu, 22 Nov 2012 09:07:40 +0100
>> Subject: [PATCH] staging: zram: Avoid lockdep warning
>>
>> zram triggers a lockdep warning. The cause of it is the call to
>> zram_init_device() from zram_make_request(). The memory allocation in
>> zram_init_device() could start a memory reclaim which in turn could
>> cause swapout and (as it appears to lockdep) a call to
>> zram_make_request(). However this is a false positive: an
>> unititialized device can't be used as swap.
>> A solution is to split init_lock in two lock. One mutex that protects
>> init, reset and size setting and a rw_semaphore that protects requests
>> and reset. Thus init and request would be protected by different locks
>> and lockdep will be happy.
>>
>> Signed-off-by: Jerome Marchand <[email protected]>
>> ---
>> drivers/staging/zram/zram_drv.c | 41 +++++++++++++++++++-----------------
>> drivers/staging/zram/zram_drv.h | 16 ++++++++++---
>> drivers/staging/zram/zram_sysfs.c | 20 +++++++++---------
>> 3 files changed, 44 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
>> index fb4a7c9..b3bc3c4 100644
>> --- a/drivers/staging/zram/zram_drv.c
>> +++ b/drivers/staging/zram/zram_drv.c
>> @@ -470,11 +470,11 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>> {
>> struct zram *zram = queue->queuedata;
>>
>> - if (unlikely(!zram->init_done) && zram_init_device(zram))
>> + if (unlikely(!is_initialized(zram)) && zram_init_device(zram))
>> goto error;
>>
>> - down_read(&zram->init_lock);
>> - if (unlikely(!zram->init_done))
>> + down_read(&zram->req_lock);
>> + if (unlikely(!is_initialized(zram)))
>> goto error_unlock;
>>
>> if (!valid_io_request(zram, bio)) {
>> @@ -483,12 +483,12 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>> }
>>
>> __zram_make_request(zram, bio, bio_data_dir(bio));
>> - up_read(&zram->init_lock);
>> + up_read(&zram->req_lock);
>>
>> return;
>>
>> error_unlock:
>> - up_read(&zram->init_lock);
>> + up_read(&zram->req_lock);
>> error:
>> bio_io_error(bio);
>> }
>> @@ -497,7 +497,7 @@ void __zram_reset_device(struct zram *zram)
>> {
>> size_t index;
>>
>> - zram->init_done = 0;
>> + atomic_set(&zram->init_done, 0);
>>
>> /* Free various per-device buffers */
>> kfree(zram->compress_workmem);
>> @@ -529,9 +529,12 @@ void __zram_reset_device(struct zram *zram)
>>
>> void zram_reset_device(struct zram *zram)
>> {
>> - down_write(&zram->init_lock);
>> - __zram_reset_device(zram);
>> - up_write(&zram->init_lock);
>> + mutex_lock(&zram->init_lock);
>> + down_write(&zram->req_lock);
>> + if (is_initialized(zram))
>> + __zram_reset_device(zram);
>> + up_write(&zram->req_lock);
>> + mutex_unlock(&zram->init_lock);
>> }
>>
>> int zram_init_device(struct zram *zram)
>> @@ -539,10 +542,10 @@ int zram_init_device(struct zram *zram)
>> int ret;
>> size_t num_pages;
>>
>> - down_write(&zram->init_lock);
>> + mutex_lock(&zram->init_lock);
>>
>> - if (zram->init_done) {
>> - up_write(&zram->init_lock);
>> + if (is_initialized(zram)) {
>> + mutex_unlock(&zram->init_lock);
>> return 0;
>> }
>>
>> @@ -583,8 +586,8 @@ int zram_init_device(struct zram *zram)
>> goto fail;
>> }
>>
>> - zram->init_done = 1;
>> - up_write(&zram->init_lock);
>> + atomic_set(&zram->init_done, 1);
>> + mutex_unlock(&zram->init_lock);
>>
>> pr_debug("Initialization done!\n");
>> return 0;
>> @@ -594,7 +597,7 @@ fail_no_table:
>> zram->disksize = 0;
>> fail:
>> __zram_reset_device(zram);
>> - up_write(&zram->init_lock);
>> + mutex_unlock(&zram->init_lock);
>> pr_err("Initialization failed: err=%d\n", ret);
>> return ret;
>> }
>> @@ -619,7 +622,8 @@ static int create_device(struct zram *zram, int device_id)
>> int ret = 0;
>>
>> init_rwsem(&zram->lock);
>> - init_rwsem(&zram->init_lock);
>> + mutex_init(&zram->init_lock);
>> + init_rwsem(&zram->req_lock);
>> spin_lock_init(&zram->stat64_lock);
>>
>> zram->queue = blk_alloc_queue(GFP_KERNEL);
>> @@ -672,7 +676,7 @@ static int create_device(struct zram *zram, int device_id)
>> goto out;
>> }
>>
>> - zram->init_done = 0;
>> + atomic_set(&zram->init_done, 0);
>>
>> out:
>> return ret;
>> @@ -755,8 +759,7 @@ static void __exit zram_exit(void)
>> zram = &zram_devices[i];
>>
>> destroy_device(zram);
>> - if (zram->init_done)
>> - zram_reset_device(zram);
>> + zram_reset_device(zram);
>> }
>>
>> unregister_blkdev(zram_major, "zram");
>> diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
>> index df2eec4..f6bcead 100644
>> --- a/drivers/staging/zram/zram_drv.h
>> +++ b/drivers/staging/zram/zram_drv.h
>> @@ -96,9 +96,12 @@ struct zram {
>> * against concurrent read and writes */
>> struct request_queue *queue;
>> struct gendisk *disk;
>> - int init_done;
>> - /* Prevent concurrent execution of device init, reset and R/W request */
>> - struct rw_semaphore init_lock;
>> + atomic_t init_done;
>> + /* Prevent concurrent execution of device init, reset and
>> + * disksize_store */
>> + struct mutex init_lock;
>> + /* Prevent concurent execution device reset and R/W requests */
>> + struct rw_semaphore req_lock;
>> /*
>> * This is the limit on amount of *uncompressed* worth of data
>> * we can store in a disk.
>> @@ -108,6 +111,11 @@ struct zram {
>> struct zram_stats stats;
>> };
>>
>> +static inline int is_initialized(struct zram *zram)
>> +{
>> + return atomic_read(&zram->init_done);
>> +}
>> +
>> extern struct zram *zram_devices;
>> unsigned int zram_get_num_devices(void);
>> #ifdef CONFIG_SYSFS
>> @@ -115,6 +123,6 @@ extern struct attribute_group zram_disk_attr_group;
>> #endif
>>
>> extern int zram_init_device(struct zram *zram);
>> -extern void __zram_reset_device(struct zram *zram);
>> +extern void zram_reset_device(struct zram *zram);
>>
>> #endif
>> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
>> index de1eacf..b300881 100644
>> --- a/drivers/staging/zram/zram_sysfs.c
>> +++ b/drivers/staging/zram/zram_sysfs.c
>> @@ -62,16 +62,19 @@ static ssize_t disksize_store(struct device *dev,
>> if (!disksize)
>> return -EINVAL;
>>
>> - down_write(&zram->init_lock);
>> - if (zram->init_done) {
>> - up_write(&zram->init_lock);
>> + mutex_lock(&zram->init_lock);
>> + down_write(&zram->req_lock);
>> + if (is_initialized(zram)) {
>> + up_write(&zram->req_lock);
>> + mutex_unlock(&zram->init_lock);
>> pr_info("Cannot change disksize for initialized device\n");
>> return -EBUSY;
>> }
>>
>> zram->disksize = PAGE_ALIGN(disksize);
>> set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
>> - up_write(&zram->init_lock);
>> + up_write(&zram->req_lock);
>> + mutex_unlock(&zram->init_lock);
>>
>> return len;
>> }
>> @@ -81,7 +84,7 @@ static ssize_t initstate_show(struct device *dev,
>> {
>> struct zram *zram = dev_to_zram(dev);
>>
>> - return sprintf(buf, "%u\n", zram->init_done);
>> + return sprintf(buf, "%u\n", atomic_read(&zram->init_done));
>> }
>>
>> static ssize_t reset_store(struct device *dev,
>> @@ -110,10 +113,7 @@ static ssize_t reset_store(struct device *dev,
>> if (bdev)
>> fsync_bdev(bdev);
>>
>> - down_write(&zram->init_lock);
>> - if (zram->init_done)
>> - __zram_reset_device(zram);
>> - up_write(&zram->init_lock);
>> + zram_reset_device(zram);
>>
>> return len;
>> }
>> @@ -186,7 +186,7 @@ static ssize_t mem_used_total_show(struct device *dev,
>> u64 val = 0;
>> struct zram *zram = dev_to_zram(dev);
>>
>> - if (zram->init_done)
>> + if (is_initialized(zram))
>> val = zs_get_total_size_bytes(zram->mem_pool);
>>
>> return sprintf(buf, "%llu\n", val);
>> --
>> 1.7.7.6
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to [email protected]. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2012-11-28 05:20:25

by Minchan Kim

[permalink] [raw]
Subject: Re: Lockdep complain for zram

On Fri, Nov 23, 2012 at 03:46:31PM +0100, Jerome Marchand wrote:
> On 11/23/2012 12:34 AM, Minchan Kim wrote:
> > On Thu, Nov 22, 2012 at 12:13:24PM +0100, Jerome Marchand wrote:
> >> On 11/21/2012 09:37 AM, Minchan Kim wrote:
> >>> Hi alls,
> >>>
> >>> Today, I saw below complain of lockdep.
> >>> As a matter of fact, I knew it long time ago but forgot that.
> >>> The reason lockdep complains is that now zram uses GFP_KERNEL
> >>> in reclaim path(ex, __zram_make_request) :(
> >>> I can fix it via replacing GFP_KERNEL with GFP_NOIO.
> >>> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> >>> Of course, I can change it with __vmalloc which can receive gfp_t.
> >>> But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
> >>> allocation of GFP_KERNEL. That's why I sent the patch.
> >>> https://lkml.org/lkml/2012/4/23/77
> >>> Since then, I forgot it, saw the bug today and poped the question again.
> >>>
> >>> Yes. Fundamental problem is utter crap API vmalloc.
> >>> If we can fix it, everyone would be happy. But life isn't simple like seeing
> >>> my thread of the patch.
> >>>
> >>> So next option is to move zram_init_device into setting disksize time.
> >>> But it makes unnecessary metadata waste until zram is used really(That's why
> >>> Nitin move zram_init_device from disksize setting time to make_request) and
> >>> it makes user should set the disksize before using, which are behavior change.
> >>>
> >>> I would like to clean up this issue before promoting because it might change
> >>> usage behavior.
> >>>
> >>> Do you have any idea?
> >>
> >> This is a false positive due to the memory allocation in
> >> zram_init_device() called from zram_make_request(). It appears to
> >> lockdep that the allocation might trigger a request on the device that
> >> would try to take init_lock again, but in fact it doesn't. The device
> >> is not initialized yet, even less swapped on.
> >
> > That's not a only swap case.
> > Let's think following usecase.
> >
> > 1) Booting
> > 2) echo $((DISKSIZE)) > /sys/block/zram0/disksize
> > 3) dd if=/dev/zero of=/dev/zram0 bs=4K count=1
> > 4) Written 4K page(page-A) is still page cache and isn't submitted
> > to zram block device.
> > 5) Memory pressure happen by some memory hogger.
> > 6) VM start to reclaim and write page-A to zram0.
> > 7) zram_init_device is called at last.
> > 8) allocate GFP_KERNEL in zram_init_device
> > 9) goto reclaim path again.
> > 10) deadlock.
> >
> > So I think it's not false positive.
>
> I guess you're right. That's a scenario I haven't imagined. At any rate, my
> patch fixes that.
>
> > Even if it is, I think lock split isn't a good idea to just avoid
> > lockdep warn. It makes code unnecessary complicated and it would be more
> > error-prone. Let's not add another lock without performance trouble report
> > by the lock.
> >
> > As I discussed with Nitin in this thread, lazy initialization don't have
> > much point and disksize setting option isn't consistent for user behavior.
> > And I expect Nitin will send patch "diet of table" soonish.
> >
> > So just moving the initialzation part from reclaim context to process's one
> > is simple and clear solution, I believe.
>
> Although that would avoid deadlocks (I guess, I'm not sure anymore...), it
> won't stop lockdep from complaining. It still makes an allocation while

Argh, I sent it by mistake anyway, It's false-positive by this patch now.
Anyway we need more patch to shut lockdep up. I just sent patchset.

> holding a lock that is also taken in a reclaim context.
> Anyway, I like the idea to removes the lazy initialization. It makes things
> more complicated without any actual advantage.

Thanks for the review, Jerome.

>
> Jerome
>
> >
> >>
> >> The following (quickly tested) patch should prevent lockdep complain.
> >>
> >> Jerome
> >>
> >> ---
> >> >From ebb3514c4ee18276da7c5ca08025991b493ac204 Mon Sep 17 00:00:00 2001
> >> From: Jerome Marchand <[email protected]>
> >> Date: Thu, 22 Nov 2012 09:07:40 +0100
> >> Subject: [PATCH] staging: zram: Avoid lockdep warning
> >>
> >> zram triggers a lockdep warning. The cause of it is the call to
> >> zram_init_device() from zram_make_request(). The memory allocation in
> >> zram_init_device() could start a memory reclaim which in turn could
> >> cause swapout and (as it appears to lockdep) a call to
> >> zram_make_request(). However this is a false positive: an
> >> unititialized device can't be used as swap.
> >> A solution is to split init_lock in two lock. One mutex that protects
> >> init, reset and size setting and a rw_semaphore that protects requests
> >> and reset. Thus init and request would be protected by different locks
> >> and lockdep will be happy.
> >>
> >> Signed-off-by: Jerome Marchand <[email protected]>
> >> ---
> >> drivers/staging/zram/zram_drv.c | 41 +++++++++++++++++++-----------------
> >> drivers/staging/zram/zram_drv.h | 16 ++++++++++---
> >> drivers/staging/zram/zram_sysfs.c | 20 +++++++++---------
> >> 3 files changed, 44 insertions(+), 33 deletions(-)
> >>
> >> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> >> index fb4a7c9..b3bc3c4 100644
> >> --- a/drivers/staging/zram/zram_drv.c
> >> +++ b/drivers/staging/zram/zram_drv.c
> >> @@ -470,11 +470,11 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
> >> {
> >> struct zram *zram = queue->queuedata;
> >>
> >> - if (unlikely(!zram->init_done) && zram_init_device(zram))
> >> + if (unlikely(!is_initialized(zram)) && zram_init_device(zram))
> >> goto error;
> >>
> >> - down_read(&zram->init_lock);
> >> - if (unlikely(!zram->init_done))
> >> + down_read(&zram->req_lock);
> >> + if (unlikely(!is_initialized(zram)))
> >> goto error_unlock;
> >>
> >> if (!valid_io_request(zram, bio)) {
> >> @@ -483,12 +483,12 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
> >> }
> >>
> >> __zram_make_request(zram, bio, bio_data_dir(bio));
> >> - up_read(&zram->init_lock);
> >> + up_read(&zram->req_lock);
> >>
> >> return;
> >>
> >> error_unlock:
> >> - up_read(&zram->init_lock);
> >> + up_read(&zram->req_lock);
> >> error:
> >> bio_io_error(bio);
> >> }
> >> @@ -497,7 +497,7 @@ void __zram_reset_device(struct zram *zram)
> >> {
> >> size_t index;
> >>
> >> - zram->init_done = 0;
> >> + atomic_set(&zram->init_done, 0);
> >>
> >> /* Free various per-device buffers */
> >> kfree(zram->compress_workmem);
> >> @@ -529,9 +529,12 @@ void __zram_reset_device(struct zram *zram)
> >>
> >> void zram_reset_device(struct zram *zram)
> >> {
> >> - down_write(&zram->init_lock);
> >> - __zram_reset_device(zram);
> >> - up_write(&zram->init_lock);
> >> + mutex_lock(&zram->init_lock);
> >> + down_write(&zram->req_lock);
> >> + if (is_initialized(zram))
> >> + __zram_reset_device(zram);
> >> + up_write(&zram->req_lock);
> >> + mutex_unlock(&zram->init_lock);
> >> }
> >>
> >> int zram_init_device(struct zram *zram)
> >> @@ -539,10 +542,10 @@ int zram_init_device(struct zram *zram)
> >> int ret;
> >> size_t num_pages;
> >>
> >> - down_write(&zram->init_lock);
> >> + mutex_lock(&zram->init_lock);
> >>
> >> - if (zram->init_done) {
> >> - up_write(&zram->init_lock);
> >> + if (is_initialized(zram)) {
> >> + mutex_unlock(&zram->init_lock);
> >> return 0;
> >> }
> >>
> >> @@ -583,8 +586,8 @@ int zram_init_device(struct zram *zram)
> >> goto fail;
> >> }
> >>
> >> - zram->init_done = 1;
> >> - up_write(&zram->init_lock);
> >> + atomic_set(&zram->init_done, 1);
> >> + mutex_unlock(&zram->init_lock);
> >>
> >> pr_debug("Initialization done!\n");
> >> return 0;
> >> @@ -594,7 +597,7 @@ fail_no_table:
> >> zram->disksize = 0;
> >> fail:
> >> __zram_reset_device(zram);
> >> - up_write(&zram->init_lock);
> >> + mutex_unlock(&zram->init_lock);
> >> pr_err("Initialization failed: err=%d\n", ret);
> >> return ret;
> >> }
> >> @@ -619,7 +622,8 @@ static int create_device(struct zram *zram, int device_id)
> >> int ret = 0;
> >>
> >> init_rwsem(&zram->lock);
> >> - init_rwsem(&zram->init_lock);
> >> + mutex_init(&zram->init_lock);
> >> + init_rwsem(&zram->req_lock);
> >> spin_lock_init(&zram->stat64_lock);
> >>
> >> zram->queue = blk_alloc_queue(GFP_KERNEL);
> >> @@ -672,7 +676,7 @@ static int create_device(struct zram *zram, int device_id)
> >> goto out;
> >> }
> >>
> >> - zram->init_done = 0;
> >> + atomic_set(&zram->init_done, 0);
> >>
> >> out:
> >> return ret;
> >> @@ -755,8 +759,7 @@ static void __exit zram_exit(void)
> >> zram = &zram_devices[i];
> >>
> >> destroy_device(zram);
> >> - if (zram->init_done)
> >> - zram_reset_device(zram);
> >> + zram_reset_device(zram);
> >> }
> >>
> >> unregister_blkdev(zram_major, "zram");
> >> diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
> >> index df2eec4..f6bcead 100644
> >> --- a/drivers/staging/zram/zram_drv.h
> >> +++ b/drivers/staging/zram/zram_drv.h
> >> @@ -96,9 +96,12 @@ struct zram {
> >> * against concurrent read and writes */
> >> struct request_queue *queue;
> >> struct gendisk *disk;
> >> - int init_done;
> >> - /* Prevent concurrent execution of device init, reset and R/W request */
> >> - struct rw_semaphore init_lock;
> >> + atomic_t init_done;
> >> + /* Prevent concurrent execution of device init, reset and
> >> + * disksize_store */
> >> + struct mutex init_lock;
> >> + /* Prevent concurent execution device reset and R/W requests */
> >> + struct rw_semaphore req_lock;
> >> /*
> >> * This is the limit on amount of *uncompressed* worth of data
> >> * we can store in a disk.
> >> @@ -108,6 +111,11 @@ struct zram {
> >> struct zram_stats stats;
> >> };
> >>
> >> +static inline int is_initialized(struct zram *zram)
> >> +{
> >> + return atomic_read(&zram->init_done);
> >> +}
> >> +
> >> extern struct zram *zram_devices;
> >> unsigned int zram_get_num_devices(void);
> >> #ifdef CONFIG_SYSFS
> >> @@ -115,6 +123,6 @@ extern struct attribute_group zram_disk_attr_group;
> >> #endif
> >>
> >> extern int zram_init_device(struct zram *zram);
> >> -extern void __zram_reset_device(struct zram *zram);
> >> +extern void zram_reset_device(struct zram *zram);
> >>
> >> #endif
> >> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
> >> index de1eacf..b300881 100644
> >> --- a/drivers/staging/zram/zram_sysfs.c
> >> +++ b/drivers/staging/zram/zram_sysfs.c
> >> @@ -62,16 +62,19 @@ static ssize_t disksize_store(struct device *dev,
> >> if (!disksize)
> >> return -EINVAL;
> >>
> >> - down_write(&zram->init_lock);
> >> - if (zram->init_done) {
> >> - up_write(&zram->init_lock);
> >> + mutex_lock(&zram->init_lock);
> >> + down_write(&zram->req_lock);
> >> + if (is_initialized(zram)) {
> >> + up_write(&zram->req_lock);
> >> + mutex_unlock(&zram->init_lock);
> >> pr_info("Cannot change disksize for initialized device\n");
> >> return -EBUSY;
> >> }
> >>
> >> zram->disksize = PAGE_ALIGN(disksize);
> >> set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
> >> - up_write(&zram->init_lock);
> >> + up_write(&zram->req_lock);
> >> + mutex_unlock(&zram->init_lock);
> >>
> >> return len;
> >> }
> >> @@ -81,7 +84,7 @@ static ssize_t initstate_show(struct device *dev,
> >> {
> >> struct zram *zram = dev_to_zram(dev);
> >>
> >> - return sprintf(buf, "%u\n", zram->init_done);
> >> + return sprintf(buf, "%u\n", atomic_read(&zram->init_done));
> >> }
> >>
> >> static ssize_t reset_store(struct device *dev,
> >> @@ -110,10 +113,7 @@ static ssize_t reset_store(struct device *dev,
> >> if (bdev)
> >> fsync_bdev(bdev);
> >>
> >> - down_write(&zram->init_lock);
> >> - if (zram->init_done)
> >> - __zram_reset_device(zram);
> >> - up_write(&zram->init_lock);
> >> + zram_reset_device(zram);
> >>
> >> return len;
> >> }
> >> @@ -186,7 +186,7 @@ static ssize_t mem_used_total_show(struct device *dev,
> >> u64 val = 0;
> >> struct zram *zram = dev_to_zram(dev);
> >>
> >> - if (zram->init_done)
> >> + if (is_initialized(zram))
> >> val = zs_get_total_size_bytes(zram->mem_pool);
> >>
> >> return sprintf(buf, "%llu\n", val);
> >> --
> >> 1.7.7.6
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to [email protected]. For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim