2021-03-24 19:52:16

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count

On 3/24/21 12:20 PM, Minchan Kim wrote:
> struct cma_stat's lifespan for cma_sysfs is different with
> struct cma because kobject for sysfs requires dynamic object
> while CMA is static object[1]. When CMA is initialized,
> it couldn't use slab to allocate cma_stat since slab was not
> initialized yet. Thus, it allocates the dynamic object
> in subsys_initcall.
>
> However, the cma allocation can happens before subsys_initcall
> then, it goes crash.
>
> Dmitry reported[2]:
>
> ..
> [ 1.226190] [<c027762f>] (cma_sysfs_alloc_pages_count) from [<c027706f>] (cma_alloc+0x153/0x274)
> [ 1.226720] [<c027706f>] (cma_alloc) from [<c01112ab>] (__alloc_from_contiguous+0x37/0x8c)
> [ 1.227272] [<c01112ab>] (__alloc_from_contiguous) from [<c1104af9>] (atomic_pool_init+0x7b/0x126)
> [ 1.233596] [<c1104af9>] (atomic_pool_init) from [<c0101d69>] (do_one_initcall+0x45/0x1e4)
> [ 1.234188] [<c0101d69>] (do_one_initcall) from [<c1101141>] (kernel_init_freeable+0x157/0x1a6)
> [ 1.234741] [<c1101141>] (kernel_init_freeable) from [<c0a27fd1>] (kernel_init+0xd/0xe0)
> [ 1.235289] [<c0a27fd1>] (kernel_init) from [<c0100155>] (ret_from_fork+0x11/0x1c)
>
> This patch moves those statistic fields of cma_stat into struct cma
> and introduces cma_kobject wrapper to follow kobject's rule.
>
> At the same time, it fixes other routines based on suggestions[3][4].
>
> [1] https://lore.kernel.org/linux-mm/[email protected]/
> [2] https://lore.kernel.org/linux-mm/[email protected]/
> [3] https://lore.kernel.org/linux-mm/[email protected]/
> [4] https://lore.kernel.org/linux-mm/[email protected]/
>
> Reported-by: Dmitry Osipenko <[email protected]>
> Tested-by: Dmitry Osipenko <[email protected]>
> Suggested-by: Dmitry Osipenko <[email protected]>
> Suggested-by: John Hubbard <[email protected]>
> Suggested-by: Matthew Wilcox <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> I belive it's worth to have separate patch rather than replacing
> original patch. It will also help to merge without conflict
> since we already filed other patch based on it.
> Strictly speaking, separating fix part and readbility part
> in this patch would be better but it's gray to separate them
> since most code in this patch was done while we were fixing
> the bug. Since we don't release it yet, I hope it will work.
> Otherwise, I can send a replacement patch inclucing all of
> changes happend until now with gathering SoB.

If we still have a choice, we should not merge a patch that has a known
serious problem, such as a crash. That's only done if the broken problematic
patch has already been committed to a tree that doesn't allow rebasing,
such as of course the main linux.git.

Here, I *think* it's just in linux-next and mmotm, so we still are allowed
to fix the original patch.

thanks,
--
John Hubbard
NVIDIA


2021-03-25 03:25:07

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count

On 24.03.21 20:45, John Hubbard wrote:
> On 3/24/21 12:20 PM, Minchan Kim wrote:
>> struct cma_stat's lifespan for cma_sysfs is different with
>> struct cma because kobject for sysfs requires dynamic object
>> while CMA is static object[1]. When CMA is initialized,
>> it couldn't use slab to allocate cma_stat since slab was not
>> initialized yet. Thus, it allocates the dynamic object
>> in subsys_initcall.
>>
>> However, the cma allocation can happens before subsys_initcall
>> then, it goes crash.
>>
>> Dmitry reported[2]:
>>
>> ..
>> [ 1.226190] [<c027762f>] (cma_sysfs_alloc_pages_count) from [<c027706f>] (cma_alloc+0x153/0x274)
>> [ 1.226720] [<c027706f>] (cma_alloc) from [<c01112ab>] (__alloc_from_contiguous+0x37/0x8c)
>> [ 1.227272] [<c01112ab>] (__alloc_from_contiguous) from [<c1104af9>] (atomic_pool_init+0x7b/0x126)
>> [ 1.233596] [<c1104af9>] (atomic_pool_init) from [<c0101d69>] (do_one_initcall+0x45/0x1e4)
>> [ 1.234188] [<c0101d69>] (do_one_initcall) from [<c1101141>] (kernel_init_freeable+0x157/0x1a6)
>> [ 1.234741] [<c1101141>] (kernel_init_freeable) from [<c0a27fd1>] (kernel_init+0xd/0xe0)
>> [ 1.235289] [<c0a27fd1>] (kernel_init) from [<c0100155>] (ret_from_fork+0x11/0x1c)
>>
>> This patch moves those statistic fields of cma_stat into struct cma
>> and introduces cma_kobject wrapper to follow kobject's rule.
>>
>> At the same time, it fixes other routines based on suggestions[3][4].
>>
>> [1] https://lore.kernel.org/linux-mm/[email protected]/
>> [2] https://lore.kernel.org/linux-mm/[email protected]/
>> [3] https://lore.kernel.org/linux-mm/[email protected]/
>> [4] https://lore.kernel.org/linux-mm/[email protected]/
>>
>> Reported-by: Dmitry Osipenko <[email protected]>
>> Tested-by: Dmitry Osipenko <[email protected]>
>> Suggested-by: Dmitry Osipenko <[email protected]>
>> Suggested-by: John Hubbard <[email protected]>
>> Suggested-by: Matthew Wilcox <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>> ---
>> I belive it's worth to have separate patch rather than replacing
>> original patch. It will also help to merge without conflict
>> since we already filed other patch based on it.
>> Strictly speaking, separating fix part and readbility part
>> in this patch would be better but it's gray to separate them
>> since most code in this patch was done while we were fixing
>> the bug. Since we don't release it yet, I hope it will work.
>> Otherwise, I can send a replacement patch inclucing all of
>> changes happend until now with gathering SoB.
>
> If we still have a choice, we should not merge a patch that has a known
> serious problem, such as a crash. That's only done if the broken problematic
> patch has already been committed to a tree that doesn't allow rebasing,
> such as of course the main linux.git.
>
> Here, I *think* it's just in linux-next and mmotm, so we still are allowed
> to fix the original patch.

Yes, that's what we should do in case it's not upstream yet. Clean
resend + re-apply.


--
Thanks,

David / dhildenb