Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47A3EC54EED for ; Mon, 30 Jan 2023 01:16:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235262AbjA3BQT (ORCPT ); Sun, 29 Jan 2023 20:16:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229614AbjA3BQR (ORCPT ); Sun, 29 Jan 2023 20:16:17 -0500 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1749144B5 for ; Sun, 29 Jan 2023 17:16:15 -0800 (PST) Received: from dggpemm500001.china.huawei.com (unknown [172.30.72.54]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4P4qvq13H7z16Mdx; Mon, 30 Jan 2023 09:14:15 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm500001.china.huawei.com (7.185.36.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Mon, 30 Jan 2023 09:16:13 +0800 Message-ID: <568c10e8-c225-b3c4-483a-5bb3329de4c5@huawei.com> Date: Mon, 30 Jan 2023 09:16:13 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 Subject: Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty() Content-Language: en-US To: Andrew Morton CC: Tejun Heo , Jens Axboe , Jan Kara , Shakeel Butt , Naoya Horiguchi , , , Ma Wupeng , Michal Hocko References: <20230129024451.121590-1-wangkefeng.wang@huawei.com> <20230129134815.21083b65ef3ae4c3e7fae8eb@linux-foundation.org> From: Kefeng Wang In-Reply-To: <20230129134815.21083b65ef3ae4c3e7fae8eb@linux-foundation.org> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemm500001.china.huawei.com (7.185.36.107) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023/1/30 5:48, Andrew Morton wrote: > On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang wrote: > >> As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"), > > Merged in 2017. > >> hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg >> could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could >> occurs a NULL pointer dereference, let's do not record the foreign >> writebacks for folio memcg is null in mem_cgroup_track_foreign() to >> fix it. >> >> Reported-by: Ma Wupeng >> Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing") > > Merged in 2019. > >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -1688,10 +1688,13 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio, >> static inline void mem_cgroup_track_foreign_dirty(struct folio *folio, >> struct bdi_writeback *wb) >> { >> + struct mem_cgroup *memcg; >> + >> if (mem_cgroup_disabled()) >> return; >> >> - if (unlikely(&folio_memcg(folio)->css != wb->memcg_css)) >> + memcg = folio_memcg(folio); >> + if (unlikely(memcg && &memcg->css != wb->memcg_css)) >> mem_cgroup_track_foreign_dirty_slowpath(folio, wb); >> } > > Has this null deref actually been observed, or is this from code > inspection? (This is why it's nice to include the Link: after a > Reported-by!) It does occurs in our internal test and report by wupeng(based v5.10), BUG: KASAN: user-memory-access in mem_cgroup_track_foreign_dirty_slowpath+0xc0/0x480 mm/memcontrol.c:4708 Read of size 8 at addr 0000000000001000 by task syz-executor.2/28325 CPU: 2 PID: 28325 Comm: syz-executor.2 Tainted: G W 5.10.0-03333-g48e46a146cbc-dirty #1 Hardware name: linux,dummy-virt (DT) Call trace: ... mem_cgroup_track_foreign_dirty_slowpath+0xc0/0x480 mm/memcontrol.c:4708 mem_cgroup_track_foreign_dirty include/linux/memcontrol.h:1880 [inline] account_page_dirtied+0x9a0/0xa90 mm/page-writeback.c:2436 __set_page_dirty+0x1f8/0x4b0 fs/buffer.c:608 __set_page_dirty_buffers+0x3d0/0x550 fs/buffer.c:668 set_page_dirty+0x158/0x500 mm/page-writeback.c:2575 filemap_page_mkwrite+0x3dc/0x490 mm/filemap.c:3224 do_page_mkwrite+0xc4/0x3d0 mm/memory.c:2786 wp_page_shared+0x14c/0x980 mm/memory.c:3118 do_wp_page+0x930/0xbc0 mm/memory.c:3219 handle_pte_fault+0x5e0/0x630 mm/memory.c:4570 __handle_mm_fault+0x41c/0x910 mm/memory.c:4690 handle_mm_fault+0x25c/0x484 mm/memory.c:4788 __do_page_fault arch/arm64/mm/fault.c:440 [inline] do_page_fault+0x3ac/0x9d4 arch/arm64/mm/fault.c:539 > > Do we have any theories why this took so many years to surface? After google, I found a similar issue[1], maybe hwpoison/mem_cgroup_track_foreign_dirty concurrency is uncommon. [1] https://syzkaller.appspot.com/bug?extid=0c84bf23aed8ee0d8399 > > I'm confused about the mention of 18365225f044, but the Fixes: target > is a different commit. Please explain this? 18365225f044 said that it will uncharge it manually from its memcg in hwpison handler, but when the new feature "writeback, memcg: Implement foreign dirty flushing" is introduced, we don't consider this, when folio's memcg is cleared by hwpison handler, the issue occurs. > > Do you think the fix should be backported into earlier -stable kernels? it's better to send stable kernel. > If so, it will need some rework due to the subsequent folio > conversion. When the patch is merged, I could refresh and send to stable maillist. > > >