Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752162AbdHHQVc (ORCPT ); Tue, 8 Aug 2017 12:21:32 -0400 Received: from gum.cmpxchg.org ([85.214.110.215]:58666 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752030AbdHHQVa (ORCPT ); Tue, 8 Aug 2017 12:21:30 -0400 Date: Tue, 8 Aug 2017 12:21:22 -0400 From: Johannes Weiner To: Bradley Bolen Cc: linux-mm@kvack.org, jaegeuk@kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: kernel panic on null pointer on page->mem_cgroup Message-ID: <20170808162122.GA14689@cmpxchg.org> References: <20170805155241.GA94821@jaegeuk-macbookpro.roam.corp.google.com> <20170808010150.4155-1-bradleybolen@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170808010150.4155-1-bradleybolen@gmail.com> User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4750 Lines: 103 Hi Jaegeuk and Bradley, On Mon, Aug 07, 2017 at 09:01:50PM -0400, Bradley Bolen wrote: > I am getting a very similar error on v4.11 with an arm64 board. > > I, too, also see page->mem_cgroup checked to make sure that it is not > NULL and then several instructions later it is NULL. It does appear > that someone is changing that member without taking the lock. In my > setup, I see > > crash> bt > PID: 72 TASK: e1f48640 CPU: 0 COMMAND: "mmcqd/1" > #0 [] (__crash_kexec) from [] > #1 [] (panic) from [] > #2 [] (svcerr_panic) from [] > #3 [] (_SvcErr_) from [] > #4 [] (die) from [] > #5 [] (__do_kernel_fault) from [] > #6 [] (do_page_fault) from [] > #7 [] (do_DataAbort) from [] > pc : [] lr : [] psr: a0000193 > sp : c1a19cc8 ip : 00000000 fp : c1a19d04 > r10: 0006ae29 r9 : 00000000 r8 : dfbf1800 > r7 : dfbf1800 r6 : 00000001 r5 : f3c1107c r4 : e2fb6424 > r3 : 00000000 r2 : 00040228 r1 : 221e3000 r0 : a0000113 > Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM > #8 [] (__dabt_svc) from [] > #9 [] (test_clear_page_writeback) from [] > #10 [] (end_page_writeback) from [] > #11 [] (end_swap_bio_write) from [] > #12 [] (bio_endio) from [] > #13 [] (dec_pending) from [] > #14 [] (clone_endio) from [] > #15 [] (bio_endio) from [] > #16 [] (crypt_dec_pending [dm_crypt]) from [] > #17 [] (crypt_endio [dm_crypt]) from [] > #18 [] (bio_endio) from [] > #19 [] (blk_update_request) from [] > #20 [] (blk_update_bidi_request) from [] > #21 [] (blk_end_bidi_request) from [] > #22 [] (blk_end_request) from [] > #23 [] (mmc_blk_issue_rw_rq) from [] > #24 [] (mmc_blk_issue_rq) from [] > #25 [] (mmc_queue_thread) from [] > #26 [] (kthread) from [] > crash> sym c0112540 > c0112540 (T) test_clear_page_writeback+512 > /kernel-source/include/linux/memcontrol.h: 518 > > crash> bt 35 > PID: 35 TASK: e1d45dc0 CPU: 1 COMMAND: "kswapd0" > #0 [] (__schedule) from [] > #1 [] (schedule) from [] > #2 [] (schedule_timeout) from [] > #3 [] (io_schedule_timeout) from [] > #4 [] (mempool_alloc) from [] > #5 [] (bio_alloc_bioset) from [] > #6 [] (get_swap_bio) from [] > #7 [] (__swap_writepage) from [] > #8 [] (swap_writepage) from [] > #9 [] (shmem_writepage) from [] > #10 [] (shrink_page_list) from [] > #11 [] (shrink_inactive_list) from [] > #12 [] (shrink_node_memcg) from [] > #13 [] (shrink_node) from [] > #14 [] (kswapd) from [] > #15 [] (kthread) from [] > > It appears that uncharge_list() in mm/memcontrol.c is not taking the > page lock when it sets mem_cgroup to NULL. I am not familiar with the > mm code so I do not know if this is on purpose or not. There is a > comment in uncharge_list that makes me believe that the crashing code > should not have been running: > /* > * Nobody should be changing or seriously looking at > * page->mem_cgroup at this point, we have fully > * exclusive access to the page. > */ > However, I am new to looking at this area of the kernel so I am not > sure. The lock is for pages that are actively being used, whereas the free path requires the page refcount to be 0; nobody else should be having access to the page at that time. > I was able to create a reproducible scenario by using a udelay to > increase the time between the if (page->mem_cgroup) check and the later > dereference of it to increase the race window. I then mounted an empty > ext4 partition and ran the following no more than twice before it > crashed. > dd if=/dev/zero of=/tmp/ext4disk/test bs=1M count=100 Thanks, that's useful. I'm going to try to reproduce this also. There is a VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page); inside uncharge_list() that verifies that there shouldn't in fact be any pages ending writeback when they get into that function. Can you build your kernel with CONFIG_DEBUG_VM to enable that test? Thanks