Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp1690906ybk; Thu, 21 May 2020 12:51:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzplDaZ3ik3oS+G1YGO81SqMwpb1og1wQ/cy9N0VnzHS9a3VEnUFoZF9D66CgX8qaO1loWm X-Received: by 2002:a50:f017:: with SMTP id r23mr312970edl.290.1590090684588; Thu, 21 May 2020 12:51:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590090684; cv=none; d=google.com; s=arc-20160816; b=hjrSx8DRiuTYhsVTer59IZ9/lFycOCtvZrgIIr0BgWq/OFMN66C27W3hpqd/qHnxvm /Jfol2gli/yFgbNnaARDkA/0KqnKxQR3dylW7INbF5X91LD4+dae/tBUvNJFZ3NZttN3 vW5HG0/Vy4pIWQDq8EhaUIk2R2qsYulHCAjk/BsG+ez2SOB3n/v3+E++tTsE9Kk2Wef+ aIgSKenYZUh6vM93+WwnFrIk3Q7PA71NCKD8oAmc1xT4wX3bRxoY7mULJqr8TKIQeeS2 qPvfThQ7ALayD5Dw6cj3qInPhUOyfECaAG0rpOip4hV6SHdDyA55emzhUbm7l8IWzY5O P0DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=OzuhmvdoIxBKTVFXltvkyaowfdMK/Lfhv/xnsclt3Rc=; b=iajo9Po8N77hSmJGmOpgKdebC2jquHbgx/NPB27O0UMo+EJBM04DtPj8u0u0jinmVq BX7lkdiuAxRMU83ACdTv13+hDPRi7dk4vRbuf8Fzz5dwYmyUU6GyF6QXYYKMyi2xVuLJ oZ3lNbeA+Lgtm8wjKLyhus8vVlbHME0J6P4Xm72M5bPzyMSKENXaF0aquAbUiWdE36p0 jZv7mzddlkrnemwXXzkN9Y4gR9eQ9h5KZ3HqAFakszS0zJv7TTi6MWVo4qEAkppP0F/U Ip9RDrj1rxlwAt72fX5BN7COk+aULOW7lcZmRjQEwQaDaHPUqN9dhDCK23uzMe4MJxaF iXfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=NObNT5xw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id nj4si3942265ejb.643.2020.05.21.12.51.01; Thu, 21 May 2020 12:51:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=NObNT5xw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729326AbgEUTtS (ORCPT + 99 others); Thu, 21 May 2020 15:49:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728635AbgEUTtS (ORCPT ); Thu, 21 May 2020 15:49:18 -0400 Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DE4DC061A0E for ; Thu, 21 May 2020 12:49:18 -0700 (PDT) Received: by mail-ot1-x341.google.com with SMTP id f18so6451198otq.11 for ; Thu, 21 May 2020 12:49:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=OzuhmvdoIxBKTVFXltvkyaowfdMK/Lfhv/xnsclt3Rc=; b=NObNT5xwSeyx5e1VPrIDD+ZRYKW07354XH9bCIyqu+FydtwFjKzNSzoRn3HvPqYtWo V27pcSOFHpM2Q0Vtlnap6dL05rDdRy+8fZBrGH+oe/vH3obTzfNDbfqhRNgOAAmBBAm3 B4L1WCwfj9iYFHC+LXeLop7MQIWUM3PWd4RSeWAeD1A+K7oX3YIYoiQMMa2lKmv9na+J eq3ePN7+l4HsVD2KxIrXISzAt5mouL4hS24weTfLzbGq1b+/n1RGFjthM71DGqV8vbnz pReU8EoW3+pVLl0kDjZO5fhCj7H0xVqUclSuqhR/bAfVbEiLdB1Fi3HJhtSVy59GVjrV nKwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=OzuhmvdoIxBKTVFXltvkyaowfdMK/Lfhv/xnsclt3Rc=; b=UiYVZOk8LrE9Hbk90X3V5qaLlC5goX9IXveQaElygPVA/TokDPETzAUjBNzvnpvsV8 FOUFpdKQumZWTzfiaER43+XZcZeICqMvtjuL7RHRqpWkbfUK9S1iQztpXh+DvRmQJFZj /1x4rKI8uN5XXueIyXbVe1ULDFjFzOFp6M7ie8igz7Mwyy4bxfkSPdgzj/v2uH9XH1Ji wu0Xev909jiudqppcSm7ICNkVhPVu0eZxBtFrEU3cmMm/N5ECTkv5ipq1B+b4IcBnI0M P57gIoB6MGxhhM5aki34U2MBLDh3OXjBNkCy6xeYXttoXCcNnOT5hM/CyirjlJgVTXTk MXLw== X-Gm-Message-State: AOAM5313fWnVEUut7bXlsEQmg6NQPsnujaHnhb/yp5jbIVjfGvRPhExK WSh99JnzG0glc5gQsxm8VeDCkQ== X-Received: by 2002:a9d:2da8:: with SMTP id g37mr8443930otb.293.1590090556781; Thu, 21 May 2020 12:49:16 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j6sm301172ots.61.2020.05.21.12.49.13 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Thu, 21 May 2020 12:49:15 -0700 (PDT) Date: Thu, 21 May 2020 12:48:58 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Qian Cai cc: Hugh Dickins , Andrew Morton , Anshuman Khandual , Johannes Weiner , Naoya Horiguchi , Zi Yan , John Hubbard , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH mmotm] mm/vmstat: Add events for PMD based THP migration without split fix In-Reply-To: <20200521185419.GB6367@ovpn-112-192.phx2.redhat.com> Message-ID: References: <20200521185419.GB6367@ovpn-112-192.phx2.redhat.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 21 May 2020, Qian Cai wrote: > On Thu, May 21, 2020 at 06:49:51AM -0700, Hugh Dickins wrote: > > Fix 5.7-rc6-mm1 page migration crash in unmap_and_move(): when the > > page to be migrated has been freed from under us, that is considered > > a MIGRATEPAGE_SUCCESS, but no newpage has been allocated (and I don't > > think it would ever need to be counted as a successful THP migration). > > > > Signed-off-by: Hugh Dickins > > --- > > Fix to mm-vmstat-add-events-for-pmd-based-thp-migration-without-split.patch > > > > mm/migrate.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > --- 5.7-rc6-mm1/mm/migrate.c 2020-05-20 12:21:56.117693827 -0700 > > +++ linux/mm/migrate.c 2020-05-20 15:08:12.319476978 -0700 > > @@ -1248,7 +1248,7 @@ out: > > * we want to retry. > > */ > > if (rc == MIGRATEPAGE_SUCCESS) { > > - if (PageTransHuge(newpage)) > > + if (newpage && PageTransHuge(newpage)) > > Should this be > > if (!IS_ERR_OR_NULL(newpage) && PageTransHuge(newpage)) ? No: newpage is set higher up by: newpage = get_new_page(page, private); if (!newpage) return -ENOMEM; and there's enough other places after that which would crash on an erroneous newpage. (Of course, the get_new_page() convention might be changed in future to return an error code, but no need to make such a change here and now.) > > I have also crashed here due to the buggy commit, > > unmap_and_move() -> PageTransHuge(page) -> page->compound_head > > but it said 0x00000008 instead of NULL which is aweful a lot like, 0x00000008 is where it's trying to dereference NULL->compound_head, no mystery there. > > https://lore.kernel.org/linux-mm/20200512215813.GA487759@cmpxchg.org/ > > Interesting thing is I applied this patch and the problem went away, but not > sure if it could still be ERR_PTR sometimes just not always? > > [ 210.929981][ T4159] BUG: Kernel NULL pointer dereference on read at 0x00000008 > [ 210.930009][ T4159] Faulting instruction address: 0xc0000000005196c8 > [ 210.930027][ C61] irq event stamp: 270727 > [ 210.930028][ T4159] Oops: Kernel access of bad area, sig: 11 [#1] > [ 210.930033][ T4159] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV > [ 210.930058][ C61] hardirqs last enabled at (270726): [] _raw_spin_unlock_irqrestore+0x94/0xd0 > [ 210.930064][ T4159] Modules linked in: kvm_hv kvm ip_tables x_tables xfs sd_mod tg3 bnx2x libphy mdio firmware_class ahci libahci libata dm_mirror dm_region_hash dm_log dm_mod > [ 210.930083][ C61] hardirqs last disabled at (270727): [] _raw_spin_lock_irqsave+0x3c/0xa0 > [ 210.930086][ C61] softirqs last enabled at (270556): [] __do_softirq+0x6dc/0xaa8 > [ 210.930125][ T4159] CPU: 109 PID: 4159 Comm: test.sh Not tainted 5.7.0-rc6-next-20200521+ #112 > [ 210.930163][ C61] softirqs last disabled at (270559): [] run_ksoftirqd+0x74/0xc0 > [ 210.930260][ T4159] NIP: c0000000005196c8 LR: c000000000519568 CTR: 0000000000000000 > [ 210.930307][ T4159] REGS: c0002005b10af570 TRAP: 0300 Not tainted (5.7.0-rc6-next-20200521+) > [ 210.930342][ T4159] MSR: 900000000280b033 CR: 24248242 XER: 00000000 > [ 210.930403][ T4159] CFAR: c000000000519570 DAR: 0000000000000008 DSISR: 40000000 IRQMASK: 0 > [ 210.930403][ T4159] GPR00: c000000000519568 c0002005b10af800 c000000001765500 0000000000000000 > [ 210.930403][ T4159] GPR04: c000000001c18d28 0000000000000006 0000000035279396 fffffffef69809d2 > [ 210.930403][ T4159] GPR08: 0000201cc6240000 0000000000000000 0000000000000000 0000000000000008 > [ 210.930403][ T4159] GPR12: 0000000000008000 c000201fff670600 0000000000000000 c00c000805312a00 > [ 210.930403][ T4159] GPR16: c00000000050b090 c000000000c857d0 c00c000805312a80 0000000000000000 > [ 210.930403][ T4159] GPR20: 0000000000000001 0000000000000000 c0002005b10af978 c000201ffc7c8780 > [ 210.930403][ T4159] GPR24: fffffffffffffff5 0000000000000002 0000000020000000 0000000000000000 > [ 210.930403][ T4159] GPR28: 0000000000000007 0000000000000001 0000000000000000 c00c000805312a08 > [ 210.930740][ T4159] NIP [c0000000005196c8] migrate_pages+0xc18/0x1ad0 > [ 210.930775][ T4159] LR [c000000000519568] migrate_pages+0xab8/0x1ad0 > [ 210.930824][ T4159] Call Trace: > [ 210.930843][ T4159] [c0002005b10af800] [c000000000519568] migrate_pages+0xab8/0x1ad0 (unreliable) > [ 210.930882][ T4159] [c0002005b10af910] [c00000000050b6fc] do_migrate_range+0x25c/0x8f0 > [ 210.930940][ T4159] [c0002005b10afa10] [c00000000050e974] __offline_pages+0x6e4/0x8b0 > [ 210.930988][ T4159] [c0002005b10afb40] [c000000000887f6c] memory_block_action+0xac/0xc0 > [ 210.931016][ T4159] [c0002005b10afba0] [c000000000888618] memory_subsys_offline+0x58/0xa0 > [ 210.931030][ T4159] [c0002005b10afbd0] [c0000000008621a0] device_offline+0x100/0x140 > [ 210.931080][ T4159] [c0002005b10afc10] [c000000000888938] state_store+0x108/0x190 > [ 210.931128][ T4159] [c0002005b10afc50] [c00000000085b628] dev_attr_store+0x38/0x60 > [ 210.931176][ T4159] [c0002005b10afc70] [c0000000006b9790] sysfs_kf_write+0x70/0xb0 > [ 210.931211][ T4159] [c0002005b10afcb0] [c0000000006b895c] kernfs_fop_write+0x11c/0x270 > [ 210.931249][ T4159] [c0002005b10afd00] [c00000000057bcac] __vfs_write+0x3c/0x70 > [ 210.931273][ T4159] [c0002005b10afd20] [c00000000057f0ac] vfs_write+0xcc/0x200 > [ 210.931319][ T4159] [c0002005b10afd70] [c00000000057f44c] ksys_write+0x7c/0x140 > [ 210.931345][ T4159] [c0002005b10afdc0] [c000000000039e78] system_call_exception+0x108/0x1d0 > [ 210.931395][ T4159] [c0002005b10afe20] [c00000000000c9f0] system_call_common+0xf0/0x278 > [ 210.931445][ T4159] Instruction dump: > [ 210.931476][ T4159] 9bad0988 e90d0028 3d22ff9f 3929c670 7d49402a 394a0001 7d49412a 4bafdec5 > [ 210.931492][ T4159] 60000000 4bfff544 2fbe0000 409e04c4 71290001 40820928 e93b0000 > [ 210.931521][ T4159] ---[ end trace 03092b3800dbb5cb ]--- > [ 211.416724][ T4159] > [ 212.416810][ T4159] Kernel panic - not syncing: Fatal exception > [ 213.829268][ T4159] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > thp_migration_success(true); > > put_page(page); > > if (reason == MR_MEMORY_FAILURE) { > >