Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp5645367pxb; Mon, 28 Mar 2022 15:26:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwwfvA12WTbQ+32DAEIGO4FL8VCvQuK5LtuWj3x4V2A67IBJxazGhRFXDd0yfH12p/lpHZ2 X-Received: by 2002:a05:6102:34d:b0:325:7e73:cba4 with SMTP id e13-20020a056102034d00b003257e73cba4mr8706952vsa.6.1648506370891; Mon, 28 Mar 2022 15:26:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648506370; cv=none; d=google.com; s=arc-20160816; b=BryeSjpMPmTb58Qf+StEs8eUgU3jb4XesFEyvq/lKPtGsV4gkxXHoDFt/cps5IbIjn Vy86tgsJQf8+nkMRRcIxWTSC7UPTsUVpHjdTZwsdUVtre83a3Mh9ktTV7AOCl2u7PF7F WVX8FoOM/4Jkg/GkYIlY4oaaV4wYYMwvL7doZJQVNfPpI6EPEGiMJC6ylC/FSLAs2iLz lialyIZV0x7G1CaplGXwwdkwb6+8re/qz1cmh0y1XDU/p1GTBKyj30yRJjaEgsNy8+Yk hZnxOkbrfnW74KVAeuToifk0KSUJ8ldDR8x0I0swYiLcy1ps9jBk+E84lL0QVNHRio6j 07kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :content-language:accept-language:message-id:date:thread-index :thread-topic:subject:cc:to:from; bh=pW+7JO8km88EEmfbr5FQYx+dHiqGkFqgVctiml/AsZ4=; b=GbCRlDUm4yfpy/zZBMa2h1GaTGg00FdWH40d4dHRMd+ilg4iOZazCqT7JCdDNbiRtm 5RSyVAIWaPtQM7oJ2LLClXHDvA7aO+UT+Kc20tCbGrpixbn0j8FmVTEj2b5EEUVFRqi8 uaofLT3qZjl9fr5xMPCZRcIfsiVsL0MYlizN2sbzVpDS8EstGrp/+JPL6zAYASeW1dPT yIUX3UbYs5Qc3ARpPP13JT3xDS7+a6hfO9+nbmLuFGJgvxUGl2oPAIBh78GDD3u105LP yu9BZ7jLpWGBPieeetuHsztPEIxcqwzFAdSL8bMtlyGXpBKUhHa2vy1HOjW94ZQT8QU9 k2Fg== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id l2-20020a67f802000000b003254c3f9172si3134968vso.499.2022.03.28.15.26.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Mar 2022 15:26:10 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 42EB81AF7FE; Mon, 28 Mar 2022 14:39:41 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241506AbiC1MBh convert rfc822-to-8bit (ORCPT + 99 others); Mon, 28 Mar 2022 08:01:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242158AbiC1MBf (ORCPT ); Mon, 28 Mar 2022 08:01:35 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86E33DEED for ; Mon, 28 Mar 2022 04:59:51 -0700 (PDT) Received: from canpemm500003.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4KRrmL3V5CzCrSg; Mon, 28 Mar 2022 19:57:38 +0800 (CST) Received: from canpemm500001.china.huawei.com (7.192.104.163) by canpemm500003.china.huawei.com (7.192.105.39) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 28 Mar 2022 19:59:49 +0800 Received: from canpemm500001.china.huawei.com ([7.192.104.163]) by canpemm500001.china.huawei.com ([7.192.104.163]) with mapi id 15.01.2308.021; Mon, 28 Mar 2022 19:59:49 +0800 From: "Zhoujian (jay)" To: "linux-kernel@vger.kernel.org" CC: "gregkh@linuxfoundation.org" , "tj@kernel.org" , "Huangweidong (C)" , "Wangjing(Hogan)" , "Zhoujian (jay)" Subject: [Question] kernfs: NULL pointer dereference in kernfs_dop_revalidate() Thread-Topic: [Question] kernfs: NULL pointer dereference in kernfs_dop_revalidate() Thread-Index: AdhCmcTZEgAvxUCLQpCJ5NBzzvVfgw== Date: Mon, 28 Mar 2022 11:59:49 +0000 Message-ID: <685d8898ca614138a9c15f8042e3d291@huawei.com> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.174.151.254] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, We met a kernel panic during boot the new kernel using kexec, the kernel version is 4.18.0(with some our changes), here is the stack: [ 0.534423] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [ 0.542224] PGD 0 P4D 0 [ 0.542228] Oops: 0000 [#1] SMP NOPTI [ 0.559748] CPU: 80 PID: 603 Comm: systemd-journal Tainted: G OE --------- - - 4.18.0-147.5.2.5.h781_272.x86_64 #1 [ 0.571262] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 7.58 05/25/2020 [ 0.578200] RIP: 0010:kernfs_dop_revalidate+0x33/0xc0 [ 0.583234] Code: 85 a4 00 00 00 48 8b 57 30 31 c0 48 85 d2 74 51 41 54 55 53 48 8b aa 60 02 00 00 48 89 fb 48 c7 c7 40 50 f0 a9 e8 9d 53 2e 00 <8b> 45 04 85 c0 78 1d 48 8b 43 18 45 31 e4 48 8b 40 30 48 85 c0 74 [ 0.601919] RSP: 0018:ffffba340384fc70 EFLAGS: 00010246 [ 0.607126] RAX: 0000000000000000 RBX: ffff8e6042df3140 RCX: 0000000000000030 [ 0.614234] RDX: ffff8e604382bd00 RSI: 0000000000000000 RDI: ffffffffa9f05040 [ 0.621340] RBP: 0000000000000000 R08: 0000746e65766575 R09: 0000000000000006 [ 0.628446] R10: ffff8e604390d021 R11: 61c8864680b583eb R12: ffffba340384fcf8 [ 0.635553] R13: ffffba340384fcf0 R14: ffff8e6043677320 R15: ffff8e6042df2fc0 [ 0.642660] FS: 00007f2bf2cecd80(0000) GS:ffff8e617a500000(0000) knlGS:0000000000000000 [ 0.650719] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.656444] CR2: 0000000000000004 CR3: 000000604398e003 CR4: 00000000003606e0 [ 0.663553] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.670659] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 0.677766] Call Trace: [ 0.680211] lookup_fast+0x258/0x2d0 [ 0.683778] walk_component+0x48/0x480 [ 0.687516] ? link_path_walk+0x27c/0x510 [ 0.691513] path_lookupat+0x84/0x1f0 [ 0.695165] filename_lookup+0xb6/0x190 [ 0.698990] ? __check_object_size+0xd4/0x1a0 [ 0.703338] ? strncpy_from_user+0x45/0x190 [ 0.707509] ? do_faccessat+0xa2/0x230 [ 0.711246] do_faccessat+0xa2/0x230 [ 0.714812] do_syscall_64+0x5b/0x1b0 [ 0.718470] entry_SYSCALL_64_after_hwframe+0x65/0xca It seems that the user space uses the access() function to verify whether a directory or file exists. The kdump service has not started yet at this point, so there's no vmcore. I use the sysrq-trigger to get a vmcore manully, here is the part of disassembly code of function kernfs_dop_revalidate: --- 0 /usr/src/debug/kernel-4.18.0-147.5.2.5.h781_272.x86_64/linux-4.18.0-147.5.2.5.h781_272.x86_64/fs/kernfs/dir.c: 551 1 0xffffffff884c7e00 : nopl 0x0(%rax,%rax,1) [FTRACE NOP] 2 /usr/src/debug/kernel-4.18.0-147.5.2.5.h781_272.x86_64/linux-4.18.0-147.5.2.5.h781_272.x86_64/fs/kernfs/dir.c: 554 3 0xffffffff884c7e05 : and $0x40,%esi 4 0xffffffff884c7e08 : jne 0xffffffff884c7eb2 5 /usr/src/debug/kernel-4.18.0-147.5.2.5.h781_272.x86_64/linux-4.18.0-147.5.2.5.h781_272.x86_64/./include/linux/dcache.h: 481 6 0xffffffff884c7e0e : mov 0x30(%rdi),%rdx 7 /usr/src/debug/kernel-4.18.0-147.5.2.5.h781_272.x86_64/linux-4.18.0-147.5.2.5.h781_272.x86_64/fs/kernfs/dir.c: 586 8 0xffffffff884c7e12 : xor %eax,%eax 9 /usr/src/debug/kernel-4.18.0-147.5.2.5.h781_272.x86_64/linux-4.18.0-147.5.2.5.h781_272.x86_64/fs/kernfs/dir.c: 558 10 0xffffffff884c7e14 : test %rdx,%rdx 11 0xffffffff884c7e17 : je 0xffffffff884c7e6a 12 /usr/src/debug/kernel-4.18.0-147.5.2.5.h781_272.x86_64/linux-4.18.0-147.5.2.5.h781_272.x86_64/fs/kernfs/dir.c: 551 13 0xffffffff884c7e19 : push %r12 14 0xffffffff884c7e1b : push %rbp 15 0xffffffff884c7e1c : push %rbx 16 /usr/src/debug/kernel-4.18.0-147.5.2.5.h781_272.x86_64/linux-4.18.0-147.5.2.5.h781_272.x86_64/fs/kernfs/kernfs-internal.h: 79 17 0xffffffff884c7e1d : mov 0x260(%rdx),%rbp 18 0xffffffff884c7e24 : mov %rdi,%rbx 19 /usr/src/debug/kernel-4.18.0-147.5.2.5.h781_272.x86_64/linux-4.18.0-147.5.2.5.h781_272.x86_64/fs/kernfs/dir.c: 562 20 0xffffffff884c7e27 : mov $0xffffffff89105040,%rdi 21 0xffffffff884c7e2e : callq 0xffffffff887ad1d0 22 /usr/src/debug/kernel-4.18.0-147.5.2.5.h781_272.x86_64/linux-4.18.0-147.5.2.5.h781_272.x86_64/./include/linux/compiler.h: 188 23 0xffffffff884c7e33 : mov 0x4(%rbp),%eax ...... --- According to the message: " BUG: unable to handle kernel NULL pointer dereference at 0000000000000004" and RIP "kernfs_dop_revalidate+0x33", the %rbp is NULL, it is assigned at line 17 above, and the offset of i_private in struct inode is 0x260, so it turns out that the i_private is NULL, which is returned from kernfs_dentry_node. --- 75 static inline struct kernfs_node *kernfs_dentry_node(struct dentry *dentry) 76 { 77 if (d_really_is_negative(dentry)) 78 return NULL; 79 return d_inode(dentry)->i_private; 80 } --- 550 static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) 551 { 552 struct kernfs_node *kn; 553 554 if (flags & LOOKUP_RCU) 555 return -ECHILD; 556 557 /* Always perform fresh lookup for negatives */ 558 if (d_really_is_negative(dentry)) 559 goto out_bad_unlocked; 560 561 kn = kernfs_dentry_node(dentry); 562 mutex_lock(&kernfs_mutex); 563 564 /* The kernfs node has been deactivated */ 565 if (!kernfs_active(kn)) 566 goto out_bad; 567 ...... --- We've tried to delete and access some files in sysfs concurrently at the user space, but can't reproduce the problem for now, :-( Any advice about the problem or how to debug the kernfs? Thanks! Jay Zhou