Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp4875052iob; Mon, 9 May 2022 03:48:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz1OxCWWbQ0gJ+6Sj8RnIeCuKFFiLiTnquxzFUPuPmCoGbpHqUqPd0fCz2XMhvCnDaLjto0 X-Received: by 2002:a17:90a:5515:b0:1dc:c1f1:59bd with SMTP id b21-20020a17090a551500b001dcc1f159bdmr22188173pji.81.1652093301096; Mon, 09 May 2022 03:48:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652093301; cv=none; d=google.com; s=arc-20160816; b=FUC9E300XUm22RKYxrqjAGPvLHO9V8oTGn5+sL67Erin5nX4Ut1zYwJH1KdwUvFnFM +LMH1gPtF1SYiP3iaiEyLdvu+DnQ+3oaFWB2kJX3iBJBkNLOcEDLFK7WlOm4FrDq3dN7 F5iMXkz3WTLDEhLmdVcoOQgwXLtYSeU4GuvUA4X82qSFgYSNy2jUD4iBi82EPBeN51O/ SNZLD6jzJ/ueIAuDD4YSXSmbLX/XkGgUxX3tubmrQl3uyCzmPbA42SiTngKNSDrAuo6h Mf4Nrae/vtRLPQPY2cDL/2ZbykFms7hnXeHfoUqpkmMqieS2Q/plOk5Svx5qytGlb1kT 96DA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=x5/snJLO1j8vBJD1Z+/DJkOCvwF+wr626QxrCymBX+Y=; b=fT8QY01KbZAZvOia3OQLmZm/va+QUxj93REwhE2c1buX6MKT2DeLBqp8MYzUAHoy11 6AUFpvPWbuO5EEDPzsQA4vJz7uAbNvlMlZ33mQZkjvbQfeQLUDR0bSCdkWR1rj6pV+SF loOdpLycAsT0ErnvI8vFdJ65kqd1pfNLaYPtXlktPW/hftlKXP7ZG974/CUFYWRKk7Dp fPzdvp8hzePQR+s3GErgPfpYgZi2KKfQUGZET1g4nbDJE9p51z/Y7sExnuoTPdr9WENg FUmM6dwdsf+p3JgBZ+xn4cJUlB33OdIIecuQGPzJTIjF04qzSpuw1fH+YPlaeo7gCJBj YXjw== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id g2-20020a170902d5c200b0015eb84f826bsi10497937plh.297.2022.05.09.03.48.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 03:48:21 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D783223674E; Mon, 9 May 2022 03:09:26 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238002AbiEIJeJ (ORCPT + 99 others); Mon, 9 May 2022 05:34:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237940AbiEIJVY (ORCPT ); Mon, 9 May 2022 05:21:24 -0400 Received: from mail.codelabs.ch (mail.codelabs.ch [IPv6:2a02:168:860f:1::35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 453A71A0AFE for ; Mon, 9 May 2022 02:17:29 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.codelabs.ch (Postfix) with ESMTP id C21CD220002; Mon, 9 May 2022 11:07:15 +0200 (CEST) Received: from mail.codelabs.ch ([127.0.0.1]) by localhost (fenrir.codelabs.ch [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PsW_K7vs4HTv; Mon, 9 May 2022 11:07:14 +0200 (CEST) Received: from zaphod.codelabs.ch (unknown [192.168.10.129]) by mail.codelabs.ch (Postfix) with ESMTPSA id 115D2220001; Mon, 9 May 2022 11:07:14 +0200 (CEST) From: Adrian-Ken Rueegsegger To: dave.hansen@linux.intel.com, osalvador@suse.de Cc: david@redhat.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, x86@kernel.org, linux-kernel@vger.kernel.org, Adrian-Ken Rueegsegger Subject: Unhandled page fault in vmemmap_populate on x86_64 Date: Mon, 9 May 2022 11:06:36 +0200 Message-Id: <20220509090637.24152-1-ken@codelabs.ch> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, While running Linux 5.15.32/x86_64 (with some out-of-tree patches) on top of Muen [1], I came across a BUG/page fault triggered in vmemmap_populate: [ 0.000000] BUG: unable to handle page fault for address: ffffea0001e00000 [ 0.000000] #PF: supervisor write access in kernel mode [ 0.000000] #PF: error_code(0x0002) - not-present page [ 0.000000] PGD 1003a067 P4D 1003a067 PUD 10039067 PMD 0 [ 0.000000] Oops: 0002 [#1] SMP PTI [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.32-muen #1 [ 0.000000] RIP: 0010:vmemmap_populate+0x181/0x218 [ 0.000000] Code: 00 a9 ff ff 1f 00 0f 84 a1 00 00 00 e8 91 f7 ff ff b9 0e 00 00 00 31 c0 48 89 ef f3 ab 48 85 f6 74 0a b0 fd 48 89 ef 48 89 f1 aa 4d 85 c0 74 7c 48 89 1d 2e e2 05 00 eb 73 48 83 3c 24 00 0f [ 0.000000] RSP: 0000:ffffffff82003e00 EFLAGS: 00010006 [ 0.000000] RAX: 00000000000000fd RBX: ffffea0001e00000 RCX: 0000000000180000 [ 0.000000] RDX: ffffea0000540000 RSI: 00000000001c0000 RDI: ffffea0001e00000 [ 0.000000] RBP: ffffea0001dc0000 R08: 0000000000000000 R09: 0000000088000000 [ 0.000000] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 0.000000] R13: ffffea0001f80000 R14: ffffea0001dc0000 R15: ffff888010039070 [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff823ea000(0000) knlGS:0000000000000000 [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.000000] CR2: ffffea0001e00000 CR3: 000000000200a000 CR4: 00000000000406b0 [ 0.000000] DR0: 000000000000003a DR1: 0000000000000003 DR2: 0000000000000000 [ 0.000000] DR3: ffffea0001e00000 DR6: 000000000200a000 DR7: ffffffff82003d58 [ 0.000000] Call Trace: [ 0.000000] [ 0.000000] ? __populate_section_memmap+0x3a/0x47 [ 0.000000] ? sparse_init_nid+0xc9/0x174 [ 0.000000] ? sparse_init+0x1c1/0x1d2 [ 0.000000] ? paging_init+0x5/0xa [ 0.000000] ? setup_arch+0x740/0x810 [ 0.000000] ? start_kernel+0x43/0x5bb [ 0.000000] ? secondary_startup_64_no_verify+0xb0/0xbb [ 0.000000] [ 0.000000] Modules linked in: [ 0.000000] CR2: ffffea0001e00000 [ 0.000000] random: get_random_bytes called from init_oops_id+0x1d/0x2c with crng_init=0 [ 0.000000] ---[ end trace 44fe402cfef775de ]--- Announcing an available RAM region at 0x88000000 to Linux (via e820) triggered the issue while placing it at 0x70000000 did not hit the bug. Since the problem had not been observed with 5.10, I did a bisect which pointed to commit 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") as the culprit. Further debugging showed that the #PF originates from vmemmap_use_new_sub_pmd in arch/x86/mm/init_64.c. In the error case the condition !IS_ALIGNED(start, PMD_SIZE) evaluates to true and the page-fault is caused by the memset marking the preceding region as unused: if (!IS_ALIGNED(start, PMD_SIZE)) memset((void *)start, PAGE_UNUSED, start - ALIGN_DOWN(start, PMD_SIZE)); If I am not mistaken, the start variable is the wrong address to use here, since it points to the beginning of the range that is to be *used*. Instead the "start" of the PMD should be used, i.e. ALIGN_DOWN(start, PMD_SIZE). Looking at arch/s390/mm/vmem.c, vmemmap_use_new_sub_pmd seems to confirm this. Is the above analysis correct or did I misread the code? The attached patch fixes the observed issue for me. Regards, Adrian PS: When replying please include my address as to/cc since I am not subscribed to LKML, thanks! [1] - https://muen.sk