Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1592559imm; Wed, 8 Aug 2018 21:33:27 -0700 (PDT) X-Google-Smtp-Source: AA+uWPxi28xHv8M701IQZCcQehBK4nFwHZ939Yz97mlP0a8vniWatzCHK8GkUO32h4llSm/NWtSn X-Received: by 2002:a17:902:28aa:: with SMTP id f39-v6mr544064plb.150.1533789207676; Wed, 08 Aug 2018 21:33:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533789207; cv=none; d=google.com; s=arc-20160816; b=nBFHtUkBRSityjzjeUmL8GptSb/CC8Mq/kUnF2RzF/EuldW9gi5+852Rdj4fGyVvwJ i4U5N9H5e+hDv49z+47bKLX5ddBDjj3wpP1tUXv1y59jJiU0Kw+YXhKZbAOhSZQCeR4z bwoEnCDAP+Hj2iQ8M6/4oP7fYujrsxFCL89rWtl5g8Z+aRd1pb9l8sD4bubcnDaOkun6 tVPExT/Cmh4sZnzlDd8ywOxPobGKCBpKKoOYpVVTw3mE7xWJUmO0VROeTM0IxE9EEXhr V8RwofYbWrKXbO+jwPldf94iV8aSLZtKd20Wj4vssBTKkjCZ7E/0sM8EXpW/xUng9700 r11Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=qObSvqQ1davkIpxyXEjxDGqw1V46RcpPIqsmkNFP2mg=; b=yWb6TGdfoOP1CDZ2DsTJAHYd+q9BH9cwIjmoyNNMrulerW4EO36fZtGX0DeL0jyFoY f+JVUMmOFTN/VXTj0+CQlIg1HXPTpu75VGFFUpe89NwXLLI/yb4qXGyNsa+0vrrDBec5 rxiP8zdvpcssOUc/ggGVTmA6llqj57pX7OroAUTWo1T4DOTEZYqA99LIX2P+r42JetgY 6QKO0NY45nZgeQ2MMdsY+fQJa1HPPXiL7J9oLV8UXOnhyk6qOvmOvRTNmHB9Ev1K2ZyI 74wHp3yP0u66ytABiqwHZc+jG6oCYCTAVYEn+vy868VJiQb+KLAdGQd1CmH8MF4jGq0Z HhWQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j6-v6si5360003pgn.416.2018.08.08.21.33.13; Wed, 08 Aug 2018 21:33:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728541AbeHIGy4 (ORCPT + 99 others); Thu, 9 Aug 2018 02:54:56 -0400 Received: from mga11.intel.com ([192.55.52.93]:22077 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728377AbeHIGyy (ORCPT ); Thu, 9 Aug 2018 02:54:54 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Aug 2018 21:32:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,213,1531810800"; d="scan'208";a="61024063" Received: from sai-dev-mach.sc.intel.com ([143.183.140.52]) by fmsmga007.fm.intel.com with ESMTP; 08 Aug 2018 21:32:03 -0700 From: Sai Praneeth Prakhya To: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ricardo.ner@intel.com, matt@codeblueprint.co.uk, Sai Praneeth , Lee Chun-Yi , Al Stone , Borislav Petkov , Ingo Molnar , Andy Lutomirski , Bhupesh Sharma , Peter Zijlstra , Ard Biesheuvel Subject: [PATCH V1 5/6] x86/mm: If in_atomic(), allocate pages without sleeping Date: Wed, 8 Aug 2018 21:31:16 -0700 Message-Id: <1533789077-16156-6-git-send-email-sai.praneeth.prakhya@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1533789077-16156-1-git-send-email-sai.praneeth.prakhya@intel.com> References: <1533789077-16156-1-git-send-email-sai.praneeth.prakhya@intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sai Praneeth A page fault occurs when any EFI Runtime Service tries to reference a memory region which it shouldn't. If the illegally accessed region is EFI_BOOT_SERVICES_, the efi specific page fault handler fixes it up by dynamically creating VA->PA mappings using efi_map_region(). Originally, efi_map_region() and hence the functionality of creating mappings for efi regions was intended to be used *only* during boot time (please note __init modifier) and hence when called during runtime (i.e. from efi page fault handler), the page allocators complain. Calling efi_map_region() during runtime complains because "gfp_allowed_mask" value changes from boot time to runtime (GFP_BOOT_MASK to __GFP_BITS_MASK). During boot, even though efi_map_region() calls alloc__page with GFP_KERNEL, the page allocator doesn't complain because "__GFP_RECLAIM" flag is cleared by "gfp_allowed_mask", but during runtime it isn't cleared and hence prints below stack trace. BUG: sleeping function called from invalid context at mm/page_alloc.c:4320 in_atomic(): 1, irqs_disabled(): 1, pid: 2022, name: fwts 1 lock held by fwts/2022: irq event stamp: 45714 hardirqs last enabled at (45713): [] restore_regs_and_return_to_kernel+0x0/0x2c hardirqs last disabled at (45714): [] error_entry+0x7c/0x100 softirqs last enabled at (44732): [] __do_softirq+0x387/0x49a softirqs last disabled at (44707): [] irq_exit+0xbb/0xc0 CPU: 0 PID: 2022 Comm: fwts Not tainted 4.17.0-rc4-efitest+ #405 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x5e/0x8b ___might_sleep+0x20c/0x240 __alloc_pages_nodemask+0xc2/0x330 get_zeroed_page+0x12/0x40 alloc_pmd_page+0x13/0x50 populate_pmd+0xc0/0x2e0 ? __lock_acquire+0x439/0x740 __cpa_process_fault+0x2e1/0x5d0 __change_page_attr_set_clr+0x7c3/0xcd0 ? console_unlock+0x34d/0x660 ? kernel_map_pages_in_pgd+0x8c/0x160 kernel_map_pages_in_pgd+0x8c/0x160 ? printk+0x43/0x4b ? __map_region+0x3c/0x60 __map_region+0x3c/0x60 efi_map_region+0x83/0xd0 efi_illegal_accesses_fixup+0x1ca/0x1e0 no_context+0x112/0x390 __do_page_fault+0xc7/0x4f0 page_fault+0x1e/0x30 RIP: 0010:0xfffffffeffc7ccf1 RSP: 0018:ffffc9000075bbf0 EFLAGS: 00010282 RAX: 0000000000000048 RBX: ffffc9000075be10 RCX: ffffc9000075bad0 RDX: 00000000000003f8 RSI: ffffc9000075be10 RDI: fffffffeffc7cccf RBP: ffffc9000075bdc8 R08: 0000000000000048 R09: 0000000000000048 R10: 00000000000003fd R11: 00000000000003f8 R12: ffff880032a92d80 R13: 0000000000000003 R14: 00007ffcf1eb9d50 R15: 0000000000000000 ? efi_call+0xd1/0x160 ? __lock_acquire+0x439/0x740 ? _raw_spin_unlock+0x24/0x30 ? virt_efi_get_next_high_mono_count+0x77/0xf0 ? efi_test_ioctl+0x1ab/0xc20 ? selinux_file_ioctl+0x122/0x1c0 ? do_vfs_ioctl+0x92/0x6b0 ? do_vfs_ioctl+0x92/0x6b0 ? security_file_ioctl+0x3c/0x50 ? selinux_capable+0x20/0x20 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x50/0x170 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix the above warning by conditionally changing the allocation from GFP_KERNEL to GFP_ATOMIC, so that efi page fault handler could use efi_map_region() during runtime. This change shouldn't effect any other generic page allocations because this allocation is used only by efi functions [1]. [1] Comment in __cpa_process_fault() at arch/x86/mm/pageattr.c if (cpa->pgd) { /* * Right now, we only execute this code path when mapping * the EFI virtual memory map regions, no other users * provide a ->pgd value. This may change in the future. */ return populate_pgd(cpa, vaddr); } Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/mm/pageattr.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 3bded76e8d5c..1b28a333c8ce 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -926,7 +926,13 @@ static void unmap_pud_range(p4d_t *p4d, unsigned long start, unsigned long end) static int alloc_pte_page(pmd_t *pmd) { - pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL); + pte_t *pte; + + if (in_atomic()) + pte = (pte_t *)get_zeroed_page(GFP_ATOMIC); + else + pte = (pte_t *)get_zeroed_page(GFP_KERNEL); + if (!pte) return -1; @@ -936,7 +942,13 @@ static int alloc_pte_page(pmd_t *pmd) static int alloc_pmd_page(pud_t *pud) { - pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL); + pmd_t *pmd; + + if (in_atomic()) + pmd = (pmd_t *)get_zeroed_page(GFP_ATOMIC); + else + pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL); + if (!pmd) return -1; -- 2.7.4