Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp4067287imm; Tue, 11 Sep 2018 06:31:37 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZXnVrQOL7/I6D80EOq4GgKnlx+vqtDVkFMQyq69broTjdzl8pGrSh7yPe0JlWHPWX7we74 X-Received: by 2002:a63:1a1a:: with SMTP id a26-v6mr28339731pga.449.1536672697569; Tue, 11 Sep 2018 06:31:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536672697; cv=none; d=google.com; s=arc-20160816; b=AEnQN/HtvMe+nxecqCJIn9e063WWYFHMrJpGx2g52LJ6FVQWcx2KqybvXhQFW8Hm8f DyIkj7sRIpHihwB3B0iYf2iE208O3GLi6lFVfGKWZkWVDGO9chMhUfPZsBl1L2P0BnLB t58EfpxAj1VXCa7kCA64F/jp1Bb02MSsAIeup0V1nBEkQ79NF6b8GkePwFHj8I/4l6e3 eKWuVRgU+UUNkDefqr5tHBTtrEqItI4hH/ZzBGPLs5pMMxDetBrmMc+riPMJnodgJwCA fmM1/ATNHoe3fn/GkGOO4096jI4CDlIiPAggFH0GV6K0txiyNQZtytWTuDf6mBtb7e2c UhHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=QRHs+ZJSUy5CcrnYhq3XCEbVC9lfJMxEwaZoG8egajw=; b=eoBsrJn3J7Q+BW4JXkXo+dVb49tuC3+f0br0/HC8Iv5y9zz4NMvnrNnVf7VhCRY7BN misDwpsWme5JEPZqFxcARK1rzK9WWegp2GsNySSxhY4Cc5Kn8md/ywdCLY3ucNNd6Y5i 2TmCrSmuGk7ZxLtDzXWWt4QPCNl8nFRvl0aV0gqpzuh416wItrSDnjFyrXji7+WiOxy1 MULSQpKqQpyleVLK5vYwO50oUGI3tMfTQGQSg/ZOh1n6GKPA/00sHZzTC8KD9ubZAboO RSdS9lJq5yrGHFOvwa8y5zeo8tdW7Lw5bbwniGiIXFjhVjh8n4hib3iTUuVmuY2lHW5I rfMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=WFZFFckp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b1-v6si20679813pli.54.2018.09.11.06.31.16; Tue, 11 Sep 2018 06:31:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=WFZFFckp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727748AbeIKS3r (ORCPT + 99 others); Tue, 11 Sep 2018 14:29:47 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:39180 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726761AbeIKS3q (ORCPT ); Tue, 11 Sep 2018 14:29:46 -0400 Received: by mail-pg1-f193.google.com with SMTP id i190-v6so12278615pgc.6; Tue, 11 Sep 2018 06:30:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=QRHs+ZJSUy5CcrnYhq3XCEbVC9lfJMxEwaZoG8egajw=; b=WFZFFckp+ciczyfZeHBcEuzWbQCnWdIfXSIogwNH9vamjnGUeBDbDZcI8r3LlauOMF EZQfQ4JiHv0ToRxlzswHrQ+jAj5LYlXt6nMiigVca/WIuo7wSctq4I6XAJRIVgj4BDDc ecQugXR0971fpLfdqm688wW2n+9jYsPpvYSif/sOSP0v9GxxygWhs67x1LWQPZBCsN2H J7SBV4Dt2fGNwZyj81+G+kzJOTKsDaGndvPqRy45+XKSMgExTEmm+Kxu/7eCWFynniUc EgrvPehLfYhTnsodYt+imAq5ffxTGWiwIK4AmzEYhxbgn6G+3YkvruCx/7fPz/V4E8U/ Hk6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=QRHs+ZJSUy5CcrnYhq3XCEbVC9lfJMxEwaZoG8egajw=; b=ereYkpITTO3buOKLCARpZXbAn0leZkvA9NYNYfRQa+iP5b/ilc3ah3KgCWP5eu3Hb5 /OBAeUow+Q9NA20TywMie0IMqClcpDOkDWI4sKjD5w1zX8AONjyb41I0yRN1gDo3A4C5 s+aT0g1XI9+jSaSYtSSIUMoFDxcWdf01NMKNB0fcqRJ0UhpYlG414TvfIrt9tlL2hLw0 cgPErxx3hxmYDksfb0urzKtqw6O5a6h3x6nqnQQmCv+2fCzwwO7TYUTMbzqsOYUT2VF2 fjn6ANF9dOUu6aYMRFSIp7BZRhVb5IX8o6a57OWWrbN+c8OqMzxiocygvzxh2SFXWkAz fohg== X-Gm-Message-State: APzg51C8a7JA6u9nAnYd+BXfaVg0fM5mgtSJ1BlxetQ+uam5J0Qf+uQV 2K+q/6rvUt9uYC9gLapx3Bc= X-Received: by 2002:a62:f208:: with SMTP id m8-v6mr29433753pfh.222.1536672625281; Tue, 11 Sep 2018 06:30:25 -0700 (PDT) Received: from server.roeck-us.net (108-223-40-66.lightspeed.sntcca.sbcglobal.net. [108.223.40.66]) by smtp.gmail.com with ESMTPSA id m25-v6sm24618748pgn.1.2018.09.11.06.30.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 06:30:24 -0700 (PDT) Subject: Re: Random crashes with i386 and efi boots To: Andy Lutomirski Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , Joerg Roedel , Thomas Gleixner , Michal Hocko , Andi Kleen , Linus Torvalds , Dave Hansen , Pavel Machek , linux-efi@vger.kernel.org, x86@kernel.org References: <20180910215659.GA17966@roeck-us.net> From: Guenter Roeck Message-ID: <877118e5-beee-4551-28d3-79e7aa52f74e@roeck-us.net> Date: Tue, 11 Sep 2018 06:30:22 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/11/2018 04:52 AM, Andy Lutomirski wrote: > > >> On Sep 10, 2018, at 2:56 PM, Guenter Roeck wrote: >> >> Hi folks, >> >> even after commit eeb89e2bb1ac ("x86/efi: Load fixmap GDT in >> efi_call_phys_epilog()"), my i386/efi qemu boot tests still crash randomly >> (roughly 5-10% of the time). As before, I don't see much useful output in >> the qemu log (this time it doesn't even complain about a triple fault). >> >> Debugging shows that the crash happens in efi_call_phys_epilog(). >> A sample log from a crashed test run is attached below. It appears that >> the crash happens if there is an interrupt at a critical section of the >> code. >> >> While playing with the code, I found a possible fix. >> >> diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c >> index 05ca14222463..9959657127f4 100644 >> --- a/arch/x86/platform/efi/efi_32.c >> +++ b/arch/x86/platform/efi/efi_32.c >> @@ -85,10 +85,9 @@ pgd_t * __init efi_call_phys_prolog(void) >> >> void __init efi_call_phys_epilog(pgd_t *save_pgd) >> { >> + load_fixmap_gdt(0); >> load_cr3(save_pgd); >> __flush_tlb_all(); >> - >> - load_fixmap_gdt(0); >> } > > We have IRQs on here? It seems plausible that we’re in a window where the EFI pgd doesn’t have cpu_entry_area mapped. Also, the hard coded CPU 0 is suspicious. > The hard coded CPU 0 was always there. The call is ultimately from efi_enter_virtual_mode(), which is called from start_kernel(). so presumably it is guaranteed to run on CPU 0. > Maybe try instrumenting the code to check whether the clone_pgd_range calls in setup_percpu.c have happened yet? > The crash is seen late in the boot process, so I am quite sure it happened, but I can add a check if needed. I think that might be a different problem, though. > Your patch may well be correct, but, if we have IRQs on, we should really have cpu_entry_area mapped in both pgds. > > Or we could turn off IRQs. Why on Earth are IRQs on in a context where the fixmap gdt is unusable? > From arch/x86/platform/efi/efi.c:phys_efi_set_virtual_address_map(): save_pgd = efi_call_phys_prolog(); local_irq_save(flags); status = efi_call_phys(...); local_irq_restore(flags); efi_call_phys_epilog(save_pgd); So, yes, interrupts are very much enabled. I ran several additional test sequences. With above patch, no failures with 500 boots. Without it, failure rate (long term average) across 500 boots is around 10%. Another data point: Moving load_fixmap_gdt(0); after load_cr3(save_pgd); does not help; it has to come first. Guenter