Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp4290055imm; Tue, 11 Sep 2018 09:37:51 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYMuNjKZ84bf8wFY8pAFtiA+0NNLBPicbElpMA/tRgeeFLVLlRHaPeUaZEq35zSvdfteOKU X-Received: by 2002:a17:902:8d8d:: with SMTP id v13-v6mr28777819plo.9.1536683871540; Tue, 11 Sep 2018 09:37:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536683871; cv=none; d=google.com; s=arc-20160816; b=TZfSekRAz5RYZ1CXjpZIwKLntl7dlmkum/zDEpTl0TRYQ72bW3F1ZfV3BKyzPSRv+W WWVuhb/SpFfpZYrgzD+KyJ4WxCve0DOIUUWtoLyE8oIydPk3C/FynSXGjUcR7DSRO1dB 0MWH/yhAsumqeu6hLchC2FEwJugfaF+c7qCUq/1hWyhh/+K+vYE7vqCdva2XD4vDQgYc UcBHqwXDvR6XP8UDEidpCOFgzQ0a240AOABdJPsceqXfiP+ZEO37TNcSxPV/XwQk6c3/ 5fag+z/sjnmYNCjDI2/C1mFb/162lIa1R3QfJhY64mlnsk25vwM5nDXmMerUC3ag5a6y LrFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=lKs1UX94L9BAxidNacCZTFJ5DpwcqcE/dRUS6HJqKAU=; b=Fs09GZ1Ltjn4Q6UKDzk+yCO9ivtKjdWCzl3yH6OW42LXLMqhydulXDf2V97iuNvg0e h92Df/PycxtpnH3D2lFtGz49OEk9lEOVE8lnfG+SupQjOZfgSavNLnM7ftuTrrgsXU0w /BmWjD1vnoJA6dDA5/1q9O8ZT48f46+kQ1Sg+OLCd9A1QOnSYnUpJOL1XSN/cKzQAaVz lHAX1+1u3pe7eeDkwvTzKvtkfNRRZR3MlzvSAGMnniCSu11CrXemIcI7MLAS+O68uhFe agVbV2mYlNgcPZ2tunUfL8Euvot9HWx5dMNpPCnPCIlmN/M3OdQn0XVGgcMsJ8KTuFq9 wkWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=LeVRefCR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 71-v6si20481005pla.92.2018.09.11.09.37.36; Tue, 11 Sep 2018 09:37:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=LeVRefCR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727875AbeIKVhA (ORCPT + 99 others); Tue, 11 Sep 2018 17:37:00 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:38639 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726622AbeIKVhA (ORCPT ); Tue, 11 Sep 2018 17:37:00 -0400 Received: by mail-pg1-f196.google.com with SMTP id t84-v6so10691903pgb.5 for ; Tue, 11 Sep 2018 09:36:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=lKs1UX94L9BAxidNacCZTFJ5DpwcqcE/dRUS6HJqKAU=; b=LeVRefCRvOIUwlfRHM/vAe66vzpNk0IrrMRqyNUhMScZMDxQCtZyNUuwJpqDceJbcn +52HWw03VJUGgyMdq2xGuMDmJ+0A3/dlgUnTXGrO3MG7bjgdsaXjBkXUqjl7Gc1dZuFO /5fUErRESp1IPOoJdAlFGY/yX8SKM7fgr2OUcMzHSA3ZRw4q2IVb5nelWXgM91ccxkEQ WWbxysCffo3AgrJ+I6Javm32t+cHkE19Atkm1DZxBCSR5+9+fFh2Ui527fux/TUSCs3/ QVdQnM+y9X4amnF6hhdkuFA+v2AjBEtDopVD+CFdmgglmVxJyehvkx4zGFdtJOaQ7gAq 62Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=lKs1UX94L9BAxidNacCZTFJ5DpwcqcE/dRUS6HJqKAU=; b=EDr6LVcezGdi35D/b6QSz4ktIifDUrxGwRcRnYf+RxwKsr6xyJsofb1LHqQU4NGII0 oY7GCbG2Ne8QpB10z2Ahf1L1fu6Qcwyegsu2C8ioGDMFXTvouJV5LgRRIaRfGC8HwtOQ VOsNLhXcWinX5YGIQt+PuaOWVT5n9wXyJOZ5lm3hiLx18kuP1kfRcuk4Z8PcmwAwK88R bk7Mlh7melC3Ty8PEM1sC43IHBxprmXSd3TwyNpg/4iZHXtKnHiBoxJOZArPHlDC24DB hXgeWczqvyKnblYO71GLynTs7PuPudOBDOSojilSEZPRVU6DSAkn4Ol2jgntXZmyUm+v 3rxA== X-Gm-Message-State: APzg51DMeqfoCjMDgmu108kCWNX0A3N4feYj2I6H0bCSGQx+E9KrwJNo q3MC2P2mWFofUlmmJnOshfv2PBVN6K8= X-Received: by 2002:a63:5f01:: with SMTP id t1-v6mr29732499pgb.149.1536683814106; Tue, 11 Sep 2018 09:36:54 -0700 (PDT) Received: from ?IPv6:2601:646:c200:7429:6c19:77f6:df55:1bb5? ([2601:646:c200:7429:6c19:77f6:df55:1bb5]) by smtp.gmail.com with ESMTPSA id c1-v6sm24516580pfi.142.2018.09.11.09.36.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 09:36:52 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: Random crashes with i386 and efi boots From: Andy Lutomirski X-Mailer: iPhone Mail (15G77) In-Reply-To: <877118e5-beee-4551-28d3-79e7aa52f74e@roeck-us.net> Date: Tue, 11 Sep 2018 09:36:51 -0700 Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel , Joerg Roedel , Thomas Gleixner , Michal Hocko , Andi Kleen , Linus Torvalds , Dave Hansen , Pavel Machek , linux-efi@vger.kernel.org, x86@kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: <90A7FF2E-F186-49CF-A028-CDE317BE13E1@amacapital.net> References: <20180910215659.GA17966@roeck-us.net> <877118e5-beee-4551-28d3-79e7aa52f74e@roeck-us.net> To: Guenter Roeck Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Sep 11, 2018, at 6:30 AM, Guenter Roeck wrote: >=20 > On 09/11/2018 04:52 AM, Andy Lutomirski wrote: >>> On Sep 10, 2018, at 2:56 PM, Guenter Roeck wrote: >>>=20 >>> Hi folks, >>>=20 >>> even after commit eeb89e2bb1ac ("x86/efi: Load fixmap GDT in >>> efi_call_phys_epilog()"), my i386/efi qemu boot tests still crash random= ly >>> (roughly 5-10% of the time). As before, I don't see much useful output i= n >>> the qemu log (this time it doesn't even complain about a triple fault). >>>=20 >>> Debugging shows that the crash happens in efi_call_phys_epilog(). >>> A sample log from a crashed test run is attached below. It appears that >>> the crash happens if there is an interrupt at a critical section of the >>> code. >>>=20 >>> While playing with the code, I found a possible fix. >>>=20 >>> diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_= 32.c >>> index 05ca14222463..9959657127f4 100644 >>> --- a/arch/x86/platform/efi/efi_32.c >>> +++ b/arch/x86/platform/efi/efi_32.c >>> @@ -85,10 +85,9 @@ pgd_t * __init efi_call_phys_prolog(void) >>>=20 >>> void __init efi_call_phys_epilog(pgd_t *save_pgd) >>> { >>> + load_fixmap_gdt(0); >>> load_cr3(save_pgd); >>> __flush_tlb_all(); >>> - >>> - load_fixmap_gdt(0); >>> } >> We have IRQs on here? It seems plausible that we=E2=80=99re in a window w= here the EFI pgd doesn=E2=80=99t have cpu_entry_area mapped. Also, the hard c= oded CPU 0 is suspicious. > The hard coded CPU 0 was always there. The call is ultimately from > efi_enter_virtual_mode(), which is called from start_kernel(). > so presumably it is guaranteed to run on CPU 0. >=20 >> Maybe try instrumenting the code to check whether the clone_pgd_range cal= ls in setup_percpu.c have happened yet? > The crash is seen late in the boot process, so I am quite sure it happened= , > but I can add a check if needed. I think that might be a different problem= , > though. >=20 >> Your patch may well be correct, but, if we have IRQs on, we should really= have cpu_entry_area mapped in both pgds. >> Or we could turn off IRQs. Why on Earth are IRQs on in a context where th= e fixmap gdt is unusable? >=20 > =46rom arch/x86/platform/efi/efi.c:phys_efi_set_virtual_address_map(): >=20 > save_pgd =3D efi_call_phys_prolog(); > local_irq_save(flags); > status =3D efi_call_phys(...); > local_irq_restore(flags); >=20 > efi_call_phys_epilog(save_pgd); >=20 > So, yes, interrupts are very much enabled. Does fixing that solve the problem? It seems more robust. >=20 > I ran several additional test sequences. With above patch, no failures wit= h > 500 boots. Without it, failure rate (long term average) across 500 boots > is around 10%. Another data point: Moving load_fixmap_gdt(0); after > load_cr3(save_pgd); does not help; it has to come first. >=20 > Guenter