Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754685AbdIFCr7 (ORCPT ); Tue, 5 Sep 2017 22:47:59 -0400 Received: from mga07.intel.com ([134.134.136.100]:40402 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753735AbdIFCry (ORCPT ); Tue, 5 Sep 2017 22:47:54 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,482,1498546800"; d="scan'208";a="1169468939" Message-ID: <1504665789.4482.31.camel@intel.com> Subject: Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually From: Sai Praneeth Prakhya To: Bhupesh Sharma Cc: "linux-efi@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Matt Fleming , Ard Biesheuvel , "jlee@suse.com" , Borislav Petkov , "Luck, Tony" , "luto@kernel.org" , "mst@redhat.com" , "Neri, Ricardo" , "Shankar, Ravi V" Date: Tue, 05 Sep 2017 19:43:09 -0700 In-Reply-To: <1504664466.4482.25.camel@intel.com> References: <1503963432-32055-1-git-send-email-sai.praneeth.prakhya@intel.com> <1504664466.4482.25.camel@intel.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.11-0ubuntu3 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6009 Lines: 132 On Tue, 2017-09-05 at 19:21 -0700, Sai Praneeth Prakhya wrote: > > I get a similar crash on Qemu with linus's master branch and the V2 > > applied on top of it. Here are the details of my test environment: > > > > 1. I use the OVMF (EDK2) EFI firmware to launch the kernel: > > edk2.git/ovmf-x64 > > > > 2. I used linus's master branch (HEAD - commit: > > b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top > > of the same. > > > > 3. I use the following qemu command line to launch the test: > > > > # /usr/local/bin/qemu-system-x86_64 --version > > QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea) > > Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers > > > > # /usr/local/bin/qemu-system-x86_64 -enable-kvm -net nic -net tap -m > > $MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2 > > -vga std -boot c -cpu host -kernel $KERNEL -append > > "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81" -initrd > > $INITRAMFS -bios $OVMF_FW_PATH > > > > And here is the crash log: > > > > [ 0.006054] general protection fault: 0000 [#1] SMP > > [ 0.006459] Modules linked in: > > [ 0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3 > > [ 0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > BIOS 0.0.0 02/06/2015 > > [ 0.007000] task: ffffffff81e0f480 task.stack: ffffffff81e00000 > > [ 0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440 > > [ 0.007000] RSP: 0000:ffffffff81e03d80 EFLAGS: 00010086 > > [ 0.007000] RAX: 800000007d084000 RBX: 0000000000000000 RCX: 000077ff80000000 > > [ 0.007000] RDX: 000000007d084000 RSI: 8000000000000000 RDI: 0000000000019a00 > > [ 0.007000] RBP: ffffffff81e03dc0 R08: 0000000000000000 R09: ffff88007d085000 > > [ 0.007000] R10: ffffffff81e03dd8 R11: 000000007d095063 R12: ffffffff81e5c6a0 > > [ 0.007000] R13: ffffffff81ed4f40 R14: 0000000000000030 R15: 0000000000000001 > > [ 0.007000] FS: 0000000000000000(0000) GS:ffff88007d400000(0000) > > knlGS:0000000000000000 > > [ 0.007000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 0.007000] CR2: ffff88007d754000 CR3: 000000000220a000 CR4: 00000000000406b0 > > [ 0.007000] Call Trace: > > [ 0.007000] switch_mm+0xd/0x20 > > [ 0.007000] ? switch_mm+0xd/0x20 > > [ 0.007000] efi_switch_mm+0x3e/0x4a > > [ 0.007000] efi_call_phys_prolog+0x28/0x1ac > > [ 0.007000] efi_enter_virtual_mode+0x35a/0x48f > > [ 0.007000] start_kernel+0x332/0x3b8 > > [ 0.007000] x86_64_start_reservations+0x2a/0x2c > > [ 0.007000] x86_64_start_kernel+0x178/0x18b > > [ 0.007000] secondary_startup_64+0xa5/0xa5 > > [ 0.007000] ? secondary_startup_64+0xa5/0xa5 > > [ 0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9 > > 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09 > > f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb > > 7e 89 > > [ 0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: ffffffff81e03d80 > > [ 0.007000] ---[ end trace bfa55bf4e4765255 ]--- > > [ 0.007000] Kernel panic - not syncing: Attempted to kill the idle task! > > [ 0.007000] ---[ end Kernel panic - not syncing: Attempted to kill > > the idle task! > > > > 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf > > firmware and 64-bit x86 kernel) with your patches, the primary kernel > > boots fine on Qemu: > > > > ovmf firmware used in this case - edk2.git/ovmf-ia32 > > > > 5. Also, if I append 'efi=old_map' to the bootargs (for the failing > > case in point 3 above), I see the primary kernel boots fine on Qemu as > > well. > > > > Regards, > > Bhupesh > > Hi Bhupesh, > > Thanks a lot for the detailed explanation. They are helpful to reproduce > the issue quickly. From my initial debug, I think that AMD SME + > efi_mm_struct patches + -cpu host (in qemu) are required to reproduce > the issue on qemu. > > I have tried the following combinations (all tests are on qemu): > On Linus's tree: > 1. With SME and efi_mm and -cpu host -> panics > 2. With SME and efi_mm and !-cpu host -> boots > 3. With SME and !efi_mm and -cpu host -> boots > 4. With SME and !efi_mm and !-cpu host -> boots > 5. With !SME and efi_mm and -cpu host -> boots > 6. With !SME and efi_mm and !-cpu host -> boots > 7. With !SME and !efi_mm and -cpu host -> boots > 8. With !SME and !efi_mm and !-cpu host -> boots > > On Matt's tree (no SME): > 1. With efi_mm and -cpu host -> boots > 2. With efi_mm and !-cpu host -> boots > 3. With !efi_mm and -cpu host -> boots > 4. With !efi_mm and !-cpu host -> boots > > Summary: > On Matt's tree (next branch), I am unable to reproduce the issue because > they don't have SME patches. > > On Linus's tree, with SME patches > (b1b6f83ac938d176742c85757960dec2cf10e468) and my patches and -cpu host > switch enabled in qemu, I was able to reproduce the issue. > > Could you please confirm if you are seeing the same behavior? > Specially on real machines (I think, this is equivalent to -cpu host on > qemu) because in earlier mails you have mentioned that you were able to > reproduce this on Matt's tree, but according to my theory it shouldn't > be the case because Matt's three doesn't have SME patches. > Did you back port (b1b6f83ac938d176742c85757960dec2cf10e468) this commit > to Matt's tree and then applied my patches? > > Your confirmation will help us in debugging the right issue. > > Regards, > Sai Sorry! I am not sure if it's the SME patches or the PCID based TLB flush patches (most likely the later because they change switch_mm() code). Both the patches along with 5-level paging were in the same pull request sent from Ingo to Linus. So, SME patches above really means this commit id (b1b6f83ac938d176742c85757960dec2cf10e468) in Linus's tree. I will debug this issue further and will send a V3 but to be sure that I am debugging the right issue, Bhupesh, Could you please update me as requested in earlier mail? Regards, Sai