Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp51745pxf; Wed, 24 Mar 2021 20:42:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxB/gg7pMaRpe9bcgtCUc6gRj1dsSr8+ROvbT/7S3lXDkVYuTs8+zo2po6s78BoIh6RV1Nu X-Received: by 2002:a17:906:b316:: with SMTP id n22mr6890858ejz.249.1616643776458; Wed, 24 Mar 2021 20:42:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616643776; cv=none; d=google.com; s=arc-20160816; b=oWxsRa6+FS+5IO0XdGK5pRjVFG/2QQ2Iej9BIefe+DKxnK9G+Pq+UgUHbFBZcoct4n PkxxewyCv2BwAlU+1+hq921UMV67uIG33ovtT9WD7ThKgCu2HfNNmn+fiJzpnPpqIyYQ t5rLpnrPD3GPAF25gljoRbIDcuJuYOzkrIFby6LyWIoZ05jmJP5Puib/3EG+Qc5x0/7D BsVRaQCtwRy1kfQ6ZVYB/yQ2JolxSXuoHTBlhovKohtqGuIAoHW/BOctbLf7KGf6jTAY YR6J5C3ZW3nesDFfH3W4G2nx7udE6jrkTOFJXvOf82NzrTRZHenZCpt7VJ1/R3rJt1w/ CZHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:message-id :in-reply-to:subject:cc:to:from:date:dkim-signature; bh=KPgNseh8yYhheSsYQuED4Rz4kqyVes0PKuxx8Zc5Pzk=; b=jRAFLLkl/60NOLIEZ0FNpjnK9nOPqtJW3yKmXM55mt4PpnhUuzobQ4aACyNVR4uQlg mJqC6nG9MRdxvYijgAWCFmb84Vmo7MQw4nAhWBF2OdOPlJ8zyjjG7HVPsT0kYrdr8vsH p1cxsup4LRDXGmrgsEbrPWVb7yahAKbAgHFgVSC+rEnu006wdAMrT0udMNFoKNJqkMJF QzKaseRc+K+z7WEru1JMFec7RFZ3RTIMA77oToEvrG9zsa6Mq11dR4WfIvzSTtz1R6OH iVrQ+0ZATTwPv7xKJyRM10SgSlzK+q2jsDnfxep3aUeBmnO8j8eMnCkgyLPc9ybP2kon SulQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=BdIRT1S5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d3si3082604eds.83.2021.03.24.20.42.34; Wed, 24 Mar 2021 20:42:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=BdIRT1S5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235098AbhCYCoT (ORCPT + 99 others); Wed, 24 Mar 2021 22:44:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233055AbhCYCns (ORCPT ); Wed, 24 Mar 2021 22:43:48 -0400 Received: from mail-qv1-xf32.google.com (mail-qv1-xf32.google.com [IPv6:2607:f8b0:4864:20::f32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0912C06175F for ; Wed, 24 Mar 2021 19:43:46 -0700 (PDT) Received: by mail-qv1-xf32.google.com with SMTP id by2so449274qvb.11 for ; Wed, 24 Mar 2021 19:43:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=KPgNseh8yYhheSsYQuED4Rz4kqyVes0PKuxx8Zc5Pzk=; b=BdIRT1S5lkykj3PvB/gG74m7dcanhvDfxsY84m9i/OkWLHWNw6KkyuuSD1aE/8W0nn hnAXpKxLL2vzXDs6ek8vgALNt0Xj2b+Zh52/nTXHCKdl+jUPO8X67v40EPD2WSyallrs BKUvrT+pexeE2vRqu+JPiUZT1Aha1NBgQqQDsezN8OVmrplKEzfFId0XzsOsXy7cu84d Fifq1EhNAZyhoCgUH6qCNUmfXXJYhxuPpYBHwd7lJ6f0teYtEcmCOp//h18PWoSXGpu+ eYlTvIPFjjWXxFXBBEdfZ5z2zx2mXgAR7FSgGEP0M2/Eu2rXAEe6AfxFlMOe9NFZFCus 3s4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=KPgNseh8yYhheSsYQuED4Rz4kqyVes0PKuxx8Zc5Pzk=; b=VCTAnjVqOayHw0Z1VpmP66yVqBKThCXo+h8r4DLHtxEiM3acRyG36DmcUYTI1/ZTda KqlSmq5kRoRUyFJE68SPCac2A2OGsF+5bBKlFS9MdUePGp8neTcDQxXflacGsJtc+UZK NGfhZW8nMdGH8dihxqbieTjS/YQhpBENWWQ9tEJVjjVC4XDPHT6SrUVhARSzsRJ1V12L VmVwEpaUHn+Ox9uM+X4l9mepiU4GcuM8wmyEuuR9+CYGWYjMbURZJXtb7lqyEzdNSylJ PW2w4PMCuyEG+T171+OOGaOvwWH396RpsPfkOhk2M1AFUfHrlcvcIIf9988OH+6mKbRk Ougw== X-Gm-Message-State: AOAM531x3ncEcPkZ080GPpM4vDmPol8CVr/TPZrByxbBGkIPj6/RrIUh 6C8ujRO1hu9K5GBOsu16bZ2sbg== X-Received: by 2002:ad4:584d:: with SMTP id de13mr6421434qvb.17.1616640225605; Wed, 24 Mar 2021 19:43:45 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id b1sm3243761qkk.117.2021.03.24.19.43.43 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Wed, 24 Mar 2021 19:43:45 -0700 (PDT) Date: Wed, 24 Mar 2021 19:43:29 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Borislav Petkov cc: Hugh Dickins , Babu Moger , Paolo Bonzini , Jim Mattson , Vitaly Kuznetsov , Wanpeng Li , kvm list , Joerg Roedel , the arch/x86 maintainers , LKML , Ingo Molnar , "H . Peter Anvin" , Thomas Gleixner , Makarand Sonare , Sean Christopherson Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support In-Reply-To: Message-ID: References: <78cc2dc7-a2ee-35ac-dd47-8f3f8b62f261@redhat.com> <20210311200755.GE5829@zn.tnic> <20210311203206.GF5829@zn.tnic> <2ca37e61-08db-3e47-f2b9-8a7de60757e6@amd.com> <20210311214013.GH5829@zn.tnic> <4a72f780-3797-229e-a938-6dc5b14bec8d@amd.com> <20210311235215.GI5829@zn.tnic> <20210324212139.GN5010@zn.tnic> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 24 Mar 2021, Hugh Dickins wrote: > On Wed, 24 Mar 2021, Borislav Petkov wrote: > > > Ok, > > > > some more experimenting Babu and I did lead us to: > > > > --- > > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h > > index f5ca15622dc9..259aa4889cad 100644 > > --- a/arch/x86/include/asm/tlbflush.h > > +++ b/arch/x86/include/asm/tlbflush.h > > @@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr) > > */ > > if (kaiser_enabled) > > invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr); > > + else > > + asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); > > + > > invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr); > > } > > > > applied on the guest kernel which fixes the issue. And let me add Hugh > > who did that PCID stuff at the time. So lemme summarize for Hugh and to > > ask him nicely to sanity-check me. :-) > > Just a brief interim note to assure you that I'm paying attention, > but wow, it's a long time since I gave any thought down here! > Trying to page it all back in... > > I see no harm in your workaround if it works, but it's not as if > this is a previously untried path: so I'm suspicious how an issue > here with Globals could have gone unnoticed for so long, and need > to understand it better. Right, after looking into it more, I completely agree with you: the Kaiser series (in both 4.4-stable and 4.9-stable) was simply wrong to lose that invlpg - fine in the kaiser case when we don't enable Globals at all, but plain wrong in the !kaiser_enabled case. One way or another, we have somehow got away with it for three years. I do agree with Paolo that the PCID_ASID_KERN flush would be better moved under the "if (kaiser_enabled)" now. (And if this were ongoing development, I'd want to rewrite the function altogether: but no, these old stable trees are not the place for that.) Boris, may I leave both -stable fixes to you? Let me know if you'd prefer me to clean up my mess. Thanks a lot for tracking this down, Hugh > > > > Basically, you have an AMD host which supports PCID and INVPCID and you > > boot on it a 4.9 guest. It explodes like the panic below. > > > > What fixes it is this: > > > > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h > > index f5ca15622dc9..259aa4889cad 100644 > > --- a/arch/x86/include/asm/tlbflush.h > > +++ b/arch/x86/include/asm/tlbflush.h > > @@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr) > > */ > > if (kaiser_enabled) > > invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr); > > + else > > + asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); > > + > > invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr); > > } > > > > --- > > > > and the reason why it does, IMHO, is because on AMD, kaiser_enabled is > > false because AMD is not affected by Meltdown, which means, there's no > > user/kernel pagetables split. > > > > And that also means, you have global TLB entries which means that if you > > look at that __native_flush_tlb_single() function, it needs to flush > > global TLB entries on CPUs with X86_FEATURE_INVPCID_SINGLE by doing an > > INVLPG in the kaiser_enabled=0 case. Errgo, the above hunk. > > > > But I might be completely off here thus this note... > > > > Thoughts? > > > > Thx. > > > > > > [ 1.235726] ------------[ cut here ]------------ > > [ 1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709! > > [ 1.240926] invalid opcode: 0000 [#1] SMP > > [ 1.243301] Modules linked in: > > [ 1.244585] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1 > > [ 1.247657] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > [ 1.251249] task: ffff909363e94040 task.stack: ffffa41bc0194000 > > [ 1.253519] RIP: 0010:[] [] text_poke+0x18c/0x240 > > [ 1.256593] RSP: 0018:ffffa41bc0197d90 EFLAGS: 00010096 > > [ 1.258657] RAX: 000000000000000f RBX: 0000000001020800 RCX: 00000000feda3203 > > [ 1.261388] RDX: 00000000178bfbff RSI: 0000000000000000 RDI: ffffffffff57a000 > > [ 1.264168] RBP: ffffffff8fbd3eca R08: 0000000000000000 R09: 0000000000000003 > > [ 1.266983] R10: 0000000000000003 R11: 0000000000000112 R12: 0000000000000001 > > [ 1.269702] R13: ffffa41bc0197dcf R14: 0000000000000286 R15: ffffed1c40407500 > > [ 1.272572] FS: 0000000000000000(0000) GS:ffff909366300000(0000) knlGS:0000000000000000 > > [ 1.275791] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 1.278032] CR2: 0000000000000000 CR3: 0000000010c08000 CR4: 00000000003606f0 > > [ 1.280815] Stack: > > [ 1.281630] ffffffff8fbd3eca 0000000000000005 ffffa41bc0197e03 ffffffff8fbd3ecb > > [ 1.284660] 0000000000000000 0000000000000000 ffffffff8fa2e835 ccffffff8fad4326 > > [ 1.287729] 1ccd0231874d55d3 ffffffff8fbd3eca ffffa41bc0197e03 ffffffff90203844 > > [ 1.290852] Call Trace: > > [ 1.291782] [] ? swap_entry_free+0x12a/0x300 > > [ 1.294900] [] ? swap_entry_free+0x12b/0x300 > > [ 1.297267] [] ? text_poke_bp+0x55/0xe0 > > [ 1.299473] [] ? swap_entry_free+0x12a/0x300 > > [ 1.301896] [] ? arch_jump_label_transform+0x9c/0x120 > > [ 1.304557] [] ? set_debug_rodata+0xc/0xc > > [ 1.306790] [] ? __jump_label_update+0x72/0x80 > > [ 1.309255] [] ? static_key_slow_inc+0x8f/0xa0 > > [ 1.311680] [] ? frontswap_register_ops+0x107/0x1d0 > > [ 1.314281] [] ? init_zswap+0x282/0x3f6 > > [ 1.316547] [] ? init_frontswap+0x8c/0x8c > > [ 1.318784] [] ? do_one_initcall+0x4e/0x180 > > [ 1.321067] [] ? set_debug_rodata+0xc/0xc > > [ 1.323366] [] ? kernel_init_freeable+0x16b/0x1ec > > [ 1.325873] [] ? rest_init+0x80/0x80 > > [ 1.327989] [] ? kernel_init+0xa/0x100 > > [ 1.330092] [] ? ret_from_fork+0x44/0x70 > > [ 1.332311] Code: 00 0f a2 4d 85 e4 74 4a 0f b6 45 00 41 38 45 00 75 19 31 c0 83 c0 01 48 63 d0 49 39 d4 76 33 41 0f b6 4c 15 00 38 4c 15 00 74 e9 <0f> 0b 48 89 ef e8 da d6 19 00 48 8d bd 00 10 00 00 48 89 c3 e8 > > [ 1.342818] RIP [] text_poke+0x18c/0x240 > > [ 1.345859] RSP > > [ 1.347285] ---[ end trace 0a1c5ab5eb16de89 ]--- > > [ 1.349169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b > > [ 1.349169] > > [ 1.352885] Kernel Offset: 0xea00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > [ 1.357039] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b > > [ 1.357039] > > > > > > -- > > Regards/Gruss, > > Boris. > > > > https://people.kernel.org/tglx/notes-about-netiquette