Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp1523925ybg; Tue, 2 Jun 2020 12:19:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzi8ibxn8nLN0N4TBMVQCXoH/Oh3l8+ApK461EvPHdQgeqOsgmiVA1kuPEJrhM5S65B3KqI X-Received: by 2002:aa7:c607:: with SMTP id h7mr28517676edq.214.1591125570551; Tue, 02 Jun 2020 12:19:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591125570; cv=none; d=google.com; s=arc-20160816; b=qCTagvTpYQ14PhPtn9dElyZIEc2/FD+DVeW2Sda1LwP+v5DCePMkQXYOklPDqC9QwD Gj+SQoIvLhZh5jhVtvEK/Zz/IbI4Q3S7qAxadizHTM/llyS7kIh1nayGznnq0s2jm2k1 H0EmgbD1UqKvA6h2jqGJc4VXDeN2kW155p78ojlMWW/keRX7iF0WaH3gHlLefsMw6CCJ XuDg0ex6CpUTLlP5TLhVkPRkb2EvWF79uAZ6zN5eCHYClW7PT4hJxR2dFBYp5FB/20/0 6Nn6OZvGCx8ag+yAV29ChDVMGS7v9cRJiqy3Ks0DqF5zsDtJ0RY3tsxUJixt3H/lrtN4 oaJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=J7K6RLoEkQCH+vhugjTF9x/szy4b7JRZyTG6dDIP/ho=; b=XywBLfOO2BHlF8aYi2CbtbNiB+tHXciD1hXv96AGagA0zfzKvjlNeqHqehGE2FlFmm XkpWCYpXYeRjOVPRUAT9KpD/4EHRFZ+Y0UJw5ZPRlAMmRKQ3cO0mdrYcktHCGr8pWcBy esXz7OL1U+SBIG3eyKNNixBL1EeOkiAhonwfn6hNjWSgsDLes38jIiZMXBC83ZMSlnhk 63VGTmo48OPlZmxOlQJcG8NajKHANAWYA2BfY7iJBWoxTiO/p3VAPMWDujyLi8I/PEjd OH5GaXiIxVwlc3tqTMeHWmtRWfaua2dYKfReM+s56kE6h4p9uct5AcPpBpqvgfX9jr2K 5RJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=VxqSqgaE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d23si1816085ejt.618.2020.06.02.12.19.06; Tue, 02 Jun 2020 12:19:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=VxqSqgaE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726977AbgFBTPH (ORCPT + 99 others); Tue, 2 Jun 2020 15:15:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726139AbgFBTPF (ORCPT ); Tue, 2 Jun 2020 15:15:05 -0400 Received: from mail-lj1-x242.google.com (mail-lj1-x242.google.com [IPv6:2a00:1450:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92F6CC08C5C0 for ; Tue, 2 Jun 2020 12:15:04 -0700 (PDT) Received: by mail-lj1-x242.google.com with SMTP id e4so13997843ljn.4 for ; Tue, 02 Jun 2020 12:15:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=J7K6RLoEkQCH+vhugjTF9x/szy4b7JRZyTG6dDIP/ho=; b=VxqSqgaEO6IhaVqpktIQNoWvOOfyxQjP0DlHWFLcbaK56dzdAlhqGE3imh4rSi2fxL kwXUXO3x9TZuVGFLQuNBGy93l1TJ33LJZjZ2R2DYHViAJmHYTe6/8Qhi3+TJEpaM519o u67wDm8vzjkSMBYJU6AozDDhfbA9vs1ULOCEE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=J7K6RLoEkQCH+vhugjTF9x/szy4b7JRZyTG6dDIP/ho=; b=I5ETwrT0yGwtJFjz9Az4L4Gd96ujg3BzOW+AOz38Q4CUjJu3oxMxAJAfsV8fJjVIke Z+mvbZiwG2B296GWs6O3cyi+FWcYmoXorAS0BweIA2WFh6Isy4zJyyUicFPjNRkgghr2 N+sWaLGAVa5Q3VYvMJZfBb5f7Q36mABWuk6jo/qg54F5SC8B7Tf9HKG9ZX55b2Uhzun8 3dvzTNfdRczlEkBfOuZ2FynF89ikA0pncAobSI9TANaVW0+iwUyYkMlg01eeHK0krOit MbQqA+MXIFrH3KPN3VAsOwifAcsRrHJKjBtMrp6+bSrt60UUu2qO8p2xTlL6LsYKCiOB yCZA== X-Gm-Message-State: AOAM530aRpSXgezaecu4VVQWb25L7QVxNNxf7ys8Jt2GoCPS5ZWnY+wD 2kghEnkcgDQcjRAdZW/wriVkYpCZhaI= X-Received: by 2002:a2e:7406:: with SMTP id p6mr301881ljc.198.1591125300863; Tue, 02 Jun 2020 12:15:00 -0700 (PDT) Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com. [209.85.167.41]) by smtp.gmail.com with ESMTPSA id s28sm902112lfs.3.2020.06.02.12.14.59 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 02 Jun 2020 12:15:00 -0700 (PDT) Received: by mail-lf1-f41.google.com with SMTP id u16so6867126lfl.8 for ; Tue, 02 Jun 2020 12:14:59 -0700 (PDT) X-Received: by 2002:a19:d52:: with SMTP id 79mr430715lfn.125.1591125299390; Tue, 02 Jun 2020 12:14:59 -0700 (PDT) MIME-Version: 1.0 References: <20200601170102.GA1346815@gmail.com> <20200602073350.GA481221@gmail.com> <871rmxgw4d.fsf@nanos.tec.linutronix.de> In-Reply-To: <871rmxgw4d.fsf@nanos.tec.linutronix.de> From: Linus Torvalds Date: Tue, 2 Jun 2020 12:14:43 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [GIT PULL] x86/mm changes for v5.8 To: Thomas Gleixner Cc: Benjamin Herrenschmidt , Ingo Molnar , Balbir Singh , Peter Zijlstra , Andrew Morton , Borislav Petkov , Linux Kernel Mailing List , Andrew Lutomirski Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 2, 2020 at 11:29 AM Thomas Gleixner wrote: > > It's trivial enough to fix. We have a static key already which is > telling us whether SMT scheduling is active. .. but should we do it here, in switch_mm() in the first place? Should it perhaps just return an error if user land tries to set the "flush L1$" thing on an SMT system? And no, I don't think we care at all if people then start playing games and enabling/disabling SMT dynamically while applications are working. At that point the admin kets to keep both of the broken pieces. Also, see my other point about how "switch_mm()" really isn't even a protection domain switch to begin with. We're still in the exact same protection domain we used to be in, with the exact same direct access to L1D$. Why would we flush the caches on a random and irrelevant event in kernel space? switch_mm() simply isn't relevant for caches (well, unless you have fully virtual ones, but that's a completely different issue). Wouldn't it be more sensible to treat it more like TIF_NEED_FPU_LOAD - have a per-cpu "I need to flush the cache" variable, and then the only thing a context switch does is to see if the user changed (or whatever) and then set the bit, and set TIF_NOTIFY_RESUME in the thread. Because the L1D$ flush isn't a kernel issue, it's a "don't let user space try to attack it" issue. The kernel can already read it if it wants to. And that's just the "big picture" issues I see. In the big picture, doing this when SMT is enabled is unbelievably stupid. And in the big picture, context switch really isn't a security domain change wrt the L1D$. The more I look at those patches, the more I go "that's just wrong" on some of the "small picture" implementation details. Here's just a few cases that I reacted to Actual low-level flushing code: (1) the SW fallback (a) is only defined on Intel, and initializing the silly cache flush pages on any other vendor will fail. (b) seems to assume that 16 pages (order-4) is sufficient and necessary. Probably "good enough", but it's also an example of "yeah, that's expensive". (c) and if I read the code correctly, trying to flush the L1D$ on non-intel without the HW support, it causes a WARN_ON_ONCE()! WTF? (2) the HW case is done for any vendor, if it reports the "I have the MSR" (3) the VMX support certainly has various sanity checks like "oh, CPU doesn't have X86_BUG_L1TF, then I won't do this even if there was some kernel command line to say I should". But the new prctrl doesn't have anything like that. It just enables that L1D$ thing mindlessly, thinking that user-land software somehow knows what it's doing. BS. (4) what does L1D_FLUSH_POPULATE_TLB mean? That "option" makes zero sense. It pre-populates the TLB before doing the accesses to the L1D$ pages in the SW case, but nothing at all explains why that is needed. It's clearly not needed for the caller, since the TLB population only happens for the SW fallback, not for the HW one. No documentation, no nothing. It's enabled for the VMX case, not for the non-vmx case, which makes me suspect it's some crazy "work around vm monitor page faults, because we know our SW flush fallback is just random garbage". In other words, there are those big "high-level design" questions, but also several oddities in just the implementation details. I really get the feeling that this feature just isn't ready. Ingo - would you mind sending me a pull request for the (independent) TLB cleanups part of that x86/mm tree? Because everything up to and including commit bd1de2a7aace ("x86/tlb/uv: Add a forward declaration for struct flush_tlb_info") looks sane. It's only this L1D$ flushing thing at the end of that branch that I think isn't fully cooked. Linus