Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp3254851ybz; Sun, 19 Apr 2020 22:09:58 -0700 (PDT) X-Google-Smtp-Source: APiQypKBD8VGNyPE+FJGq5zN2VhLBJ/jJOftkx+Ry1vGHOmVWvpXuK1rxkKoJBBJyQl/j4asJd8/ X-Received: by 2002:a05:6402:22cc:: with SMTP id dm12mr12968181edb.159.1587359397862; Sun, 19 Apr 2020 22:09:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587359397; cv=none; d=google.com; s=arc-20160816; b=wd42IR2eD9G5vIwx3y2C0+COO204OEvJgGUCEOVALmL0OUDLuc1jfHciFpCy7mAfzm xFG3GYUqkZj0Z+y4gSUZYPByRx0K9HMXoTYWVUmRjv4ny1jKdBkOz8oy/tKIo3r0PPvQ 5/eB8fWriZQrmJnVnjJxr6Y68a0W/92QFX/NCuD6TKLEklFViO5YrRsau5zgMV9LU8KI DaSAtbGh1VjSvDxmU8+074f19uokP78KB9aTQZdQaiKRY8EG61smHS7c//QvkJRjGUBC jLSZ7Kp293OzsBWy8V2MYfzvMy/n00bhs3ptopjsioOnv2xb/SWiXPpmmInPVhFxWPFv juHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=uosnnvhWZ3yyd1hjeyEdalFBW58ozyttD3SAmZ9epkk=; b=JUWq2STM9uz2tISyRyoleUeSm9F3OUN2G4m9WOD9ewtgevSyRuM2sBQi6LcH74/EEk z19Re1KrqNFZDnTn0uyFicf3p3JfkIzGKuJvNbe15lo+9APrlXhp1xUSF+5wNY46KoDJ VHkgBYb1tnJLcM7wWMeBBW3YGe38DLtavHl0Fj/T5gv+KjbSoUZ5y+MR4zCZ4HYRRO/S CXe32fiUBztPkB2wCV164vfPXrpF80z1WXHvfQHq1m0coiMnZ4whDCEwNcCLV0PFe7n1 FFReTarmhDruiNsoKV6kRDt307GHmZWkdvadJTFucFyXbS04KgDVkcO/KpQnVJkoMpVL tgpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=eoqWpiOw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p4si3680586eds.242.2020.04.19.22.09.34; Sun, 19 Apr 2020 22:09:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=eoqWpiOw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726079AbgDTFIh (ORCPT + 99 others); Mon, 20 Apr 2020 01:08:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725930AbgDTFIg (ORCPT ); Mon, 20 Apr 2020 01:08:36 -0400 Received: from mail-ed1-x544.google.com (mail-ed1-x544.google.com [IPv6:2a00:1450:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F0F3C061A0F for ; Sun, 19 Apr 2020 22:08:36 -0700 (PDT) Received: by mail-ed1-x544.google.com with SMTP id p16so1747613edm.10 for ; Sun, 19 Apr 2020 22:08:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=uosnnvhWZ3yyd1hjeyEdalFBW58ozyttD3SAmZ9epkk=; b=eoqWpiOwRq7oi4XyC2As+GUHLC5NEcZO1OOE4PRCK1UPnSV64nsIERwHkSaXSb1dXW 8ciCXTUnSaOgWRKxTBfIk/x746ztoISrwImvbu7zUF//ygyqTnbEwNtwad8pWAE0tlLF 804WRCHLh/Po0dTXTLal3QEkVr8WjIdRa7gF/K9a+6RXlUwuMEDBQgKPvyO9l6kBNiFQ sROA4R4733K1cB+fGzkmc2I49Ou2AL4qMfdApkWJwUKs1HpyfYya8lM6KJGumgV4MRl4 rKnBTsXYivznrdqhvAu6F57Kt1Wj4IbjIQ1/BkGhB7vdWkBLJOEtt0vSTZFUHrXOIUX6 OZug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=uosnnvhWZ3yyd1hjeyEdalFBW58ozyttD3SAmZ9epkk=; b=ViIWyDDPNJOKGC9S9P78ts8wbvcwgO96SnXL9OSoozqk08oLNO90u+1lgxElwyXYlq BLrHSbszU4ZfKKX3wERo/ES4yYomBuAD906pZyUU2Vn6CFobm6Om22HOCUe1D1j5v0Gj LyBR1iLX5Xcdw6cr9NspBtm0A0YZop8QAABcEtUNIS8Ti46bbFDiMwsVUamgKLKOrCAP q2CTydChgEnHQE65EOKVJclj+c4vky3tR9JCJ5SLNroPqCApuVGKmhosNxTmxnzDvhxI 5AJmvgzmiFpX89nOEcOrVhDS2qYhHRZRMYMeHxz2QpidTMikaZltpIjsSiUSZbdNnmMt vmGQ== X-Gm-Message-State: AGi0PuaZkyZLJxxQpD93bHmQJFXVGI4L2figJxmzGX1nn7LMa+fUa3Ms ErmdJHPnM0HyCq+PybBnb4y0kTqYmVrKOrP4sZH2Zg== X-Received: by 2002:a05:6402:2203:: with SMTP id cq3mr921641edb.154.1587359314565; Sun, 19 Apr 2020 22:08:34 -0700 (PDT) MIME-Version: 1.0 References: <67FF611B-D10E-4BAF-92EE-684C83C9107E@amacapital.net> In-Reply-To: From: Dan Williams Date: Sun, 19 Apr 2020 22:08:23 -0700 Message-ID: Subject: Re: [PATCH] x86/memcpy: Introduce memcpy_mcsafe_fast To: Linus Torvalds Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , X86 ML , stable , Borislav Petkov , "H. Peter Anvin" , Peter Zijlstra , Tony Luck , Erwin Tsaur , Linux Kernel Mailing List , linux-nvdimm Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Apr 18, 2020 at 1:52 PM Linus Torvalds wrote: > > On Sat, Apr 18, 2020 at 1:30 PM Andy Lutomirski wro= te: > > > > Maybe I=E2=80=99m missing something obvious, but what=E2=80=99s the alt= ernative? The _mcsafe variants don=E2=80=99t just avoid the REP mess =E2= =80=94 they also tell the kernel that this particular access is recoverable= via extable. > > .. which they could easily do exactly the same way the user space > accessors do, just with a much simplified model that doesn't even care > about multiple sizes, since unaligned accesses weren't valid anyway. > > The thing is, all of the MCS code has been nasty. There's no reason > for it what-so-ever that I can tell. The hardware has been so > incredibly broken that it's basically unusable, and most of the > software around it seems to have been about testing. > > So I absolutely abhor that thing. Everything about that code has > screamed "yeah, we completely mis-designed the hardware, we're pushing > the problems into software, and nobody even uses it or can test it so > there's like 5 people who care". > > And I'm pushing back on it, because I think that the least the code > can do is to at least be simple. > > For example, none of those optimizations should exist. That function > shouldn't have been inline to begin with. And if it really really > matters from a performance angle that it was inline (which I doubt), > it shouldn't have looked like a memory copy, it should have looked > like "get_user()" (except without all the complications of actually > having to test addresses or worry about different sizes). > > > And it almost certainly shouldn't have been done in low-level asm > either. It could have been a single "read aligned word" interface > using an inline asm, and then everything else could have been done as > C code around it. Do we have examples of doing exception handling from C? I thought all the exception handling copy routines were assembly routines? > > But no. The software side is almost as messy as the hardware side is. > I hate it. And since nobody sane can test it, and the broken hardware > is _so_ broken than nobody should ever use it, I have continually > pushed back against this kind of ugly nasty special code. > > We know the writes can't fault, since they are buffered. So they > aren't special at all. The writes can mmu-fault now that memcpy_mcsafe() is also used by _copy_to_iter_mcsafe(). This allows a clean bypass of the block layer in fs/dax.c in addition to the pmem driver access of poisoned memory. Now that the fallback is a sane rep; movs; it can be considered for plain copy_to_iter() for other user copies so you get exception handling on kernel access of poison outside of persistent memory. To Andy's point I think a recoverable copy (for exceptions or faults) is generally useful. > We know the acceptable reads for the broken hardware basically boil > down to a single simple word-size aligned read, so you need _one_ > special inline asm for that. The rest of the cases can be handled by > masking and shifting if you really really need to - and done better > that way than with byte accesses anyway. > > Then you have _one_ C file that implements everything using that > single operation (ok, if people absolutely want to do sizes, I guess > they can if they can just hide it in that one file), and you have one > header file that exposes the interfaces to it, and you're done. > > And you strive hard as hell to not impact anything else, because you > know that the hardware is unacceptable until all those special rules > go away. Which they apparently finally have. I understand the gripes about the mcsafe_slow() implementation, but how do I implement mcsafe_fast() any better than how it is currently organized given that, setting aside machine check handling, memcpy_mcsafe() is the core of a copy_to_iter*() front-end that can mmu-fault on either source or destination access?