Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp83253ybz; Thu, 30 Apr 2020 16:57:19 -0700 (PDT) X-Google-Smtp-Source: APiQypLa9v8hEKp8Y86c3fnX2MAsDGndY7+l81lX6fV2hcTokXqWjIeToNYzdoys7oi6t1Skyf6C X-Received: by 2002:a17:906:ecb8:: with SMTP id qh24mr873243ejb.299.1588291039193; Thu, 30 Apr 2020 16:57:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588291039; cv=none; d=google.com; s=arc-20160816; b=Up6/tCdukgBqia2WBx3zW3gIJDiOL3AFATcUSpEUhDpGC88SYi946ul1Qnec/NXuK5 4iieREeLM1oBNRommrYs+JSA6Ljzzze64Vigz3UKvRDILlsxO0mi7zlboSpTgv6+FD22 sSQCpatenSa8JbrWOd5YN6HyzKeiEWRS1Y0XwuRQoN5RJDl2lCysW6l7PI6Bt1c23p1T AyqIxYRzfOe9LfJDaWpViYVEqJHOxbrrN33+k8wSpTi6V/sRivEtqjKa1wsPdOI+VKwW AWySLManhJOGvDXcw22NlDEY4uUijo9svt6cWspIRF4faQG6rX9OBsGyIdOAvUFBrnZG XSoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=7MF1/Pddp4dhGKH9x4iKD8rjSjk+ZGQHnl4uXbC/+TA=; b=uK+4G+EWrxKLewYh7X2h9H9LclUlij/ZlsfCuvpo4DCLF3hkPgUE9vRRZZ5Lh0uOTM GV8zVZA1JGbeYY1ZsxQNKpkfq4C1wG8IhU2mUzkGmt3xsxfcWnWVRcjgThwMVtHrO2Wo Bh1Tg/TF0krCq8/4W+o8+cPv0Llcs8L8a64M7jmjyn503iuwnLnsNTChpoAKtyIfNA/u DEKkOI1ugnMLX+Rd8Lg6ZzxjmAfFPXUnoWSg6Ai+hjARcyk91/ENyzfCtMv6SBnij7Rq UpyT4DcjWGACAi50bAbdlDMNTdTqold2K8rTHvQOl92MEdCgik7YEBtCRHUkJWdzuv7G yTTA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=dPpcYEEk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q12si592069edc.303.2020.04.30.16.56.38; Thu, 30 Apr 2020 16:57:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=dPpcYEEk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727062AbgD3Xwg (ORCPT + 99 others); Thu, 30 Apr 2020 19:52:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726495AbgD3Xwg (ORCPT ); Thu, 30 Apr 2020 19:52:36 -0400 Received: from mail-ej1-x641.google.com (mail-ej1-x641.google.com [IPv6:2a00:1450:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BB6DC035495 for ; Thu, 30 Apr 2020 16:52:35 -0700 (PDT) Received: by mail-ej1-x641.google.com with SMTP id k8so6220191ejv.3 for ; Thu, 30 Apr 2020 16:52:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7MF1/Pddp4dhGKH9x4iKD8rjSjk+ZGQHnl4uXbC/+TA=; b=dPpcYEEkvq8ZWM/jSsBIAi3UKaYNI1WVWBVTLYMB2LGGdKZ7j4RiDwdbaThqfngPt2 dY1VoMJ9o5v7csTLFC8UtnvbuZxmV7ZbXj9f8OdGd7h9jWnxiWetKuCoTCAtbTvwZhbV pn1KZl9QshPTNiecbjqpjGR8CzuoM9vPC7Qq74tfH8Ae0qpGpm6jdonNhrImgbrFxXah ZqfKIn1872Ne7dc9wTwgp7qk6Nf2NfnUMT6Lg05YH2iKhVhygSE6vJka0ZvpXBGQiUcW VmBU7vvo9gweFcrn6hy6RNAZXeCdtc1fJNXBQSThyT4VY2FiLezyerj384lGD2jBMVSM 8TFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7MF1/Pddp4dhGKH9x4iKD8rjSjk+ZGQHnl4uXbC/+TA=; b=cOmQ/sQRX12/adtS7LSVUOAQsLTlxkawQCl3y7JVfaB37qn2UKTOi3UVpjoFNVQdBu DEm7U0OTjac/3AI6e20eOoJ5WCkhnIwRsvmFkq+zmwUFQjpM4/JyJko1zhxS2wTnnzDG pbAa69KB8OdWV8ISllrReQ6F68PFNGOPkXlxR3FRJonUM8/cpBVGqv+T3iY6iUy/4sBE JL3mgf0B8wNzMDbTsiSyR51ndh5m59AXKFGCQvWRHs1N3tTKPxZY0ZbTngyUSMdOxcao hfdpu5pNoJ6j9jmryBnJguQ5G5umcIdO43y0RTkoIB5+rf/1zkALGOzTn9hTOOEf9qqh SJFA== X-Gm-Message-State: AGi0PuYx8zobCBUPBhgL8fJjVeQY8rnxKqWoVLEC+dSTIbkXqHXfExCt RRPaanWvr6ZtEdXX+UYTEEmbDSKYdX/eaPkLTXpzwg== X-Received: by 2002:a17:906:90cc:: with SMTP id v12mr892682ejw.211.1588290754110; Thu, 30 Apr 2020 16:52:34 -0700 (PDT) MIME-Version: 1.0 References: <158823509800.2094061.9683997333958344535.stgit@dwillia2-desk3.amr.corp.intel.com> <20200430192258.GA24749@agluck-desk2.amr.corp.intel.com> In-Reply-To: From: Dan Williams Date: Thu, 30 Apr 2020 16:52:22 -0700 Message-ID: Subject: Re: [PATCH v2 0/2] Replace and improve "mcsafe" with copy_safe() To: Linus Torvalds Cc: "Luck, Tony" , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Borislav Petkov , stable , "the arch/x86 maintainers" , "H. Peter Anvin" , Paul Mackerras , Benjamin Herrenschmidt , Erwin Tsaur , Michael Ellerman , Arnaldo Carvalho de Melo , linux-nvdimm , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 30, 2020 at 12:56 PM Linus Torvalds wrote: > > On Thu, Apr 30, 2020 at 12:23 PM Luck, Tony wrote: > > > > How about > > > > try_copy_catch(void *dst, void *src, size_t count, int *fault) > > > > returns number of bytes not-copied (like copy_to_user etc). > > > > if return is not zero, "fault" tells you what type of fault > > cause the early stop (#PF, #MC). > > That just makes things worse. > > The problem isn't "what fault did I get?". > > The problem is "what is the point of this function?". > > In other words, I want the code to explain _why_ it uses a particular function. > > So the question the function should answer is not "Why did it take a > fault?", but "Why isn't this just a 'memcpy()'"? > > When somebody writes > > x = copy_to_user(a,b,c); > > the "why is it not a memcpy" question never comes up, because the code > is obvious. It's not a memory copy, because it's copying to user > space, and user space is different! > > In contrast, if you write > > x = copy_safe(a,b,c); > > then what is going on? There is no rhyme or reason to the call. Is the > source unsafe? Wny? Is the destination faulting? Why? Both? How? > > So "copy_to_user()" _answers_ a question. But "copy_safe()" just > results in more questions. See the difference? > > And notice that the "why did it fault" question is NOT about your > "what kind of fault did it take" question. That question is generally > pretty much uninteresting. > > The question I want answered is "why is this function being called AT ALL". > > Again, "copy_to_user()" can fail, and we know to check failure cases. > But more importantly, the _reason_ it can fail is obvious from the > name and from the use. There's no confusion about "why". > > "copy_safe()"? or "try_copy_catch()"? No such thing. It doesn't answer > that fundamental "why" question. > > And yes, this also has practical consequences. If you know that the > failure is due to the source being some special memory (and if we care > about the MC case with unaligned accesses), then the function in > question should probably strive to make those _source_ accesses be the > aligned ones. But if it's the destination that is special memory, > then it's the writes to the destination that should be aligned. If you > need both, you may need to be either mutually aligned, or do byte > accesses, or do special shifting copies. So it matters for any initial > alignment code (if the thing has alignment issues to begin with). > > I haven't even gotten an answer to the "why would the write fail". > When I asked, Dan said > > "writes can mmu-fault now that memcpy_mcsafe() is also used by > _copy_to_iter_mcsafe()" > > but as mentioned, the only reason I can find for _that_ is that the > destination is user space. > > In which case a "copy_safe()" absolutely could never work. > > If I can't figure out the "why is this function used" or even figure > out why it needs the semantics it claims it needs, then there's a > problem. > > Personally, I suspect that the *easiest* way to support the broken > nvdimm semantics is to not have a "copy" function at all as the basic > accessor model. You had me until here. Up to this point I was grokking that Andy's "_fallible" suggestion does help explain better than "_safe", because the copy is doing extra safety checks. copy_to_user() and copy_to_user_fallible() mean *something* where copy_to_user_safe() does not. However you lose me on this "broken nvdimm semantics" contention. There is nothing nvdimm-hardware specific about the copy_safe() implementation, zero, nada, nothing new to the error model that DRAM did not also inflict on the Linux implementation. This is about Linux treating memory as a disk and performing bulk transfers with the CPU instead of a DMA engine. Whereas existing memory error handling has a high chance for it to trigger on userspace accesses, large copies in kernel mode now have a higher chance of tripping over the same errors in kernel space. Since there is an error model overlaid on top of the block-I/O, software is well prepared to handle the short transfer case. > Why? Exactly because "copy" is not a fundamental well-defined action. > You get nasty combinatorial explosions of different things, where you > have three different kinds of sources (user, RAM, nvdimm) and three > different kinds of destinations. True, copy is not a fundamentally well defined operation. It's unfortunate that the low-level of a direct-I/O read(2) eventually turns into a call to memcpy() inside a typical ramdisk driver. So you're right, this isn't a copy routine as much as it's the backend of read(2)/write(2) with short transfer semantics, but by the time it gets to pmem driver it's been disconnected from its original intent and all it sees is "move bytes from here to there if you can". > And that's ignoring the whole "maybe you don't want to do a plain > copy, maybe you want to calculate a RAID checksum, or do a 'strcpy()' > or whatever". If those are ever issues, you get another explosion of > combinations. > > The only *fundamental* access would likely be a single read/write > operation, not a copy operation. Think "get_user()" instead of > "copy_from_user()". Even there you get combinatorial explosions with > access sizes, but you can often generate those automatically or with > simple patterns, and then you can build up the copy functions from > that if you really need to. The CPU overhead of synchronous bulk block-I/O transfers with the CPU is already painful, a get_user() loop adds to that without a benefit that I can currently perceive. The vast majority of driver actions and DAX operations are in PAGE_SIZE units.