Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp2054982pxb; Thu, 28 Oct 2021 15:18:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxQFI38FMUPnqfJG3qixqpL5pFHIDqDUnIwlSBMp7tpyQ1gM2IdRliyOhEUtwswev2IIUTe X-Received: by 2002:a63:b402:: with SMTP id s2mr5240316pgf.324.1635459488304; Thu, 28 Oct 2021 15:18:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635459488; cv=none; d=google.com; s=arc-20160816; b=q8sKffGWqA1nd5VQOYU3gPnAW2amrDBDPCPn9bLGAIsKe4d4DzHfhuBVYPtx/WOvCn /FzltMFSkNaCMs+LcMIytbzvgMFTepNVSbo0TzCL+QHJ49qUDhoNfrLPSkads7sOBFxk v90SbU8xBm+n91XHU3dZYvUXJ3ORad7OsocumQwE26YzPzE62aIBhEInO4qEX+gMlSeA iup2aIpTYgK9eMBBK21fp+GEdzMM/J5z/rG2Z/JqJRm2qq1E9mbeQFVpECx//QWs3XBh LubxbCU1AnYla9UZht6OAiAQm1lwPpiDF9qKGOZkpeuYlmXMjzENQNvxMA763NLbzDgU 6cEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=rg70L5/px9IrBTtKlC6NDcYo4kfwnEG9jTpOWhVqPcc=; b=Dh40eEEmApb+Tf6Fv9LinN4SEahZLyIuD+j6LOjwkdQsqBpBbZju3wZR83NoaR4t/D i4YIFRnSwDHyCHCjc4EQVQ6UjWhVtmT0iiyWeiQE4w/DNgAixytYon6zFFkTvYx2Y5Ih fMR61zE7IHjDNS8CKFt2j3C4Um3hbNnETyHH8Y8TIPuJx4MxRkmjUoFDW3aA9XTCn1Vd qd8L0yW16NkmT5/xaMsWc+m6w+3S6/QoVRIp6qheAl2eeIcEvSByFiKS6m2gfhaTdax1 V/9hhmLpoWiEvCitFZ1fotAE6is8tWJPBZ2x70MiGKlWYfXrkRk3bo35gLvLCb8lWfbY 5S5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=eLMlkR3C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f24si5592581pgh.549.2021.10.28.15.17.48; Thu, 28 Oct 2021 15:18:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=eLMlkR3C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231445AbhJ1WSf (ORCPT + 99 others); Thu, 28 Oct 2021 18:18:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231124AbhJ1WSf (ORCPT ); Thu, 28 Oct 2021 18:18:35 -0400 Received: from mail-io1-xd34.google.com (mail-io1-xd34.google.com [IPv6:2607:f8b0:4864:20::d34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F06BC061570; Thu, 28 Oct 2021 15:16:07 -0700 (PDT) Received: by mail-io1-xd34.google.com with SMTP id n67so10111875iod.9; Thu, 28 Oct 2021 15:16:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=rg70L5/px9IrBTtKlC6NDcYo4kfwnEG9jTpOWhVqPcc=; b=eLMlkR3CrL/8PKVzyYVcuncnEdxnsfxn3riWgN8oKOQlGwnX+UChUbaymKEw6taCnn T8Axfv4krveNN1JwPJ8iFTOlfd102NDBwnWrreHYu4O3lJvrvpNe2/UNEdxrzQ9SIEC2 QLoTJw49KEGspkcIbfmZwhQPQnQHmI3RwBeznARr6NdRUey/+TThUWIxW9Sln8YZy4R8 11VLzJvcLrKyWixt2PgLtCOpiAI3GIufYgiUdKrBvJwerHJEVjKfl6H+d48ohmeM6LH4 qI+q4pHyEmmlq/Z2OQHtWTRygZ3gz8XwduyZ/uS8/ZOuJq5JPvoIFBAGbgx8he4F1Sb8 u0iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rg70L5/px9IrBTtKlC6NDcYo4kfwnEG9jTpOWhVqPcc=; b=L/DRI1oulBjkW6Rq0CXcppn/FcluXqjQE4oIn+JLyA9t33m+qKb4OViuIQ33dYX60R 001EoHXE37qWdx8gTzPi/MvP+0UN/jAjz+UphTB6PZu3DbBt97J7/d40RZsSwzJgNY9Y GlpuQqpzwLP1og4EplBOkxjh+GGeA8a4mHRYSjO25pI0T8fnxwcquDvChSlXldFec5Pj aXNAtzx5fmVOUO2Xdgh5oHpuTy2tlACb16SX/iqb7ULhgL5EJF1XngfyP0dIWs6/kPWa PMH60Ova2gDyBmJBWgGGUgLIqd0lij4W3MPqKDzSOmjHjD3ktpaD9GfSy10g9NwErib2 0PWQ== X-Gm-Message-State: AOAM530S+Uu4ldN9l126Q2om5ocMNGKP6+14NOqQf4mbs13MQ1WCrDvO fYNvgKUlK8dRx3MS1sKF12BqYVFquYkUq8daKII= X-Received: by 2002:a05:6602:736:: with SMTP id g22mr5165002iox.139.1635459366944; Thu, 28 Oct 2021 15:16:06 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Andreas_Gr=C3=BCnbacher?= Date: Fri, 29 Oct 2021 00:15:55 +0200 Message-ID: Subject: Re: [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks To: Catalin Marinas Cc: Linus Torvalds , Andreas Gruenbacher , Paul Mackerras , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" , Jan Kara , Matthew Wilcox , cluster-devel , linux-fsdevel , Linux Kernel Mailing List , ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org, linux-btrfs Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am Do., 28. Okt. 2021 um 23:21 Uhr schrieb Catalin Marinas : > One last try on this path before I switch to the other options. > > On Wed, Oct 27, 2021 at 02:14:48PM -0700, Linus Torvalds wrote: > > On Wed, Oct 27, 2021 at 12:13 PM Catalin Marinas > > wrote: > > > As an alternative, you mentioned earlier that a per-thread fault status > > > was not feasible on x86 due to races. Was this only for the hw poison > > > case? I think the uaccess is slightly different. > > > > It's not x86-specific, it's very generic. > > > > If we set some flag in the per-thread status, we'll need to be careful > > about not overwriting it if we then have a subsequent NMI that _also_ > > takes a (completely unrelated) page fault - before we then read the > > per-thread flag. > > > > Think 'perf' and fetching backtraces etc. > > > > Note that the NMI page fault can easily also be a pointer coloring > > fault on arm64, for exactly the same reason that whatever original > > copy_from_user() code was. So this is not a "oh, pointer coloring > > faults are different". They have the same re-entrancy issue. > > > > And both the "pagefault_disable" and "fault happens in interrupt > > context" cases are also the exact same 'faulthandler_disabled()' > > thing. So even at fault time they look very similar. > > They do look fairly similar but we should have the information in the > fault handler to distinguish: not a page fault (pte permission or p*d > translation), in_task(), user address, fixup handler. But I agree the > logic looks fragile. > > I think for nested contexts we can save the uaccess fault state on > exception entry, restore it on return. Or (needs some thinking on > atomicity) save it in a local variable. The high-level API would look > something like: > > unsigned long uaccess_flags; /* we could use TIF_ flags */ > > uaccess_flags = begin_retriable_uaccess(); > copied = copy_page_from_iter_atomic(...); > retry = end_retriable_uaccess(uaccess_flags); > ... > > if (!retry) > break; > > I think we'd need a TIF flag to mark the retriable region and another to > track whether a non-recoverable fault occurred. It needs prototyping. > > Anyway, if you don't like this approach, I'll look at error codes being > returned but rather than changing all copy_from_user() etc., introduce a > new API that returns different error codes depending on the fault > (e.g -EFAULT vs -EACCES). We already have copy_from_user_nofault(), we'd > need something for the iov_iter stuff to use in the fs code. We won't need any of that on the filesystem read and write paths. The two cases there are buffered and direct I/O: * In the buffered I/O case, the copying happens with page faults disabled, at a byte granularity. If that returns a short result, we need to enable page faults, check if the exact address that failed still fails (in which case we have a sub-page fault), fault in the pages, disable page faults again, and repeat. No probing for sub-page faults beyond the first byte of the fault-in address is needed. Functions fault_in_{readable,writeable} implicitly have this behavior; for fault_in_safe_writeable() the choice we have is to either add probing of the first byte for sub-page faults to this function or force callers to do that probing separately. At this point, I'd vote for the former. * In the direct I/O case, the copying happens while we're holding page references, so the only page faults that can occur during copying are sub-page faults. When iomap_dio_rw or its legacy counterpart is called with page faults disabled, we need to make sure that the caller can distinguish between page faults triggered during bio_iov_iter_get_pages() and during the copying, but that's a separate problem. (At the moment, when iomap_dio_rw fails with -EFAULT, the caller *cannot* distinguish between a bio_iov_iter_get_pages failure and a failure during synchronous copying, but that could be fixed by returning unique error codes from iomap_dio_rw.) So as far as I can see, the only problematic case we're left with is copying bigger than byte-size chunks with page faults disabled when we don't know whether the underlying pages are resident or not. My guess would be that in this case, if the copying fails, it would be perfectly acceptable to explicitly probe the entire chunk for sub-page faults. Thanks, Andreas