Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp891326pxb; Wed, 27 Oct 2021 14:34:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx/SY2sZOWaFuyGTOrjQlgw4nx3+EKhAEhpua0liLq7Y4vKQgQbEU/b2oUFPHSzu1RNSIeo X-Received: by 2002:a05:6402:40d2:: with SMTP id z18mr532024edb.362.1635370470231; Wed, 27 Oct 2021 14:34:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635370470; cv=none; d=google.com; s=arc-20160816; b=aImd4noeFu5VExH6BRm0dHryUtz0U2nFS146RZJnDwETkj5Sc1y94X7QaPNR7spDZP hsl2CRvzYoc84kxmuloS08/u7oI69R95xiFRyA/cAR8Pru1w9ucpdw/EN/MI5eajvvRJ e8JyqLhJ8l5/bm6RwEqH6Vn6PSHeg7I3MKaOKKBPSi9k4I1b4lgDDiIP0Sb8gEZR8TBo bTW6wS4d2on2G9ZSf5EBXxUtUz2220R9vigrHbtqcW1GFD5/N7k80GOmztfVaK3TpaoC k+duVcFYNoVDPDrNJ44Xm4NAAAlM5+M3aENSUGhIfHUrMxyq3AEbDe8FbDSUiGHe65ph pOOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=ZJa2Ot71/rd1nVhZP6Bc5AS7Xz+cbyrd0Nhul3dEnp4=; b=Q8f235pLj+ZNs01WcHpMX2R8HOJz43fODnM8rfwkGKMdqGI17G5cr1hOjr73XLEhN5 ikGiPiPZhzIBYEfuECsgs9jYJ7SjgQo3SnfR3xgkrjQJv+GGvjpV+aTw40BxSiKSMpLK hidPliiBQhMZdzgBY0i0iMQ25jER6Py4fsCDRyj9nXk+fO7uf+gpufpHi3m1cPNu+uLe EBCT0jf9CfKZOw26WxEPFYkp1f7oSK4X3kBV1UPmYy7eGxLukuSeRhQc7rRzHfBx9qr1 CG3yYTikCxOj2DTaNHCjYXUh3etSFotawnepev/bVJ2y1UKdhRqWCil3oPf5aacO5qHC lhtw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g21si1229497edj.79.2021.10.27.14.34.00; Wed, 27 Oct 2021 14:34:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240608AbhJ0TPd (ORCPT + 97 others); Wed, 27 Oct 2021 15:15:33 -0400 Received: from mail.kernel.org ([198.145.29.99]:40882 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240552AbhJ0TPc (ORCPT ); Wed, 27 Oct 2021 15:15:32 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8EAE960EB4; Wed, 27 Oct 2021 19:13:04 +0000 (UTC) Date: Wed, 27 Oct 2021 20:13:01 +0100 From: Catalin Marinas To: Linus Torvalds Cc: Andreas Gruenbacher , Paul Mackerras , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" , Jan Kara , Matthew Wilcox , cluster-devel , linux-fsdevel , Linux Kernel Mailing List , ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org, linux-btrfs Subject: Re: [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 26, 2021 at 11:50:04AM -0700, Linus Torvalds wrote: > On Tue, Oct 26, 2021 at 11:24 AM Catalin Marinas > wrote: > > While more intrusive, I'd rather change copy_page_from_iter_atomic() > > etc. to take a pointer where to write back an error code. [...] > That said, the fact that these sub-page faults are always > non-recoverable might be a hint to a solution to the problem: maybe we > could extend the existing return code with actual negative error > numbers. > > Because for _most_ cases of "copy_to/from_user()" and friends by far, > the only thing we look for is "zero for success". > > We could extend the "number of bytes _not_ copied" semantics to say > "negative means fatal", and because there are fairly few places that > actually look at non-zero values, we could have a coccinelle script > that actually marks those places. As you already replied, there are some odd places where the returned uncopied of bytes is used. Also for some valid cases like copy_mount_options(), it's likely that it will fall back to byte-at-a-time with MTE since it's a good chance it would hit a fault in a 4K page (not a fast path though). I'd have to go through all the cases and check whether the return value is meaningful. The iter_iov.c functions and their callers also seem to make use of the bytes copied in case they need to call iov_iter_revert() (though I suppose the iov_iter_iovec_advance() would skip the update in case of an error). As an alternative, you mentioned earlier that a per-thread fault status was not feasible on x86 due to races. Was this only for the hw poison case? I think the uaccess is slightly different. We can add a current->non_recoverable_uaccess variable cleared on pagefault_disable(), only set by uaccess faults and checked by the fs code before re-attempting the fault_in(). An interrupt shouldn't do a uaccess (well, if it does a _nofault one, we can detect in_interrupt() in the MTE exception handler). Last time I looked at io_uring it was running in a separate kernel thread, not sure whether this was changed. I don't see what else would be racing with such current->non_recoverable_uaccess variable. If that's doable, I think it's the least intrusive approach. -- Catalin