Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp159816pxb; Wed, 27 Oct 2021 00:03:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzNrky4G590nLnPUjAYsKLw9RoZKJ75z06DFe4lwf9yYS6Ns+n86Qlvppj5Yhm0bl/fCAqt X-Received: by 2002:a17:907:160a:: with SMTP id hb10mr24789465ejc.293.1635318201128; Wed, 27 Oct 2021 00:03:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635318201; cv=none; d=google.com; s=arc-20160816; b=vTe0YuPF3pu/aRsEe503OAzPVw0KMu+4tMOMgiIvQABpzMMx12SwSzuCW+MOdbX2zu 7F5g4gWGOXej3hUoeUQIs1e+0zOB4ZnzBT8wgk07z7aw8I9tiTJw6A2UtRJNiJXY5S3T dF6exl+yLNvYKIrE7Q+wv6qjhrONgmeUP2tTiktYeBE3VTYympnTsuqjpCu+sIiyKjSG fSAPHF7Rwy0DdobJAQ1ipWB6RQhyEYy2cw30PTD43YTLBuIndKZ/sElmd4rlOtH/6fBp g3C2X96VmiK/aPAC9M6qJFts0PPFYsnmv49oiflUx0LprH2K7JvbnNKatyGl9q8YAfTc i8hw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=7b4Rr+klYGeaRICaj1F+KKAYaqUvRYsy+faVFNgWRyw=; b=i4k3MdA5LJUTqri/9qlE3Xpoy3IeXi22iqNygQ/HOFs1pQxcCVv7hC//1zA0l0YFud P2zFWMY8zT3uxF/ycbGrp0pQT/MYT1bz3GY/X14RxvpGj7HRTKk9q4XJVn1be8pfZR4W 0YGbux+yJWwwRcwZaMmzc8oTmTEM3oYhVqEyUEx8Pl3lnNR+jdRgj6Os8/C3OzJeJBU/ EQRa/sDZ2q7t5CNRpdvGFz7CEDttIOk5ems/HpVdReiBYCMOmm5A4sP5+Wb8X0Y25FMb DF53HknaUkjfUs8dDRj9iHtmtcIdwNNNrpibqqIQ3R7hGEWZPKnF/DmhRng90dlF8w+i 3jww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hr26si16391199ejc.417.2021.10.27.00.02.53; Wed, 27 Oct 2021 00:03:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236795AbhJZS0u (ORCPT + 99 others); Tue, 26 Oct 2021 14:26:50 -0400 Received: from mail.kernel.org ([198.145.29.99]:34294 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231297AbhJZS0t (ORCPT ); Tue, 26 Oct 2021 14:26:49 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 9302360F9D; Tue, 26 Oct 2021 18:24:21 +0000 (UTC) Date: Tue, 26 Oct 2021 19:24:18 +0100 From: Catalin Marinas To: Andreas Gruenbacher Cc: Linus Torvalds , Paul Mackerras , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" , Jan Kara , Matthew Wilcox , cluster-devel , linux-fsdevel , Linux Kernel Mailing List , ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org, linux-btrfs Subject: Re: [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks Message-ID: References: <20211019134204.3382645-1-agruenba@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 25, 2021 at 09:00:43PM +0200, Andreas Gruenbacher wrote: > On Fri, Oct 22, 2021 at 9:23 PM Linus Torvalds > wrote: > > On Fri, Oct 22, 2021 at 8:06 AM Catalin Marinas wrote: > > > Probing only the first byte(s) in fault_in() would be ideal, no need to > > > go through all filesystems and try to change the uaccess/probing order. > > > > Let's try that. Or rather: probing just the first page - since there > > are users like that btrfs ioctl, and the direct-io path. > > For direct I/O, we actually only want to trigger page fault-in so that > we can grab page references with bio_iov_iter_get_pages. Probing for > sub-page error domains will only slow things down. If we hit -EFAULT > during the actual copy-in or copy-out, we know that the error can't be > page fault related. Similarly, in the buffered I/O case, we only > really care about the next byte, so any probing beyond that is > unnecessary. > > So maybe we should split the sub-page error domain probing off from > the fault-in functions. Or at least add an argument to the fault-in > functions that specifies the amount of memory to probe. My preferred option is not to touch fault-in for sub-page faults (though I have some draft patches, they need testing). All this fault-in and uaccess with pagefaults_disabled() is needed to avoid a deadlock when the uaccess fault handling would take the same lock. With sub-page faults, the kernel cannot fix it up anyway, so the arch code won't even attempt call handle_mm_fault() (it is not an mm fault). But the problem is the copy_*_user() etc. API that can only return the number of bytes not copied. That's what I think should be fixed. fault_in() feels like the wrong place to address this when it's not an mm fault. As for fault_in() getting another argument with the amount of sub-page probing to do, I think the API gets even more confusing. I was also thinking, with your patches for fault_in() now returning size_t, is the expectation to be precise in what cannot be copied? We don't have such requirement for copy_*_user(). While more intrusive, I'd rather change copy_page_from_iter_atomic() etc. to take a pointer where to write back an error code. If it's -EFAULT, retry the loop. If it's -EACCES/EPERM just bail out. Or maybe simply a bool set if there was an mm fault to be retried. Yet another option to return an -EAGAIN if it could not process the mm fault due to page faults being disabled. Happy to give this a try, unless there's a strong preference for the fault_in() fix-up (well, I can do both options and post them). -- Catalin