Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp2332243pxv; Sat, 24 Jul 2021 12:37:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxzpha3sXg0SAsVrWs1Al0w4xcBWhAIOjPvgyrd2ntNcU3uXH10vahriW1cawo7o4JTH0EW X-Received: by 2002:aa7:c804:: with SMTP id a4mr12605874edt.294.1627155452573; Sat, 24 Jul 2021 12:37:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627155452; cv=none; d=google.com; s=arc-20160816; b=SxI2V+BYIn4RXh0jhFx0y7+d43/RHve9jApwO34WbRUF6LEFlsxSBLcBhfgxXrNYhm hL9YIG1is92dYWAQHGHvxyL8vc0uGm51eD7h4G1P2Fo7MtXQxGbrTkVh7uTRHQRtbG0a /e2eiIhG74OA0u/Gdz/Uh4qQMrF3dSJ282iyNhXxNBU0VvYe8K+Yp5V3H97Ir1+n/n7u YtMLqlCkZMCOoNg7IG1lGRw8AIMvWnBDv8lwlfMCFmzM0wn1Vs7/4huNQE4WcGeHO93r dHHVCawUwIB6PwaobhB4tzA+DiCi/jeobXNLg88tFyviVIqD3sWtcOTYej/o2nhez3Or x6mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ahB/v6U/xP9AlZ0hKlPiYyAr767PhYNFGz5xudMNDn4=; b=B9o6/UDzGvRF3sBans6JxCcS/T4/Cjl1ACOlWU1xIBWeyNGfjx8GK0U9vEKcBc+Vve PmrO7qf1OzoAmjd80qHbK7P2a1z+36FlHQ/zs3xBDQwzrnOHabQUhsEKBFyKqlMnjGFU 8RBhco811L698FQHOA9N/tLZbvC5Bn436iOViQs/gjTm5oE32K6mUpfkaFSq3JItlkFT MySkp8lyyTPuCHrsaJs9ZRAJgWU+M3g3fEMjdILTaPAsgDeMPspfk52BTZJtOf8H2Dko G69EBanMY6kKtw0iwD+t3ioNP55yxTIbRNy1pKH7ZRMa4yYnFnVC27Ad35IpWRlVzmmu IC1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gKcTGygH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g20si38116578edw.5.2021.07.24.12.36.55; Sat, 24 Jul 2021 12:37:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gKcTGygH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229697AbhGXSys (ORCPT + 99 others); Sat, 24 Jul 2021 14:54:48 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:50660 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229845AbhGXSyk (ORCPT ); Sat, 24 Jul 2021 14:54:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1627155311; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ahB/v6U/xP9AlZ0hKlPiYyAr767PhYNFGz5xudMNDn4=; b=gKcTGygH/Ox3aYAfIHWN4THh58BQlwc8BwQ+lCuPhLelUrRP82IjjRD5luq+rU6ZPvFCd6 IA/85Dtwhn55YB3NJHva7GWs87EyVd2EyKILKslEKfE7P23xxdA1vM1iH3DfTkVW3GgbyP TNu6BUR6NHEtmaQX3FxjfzY30uCKkew= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-372-amHYFzY8Oty12B9PRSFCGA-1; Sat, 24 Jul 2021 15:35:07 -0400 X-MC-Unique: amHYFzY8Oty12B9PRSFCGA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 859048042E7; Sat, 24 Jul 2021 19:35:06 +0000 (UTC) Received: from max.com (unknown [10.40.194.164]) by smtp.corp.redhat.com (Postfix) with ESMTP id 11308669F3; Sat, 24 Jul 2021 19:35:03 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v4 3/8] gfs2: Fix mmap + page fault deadlocks for buffered I/O Date: Sat, 24 Jul 2021 21:34:44 +0200 Message-Id: <20210724193449.361667-4-agruenba@redhat.com> In-Reply-To: <20210724193449.361667-1-agruenba@redhat.com> References: <20210724193449.361667-1-agruenba@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In the .read_iter and .write_iter file operations, we're accessing user-space memory while holding the inodes glock. There's a possibility that the memory is mapped to the same file, in which case we'd recurse on the same glock. More complex scenarios can involve multiple glocks, processes, and even cluster nodes. Avoids these kinds of problems by disabling page faults while holding a glock. If a page fault occurs, we either end up with a partial read or write, or with -EFAULT if nothing could be read or written. In that case, we drop the glock, fault in the requested pages manually, and repeat the operation. This locking problem in gfs2 was originally reported by Jan Kara. Linus came up with the proposal to disable page faults. Many thanks to Al Viro and Matthew Wilcox for their feedback as well. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/file.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 55ec1cadc9e6..3aa66d4de383 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -843,6 +843,12 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) size_t written = 0; ssize_t ret; + /* + * In this function, we disable page faults when we're holding the + * inode glock while doing I/O. If a page fault occurs, we drop the + * inode glock, fault in the pages manually, and then we retry. + */ + if (iocb->ki_flags & IOCB_DIRECT) { ret = gfs2_file_direct_read(iocb, to, &gh); if (likely(ret != -ENOTBLK)) @@ -864,13 +870,20 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) } ip = GFS2_I(iocb->ki_filp->f_mapping->host); gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); +retry: ret = gfs2_glock_nq(&gh); if (ret) goto out_uninit; + pagefault_disable(); ret = generic_file_read_iter(iocb, to); + pagefault_enable(); if (ret > 0) written += ret; gfs2_glock_dq(&gh); + if (unlikely(iov_iter_count(to) && (ret > 0 || ret == -EFAULT)) && + iter_is_iovec(to) && + iov_iter_fault_in_writeable(to, SIZE_MAX) == 0) + goto retry; out_uninit: gfs2_holder_uninit(&gh); return written ? written : ret; @@ -882,9 +895,22 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb, struct iov_iter *fro struct inode *inode = file_inode(file); ssize_t ret; + /* + * In this function, we disable page faults when we're holding the + * inode glock while doing I/O. If a page fault occurs, we drop the + * inode glock, fault in the pages manually, and then we retry. + */ + +retry: current->backing_dev_info = inode_to_bdi(inode); + pagefault_disable(); ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops); + pagefault_enable(); current->backing_dev_info = NULL; + if (unlikely(ret == -EFAULT) && + iter_is_iovec(from) && + iov_iter_fault_in_readable(from, SIZE_MAX) == 0) + goto retry; return ret; } -- 2.26.3