Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp4177708pxb; Tue, 2 Nov 2021 05:35:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwugrzKdtyJhSN2DE6qi1zLwOb3wB7FLA4VPhIwmYanf+UWowVDzyPMGvNaKWXXxpMMrs8S X-Received: by 2002:a17:906:2a0d:: with SMTP id j13mr43025952eje.545.1635856530586; Tue, 02 Nov 2021 05:35:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635856530; cv=none; d=google.com; s=arc-20160816; b=yhbQwLCC6oZSo7DAAbdKAmhrUxS7VTMFnRNMi3knsv5jqCZzvXrXXumlUQT+eMK6Iv dYTUO5fTj9XRNGsB4ABlxYEP57NSHUlnfxJlpcVZlkD6o1fX0W1JYGLwm+heCLzB4498 Pn+i+iJTq18b/Et2HWB27fAUg/9Tx6bUFcK11GxplwWJSnqu5xBzAIgYdeHKlbLczhXE pzLZpkiZyL+tgIuU9QXrGESl8P5IvSww2mxgigUd/fphwhe/GDIbuzARCZY+/gVHU+SX 3/R03kBOTW1GePyV30MXs0CS0pwI+NnwsAZlV8sDb79t4hgJRwDgIA81G1yt5hnO0QZA FUEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=faLoLpLyKy/Ab8XQ27t5jmK0JEkq3soAYUpBSCy17oY=; b=fG2PFVD9LH9NzGqx6det6sN7ck0ZIX3Z7sJ0cmlrnyeQbipVHRK604Grw1KFy16VkI J7c9dvxV4MezYw7andcZSNw8R7qX4RnMbqMdxRmAsGziOwDj5qCGM1N3hRMC9NLZXJsi Wo/AK3H5mHE/rof/bwRLWAI0T23BL+SbK5P4E9jm9wduZyaIAuXK6gNLT00eAKx/sWpw h464dkctNmybRav9ZaEyBHx0epJJvRmwbke4iLGFqdtxcsossAhXFARP2ZiNjc2LEvQt hkNP04xaIDYmAjd9BOdQ/hc+qPXmrJAf/e3GK4nmKa8Jup/mZhyzocVjnHoAkCZPs5mM JkgA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=D+TIox7+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s20si22860922ejb.327.2021.11.02.05.35.06; Tue, 02 Nov 2021 05:35:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=D+TIox7+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230349AbhKBMfn (ORCPT + 99 others); Tue, 2 Nov 2021 08:35:43 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:36537 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229530AbhKBMfn (ORCPT ); Tue, 2 Nov 2021 08:35:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635856388; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=faLoLpLyKy/Ab8XQ27t5jmK0JEkq3soAYUpBSCy17oY=; b=D+TIox7+WsRRsXtfIBGnlbb2OKNr8yNSba1E2GHf4KJ1L1e+G13DVxet5AGGaJUyXEM7XS SWHkmQWbuPFj8U4PLq3JPNjGfVrE4WBFhS0Z+xi7QPdjySMidyYXj32TnU47N8sBJtxkcK ifmeVS7k/eZBHC+yqlLgXPkCVbFrIRQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-460-Ww8bC-2wMe2RsfP5VGf83w-1; Tue, 02 Nov 2021 08:33:03 -0400 X-MC-Unique: Ww8bC-2wMe2RsfP5VGf83w-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4574B802682; Tue, 2 Nov 2021 12:33:01 +0000 (UTC) Received: from max.localdomain (unknown [10.40.195.95]) by smtp.corp.redhat.com (Postfix) with ESMTP id 06EF769119; Tue, 2 Nov 2021 12:32:08 +0000 (UTC) From: Andreas Gruenbacher To: cluster-devel@redhat.com Cc: Linus Torvalds , Catalin Marinas , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" , Paul Mackerras , Jan Kara , Matthew Wilcox , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org, linux-btrfs@vger.kernel.org, Andreas Gruenbacher Subject: [PATCH v9 17/17] gfs2: Fix mmap + page fault deadlocks for direct I/O Date: Tue, 2 Nov 2021 13:29:45 +0100 Message-Id: <20211102122945.117744-18-agruenba@redhat.com> In-Reply-To: <20211102122945.117744-1-agruenba@redhat.com> References: <20211102122945.117744-1-agruenba@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Also disable page faults during direct I/O requests and implement a similar kind of retry logic as in the buffered I/O case. The retry logic in the direct I/O case differs from the buffered I/O case in the following way: direct I/O doesn't provide the kinds of consistency guarantees between concurrent reads and writes that buffered I/O provides, so once we lose the inode glock while faulting in user pages, we always resume the operation. We never need to return a partial read or write. This locking problem was originally reported by Jan Kara. Linus came up with the idea of disabling page faults. Many thanks to Al Viro and Matthew Wilcox for their feedback. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/file.c | 99 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 87 insertions(+), 12 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index f772ee0fcae3..40e6501c02e5 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -811,22 +811,64 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to, { struct file *file = iocb->ki_filp; struct gfs2_inode *ip = GFS2_I(file->f_mapping->host); - size_t count = iov_iter_count(to); + size_t prev_count = 0, window_size = 0; + size_t written = 0; ssize_t ret; - if (!count) + /* + * In this function, we disable page faults when we're holding the + * inode glock while doing I/O. If a page fault occurs, we indicate + * that the inode glock may be dropped, fault in the pages manually, + * and retry. + * + * Unlike generic_file_read_iter, for reads, iomap_dio_rw can trigger + * physical as well as manual page faults, and we need to disable both + * kinds. + * + * For direct I/O, gfs2 takes the inode glock in deferred mode. This + * locking mode is compatible with other deferred holders, so multiple + * processes and nodes can do direct I/O to a file at the same time. + * There's no guarantee that reads or writes will be atomic. Any + * coordination among readers and writers needs to happen externally. + */ + + if (!iov_iter_count(to)) return 0; /* skip atime */ gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, gh); +retry: ret = gfs2_glock_nq(gh); if (ret) goto out_uninit; +retry_under_glock: + pagefault_disable(); + to->nofault = true; + ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL, + IOMAP_DIO_PARTIAL, written); + to->nofault = false; + pagefault_enable(); + if (ret > 0) + written = ret; - ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL, 0, 0); - gfs2_glock_dq(gh); + if (should_fault_in_pages(ret, to, &prev_count, &window_size)) { + size_t leftover; + + gfs2_holder_allow_demote(gh); + leftover = fault_in_iov_iter_writeable(to, window_size); + gfs2_holder_disallow_demote(gh); + if (leftover != window_size) { + if (!gfs2_holder_queued(gh)) + goto retry; + goto retry_under_glock; + } + } + if (gfs2_holder_queued(gh)) + gfs2_glock_dq(gh); out_uninit: gfs2_holder_uninit(gh); - return ret; + if (ret < 0) + return ret; + return written; } static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, @@ -835,10 +877,20 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; struct gfs2_inode *ip = GFS2_I(inode); - size_t len = iov_iter_count(from); - loff_t offset = iocb->ki_pos; + size_t prev_count = 0, window_size = 0; + size_t read = 0; ssize_t ret; + /* + * In this function, we disable page faults when we're holding the + * inode glock while doing I/O. If a page fault occurs, we indicate + * that the inode glock may be dropped, fault in the pages manually, + * and retry. + * + * For writes, iomap_dio_rw only triggers manual page faults, so we + * don't need to disable physical ones. + */ + /* * Deferred lock, even if its a write, since we do no allocation on * this path. All we need to change is the atime, and this lock mode @@ -848,22 +900,45 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, * VFS does. */ gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, gh); +retry: ret = gfs2_glock_nq(gh); if (ret) goto out_uninit; - +retry_under_glock: /* Silently fall back to buffered I/O when writing beyond EOF */ - if (offset + len > i_size_read(&ip->i_inode)) + if (iocb->ki_pos + iov_iter_count(from) > i_size_read(&ip->i_inode)) goto out; - ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, 0, 0); + from->nofault = true; + ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, + IOMAP_DIO_PARTIAL, read); + from->nofault = false; + if (ret == -ENOTBLK) ret = 0; + if (ret > 0) + read = ret; + + if (should_fault_in_pages(ret, from, &prev_count, &window_size)) { + size_t leftover; + + gfs2_holder_allow_demote(gh); + leftover = fault_in_iov_iter_readable(from, window_size); + gfs2_holder_disallow_demote(gh); + if (leftover != window_size) { + if (!gfs2_holder_queued(gh)) + goto retry; + goto retry_under_glock; + } + } out: - gfs2_glock_dq(gh); + if (gfs2_holder_queued(gh)) + gfs2_glock_dq(gh); out_uninit: gfs2_holder_uninit(gh); - return ret; + if (ret < 0) + return ret; + return read; } static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) -- 2.31.1