Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1097798ybt; Tue, 7 Jul 2020 07:46:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyi6oZRFHk9mAO7aRmyrjUyjgVeBD9LbKA3W+Sfmko+Y+6OGcUDmUWC5FUhXEkGW9QxsAdK X-Received: by 2002:a05:6402:3113:: with SMTP id dc19mr60537846edb.20.1594133192277; Tue, 07 Jul 2020 07:46:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594133192; cv=none; d=google.com; s=arc-20160816; b=tez0jCwUX18Cjz+X6+EzaRQ61AiK4Qp3xpHlLPKgBaEpdnSuQWmlt+ccay2Yf0EayG OxZ42IIXMaDq3wdJIXzc9/I9j0qMobuSWR1TgrXgFIgVA52zqcbAjV4aisZkzZcqhrR3 7VDUziSY9q0k2qpxf262omjeQMNlBQA/aeEd1jd7zPRaH4ZrmAmPqJVjIynWWV2PSqzs oJAB4p5T4FgzsHlDjvXtFd05Wre297Kv+aqBxIJHNOUjSJBfYbCfNmdQEYC6u1i75iWD 5aYUi8IlXEByw1mbXxKtsCcHTktInrYmLv5CH0ZG1aK85piM4Wj9R8NOFb/wx2y9xjOr didw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=mT7+S/JZ0X5UTjCHTtZu+xMVOeg7qKFbo24bRgqj2g8=; b=s1Yg6k9vZ6Tb4v0U0jy+blFy8DRYdTeGjlNOzmGauKeZQkXouN87m8vbABZ920AnH4 RqVH6V8HcRJ6NmD9nGyzNrMWNUwJ2VCKWKefDGoIXT6cK2qCV1IQdClJc8oK4AjYJVEy RLYlPJgcCKSWcs4hPTRZWvg3uvK92NWtxNx15aNXCLvTauSLwHtJAh7IFzHuHMxxmGwE XDuAdTy24Tf+ZqYqpVWdKDFXhKlGzVB4h8gWAVvTBXtbM2ULf9+3o82ADZ9Z7CTxX8lz VFchTnn58DCz550/RPVZyLI2kg1c9P30+T1tfNdJx3gzgtNn/J3/9sg5Xz6GW/bALjXA 0fAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SzoMb7BR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x8si15701652edl.120.2020.07.07.07.46.09; Tue, 07 Jul 2020 07:46:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SzoMb7BR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728300AbgGGOpP (ORCPT + 99 others); Tue, 7 Jul 2020 10:45:15 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:31138 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728275AbgGGOpN (ORCPT ); Tue, 7 Jul 2020 10:45:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1594133111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mT7+S/JZ0X5UTjCHTtZu+xMVOeg7qKFbo24bRgqj2g8=; b=SzoMb7BRcF4Z6qWkhPallba4BhTvirb2gjkR7fcSc09AuxEBYXqNy6pfHm9Fyy/gSc02sQ 0qzZG49sIMa/UKsz9B41Mc1Zm/SLRkB00TdHDU2oJpMqxPtdiRC9UbmHpmBJOrPzYLs6mg bzU4BJJxBKxYciunFhZoWEDZb0y31Vs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-171-sRVWfiggMZaE7nFUWgYLQw-1; Tue, 07 Jul 2020 10:45:08 -0400 X-MC-Unique: sRVWfiggMZaE7nFUWgYLQw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 825F08005B0; Tue, 7 Jul 2020 14:45:06 +0000 (UTC) Received: from max.home.com (unknown [10.40.192.179]) by smtp.corp.redhat.com (Postfix) with ESMTP id 874DA797E8; Tue, 7 Jul 2020 14:45:04 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds Cc: Matthew Wilcox , Dave Chinner , Jens Axboe , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andreas Gruenbacher Subject: [RFC v3 2/2] gfs2: Rework read and page fault locking Date: Tue, 7 Jul 2020 16:44:57 +0200 Message-Id: <20200707144457.1603400-3-agruenba@redhat.com> In-Reply-To: <20200707144457.1603400-1-agruenba@redhat.com> References: <20200707144457.1603400-1-agruenba@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org So far, gfs2 has taken the inode glocks inside the ->readpage and ->readahead address space operations. Since commit d4388340ae0b ("fs: convert mpage_readpages to mpage_readahead"), gfs2_readahead is passed the pages to read ahead locked. With that, the current holder of the inode glock may be trying to lock one of those pages while gfs2_readahead is trying to take the inode glock, resulting in a deadlock. Fix that by moving the lock taking to the higher-level ->read_iter file and ->fault vm operations. This also gets rid of an ugly lock inversion workaround in gfs2_readpage. The cache consistency model of filesystems like gfs2 is such that if data is found in the page cache, the data is up to date and can be used without taking any filesystem locks. If a page is not cached, filesystem locks must be taken before populating the page cache. To avoid taking the inode glock when the data is already cached, gfs2_file_read_iter first tries to read the data with the IOCB_NOIO flag set. If that fails, the inode glock is taken and the operation is retried with the IOCB_NOIO flag cleared. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/aops.c | 45 +------------------------------------------ fs/gfs2/file.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 51 insertions(+), 46 deletions(-) diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index 72c9560f4467..68cd700a2719 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -468,21 +468,10 @@ static int stuffed_readpage(struct gfs2_inode *ip, struct page *page) } -/** - * __gfs2_readpage - readpage - * @file: The file to read a page for - * @page: The page to read - * - * This is the core of gfs2's readpage. It's used by the internal file - * reading code as in that case we already hold the glock. Also it's - * called by gfs2_readpage() once the required lock has been granted. - */ - static int __gfs2_readpage(void *file, struct page *page) { struct gfs2_inode *ip = GFS2_I(page->mapping->host); struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host); - int error; if (i_blocksize(page->mapping->host) == PAGE_SIZE && @@ -505,36 +494,11 @@ static int __gfs2_readpage(void *file, struct page *page) * gfs2_readpage - read a page of a file * @file: The file to read * @page: The page of the file - * - * This deals with the locking required. We have to unlock and - * relock the page in order to get the locking in the right - * order. */ static int gfs2_readpage(struct file *file, struct page *page) { - struct address_space *mapping = page->mapping; - struct gfs2_inode *ip = GFS2_I(mapping->host); - struct gfs2_holder gh; - int error; - - unlock_page(page); - gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); - error = gfs2_glock_nq(&gh); - if (unlikely(error)) - goto out; - error = AOP_TRUNCATED_PAGE; - lock_page(page); - if (page->mapping == mapping && !PageUptodate(page)) - error = __gfs2_readpage(file, page); - else - unlock_page(page); - gfs2_glock_dq(&gh); -out: - gfs2_holder_uninit(&gh); - if (error && error != AOP_TRUNCATED_PAGE) - lock_page(page); - return error; + return __gfs2_readpage(file, page); } /** @@ -598,16 +562,9 @@ static void gfs2_readahead(struct readahead_control *rac) { struct inode *inode = rac->mapping->host; struct gfs2_inode *ip = GFS2_I(inode); - struct gfs2_holder gh; - gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); - if (gfs2_glock_nq(&gh)) - goto out_uninit; if (!gfs2_is_stuffed(ip)) mpage_readahead(rac, gfs2_block_map); - gfs2_glock_dq(&gh); -out_uninit: - gfs2_holder_uninit(&gh); } /** diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index fe305e4bfd37..bebde537ac8c 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -558,8 +558,29 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf) return block_page_mkwrite_return(ret); } +static vm_fault_t gfs2_fault(struct vm_fault *vmf) +{ + struct inode *inode = file_inode(vmf->vma->vm_file); + struct gfs2_inode *ip = GFS2_I(inode); + struct gfs2_holder gh; + vm_fault_t ret; + int err; + + gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); + err = gfs2_glock_nq(&gh); + if (err) { + ret = block_page_mkwrite_return(err); + goto out_uninit; + } + ret = filemap_fault(vmf); + gfs2_glock_dq(&gh); +out_uninit: + gfs2_holder_uninit(&gh); + return ret; +} + static const struct vm_operations_struct gfs2_vm_ops = { - .fault = filemap_fault, + .fault = gfs2_fault, .map_pages = filemap_map_pages, .page_mkwrite = gfs2_page_mkwrite, }; @@ -824,6 +845,9 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from) static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { + struct gfs2_inode *ip; + struct gfs2_holder gh; + size_t written = 0; ssize_t ret; if (iocb->ki_flags & IOCB_DIRECT) { @@ -832,7 +856,31 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) return ret; iocb->ki_flags &= ~IOCB_DIRECT; } - return generic_file_read_iter(iocb, to); + iocb->ki_flags |= IOCB_NOIO; + ret = generic_file_read_iter(iocb, to); + iocb->ki_flags &= ~IOCB_NOIO; + if (ret >= 0) { + if (!iov_iter_count(to)) + return ret; + written = ret; + } else { + if (ret != -EAGAIN) + return ret; + if (iocb->ki_flags & IOCB_NOWAIT) + return ret; + } + ip = GFS2_I(iocb->ki_filp->f_mapping->host); + gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); + ret = gfs2_glock_nq(&gh); + if (ret) + goto out_uninit; + ret = generic_file_read_iter(iocb, to); + if (ret > 0) + written += ret; + gfs2_glock_dq(&gh); +out_uninit: + gfs2_holder_uninit(&gh); + return written ? written : ret; } /** -- 2.26.2