2017-08-03 13:45:32

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 00/28] Reducing inode->i_lock contention in writebacks

Chuck Lever has presented measurements, that would appear to indicate that
the inode->i_lock can be a source of contention when doing writeback using
high performance networks. This patch series is therefore an attempt to
address the sources of contention that his measurements pointed to.

According to Chuck's table, the main sources of contention appear to
be in nfs_flush_incompatible(), nfs_lock_and_join_requests() and
nfs_updatepage(). Most of this patch series focuses on fixing
nfs_lock_and_join_requests(), since it holds the inode lock, while
taking locks on the page group and the sub-requests themselves,
rolling them all back if ever there is contention.
By noting a few simple rules that are mostly already in place, we
can simplify that locking to ensure that we only have to keep the
spin lock while we're dereferencing and locking the head page pointer
that is stored in page_private(page).

Along the way, the patches also simplify the code a little, and fix
a number of subtle races which mainly occur when you set wsize to
some value smaller than the page size. The most notable such race
occurs between nfs_lock_and_join_requests() and nfs_page_group_destroy(),
and could lead to a double free() of some of the sub-requests.

Finally, there are 2 patches tacked onto the end of the series that
attempt to improve the throttling of writes when the RPC layer is
congested. They do so by forcing each caller of nfs_initiate_pgio()
to wait until the request is being transmitted. The expectation is
that this might help improve latencies when there are several processes
competing for access to the RPC transport.

v2: Further patches to eliminate use of inode->i_lock to protect the
page->private contents.
Protect the (large!) list of unstable pages using a dedicated
NFS_I(inode)->commit_mutex instead of the inode->i_lock.

Trond Myklebust (28):
NFS: Simplify page writeback
NFS: Reduce lock contention in nfs_page_find_head_request()
NFS: Reduce lock contention in nfs_try_to_update_request()
NFS: Ensure we always dereference the page head last
NFS: Fix a reference and lock leak in nfs_lock_and_join_requests()
NFS: Fix an ABBA issue in nfs_lock_and_join_requests()
NFS: Don't check request offset and size without holding a lock
NFS: Don't unlock writebacks before declaring PG_WB_END
NFS: Fix the inode request accounting when pages have subrequests
NFS: Teach nfs_try_to_update_request() to deal with request
page_groups
NFS: Remove page group limit in nfs_flush_incompatible()
NFS: Reduce inode->i_lock contention in nfs_lock_and_join_requests()
NFS: Further optimise nfs_lock_and_join_requests()
NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests()
race cases
NFS: Remove nfs_page_group_clear_bits()
NFS: Remove unuse function nfs_page_group_lock_wait()
NFS: Remove unused parameter from nfs_page_group_lock()
NFS: Fix up nfs_page_group_covers_page()
NFSv4: Convert nfs_lock_and_join_requests() to use
nfs_page_find_head_request()
NFS: Refactor nfs_page_find_head_request()
NFSv4: Use a mutex to protect the per-inode commit lists
NFS: Use an atomic_long_t to count the number of requests
NFS: Use an atomic_long_t to count the number of commits
NFS: Switch to using mapping->private_lock for page writeback lookups.
NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg()
NFS: Wait for requests that are locked on the commit list
SUNRPC: Add a function to allow waiting for RPC transmission
NFS: Throttle I/O to the NFS server

fs/nfs/callback_proc.c | 2 +-
fs/nfs/delegation.c | 2 +-
fs/nfs/direct.c | 4 +-
fs/nfs/inode.c | 10 +-
fs/nfs/pagelist.c | 70 +++----
fs/nfs/pnfs.c | 41 ----
fs/nfs/pnfs.h | 2 -
fs/nfs/pnfs_nfs.c | 37 ++--
fs/nfs/write.c | 440 ++++++++++++++++++++-----------------------
include/linux/nfs_fs.h | 5 +-
include/linux/nfs_page.h | 3 +-
include/linux/nfs_xdr.h | 2 +-
include/linux/sunrpc/sched.h | 3 +
net/sunrpc/sched.c | 22 +++
net/sunrpc/xprt.c | 6 +
15 files changed, 295 insertions(+), 354 deletions(-)

--
2.13.3



2017-08-03 13:45:34

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 02/28] NFS: Reduce lock contention in nfs_page_find_head_request()

Add a lockless check for whether or not the page might be carrying
an existing writeback before we grab the inode->i_lock.

Reported-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 1d447e37f472..06e150c4e315 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -190,9 +190,11 @@ static struct nfs_page *nfs_page_find_head_request(struct page *page)
struct inode *inode = page_file_mapping(page)->host;
struct nfs_page *req = NULL;

- spin_lock(&inode->i_lock);
- req = nfs_page_find_head_request_locked(NFS_I(inode), page);
- spin_unlock(&inode->i_lock);
+ if (PagePrivate(page) || PageSwapCache(page)) {
+ spin_lock(&inode->i_lock);
+ req = nfs_page_find_head_request_locked(NFS_I(inode), page);
+ spin_unlock(&inode->i_lock);
+ }
return req;
}

--
2.13.3


2017-08-03 13:45:36

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 03/28] NFS: Reduce lock contention in nfs_try_to_update_request()

Micro-optimisation to move the lockless check into the for(;;) loop.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 06e150c4e315..bb019096c331 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1097,13 +1097,12 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
unsigned int end;
int error;

- if (!PagePrivate(page))
- return NULL;
-
end = offset + bytes;
- spin_lock(&inode->i_lock);

for (;;) {
+ if (!(PagePrivate(page) || PageSwapCache(page)))
+ return NULL;
+ spin_lock(&inode->i_lock);
req = nfs_page_find_head_request_locked(NFS_I(inode), page);
if (req == NULL)
goto out_unlock;
@@ -1132,7 +1131,6 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
nfs_release_request(req);
if (error != 0)
goto out_err;
- spin_lock(&inode->i_lock);
}

/* Okay, the request matches. Update the region */
--
2.13.3


2017-08-03 13:45:33

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 01/28] NFS: Simplify page writeback

We don't expect the page header lock to ever be held across I/O, so
it should always be safe to wait for it, even if we're doing nonblocking
writebacks.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 30 ++++++++++--------------------
1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index b1af5dee5e0a..1d447e37f472 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -366,7 +366,6 @@ nfs_page_group_clear_bits(struct nfs_page *req)
* @inode - inode associated with request page group, must be holding inode lock
* @head - head request of page group, must be holding head lock
* @req - request that couldn't lock and needs to wait on the req bit lock
- * @nonblock - if true, don't actually wait
*
* NOTE: this must be called holding page_group bit lock and inode spin lock
* and BOTH will be released before returning.
@@ -375,7 +374,7 @@ nfs_page_group_clear_bits(struct nfs_page *req)
*/
static int
nfs_unroll_locks_and_wait(struct inode *inode, struct nfs_page *head,
- struct nfs_page *req, bool nonblock)
+ struct nfs_page *req)
__releases(&inode->i_lock)
{
struct nfs_page *tmp;
@@ -396,10 +395,7 @@ nfs_unroll_locks_and_wait(struct inode *inode, struct nfs_page *head,
/* release ref from nfs_page_find_head_request_locked */
nfs_release_request(head);

- if (!nonblock)
- ret = nfs_wait_on_request(req);
- else
- ret = -EAGAIN;
+ ret = nfs_wait_on_request(req);
nfs_release_request(req);

return ret;
@@ -464,7 +460,6 @@ nfs_destroy_unlinked_subrequests(struct nfs_page *destroy_list,
* operations for this page.
*
* @page - the page used to lookup the "page group" of nfs_page structures
- * @nonblock - if true, don't block waiting for request locks
*
* This function joins all sub requests to the head request by first
* locking all requests in the group, cancelling any pending operations
@@ -478,7 +473,7 @@ nfs_destroy_unlinked_subrequests(struct nfs_page *destroy_list,
* error was encountered.
*/
static struct nfs_page *
-nfs_lock_and_join_requests(struct page *page, bool nonblock)
+nfs_lock_and_join_requests(struct page *page)
{
struct inode *inode = page_file_mapping(page)->host;
struct nfs_page *head, *subreq;
@@ -511,14 +506,9 @@ nfs_lock_and_join_requests(struct page *page, bool nonblock)
if (ret < 0) {
spin_unlock(&inode->i_lock);

- if (!nonblock && ret == -EAGAIN) {
- nfs_page_group_lock_wait(head);
- nfs_release_request(head);
- goto try_again;
- }
-
+ nfs_page_group_lock_wait(head);
nfs_release_request(head);
- return ERR_PTR(ret);
+ goto try_again;
}

/* lock each request in the page group */
@@ -543,7 +533,7 @@ nfs_lock_and_join_requests(struct page *page, bool nonblock)
/* releases page group bit lock and
* inode spin lock and all references */
ret = nfs_unroll_locks_and_wait(inode, head,
- subreq, nonblock);
+ subreq);

if (ret == 0)
goto try_again;
@@ -624,12 +614,12 @@ nfs_error_is_fatal_on_server(int err)
* May return an error if the user signalled nfs_wait_on_request().
*/
static int nfs_page_async_flush(struct nfs_pageio_descriptor *pgio,
- struct page *page, bool nonblock)
+ struct page *page)
{
struct nfs_page *req;
int ret = 0;

- req = nfs_lock_and_join_requests(page, nonblock);
+ req = nfs_lock_and_join_requests(page);
if (!req)
goto out;
ret = PTR_ERR(req);
@@ -672,7 +662,7 @@ static int nfs_do_writepage(struct page *page, struct writeback_control *wbc,
int ret;

nfs_pageio_cond_complete(pgio, page_index(page));
- ret = nfs_page_async_flush(pgio, page, wbc->sync_mode == WB_SYNC_NONE);
+ ret = nfs_page_async_flush(pgio, page);
if (ret == -EAGAIN) {
redirty_page_for_writepage(wbc, page);
ret = 0;
@@ -2015,7 +2005,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)

/* blocking call to cancel all requests and join to a single (head)
* request */
- req = nfs_lock_and_join_requests(page, false);
+ req = nfs_lock_and_join_requests(page);

if (IS_ERR(req)) {
ret = PTR_ERR(req);
--
2.13.3


2017-08-03 13:45:41

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 06/28] NFS: Fix an ABBA issue in nfs_lock_and_join_requests()

All other callers of nfs_page_group_lock() appear to already hold the
page lock on the head page, so doing it in the opposite order here
is inefficient, although not deadlock prone since we roll back all
locks on contention.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 1ca759719429..c940e615f5dc 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -383,7 +383,7 @@ nfs_unroll_locks_and_wait(struct inode *inode, struct nfs_page *head,
int ret;

/* relinquish all the locks successfully grabbed this run */
- for (tmp = head ; tmp != req; tmp = tmp->wb_this_page)
+ for (tmp = head->wb_this_page ; tmp != req; tmp = tmp->wb_this_page)
nfs_unlock_request(tmp);

WARN_ON_ONCE(test_bit(PG_TEARDOWN, &req->wb_flags));
@@ -395,7 +395,7 @@ nfs_unroll_locks_and_wait(struct inode *inode, struct nfs_page *head,
spin_unlock(&inode->i_lock);

/* release ref from nfs_page_find_head_request_locked */
- nfs_release_request(head);
+ nfs_unlock_and_release_request(head);

ret = nfs_wait_on_request(req);
nfs_release_request(req);
@@ -484,10 +484,6 @@ nfs_lock_and_join_requests(struct page *page)
int ret;

try_again:
- total_bytes = 0;
-
- WARN_ON_ONCE(destroy_list);
-
spin_lock(&inode->i_lock);

/*
@@ -502,6 +498,16 @@ nfs_lock_and_join_requests(struct page *page)
return NULL;
}

+ /* lock the page head first in order to avoid an ABBA inefficiency */
+ if (!nfs_lock_request(head)) {
+ spin_unlock(&inode->i_lock);
+ ret = nfs_wait_on_request(head);
+ nfs_release_request(head);
+ if (ret < 0)
+ return ERR_PTR(ret);
+ goto try_again;
+ }
+
/* holding inode lock, so always make a non-blocking call to try the
* page group lock */
ret = nfs_page_group_lock(head, true);
@@ -509,13 +515,14 @@ nfs_lock_and_join_requests(struct page *page)
spin_unlock(&inode->i_lock);

nfs_page_group_lock_wait(head);
- nfs_release_request(head);
+ nfs_unlock_and_release_request(head);
goto try_again;
}

/* lock each request in the page group */
- subreq = head;
- do {
+ total_bytes = head->wb_bytes;
+ for (subreq = head->wb_this_page; subreq != head;
+ subreq = subreq->wb_this_page) {
/*
* Subrequests are always contiguous, non overlapping
* and in order - but may be repeated (mirrored writes).
@@ -541,9 +548,7 @@ nfs_lock_and_join_requests(struct page *page)

return ERR_PTR(ret);
}
-
- subreq = subreq->wb_this_page;
- } while (subreq != head);
+ }

/* Now that all requests are locked, make sure they aren't on any list.
* Commit list removal accounting is done after locks are dropped */
--
2.13.3


2017-08-03 13:45:56

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 17/28] NFS: Remove unused parameter from nfs_page_group_lock()

nfs_page_group_lock() is now always called with the 'nonblock'
parameter set to 'false'.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/pagelist.c | 31 +++++++++++--------------------
fs/nfs/write.c | 6 +++---
include/linux/nfs_page.h | 2 +-
3 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index ced7974622dd..af6731dd4324 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -134,19 +134,14 @@ EXPORT_SYMBOL_GPL(nfs_async_iocounter_wait);
/*
* nfs_page_group_lock - lock the head of the page group
* @req - request in group that is to be locked
- * @nonblock - if true don't block waiting for lock
*
- * this lock must be held if modifying the page group list
+ * this lock must be held when traversing or modifying the page
+ * group list
*
- * return 0 on success, < 0 on error: -EDELAY if nonblocking or the
- * result from wait_on_bit_lock
- *
- * NOTE: calling with nonblock=false should always have set the
- * lock bit (see fs/buffer.c and other uses of wait_on_bit_lock
- * with TASK_UNINTERRUPTIBLE), so there is no need to check the result.
+ * return 0 on success, < 0 on error
*/
int
-nfs_page_group_lock(struct nfs_page *req, bool nonblock)
+nfs_page_group_lock(struct nfs_page *req)
{
struct nfs_page *head = req->wb_head;

@@ -155,14 +150,10 @@ nfs_page_group_lock(struct nfs_page *req, bool nonblock)
if (!test_and_set_bit(PG_HEADLOCK, &head->wb_flags))
return 0;

- if (!nonblock) {
- set_bit(PG_CONTENDED1, &head->wb_flags);
- smp_mb__after_atomic();
- return wait_on_bit_lock(&head->wb_flags, PG_HEADLOCK,
+ set_bit(PG_CONTENDED1, &head->wb_flags);
+ smp_mb__after_atomic();
+ return wait_on_bit_lock(&head->wb_flags, PG_HEADLOCK,
TASK_UNINTERRUPTIBLE);
- }
-
- return -EAGAIN;
}

/*
@@ -225,7 +216,7 @@ bool nfs_page_group_sync_on_bit(struct nfs_page *req, unsigned int bit)
{
bool ret;

- nfs_page_group_lock(req, false);
+ nfs_page_group_lock(req);
ret = nfs_page_group_sync_on_bit_locked(req, bit);
nfs_page_group_unlock(req);

@@ -1016,7 +1007,7 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
unsigned int bytes_left = 0;
unsigned int offset, pgbase;

- nfs_page_group_lock(req, false);
+ nfs_page_group_lock(req);

subreq = req;
bytes_left = subreq->wb_bytes;
@@ -1038,7 +1029,7 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
if (mirror->pg_recoalesce)
return 0;
/* retry add_request for this subreq */
- nfs_page_group_lock(req, false);
+ nfs_page_group_lock(req);
continue;
}

@@ -1135,7 +1126,7 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,

for (midx = 0; midx < desc->pg_mirror_count; midx++) {
if (midx) {
- nfs_page_group_lock(req, false);
+ nfs_page_group_lock(req);

/* find the last request */
for (lastreq = req->wb_head;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 20d44ea328b6..0f418d825185 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -271,7 +271,7 @@ static bool nfs_page_group_covers_page(struct nfs_page *req)
unsigned int pos = 0;
unsigned int len = nfs_page_length(req->wb_page);

- nfs_page_group_lock(req, false);
+ nfs_page_group_lock(req);

do {
tmp = nfs_page_group_search_locked(req->wb_head, pos);
@@ -480,7 +480,7 @@ nfs_lock_and_join_requests(struct page *page)
}
spin_unlock(&inode->i_lock);

- ret = nfs_page_group_lock(head, false);
+ ret = nfs_page_group_lock(head);
if (ret < 0) {
nfs_unlock_and_release_request(head);
return ERR_PTR(ret);
@@ -501,7 +501,7 @@ nfs_lock_and_join_requests(struct page *page)
nfs_page_group_unlock(head);
ret = nfs_wait_on_request(subreq);
if (!ret)
- ret = nfs_page_group_lock(head, false);
+ ret = nfs_page_group_lock(head);
if (ret < 0) {
nfs_unroll_locks(inode, head, subreq);
nfs_release_request(subreq);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index de1d24cedaa2..2f4fdafb6746 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -139,7 +139,7 @@ extern size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
extern int nfs_wait_on_request(struct nfs_page *);
extern void nfs_unlock_request(struct nfs_page *req);
extern void nfs_unlock_and_release_request(struct nfs_page *);
-extern int nfs_page_group_lock(struct nfs_page *, bool);
+extern int nfs_page_group_lock(struct nfs_page *);
extern void nfs_page_group_unlock(struct nfs_page *);
extern bool nfs_page_group_sync_on_bit(struct nfs_page *, unsigned int);
extern bool nfs_async_iocounter_wait(struct rpc_task *, struct nfs_lock_context *);
--
2.13.3


2017-08-03 13:45:54

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 16/28] NFS: Remove unuse function nfs_page_group_lock_wait()

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/pagelist.c | 21 ---------------------
include/linux/nfs_page.h | 1 -
2 files changed, 22 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index a6f2bbd709ba..ced7974622dd 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -166,27 +166,6 @@ nfs_page_group_lock(struct nfs_page *req, bool nonblock)
}

/*
- * nfs_page_group_lock_wait - wait for the lock to clear, but don't grab it
- * @req - a request in the group
- *
- * This is a blocking call to wait for the group lock to be cleared.
- */
-void
-nfs_page_group_lock_wait(struct nfs_page *req)
-{
- struct nfs_page *head = req->wb_head;
-
- WARN_ON_ONCE(head != head->wb_head);
-
- if (!test_bit(PG_HEADLOCK, &head->wb_flags))
- return;
- set_bit(PG_CONTENDED1, &head->wb_flags);
- smp_mb__after_atomic();
- wait_on_bit(&head->wb_flags, PG_HEADLOCK,
- TASK_UNINTERRUPTIBLE);
-}
-
-/*
* nfs_page_group_unlock - unlock the head of the page group
* @req - request in group that is to be unlocked
*/
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index d67b67ae6c8b..de1d24cedaa2 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -140,7 +140,6 @@ extern int nfs_wait_on_request(struct nfs_page *);
extern void nfs_unlock_request(struct nfs_page *req);
extern void nfs_unlock_and_release_request(struct nfs_page *);
extern int nfs_page_group_lock(struct nfs_page *, bool);
-extern void nfs_page_group_lock_wait(struct nfs_page *);
extern void nfs_page_group_unlock(struct nfs_page *);
extern bool nfs_page_group_sync_on_bit(struct nfs_page *, unsigned int);
extern bool nfs_async_iocounter_wait(struct rpc_task *, struct nfs_lock_context *);
--
2.13.3


2017-08-03 13:45:48

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 11/28] NFS: Remove page group limit in nfs_flush_incompatible()

nfs_try_to_update_request() should be able to cope now.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 0b4d1ef168e0..08c1ce968cce 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1205,8 +1205,6 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
l_ctx = req->wb_lock_context;
do_flush = req->wb_page != page ||
!nfs_match_open_context(req->wb_context, ctx);
- /* for now, flush if more than 1 request in page_group */
- do_flush |= req->wb_this_page != req;
if (l_ctx && flctx &&
!(list_empty_careful(&flctx->flc_posix) &&
list_empty_careful(&flctx->flc_flock))) {
--
2.13.3


2017-08-03 13:45:59

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 20/28] NFS: Refactor nfs_page_find_head_request()

Split out the 2 cases so that we can treat the locking differently.
The issue is that the locking in the pageswapcache cache is highly
linked to the commit list locking.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 42 ++++++++++++++++++++++++++++++------------
1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index a06167e20b72..8d8fa6d4cfcc 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -170,20 +170,41 @@ nfs_page_private_request(struct page *page)
* returns matching head request with reference held, or NULL if not found.
*/
static struct nfs_page *
-nfs_page_find_head_request_locked(struct nfs_inode *nfsi, struct page *page)
+nfs_page_find_private_request(struct page *page)
{
+ struct inode *inode = page_file_mapping(page)->host;
struct nfs_page *req;

+ if (!PagePrivate(page))
+ return NULL;
+ spin_lock(&inode->i_lock);
req = nfs_page_private_request(page);
- if (!req && unlikely(PageSwapCache(page)))
- req = nfs_page_search_commits_for_head_request_locked(nfsi,
- page);
-
if (req) {
WARN_ON_ONCE(req->wb_head != req);
kref_get(&req->wb_kref);
}
+ spin_unlock(&inode->i_lock);
+ return req;
+}

+static struct nfs_page *
+nfs_page_find_swap_request(struct page *page)
+{
+ struct inode *inode = page_file_mapping(page)->host;
+ struct nfs_inode *nfsi = NFS_I(inode);
+ struct nfs_page *req = NULL;
+ if (!PageSwapCache(page))
+ return NULL;
+ spin_lock(&inode->i_lock);
+ if (PageSwapCache(page)) {
+ req = nfs_page_search_commits_for_head_request_locked(nfsi,
+ page);
+ if (req) {
+ WARN_ON_ONCE(req->wb_head != req);
+ kref_get(&req->wb_kref);
+ }
+ }
+ spin_unlock(&inode->i_lock);
return req;
}

@@ -194,14 +215,11 @@ nfs_page_find_head_request_locked(struct nfs_inode *nfsi, struct page *page)
*/
static struct nfs_page *nfs_page_find_head_request(struct page *page)
{
- struct inode *inode = page_file_mapping(page)->host;
- struct nfs_page *req = NULL;
+ struct nfs_page *req;

- if (PagePrivate(page) || PageSwapCache(page)) {
- spin_lock(&inode->i_lock);
- req = nfs_page_find_head_request_locked(NFS_I(inode), page);
- spin_unlock(&inode->i_lock);
- }
+ req = nfs_page_find_private_request(page);
+ if (!req)
+ req = nfs_page_find_swap_request(page);
return req;
}

--
2.13.3


2017-08-03 13:45:58

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 19/28] NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request()

Hide the locking from nfs_lock_and_join_requests() so that we can
separate out the requirements for swapcache pages.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 759e37d26acf..a06167e20b72 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -154,6 +154,14 @@ static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
set_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags);
}

+static struct nfs_page *
+nfs_page_private_request(struct page *page)
+{
+ if (!PagePrivate(page))
+ return NULL;
+ return (struct nfs_page *)page_private(page);
+}
+
/*
* nfs_page_find_head_request_locked - find head request associated with @page
*
@@ -164,11 +172,10 @@ static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
static struct nfs_page *
nfs_page_find_head_request_locked(struct nfs_inode *nfsi, struct page *page)
{
- struct nfs_page *req = NULL;
+ struct nfs_page *req;

- if (PagePrivate(page))
- req = (struct nfs_page *)page_private(page);
- else if (unlikely(PageSwapCache(page)))
+ req = nfs_page_private_request(page);
+ if (!req && unlikely(PageSwapCache(page)))
req = nfs_page_search_commits_for_head_request_locked(nfsi,
page);

@@ -448,31 +455,29 @@ nfs_lock_and_join_requests(struct page *page)
int ret;

try_again:
- if (!(PagePrivate(page) || PageSwapCache(page)))
- return NULL;
- spin_lock(&inode->i_lock);
/*
* A reference is taken only on the head request which acts as a
* reference to the whole page group - the group will not be destroyed
* until the head reference is released.
*/
- head = nfs_page_find_head_request_locked(NFS_I(inode), page);
-
- if (!head) {
- spin_unlock(&inode->i_lock);
+ head = nfs_page_find_head_request(page);
+ if (!head)
return NULL;
- }

/* lock the page head first in order to avoid an ABBA inefficiency */
if (!nfs_lock_request(head)) {
- spin_unlock(&inode->i_lock);
ret = nfs_wait_on_request(head);
nfs_release_request(head);
if (ret < 0)
return ERR_PTR(ret);
goto try_again;
}
- spin_unlock(&inode->i_lock);
+
+ /* Ensure that nobody removed the request before we locked it */
+ if (head != nfs_page_private_request(page) && !PageSwapCache(page)) {
+ nfs_unlock_and_release_request(head);
+ goto try_again;
+ }

ret = nfs_page_group_lock(head);
if (ret < 0) {
@@ -559,7 +564,7 @@ nfs_lock_and_join_requests(struct page *page)
return NULL;
}

- /* still holds ref on head from nfs_page_find_head_request_locked
+ /* still holds ref on head from nfs_page_find_head_request
* and still has lock on head from lock loop */
return head;
}
--
2.13.3


2017-08-03 13:45:57

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 18/28] NFS: Fix up nfs_page_group_covers_page()

Fix up the test in nfs_page_group_covers_page(). The simplest implementation
is to check that we have a set of intersecting or contiguous subrequests
that connect page offset 0 to nfs_page_length(req->wb_page).

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 0f418d825185..759e37d26acf 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -243,9 +243,6 @@ nfs_page_group_search_locked(struct nfs_page *head, unsigned int page_offset)
{
struct nfs_page *req;

- WARN_ON_ONCE(head != head->wb_head);
- WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &head->wb_head->wb_flags));
-
req = head;
do {
if (page_offset >= req->wb_pgbase &&
@@ -273,18 +270,15 @@ static bool nfs_page_group_covers_page(struct nfs_page *req)

nfs_page_group_lock(req);

- do {
+ for (;;) {
tmp = nfs_page_group_search_locked(req->wb_head, pos);
- if (tmp) {
- /* no way this should happen */
- WARN_ON_ONCE(tmp->wb_pgbase != pos);
- pos += tmp->wb_bytes - (pos - tmp->wb_pgbase);
- }
- } while (tmp && pos < len);
+ if (!tmp)
+ break;
+ pos = tmp->wb_pgbase + tmp->wb_bytes;
+ }

nfs_page_group_unlock(req);
- WARN_ON_ONCE(pos > len);
- return pos == len;
+ return pos >= len;
}

/* We can set the PG_uptodate flag if we see that a write request
--
2.13.3


2017-08-03 13:45:42

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 07/28] NFS: Don't check request offset and size without holding a lock

Request offsets and sizes are not guaranteed to be stable unless you
are holding the request locked.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index c940e615f5dc..84b6818e5278 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -523,6 +523,17 @@ nfs_lock_and_join_requests(struct page *page)
total_bytes = head->wb_bytes;
for (subreq = head->wb_this_page; subreq != head;
subreq = subreq->wb_this_page) {
+ if (!nfs_lock_request(subreq)) {
+ /* releases page group bit lock and
+ * inode spin lock and all references */
+ ret = nfs_unroll_locks_and_wait(inode, head,
+ subreq);
+
+ if (ret == 0)
+ goto try_again;
+
+ return ERR_PTR(ret);
+ }
/*
* Subrequests are always contiguous, non overlapping
* and in order - but may be repeated (mirrored writes).
@@ -533,21 +544,10 @@ nfs_lock_and_join_requests(struct page *page)
} else if (WARN_ON_ONCE(subreq->wb_offset < head->wb_offset ||
((subreq->wb_offset + subreq->wb_bytes) >
(head->wb_offset + total_bytes)))) {
+ nfs_unlock_request(subreq);
nfs_unroll_locks_and_wait(inode, head, subreq);
return ERR_PTR(-EIO);
}
-
- if (!nfs_lock_request(subreq)) {
- /* releases page group bit lock and
- * inode spin lock and all references */
- ret = nfs_unroll_locks_and_wait(inode, head,
- subreq);
-
- if (ret == 0)
- goto try_again;
-
- return ERR_PTR(ret);
- }
}

/* Now that all requests are locked, make sure they aren't on any list.
--
2.13.3


2017-08-03 13:45:46

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 09/28] NFS: Fix the inode request accounting when pages have subrequests

Both nfs_destroy_unlinked_subrequests() and nfs_lock_and_join_requests()
manipulate the inode flags adjusting the NFS_I(inode)->nrequests.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 27 +++++++++++++++------------
1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index bb38c881fc48..ee981353d4aa 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -418,7 +418,8 @@ nfs_unroll_locks_and_wait(struct inode *inode, struct nfs_page *head,
*/
static void
nfs_destroy_unlinked_subrequests(struct nfs_page *destroy_list,
- struct nfs_page *old_head)
+ struct nfs_page *old_head,
+ struct inode *inode)
{
while (destroy_list) {
struct nfs_page *subreq = destroy_list;
@@ -443,9 +444,12 @@ nfs_destroy_unlinked_subrequests(struct nfs_page *destroy_list,
nfs_page_group_clear_bits(subreq);

/* release the PG_INODE_REF reference */
- if (test_and_clear_bit(PG_INODE_REF, &subreq->wb_flags))
+ if (test_and_clear_bit(PG_INODE_REF, &subreq->wb_flags)) {
nfs_release_request(subreq);
- else
+ spin_lock(&inode->i_lock);
+ NFS_I(inode)->nrequests--;
+ spin_unlock(&inode->i_lock);
+ } else
WARN_ON_ONCE(1);
} else {
WARN_ON_ONCE(test_bit(PG_CLEAN, &subreq->wb_flags));
@@ -572,25 +576,24 @@ nfs_lock_and_join_requests(struct page *page)
head->wb_bytes = total_bytes;
}

+ /* Postpone destruction of this request */
+ if (test_and_clear_bit(PG_REMOVE, &head->wb_flags)) {
+ set_bit(PG_INODE_REF, &head->wb_flags);
+ kref_get(&head->wb_kref);
+ NFS_I(inode)->nrequests++;
+ }
+
/*
* prepare head request to be added to new pgio descriptor
*/
nfs_page_group_clear_bits(head);

- /*
- * some part of the group was still on the inode list - otherwise
- * the group wouldn't be involved in async write.
- * grab a reference for the head request, iff it needs one.
- */
- if (!test_and_set_bit(PG_INODE_REF, &head->wb_flags))
- kref_get(&head->wb_kref);
-
nfs_page_group_unlock(head);

/* drop lock to clean uprequests on destroy list */
spin_unlock(&inode->i_lock);

- nfs_destroy_unlinked_subrequests(destroy_list, head);
+ nfs_destroy_unlinked_subrequests(destroy_list, head, inode);

/* still holds ref on head from nfs_page_find_head_request_locked
* and still has lock on head from lock loop */
--
2.13.3


2017-08-03 13:46:02

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 22/28] NFS: Use an atomic_long_t to count the number of requests

Rather than forcing us to take the inode->i_lock just in order to bump
the number.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/callback_proc.c | 2 +-
fs/nfs/delegation.c | 2 +-
fs/nfs/inode.c | 7 +++----
fs/nfs/pagelist.c | 4 +---
fs/nfs/write.c | 18 +++++-------------
include/linux/nfs_fs.h | 4 ++--
6 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 5427cdf04c5a..14358de173fb 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -51,7 +51,7 @@ __be32 nfs4_callback_getattr(void *argp, void *resp,
goto out_iput;
res->size = i_size_read(inode);
res->change_attr = delegation->change_attr;
- if (nfsi->nrequests != 0)
+ if (nfs_have_writebacks(inode))
res->change_attr++;
res->ctime = inode->i_ctime;
res->mtime = inode->i_mtime;
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index d7df5e67b0c1..606dd3871f66 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -1089,7 +1089,7 @@ bool nfs4_delegation_flush_on_close(const struct inode *inode)
delegation = rcu_dereference(nfsi->delegation);
if (delegation == NULL || !(delegation->type & FMODE_WRITE))
goto out;
- if (nfsi->nrequests < delegation->pagemod_limit)
+ if (atomic_long_read(&nfsi->nrequests) < delegation->pagemod_limit)
ret = false;
out:
rcu_read_unlock();
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 34d9ebbc0dfd..0480eb02299a 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1285,7 +1285,6 @@ static bool nfs_file_has_buffered_writers(struct nfs_inode *nfsi)

static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
{
- struct nfs_inode *nfsi = NFS_I(inode);
unsigned long ret = 0;

if ((fattr->valid & NFS_ATTR_FATTR_PRECHANGE)
@@ -1315,7 +1314,7 @@ static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr
if ((fattr->valid & NFS_ATTR_FATTR_PRESIZE)
&& (fattr->valid & NFS_ATTR_FATTR_SIZE)
&& i_size_read(inode) == nfs_size_to_loff_t(fattr->pre_size)
- && nfsi->nrequests == 0) {
+ && !nfs_have_writebacks(inode)) {
i_size_write(inode, nfs_size_to_loff_t(fattr->size));
ret |= NFS_INO_INVALID_ATTR;
}
@@ -1823,7 +1822,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
if (new_isize != cur_isize) {
/* Do we perhaps have any outstanding writes, or has
* the file grown beyond our last write? */
- if (nfsi->nrequests == 0 || new_isize > cur_isize) {
+ if (!nfs_have_writebacks(inode) || new_isize > cur_isize) {
i_size_write(inode, new_isize);
if (!have_writers)
invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
@@ -2012,7 +2011,7 @@ static void init_once(void *foo)
INIT_LIST_HEAD(&nfsi->access_cache_entry_lru);
INIT_LIST_HEAD(&nfsi->access_cache_inode_lru);
INIT_LIST_HEAD(&nfsi->commit_info.list);
- nfsi->nrequests = 0;
+ atomic_long_set(&nfsi->nrequests, 0);
nfsi->commit_info.ncommit = 0;
atomic_set(&nfsi->commit_info.rpcs_out, 0);
init_rwsem(&nfsi->rmdir_sem);
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index af6731dd4324..ec97c301899b 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -258,9 +258,7 @@ nfs_page_group_init(struct nfs_page *req, struct nfs_page *prev)
inode = page_file_mapping(req->wb_page)->host;
set_bit(PG_INODE_REF, &req->wb_flags);
kref_get(&req->wb_kref);
- spin_lock(&inode->i_lock);
- NFS_I(inode)->nrequests++;
- spin_unlock(&inode->i_lock);
+ atomic_long_inc(&NFS_I(inode)->nrequests);
}
}
}
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 5ab5ca24b48a..08093552f115 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -434,9 +434,7 @@ nfs_destroy_unlinked_subrequests(struct nfs_page *destroy_list,

if (test_and_clear_bit(PG_INODE_REF, &subreq->wb_flags)) {
nfs_release_request(subreq);
- spin_lock(&inode->i_lock);
- NFS_I(inode)->nrequests--;
- spin_unlock(&inode->i_lock);
+ atomic_long_dec(&NFS_I(inode)->nrequests);
}

/* subreq is now totally disconnected from page group or any
@@ -567,9 +565,7 @@ nfs_lock_and_join_requests(struct page *page)
if (test_and_clear_bit(PG_REMOVE, &head->wb_flags)) {
set_bit(PG_INODE_REF, &head->wb_flags);
kref_get(&head->wb_kref);
- spin_lock(&inode->i_lock);
- NFS_I(inode)->nrequests++;
- spin_unlock(&inode->i_lock);
+ atomic_long_inc(&NFS_I(inode)->nrequests);
}

nfs_page_group_unlock(head);
@@ -755,7 +751,7 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
nfs_lock_request(req);

spin_lock(&inode->i_lock);
- if (!nfsi->nrequests &&
+ if (!nfs_have_writebacks(inode) &&
NFS_PROTO(inode)->have_delegation(inode, FMODE_WRITE))
inode->i_version++;
/*
@@ -767,7 +763,7 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
SetPagePrivate(req->wb_page);
set_page_private(req->wb_page, (unsigned long)req);
}
- nfsi->nrequests++;
+ atomic_long_inc(&nfsi->nrequests);
/* this a head request for a page group - mark it as having an
* extra reference so sub groups can follow suit.
* This flag also informs pgio layer when to bump nrequests when
@@ -786,6 +782,7 @@ static void nfs_inode_remove_request(struct nfs_page *req)
struct nfs_inode *nfsi = NFS_I(inode);
struct nfs_page *head;

+ atomic_long_dec(&nfsi->nrequests);
if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
head = req->wb_head;

@@ -795,11 +792,6 @@ static void nfs_inode_remove_request(struct nfs_page *req)
ClearPagePrivate(head->wb_page);
clear_bit(PG_MAPPED, &head->wb_flags);
}
- nfsi->nrequests--;
- spin_unlock(&inode->i_lock);
- } else {
- spin_lock(&inode->i_lock);
- nfsi->nrequests--;
spin_unlock(&inode->i_lock);
}

diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 121a702888b4..238fdc4c46df 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -154,7 +154,7 @@ struct nfs_inode {
*/
__be32 cookieverf[2];

- unsigned long nrequests;
+ atomic_long_t nrequests;
struct nfs_mds_commit_info commit_info;

/* Open contexts for shared mmap writes */
@@ -511,7 +511,7 @@ extern void nfs_commit_free(struct nfs_commit_data *data);
static inline int
nfs_have_writebacks(struct inode *inode)
{
- return NFS_I(inode)->nrequests != 0;
+ return atomic_long_read(&NFS_I(inode)->nrequests) != 0;
}

/*
--
2.13.3


2017-08-03 13:46:06

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 25/28] NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg()

Now that we no longer hold the inode->i_lock when manipulating the
commit lists, it is safe to call pnfs_put_lseg() again.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/pnfs.c | 41 -----------------------------------------
fs/nfs/pnfs.h | 2 --
fs/nfs/pnfs_nfs.c | 4 ++--
3 files changed, 2 insertions(+), 45 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index c383d0913b54..3125a9d7b237 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -529,47 +529,6 @@ pnfs_put_lseg(struct pnfs_layout_segment *lseg)
}
EXPORT_SYMBOL_GPL(pnfs_put_lseg);

-static void pnfs_free_lseg_async_work(struct work_struct *work)
-{
- struct pnfs_layout_segment *lseg;
- struct pnfs_layout_hdr *lo;
-
- lseg = container_of(work, struct pnfs_layout_segment, pls_work);
- lo = lseg->pls_layout;
-
- pnfs_free_lseg(lseg);
- pnfs_put_layout_hdr(lo);
-}
-
-static void pnfs_free_lseg_async(struct pnfs_layout_segment *lseg)
-{
- INIT_WORK(&lseg->pls_work, pnfs_free_lseg_async_work);
- schedule_work(&lseg->pls_work);
-}
-
-void
-pnfs_put_lseg_locked(struct pnfs_layout_segment *lseg)
-{
- if (!lseg)
- return;
-
- assert_spin_locked(&lseg->pls_layout->plh_inode->i_lock);
-
- dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
- atomic_read(&lseg->pls_refcount),
- test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
- if (atomic_dec_and_test(&lseg->pls_refcount)) {
- struct pnfs_layout_hdr *lo = lseg->pls_layout;
- if (test_bit(NFS_LSEG_VALID, &lseg->pls_flags))
- return;
- pnfs_layout_remove_lseg(lo, lseg);
- if (!pnfs_cache_lseg_for_layoutreturn(lo, lseg)) {
- pnfs_get_layout_hdr(lo);
- pnfs_free_lseg_async(lseg);
- }
- }
-}
-
/*
* is l2 fully contained in l1?
* start1 end1
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 99731e3e332f..87f144f14d1e 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -67,7 +67,6 @@ struct pnfs_layout_segment {
u32 pls_seq;
unsigned long pls_flags;
struct pnfs_layout_hdr *pls_layout;
- struct work_struct pls_work;
};

enum pnfs_try_status {
@@ -230,7 +229,6 @@ extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync);
/* pnfs.c */
void pnfs_get_layout_hdr(struct pnfs_layout_hdr *lo);
void pnfs_put_lseg(struct pnfs_layout_segment *lseg);
-void pnfs_put_lseg_locked(struct pnfs_layout_segment *lseg);

void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh *, struct nfs_fsinfo *);
void unset_pnfs_layoutdriver(struct nfs_server *);
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 2cdee8ce2094..4b0a809653d1 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -83,7 +83,7 @@ pnfs_generic_clear_request_commit(struct nfs_page *req,
}
out:
nfs_request_remove_commit_list(req, cinfo);
- pnfs_put_lseg_locked(freeme);
+ pnfs_put_lseg(freeme);
}
EXPORT_SYMBOL_GPL(pnfs_generic_clear_request_commit);

@@ -126,7 +126,7 @@ pnfs_generic_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
if (bucket->clseg == NULL)
bucket->clseg = pnfs_get_lseg(bucket->wlseg);
if (list_empty(src)) {
- pnfs_put_lseg_locked(bucket->wlseg);
+ pnfs_put_lseg(bucket->wlseg);
bucket->wlseg = NULL;
}
}
--
2.13.3


2017-08-03 13:46:10

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 28/28] NFS: Throttle I/O to the NFS server

Ensure that we do not build up large queues of asynchronous I/O work in the
RPC layer that are not being handled by the socket.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/pagelist.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 548ebc7256ff..c9a664730e94 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -625,7 +625,8 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
ret = rpc_wait_for_completion_task(task);
if (ret == 0)
ret = task->tk_status;
- }
+ } else
+ rpc_wait_for_msg_send(task);
rpc_put_task(task);
out:
return ret;
--
2.13.3


2017-08-03 13:46:04

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 24/28] NFS: Switch to using mapping->private_lock for page writeback lookups.

Switch from using the inode->i_lock for this to avoid contention with
other metadata manipulation.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 12479c25028e..866702823869 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -172,18 +172,18 @@ nfs_page_private_request(struct page *page)
static struct nfs_page *
nfs_page_find_private_request(struct page *page)
{
- struct inode *inode = page_file_mapping(page)->host;
+ struct address_space *mapping = page_file_mapping(page);
struct nfs_page *req;

if (!PagePrivate(page))
return NULL;
- spin_lock(&inode->i_lock);
+ spin_lock(&mapping->private_lock);
req = nfs_page_private_request(page);
if (req) {
WARN_ON_ONCE(req->wb_head != req);
kref_get(&req->wb_kref);
}
- spin_unlock(&inode->i_lock);
+ spin_unlock(&mapping->private_lock);
return req;
}

@@ -743,6 +743,7 @@ int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
*/
static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
{
+ struct address_space *mapping = page_file_mapping(req->wb_page);
struct nfs_inode *nfsi = NFS_I(inode);

WARN_ON_ONCE(req->wb_this_page != req);
@@ -750,19 +751,23 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
/* Lock the request! */
nfs_lock_request(req);

- spin_lock(&inode->i_lock);
- if (!nfs_have_writebacks(inode) &&
- NFS_PROTO(inode)->have_delegation(inode, FMODE_WRITE))
- inode->i_version++;
/*
* Swap-space should not get truncated. Hence no need to plug the race
* with invalidate/truncate.
*/
+ spin_lock(&mapping->private_lock);
+ if (!nfs_have_writebacks(inode) &&
+ NFS_PROTO(inode)->have_delegation(inode, FMODE_WRITE)) {
+ spin_lock(&inode->i_lock);
+ inode->i_version++;
+ spin_unlock(&inode->i_lock);
+ }
if (likely(!PageSwapCache(req->wb_page))) {
set_bit(PG_MAPPED, &req->wb_flags);
SetPagePrivate(req->wb_page);
set_page_private(req->wb_page, (unsigned long)req);
}
+ spin_unlock(&mapping->private_lock);
atomic_long_inc(&nfsi->nrequests);
/* this a head request for a page group - mark it as having an
* extra reference so sub groups can follow suit.
@@ -770,7 +775,6 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
* adding subrequests. */
WARN_ON(test_and_set_bit(PG_INODE_REF, &req->wb_flags));
kref_get(&req->wb_kref);
- spin_unlock(&inode->i_lock);
}

/*
@@ -778,7 +782,8 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
*/
static void nfs_inode_remove_request(struct nfs_page *req)
{
- struct inode *inode = d_inode(req->wb_context->dentry);
+ struct address_space *mapping = page_file_mapping(req->wb_page);
+ struct inode *inode = mapping->host;
struct nfs_inode *nfsi = NFS_I(inode);
struct nfs_page *head;

@@ -786,13 +791,13 @@ static void nfs_inode_remove_request(struct nfs_page *req)
if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
head = req->wb_head;

- spin_lock(&inode->i_lock);
+ spin_lock(&mapping->private_lock);
if (likely(head->wb_page && !PageSwapCache(head->wb_page))) {
set_page_private(head->wb_page, 0);
ClearPagePrivate(head->wb_page);
clear_bit(PG_MAPPED, &head->wb_flags);
}
- spin_unlock(&inode->i_lock);
+ spin_unlock(&mapping->private_lock);
}

if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags))
--
2.13.3


2017-08-03 13:45:53

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 15/28] NFS: Remove nfs_page_group_clear_bits()

At this point, we only expect ever to potentially see PG_REMOVE and
PG_TEARDOWN being set on the subrequests.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 29 +++--------------------------
1 file changed, 3 insertions(+), 26 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ffb9934607ef..20d44ea328b6 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -347,22 +347,6 @@ static void nfs_end_page_writeback(struct nfs_page *req)
clear_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
}

-
-/* nfs_page_group_clear_bits
- * @req - an nfs request
- * clears all page group related bits from @req
- */
-static void
-nfs_page_group_clear_bits(struct nfs_page *req)
-{
- clear_bit(PG_TEARDOWN, &req->wb_flags);
- clear_bit(PG_UNLOCKPAGE, &req->wb_flags);
- clear_bit(PG_UPTODATE, &req->wb_flags);
- clear_bit(PG_WB_END, &req->wb_flags);
- clear_bit(PG_REMOVE, &req->wb_flags);
-}
-
-
/*
* nfs_unroll_locks_and_wait - unlock all newly locked reqs and wait on @req
*
@@ -417,13 +401,12 @@ nfs_destroy_unlinked_subrequests(struct nfs_page *destroy_list,
/* make sure old group is not used */
subreq->wb_this_page = subreq;

+ clear_bit(PG_REMOVE, &subreq->wb_flags);
+
/* Note: races with nfs_page_group_destroy() */
if (!kref_read(&subreq->wb_kref)) {
- bool freeme = test_bit(PG_TEARDOWN, &subreq->wb_flags);
-
- nfs_page_group_clear_bits(subreq);
/* Check if we raced with nfs_page_group_destroy() */
- if (freeme)
+ if (test_and_clear_bit(PG_TEARDOWN, &subreq->wb_flags))
nfs_free_request(subreq);
continue;
}
@@ -437,7 +420,6 @@ nfs_destroy_unlinked_subrequests(struct nfs_page *destroy_list,
spin_unlock(&inode->i_lock);
}

- nfs_page_group_clear_bits(subreq);
/* subreq is now totally disconnected from page group or any
* write / commit lists. last chance to wake any waiters */
nfs_unlock_and_release_request(subreq);
@@ -573,11 +555,6 @@ nfs_lock_and_join_requests(struct page *page)
spin_unlock(&inode->i_lock);
}

- /*
- * prepare head request to be added to new pgio descriptor
- */
- nfs_page_group_clear_bits(head);
-
nfs_page_group_unlock(head);

nfs_destroy_unlinked_subrequests(destroy_list, head, inode);
--
2.13.3


2017-08-03 13:45:47

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 10/28] NFS: Teach nfs_try_to_update_request() to deal with request page_groups

Simplify the code, and avoid some flushes to disk.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 60 ++++++++++++++++++++--------------------------------------
1 file changed, 20 insertions(+), 40 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ee981353d4aa..0b4d1ef168e0 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1107,39 +1107,19 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,

end = offset + bytes;

- for (;;) {
- if (!(PagePrivate(page) || PageSwapCache(page)))
- return NULL;
- spin_lock(&inode->i_lock);
- req = nfs_page_find_head_request_locked(NFS_I(inode), page);
- if (req == NULL)
- goto out_unlock;
-
- /* should be handled by nfs_flush_incompatible */
- WARN_ON_ONCE(req->wb_head != req);
- WARN_ON_ONCE(req->wb_this_page != req);
-
- rqend = req->wb_offset + req->wb_bytes;
- /*
- * Tell the caller to flush out the request if
- * the offsets are non-contiguous.
- * Note: nfs_flush_incompatible() will already
- * have flushed out requests having wrong owners.
- */
- if (offset > rqend
- || end < req->wb_offset)
- goto out_flushme;
-
- if (nfs_lock_request(req))
- break;
+ req = nfs_lock_and_join_requests(page);
+ if (IS_ERR_OR_NULL(req))
+ return req;

- /* The request is locked, so wait and then retry */
- spin_unlock(&inode->i_lock);
- error = nfs_wait_on_request(req);
- nfs_release_request(req);
- if (error != 0)
- goto out_err;
- }
+ rqend = req->wb_offset + req->wb_bytes;
+ /*
+ * Tell the caller to flush out the request if
+ * the offsets are non-contiguous.
+ * Note: nfs_flush_incompatible() will already
+ * have flushed out requests having wrong owners.
+ */
+ if (offset > rqend || end < req->wb_offset)
+ goto out_flushme;

/* Okay, the request matches. Update the region */
if (offset < req->wb_offset) {
@@ -1150,17 +1130,17 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
req->wb_bytes = end - req->wb_offset;
else
req->wb_bytes = rqend - req->wb_offset;
-out_unlock:
- if (req)
- nfs_clear_request_commit(req);
- spin_unlock(&inode->i_lock);
return req;
out_flushme:
- spin_unlock(&inode->i_lock);
- nfs_release_request(req);
+ /*
+ * Note: we mark the request dirty here because
+ * nfs_lock_and_join_requests() cannot preserve
+ * commit flags, so we have to replay the write.
+ */
+ nfs_mark_request_dirty(req);
+ nfs_unlock_and_release_request(req);
error = nfs_wb_page(inode, page);
-out_err:
- return ERR_PTR(error);
+ return (error < 0) ? ERR_PTR(error) : NULL;
}

/*
--
2.13.3


2017-08-03 13:46:00

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 21/28] NFSv4: Use a mutex to protect the per-inode commit lists

The commit lists can get very large, so using the inode->i_lock can
end up affecting general metadata performance.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/direct.c | 4 ++--
fs/nfs/inode.c | 1 +
fs/nfs/pnfs_nfs.c | 15 +++++++--------
fs/nfs/write.c | 24 ++++++++++++------------
include/linux/nfs_fs.h | 1 +
5 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 6fb9fad2d1e6..d2972d537469 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -616,13 +616,13 @@ nfs_direct_write_scan_commit_list(struct inode *inode,
struct list_head *list,
struct nfs_commit_info *cinfo)
{
- spin_lock(&cinfo->inode->i_lock);
+ mutex_lock(&NFS_I(cinfo->inode)->commit_mutex);
#ifdef CONFIG_NFS_V4_1
if (cinfo->ds != NULL && cinfo->ds->nwritten != 0)
NFS_SERVER(inode)->pnfs_curr_ld->recover_commit_reqs(list, cinfo);
#endif
nfs_scan_commit_list(&cinfo->mds->list, list, cinfo, 0);
- spin_unlock(&cinfo->inode->i_lock);
+ mutex_unlock(&NFS_I(cinfo->inode)->commit_mutex);
}

static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 109279d6d91b..34d9ebbc0dfd 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2016,6 +2016,7 @@ static void init_once(void *foo)
nfsi->commit_info.ncommit = 0;
atomic_set(&nfsi->commit_info.rpcs_out, 0);
init_rwsem(&nfsi->rmdir_sem);
+ mutex_init(&nfsi->commit_mutex);
nfs4_init_once(nfsi);
}

diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 25f28fa64c57..2cdee8ce2094 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -98,14 +98,13 @@ pnfs_generic_transfer_commit_list(struct list_head *src, struct list_head *dst,
if (!nfs_lock_request(req))
continue;
kref_get(&req->wb_kref);
- if (cond_resched_lock(&cinfo->inode->i_lock))
- list_safe_reset_next(req, tmp, wb_list);
nfs_request_remove_commit_list(req, cinfo);
clear_bit(PG_COMMIT_TO_DS, &req->wb_flags);
nfs_list_add_request(req, dst);
ret++;
if ((ret == max) && !cinfo->dreq)
break;
+ cond_resched();
}
return ret;
}
@@ -119,7 +118,7 @@ pnfs_generic_scan_ds_commit_list(struct pnfs_commit_bucket *bucket,
struct list_head *dst = &bucket->committing;
int ret;

- lockdep_assert_held(&cinfo->inode->i_lock);
+ lockdep_assert_held(&NFS_I(cinfo->inode)->commit_mutex);
ret = pnfs_generic_transfer_commit_list(src, dst, cinfo, max);
if (ret) {
cinfo->ds->nwritten -= ret;
@@ -142,7 +141,7 @@ int pnfs_generic_scan_commit_lists(struct nfs_commit_info *cinfo,
{
int i, rv = 0, cnt;

- lockdep_assert_held(&cinfo->inode->i_lock);
+ lockdep_assert_held(&NFS_I(cinfo->inode)->commit_mutex);
for (i = 0; i < cinfo->ds->nbuckets && max != 0; i++) {
cnt = pnfs_generic_scan_ds_commit_list(&cinfo->ds->buckets[i],
cinfo, max);
@@ -162,7 +161,7 @@ void pnfs_generic_recover_commit_reqs(struct list_head *dst,
int nwritten;
int i;

- lockdep_assert_held(&cinfo->inode->i_lock);
+ lockdep_assert_held(&NFS_I(cinfo->inode)->commit_mutex);
restart:
for (i = 0, b = cinfo->ds->buckets; i < cinfo->ds->nbuckets; i++, b++) {
nwritten = pnfs_generic_transfer_commit_list(&b->written,
@@ -953,12 +952,12 @@ pnfs_layout_mark_request_commit(struct nfs_page *req,
struct list_head *list;
struct pnfs_commit_bucket *buckets;

- spin_lock(&cinfo->inode->i_lock);
+ mutex_lock(&NFS_I(cinfo->inode)->commit_mutex);
buckets = cinfo->ds->buckets;
list = &buckets[ds_commit_idx].written;
if (list_empty(list)) {
if (!pnfs_is_valid_lseg(lseg)) {
- spin_unlock(&cinfo->inode->i_lock);
+ mutex_unlock(&NFS_I(cinfo->inode)->commit_mutex);
cinfo->completion_ops->resched_write(cinfo, req);
return;
}
@@ -975,7 +974,7 @@ pnfs_layout_mark_request_commit(struct nfs_page *req,
cinfo->ds->nwritten++;

nfs_request_add_commit_list_locked(req, list, cinfo);
- spin_unlock(&cinfo->inode->i_lock);
+ mutex_unlock(&NFS_I(cinfo->inode)->commit_mutex);
nfs_mark_page_unstable(req->wb_page, cinfo);
}
EXPORT_SYMBOL_GPL(pnfs_layout_mark_request_commit);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 8d8fa6d4cfcc..5ab5ca24b48a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -195,7 +195,7 @@ nfs_page_find_swap_request(struct page *page)
struct nfs_page *req = NULL;
if (!PageSwapCache(page))
return NULL;
- spin_lock(&inode->i_lock);
+ mutex_lock(&nfsi->commit_mutex);
if (PageSwapCache(page)) {
req = nfs_page_search_commits_for_head_request_locked(nfsi,
page);
@@ -204,7 +204,7 @@ nfs_page_find_swap_request(struct page *page)
kref_get(&req->wb_kref);
}
}
- spin_unlock(&inode->i_lock);
+ mutex_unlock(&nfsi->commit_mutex);
return req;
}

@@ -856,7 +856,8 @@ nfs_page_search_commits_for_head_request_locked(struct nfs_inode *nfsi,
* number of outstanding requests requiring a commit as well as
* the MM page stats.
*
- * The caller must hold cinfo->inode->i_lock, and the nfs_page lock.
+ * The caller must hold NFS_I(cinfo->inode)->commit_mutex, and the
+ * nfs_page lock.
*/
void
nfs_request_add_commit_list_locked(struct nfs_page *req, struct list_head *dst,
@@ -884,9 +885,9 @@ EXPORT_SYMBOL_GPL(nfs_request_add_commit_list_locked);
void
nfs_request_add_commit_list(struct nfs_page *req, struct nfs_commit_info *cinfo)
{
- spin_lock(&cinfo->inode->i_lock);
+ mutex_lock(&NFS_I(cinfo->inode)->commit_mutex);
nfs_request_add_commit_list_locked(req, &cinfo->mds->list, cinfo);
- spin_unlock(&cinfo->inode->i_lock);
+ mutex_unlock(&NFS_I(cinfo->inode)->commit_mutex);
if (req->wb_page)
nfs_mark_page_unstable(req->wb_page, cinfo);
}
@@ -964,11 +965,11 @@ nfs_clear_request_commit(struct nfs_page *req)
struct nfs_commit_info cinfo;

nfs_init_cinfo_from_inode(&cinfo, inode);
- spin_lock(&inode->i_lock);
+ mutex_lock(&NFS_I(inode)->commit_mutex);
if (!pnfs_clear_request_commit(req, &cinfo)) {
nfs_request_remove_commit_list(req, &cinfo);
}
- spin_unlock(&inode->i_lock);
+ mutex_unlock(&NFS_I(inode)->commit_mutex);
nfs_clear_page_commit(req->wb_page);
}
}
@@ -1027,7 +1028,7 @@ nfs_reqs_to_commit(struct nfs_commit_info *cinfo)
return cinfo->mds->ncommit;
}

-/* cinfo->inode->i_lock held by caller */
+/* NFS_I(cinfo->inode)->commit_mutex held by caller */
int
nfs_scan_commit_list(struct list_head *src, struct list_head *dst,
struct nfs_commit_info *cinfo, int max)
@@ -1039,13 +1040,12 @@ nfs_scan_commit_list(struct list_head *src, struct list_head *dst,
if (!nfs_lock_request(req))
continue;
kref_get(&req->wb_kref);
- if (cond_resched_lock(&cinfo->inode->i_lock))
- list_safe_reset_next(req, tmp, wb_list);
nfs_request_remove_commit_list(req, cinfo);
nfs_list_add_request(req, dst);
ret++;
if ((ret == max) && !cinfo->dreq)
break;
+ cond_resched();
}
return ret;
}
@@ -1065,7 +1065,7 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst,
{
int ret = 0;

- spin_lock(&cinfo->inode->i_lock);
+ mutex_lock(&NFS_I(cinfo->inode)->commit_mutex);
if (cinfo->mds->ncommit > 0) {
const int max = INT_MAX;

@@ -1073,7 +1073,7 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst,
cinfo, max);
ret += pnfs_scan_commit_lists(inode, cinfo, max - ret);
}
- spin_unlock(&cinfo->inode->i_lock);
+ mutex_unlock(&NFS_I(cinfo->inode)->commit_mutex);
return ret;
}

diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 5cc91d6381a3..121a702888b4 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -163,6 +163,7 @@ struct nfs_inode {
/* Readers: in-flight sillydelete RPC calls */
/* Writers: rmdir */
struct rw_semaphore rmdir_sem;
+ struct mutex commit_mutex;

#if IS_ENABLED(CONFIG_NFS_V4)
struct nfs4_cached_acl *nfs4_acl;
--
2.13.3


2017-08-03 13:45:50

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 12/28] NFS: Reduce inode->i_lock contention in nfs_lock_and_join_requests()

We should no longer need the inode->i_lock, now that we've
straightened out the request locking. The locking schema is now:

1) Lock page head request
2) Lock the page group
3) Lock the subrequests one by one

Note that there is a subtle race with nfs_inode_remove_request() due
to the fact that the latter does not lock the page head, when removing
it from the struct page. Only the last subrequest is locked, hence
we need to re-check that the PagePrivate(page) is still set after
we've locked all the subrequests.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 40 ++++++++++++++++++++++------------------
1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 08c1ce968cce..ff7c90c7ff79 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -372,15 +372,14 @@ nfs_page_group_clear_bits(struct nfs_page *req)
* @head - head request of page group, must be holding head lock
* @req - request that couldn't lock and needs to wait on the req bit lock
*
- * NOTE: this must be called holding page_group bit lock and inode spin lock
- * and BOTH will be released before returning.
+ * NOTE: this must be called holding page_group bit lock
+ * which will be released before returning.
*
* returns 0 on success, < 0 on error.
*/
static int
nfs_unroll_locks_and_wait(struct inode *inode, struct nfs_page *head,
struct nfs_page *req)
- __releases(&inode->i_lock)
{
struct nfs_page *tmp;
int ret;
@@ -395,7 +394,6 @@ nfs_unroll_locks_and_wait(struct inode *inode, struct nfs_page *head,
kref_get(&req->wb_kref);

nfs_page_group_unlock(head);
- spin_unlock(&inode->i_lock);

/* release ref from nfs_page_find_head_request_locked */
nfs_unlock_and_release_request(head);
@@ -491,8 +489,9 @@ nfs_lock_and_join_requests(struct page *page)
int ret;

try_again:
+ if (!(PagePrivate(page) || PageSwapCache(page)))
+ return NULL;
spin_lock(&inode->i_lock);
-
/*
* A reference is taken only on the head request which acts as a
* reference to the whole page group - the group will not be destroyed
@@ -514,16 +513,12 @@ nfs_lock_and_join_requests(struct page *page)
return ERR_PTR(ret);
goto try_again;
}
+ spin_unlock(&inode->i_lock);

- /* holding inode lock, so always make a non-blocking call to try the
- * page group lock */
- ret = nfs_page_group_lock(head, true);
+ ret = nfs_page_group_lock(head, false);
if (ret < 0) {
- spin_unlock(&inode->i_lock);
-
- nfs_page_group_lock_wait(head);
nfs_unlock_and_release_request(head);
- goto try_again;
+ return ERR_PTR(ret);
}

/* lock each request in the page group */
@@ -531,8 +526,10 @@ nfs_lock_and_join_requests(struct page *page)
for (subreq = head->wb_this_page; subreq != head;
subreq = subreq->wb_this_page) {
if (!nfs_lock_request(subreq)) {
- /* releases page group bit lock and
- * inode spin lock and all references */
+ /*
+ * releases page group bit lock and
+ * page locks and all references
+ */
ret = nfs_unroll_locks_and_wait(inode, head,
subreq);

@@ -580,7 +577,9 @@ nfs_lock_and_join_requests(struct page *page)
if (test_and_clear_bit(PG_REMOVE, &head->wb_flags)) {
set_bit(PG_INODE_REF, &head->wb_flags);
kref_get(&head->wb_kref);
+ spin_lock(&inode->i_lock);
NFS_I(inode)->nrequests++;
+ spin_unlock(&inode->i_lock);
}

/*
@@ -590,11 +589,14 @@ nfs_lock_and_join_requests(struct page *page)

nfs_page_group_unlock(head);

- /* drop lock to clean uprequests on destroy list */
- spin_unlock(&inode->i_lock);
-
nfs_destroy_unlinked_subrequests(destroy_list, head, inode);

+ /* Did we lose a race with nfs_inode_remove_request()? */
+ if (!(PagePrivate(page) || PageSwapCache(page))) {
+ nfs_unlock_and_release_request(head);
+ return NULL;
+ }
+
/* still holds ref on head from nfs_page_find_head_request_locked
* and still has lock on head from lock loop */
return head;
@@ -968,7 +970,7 @@ nfs_clear_page_commit(struct page *page)
WB_RECLAIMABLE);
}

-/* Called holding inode (/cinfo) lock */
+/* Called holding the request lock on @req */
static void
nfs_clear_request_commit(struct nfs_page *req)
{
@@ -977,9 +979,11 @@ nfs_clear_request_commit(struct nfs_page *req)
struct nfs_commit_info cinfo;

nfs_init_cinfo_from_inode(&cinfo, inode);
+ spin_lock(&inode->i_lock);
if (!pnfs_clear_request_commit(req, &cinfo)) {
nfs_request_remove_commit_list(req, &cinfo);
}
+ spin_unlock(&inode->i_lock);
nfs_clear_page_commit(req->wb_page);
}
}
--
2.13.3


2017-08-03 13:46:03

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 23/28] NFS: Use an atomic_long_t to count the number of commits

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/inode.c | 2 +-
fs/nfs/write.c | 12 +++++++-----
include/linux/nfs_xdr.h | 2 +-
3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 0480eb02299a..134d9f560240 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2012,7 +2012,7 @@ static void init_once(void *foo)
INIT_LIST_HEAD(&nfsi->access_cache_inode_lru);
INIT_LIST_HEAD(&nfsi->commit_info.list);
atomic_long_set(&nfsi->nrequests, 0);
- nfsi->commit_info.ncommit = 0;
+ atomic_long_set(&nfsi->commit_info.ncommit, 0);
atomic_set(&nfsi->commit_info.rpcs_out, 0);
init_rwsem(&nfsi->rmdir_sem);
mutex_init(&nfsi->commit_mutex);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 08093552f115..12479c25028e 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -857,7 +857,7 @@ nfs_request_add_commit_list_locked(struct nfs_page *req, struct list_head *dst,
{
set_bit(PG_CLEAN, &req->wb_flags);
nfs_list_add_request(req, dst);
- cinfo->mds->ncommit++;
+ atomic_long_inc(&cinfo->mds->ncommit);
}
EXPORT_SYMBOL_GPL(nfs_request_add_commit_list_locked);

@@ -903,7 +903,7 @@ nfs_request_remove_commit_list(struct nfs_page *req,
if (!test_and_clear_bit(PG_CLEAN, &(req)->wb_flags))
return;
nfs_list_remove_request(req);
- cinfo->mds->ncommit--;
+ atomic_long_dec(&cinfo->mds->ncommit);
}
EXPORT_SYMBOL_GPL(nfs_request_remove_commit_list);

@@ -1017,7 +1017,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
unsigned long
nfs_reqs_to_commit(struct nfs_commit_info *cinfo)
{
- return cinfo->mds->ncommit;
+ return atomic_long_read(&cinfo->mds->ncommit);
}

/* NFS_I(cinfo->inode)->commit_mutex held by caller */
@@ -1057,8 +1057,10 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst,
{
int ret = 0;

+ if (!atomic_long_read(&cinfo->mds->ncommit))
+ return 0;
mutex_lock(&NFS_I(cinfo->inode)->commit_mutex);
- if (cinfo->mds->ncommit > 0) {
+ if (atomic_long_read(&cinfo->mds->ncommit) > 0) {
const int max = INT_MAX;

ret = nfs_scan_commit_list(&cinfo->mds->list, dst,
@@ -1890,7 +1892,7 @@ int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
int ret = 0;

/* no commits means nothing needs to be done */
- if (!nfsi->commit_info.ncommit)
+ if (!atomic_long_read(&nfsi->commit_info.ncommit))
return ret;

if (wbc->sync_mode == WB_SYNC_NONE) {
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index ca3bcc4ed4e5..84943a4c53c7 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1476,7 +1476,7 @@ struct nfs_pgio_header {

struct nfs_mds_commit_info {
atomic_t rpcs_out;
- unsigned long ncommit;
+ atomic_long_t ncommit;
struct list_head list;
};

--
2.13.3


2017-08-03 13:46:07

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 27/28] SUNRPC: Add a function to allow waiting for RPC transmission

Sometimes, we only want to know that the RPC message is in the process
of being transmitted. Add a function to allow that.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/sched.h | 3 +++
net/sunrpc/sched.c | 22 ++++++++++++++++++++++
net/sunrpc/xprt.c | 6 ++++++
3 files changed, 31 insertions(+)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 50a99a117da7..15bc1cd6ee5c 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -139,6 +139,8 @@ struct rpc_task_setup {
#define RPC_TASK_RUNNING 0
#define RPC_TASK_QUEUED 1
#define RPC_TASK_ACTIVE 2
+#define RPC_TASK_MSG_XMIT 3
+#define RPC_TASK_MSG_XMIT_WAIT 4

#define RPC_IS_RUNNING(t) test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate)
#define rpc_set_running(t) set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate)
@@ -244,6 +246,7 @@ void rpc_free(struct rpc_task *);
int rpciod_up(void);
void rpciod_down(void);
int __rpc_wait_for_completion_task(struct rpc_task *task, wait_bit_action_f *);
+int rpc_wait_for_msg_send(struct rpc_task *task);
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
struct net;
void rpc_show_tasks(struct net *);
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 0cc83839c13c..30509a4d7e00 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -320,6 +320,19 @@ int __rpc_wait_for_completion_task(struct rpc_task *task, wait_bit_action_f *act
EXPORT_SYMBOL_GPL(__rpc_wait_for_completion_task);

/*
+ * Allow callers to wait for completion of an RPC call
+ */
+int rpc_wait_for_msg_send(struct rpc_task *task)
+{
+ if (!test_bit(RPC_TASK_MSG_XMIT, &task->tk_runstate))
+ return 0;
+ set_bit(RPC_TASK_MSG_XMIT_WAIT, &task->tk_runstate);
+ return wait_on_bit_action(&task->tk_runstate, RPC_TASK_MSG_XMIT,
+ rpc_wait_bit_killable, TASK_KILLABLE);
+}
+EXPORT_SYMBOL_GPL(rpc_wait_for_msg_send);
+
+/*
* Make an RPC task runnable.
*
* Note: If the task is ASYNC, and is being made runnable after sitting on an
@@ -700,6 +713,7 @@ rpc_reset_task_statistics(struct rpc_task *task)
{
task->tk_timeouts = 0;
task->tk_flags &= ~(RPC_CALL_MAJORSEEN|RPC_TASK_KILLED|RPC_TASK_SENT);
+ set_bit(RPC_TASK_MSG_XMIT, &task->tk_runstate);

rpc_init_task_statistics(task);
}
@@ -928,6 +942,7 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta
memset(task, 0, sizeof(*task));
atomic_set(&task->tk_count, 1);
task->tk_flags = task_setup_data->flags;
+ task->tk_runstate = BIT(RPC_TASK_MSG_XMIT);
task->tk_ops = task_setup_data->callback_ops;
task->tk_calldata = task_setup_data->callback_data;
INIT_LIST_HEAD(&task->tk_task);
@@ -1012,6 +1027,13 @@ static void rpc_async_release(struct work_struct *work)

static void rpc_release_resources_task(struct rpc_task *task)
{
+ if (test_bit(RPC_TASK_MSG_XMIT, &task->tk_runstate)) {
+ clear_bit(RPC_TASK_MSG_XMIT, &task->tk_runstate);
+ smp_mb__after_atomic();
+ if (test_bit(RPC_TASK_MSG_XMIT_WAIT, &task->tk_runstate))
+ wake_up_bit(&task->tk_runstate, RPC_TASK_MSG_XMIT);
+ }
+
xprt_release(task);
if (task->tk_msg.rpc_cred) {
put_rpccred(task->tk_msg.rpc_cred);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 4654a9934269..788c1b6138c2 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -937,6 +937,12 @@ bool xprt_prepare_transmit(struct rpc_task *task)
goto out_unlock;
}
ret = true;
+ if (test_bit(RPC_TASK_MSG_XMIT, &task->tk_runstate)) {
+ clear_bit(RPC_TASK_MSG_XMIT, &task->tk_runstate);
+ smp_mb__after_atomic();
+ if (test_bit(RPC_TASK_MSG_XMIT_WAIT, &task->tk_runstate))
+ wake_up_bit(&task->tk_runstate, RPC_TASK_MSG_XMIT);
+ }
out_unlock:
spin_unlock_bh(&xprt->transport_lock);
return ret;
--
2.13.3


2017-08-03 13:45:44

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 08/28] NFS: Don't unlock writebacks before declaring PG_WB_END

We don't want nfs_lock_and_join_requests() to start fiddling with
the request before the call to nfs_page_group_sync_on_bit().

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 84b6818e5278..bb38c881fc48 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -335,8 +335,11 @@ static void nfs_end_page_writeback(struct nfs_page *req)
{
struct inode *inode = page_file_mapping(req->wb_page)->host;
struct nfs_server *nfss = NFS_SERVER(inode);
+ bool is_done;

- if (!nfs_page_group_sync_on_bit(req, PG_WB_END))
+ is_done = nfs_page_group_sync_on_bit(req, PG_WB_END);
+ nfs_unlock_request(req);
+ if (!is_done)
return;

end_page_writeback(req->wb_page);
@@ -596,7 +599,6 @@ nfs_lock_and_join_requests(struct page *page)

static void nfs_write_error_remove_page(struct nfs_page *req)
{
- nfs_unlock_request(req);
nfs_end_page_writeback(req);
generic_error_remove_page(page_file_mapping(req->wb_page),
req->wb_page);
@@ -1019,7 +1021,6 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
remove_req:
nfs_inode_remove_request(req);
next:
- nfs_unlock_request(req);
nfs_end_page_writeback(req);
nfs_release_request(req);
}
@@ -1406,7 +1407,6 @@ static void nfs_redirty_request(struct nfs_page *req)
{
nfs_mark_request_dirty(req);
set_bit(NFS_CONTEXT_RESEND_WRITES, &req->wb_context->flags);
- nfs_unlock_request(req);
nfs_end_page_writeback(req);
nfs_release_request(req);
}
--
2.13.3


2017-08-03 13:46:06

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 26/28] NFS: Wait for requests that are locked on the commit list

If a request is on the commit list, but is locked, we will currently skip
it, which can lead to livelocking when the commit count doesn't reduce
to zero.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/pagelist.c | 2 ++
fs/nfs/pnfs_nfs.c | 18 ++++++++++++++----
fs/nfs/write.c | 17 +++++++++++++----
3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index ec97c301899b..548ebc7256ff 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -434,6 +434,7 @@ void nfs_release_request(struct nfs_page *req)
{
kref_put(&req->wb_kref, nfs_page_group_destroy);
}
+EXPORT_SYMBOL_GPL(nfs_release_request);

/**
* nfs_wait_on_request - Wait for a request to complete.
@@ -452,6 +453,7 @@ nfs_wait_on_request(struct nfs_page *req)
return wait_on_bit_io(&req->wb_flags, PG_BUSY,
TASK_UNINTERRUPTIBLE);
}
+EXPORT_SYMBOL_GPL(nfs_wait_on_request);

/*
* nfs_generic_pg_test - determine if requests can be coalesced
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 4b0a809653d1..303ff171cb5d 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -91,13 +91,23 @@ static int
pnfs_generic_transfer_commit_list(struct list_head *src, struct list_head *dst,
struct nfs_commit_info *cinfo, int max)
{
- struct nfs_page *req, *tmp;
+ struct nfs_page *req;
int ret = 0;

- list_for_each_entry_safe(req, tmp, src, wb_list) {
- if (!nfs_lock_request(req))
- continue;
+ while(!list_empty(src)) {
+ req = list_first_entry(src, struct nfs_page, wb_list);
+
kref_get(&req->wb_kref);
+ if (!nfs_lock_request(req)) {
+ int status;
+ mutex_unlock(&NFS_I(cinfo->inode)->commit_mutex);
+ status = nfs_wait_on_request(req);
+ nfs_release_request(req);
+ mutex_lock(&NFS_I(cinfo->inode)->commit_mutex);
+ if (status < 0)
+ break;
+ continue;
+ }
nfs_request_remove_commit_list(req, cinfo);
clear_bit(PG_COMMIT_TO_DS, &req->wb_flags);
nfs_list_add_request(req, dst);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 866702823869..5dd3b212376e 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1030,13 +1030,22 @@ int
nfs_scan_commit_list(struct list_head *src, struct list_head *dst,
struct nfs_commit_info *cinfo, int max)
{
- struct nfs_page *req, *tmp;
+ struct nfs_page *req;
int ret = 0;

- list_for_each_entry_safe(req, tmp, src, wb_list) {
- if (!nfs_lock_request(req))
- continue;
+ while(!list_empty(src)) {
+ req = list_first_entry(src, struct nfs_page, wb_list);
kref_get(&req->wb_kref);
+ if (!nfs_lock_request(req)) {
+ int status;
+ mutex_unlock(&NFS_I(cinfo->inode)->commit_mutex);
+ status = nfs_wait_on_request(req);
+ nfs_release_request(req);
+ mutex_lock(&NFS_I(cinfo->inode)->commit_mutex);
+ if (status < 0)
+ break;
+ continue;
+ }
nfs_request_remove_commit_list(req, cinfo);
nfs_list_add_request(req, dst);
ret++;
--
2.13.3


2017-08-03 13:45:52

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 14/28] NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases

Since nfs_page_group_destroy() does not take any locks on the requests
to be freed, we need to ensure that we don't inadvertently free the
request in nfs_destroy_unlinked_subrequests() while the last reference
is being released elsewhere.

Do this by:

1) Taking a reference to the request unless it is already being freed
2) Checking (under the page group lock) if PG_TEARDOWN is already set before
freeing an unreferenced request in nfs_destroy_unlinked_subrequests()

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 58 +++++++++++++++++++++++++++++-----------------------------
1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 1ee5d89380d9..ffb9934607ef 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -384,10 +384,11 @@ nfs_unroll_locks(struct inode *inode, struct nfs_page *head,
struct nfs_page *tmp;

/* relinquish all the locks successfully grabbed this run */
- for (tmp = head->wb_this_page ; tmp != req; tmp = tmp->wb_this_page)
- nfs_unlock_request(tmp);
-
- WARN_ON_ONCE(test_bit(PG_TEARDOWN, &req->wb_flags));
+ for (tmp = head->wb_this_page ; tmp != req; tmp = tmp->wb_this_page) {
+ if (!kref_read(&tmp->wb_kref))
+ continue;
+ nfs_unlock_and_release_request(tmp);
+ }
}

/*
@@ -414,36 +415,32 @@ nfs_destroy_unlinked_subrequests(struct nfs_page *destroy_list,
WARN_ON_ONCE(old_head != subreq->wb_head);

/* make sure old group is not used */
- subreq->wb_head = subreq;
subreq->wb_this_page = subreq;

- /* subreq is now totally disconnected from page group or any
- * write / commit lists. last chance to wake any waiters */
- nfs_unlock_request(subreq);
-
- if (!test_bit(PG_TEARDOWN, &subreq->wb_flags)) {
- /* release ref on old head request */
- nfs_release_request(old_head);
+ /* Note: races with nfs_page_group_destroy() */
+ if (!kref_read(&subreq->wb_kref)) {
+ bool freeme = test_bit(PG_TEARDOWN, &subreq->wb_flags);

nfs_page_group_clear_bits(subreq);
+ /* Check if we raced with nfs_page_group_destroy() */
+ if (freeme)
+ nfs_free_request(subreq);
+ continue;
+ }

- /* release the PG_INODE_REF reference */
- if (test_and_clear_bit(PG_INODE_REF, &subreq->wb_flags)) {
- nfs_release_request(subreq);
- spin_lock(&inode->i_lock);
- NFS_I(inode)->nrequests--;
- spin_unlock(&inode->i_lock);
- } else
- WARN_ON_ONCE(1);
- } else {
- WARN_ON_ONCE(test_bit(PG_CLEAN, &subreq->wb_flags));
- /* zombie requests have already released the last
- * reference and were waiting on the rest of the
- * group to complete. Since it's no longer part of a
- * group, simply free the request */
- nfs_page_group_clear_bits(subreq);
- nfs_free_request(subreq);
+ subreq->wb_head = subreq;
+
+ if (test_and_clear_bit(PG_INODE_REF, &subreq->wb_flags)) {
+ nfs_release_request(subreq);
+ spin_lock(&inode->i_lock);
+ NFS_I(inode)->nrequests--;
+ spin_unlock(&inode->i_lock);
}
+
+ nfs_page_group_clear_bits(subreq);
+ /* subreq is now totally disconnected from page group or any
+ * write / commit lists. last chance to wake any waiters */
+ nfs_unlock_and_release_request(subreq);
}
}

@@ -512,6 +509,8 @@ nfs_lock_and_join_requests(struct page *page)
for (subreq = head->wb_this_page; subreq != head;
subreq = subreq->wb_this_page) {

+ if (!kref_get_unless_zero(&subreq->wb_kref))
+ continue;
while (!nfs_lock_request(subreq)) {
/*
* Unlock page to allow nfs_page_group_sync_on_bit()
@@ -523,6 +522,7 @@ nfs_lock_and_join_requests(struct page *page)
ret = nfs_page_group_lock(head, false);
if (ret < 0) {
nfs_unroll_locks(inode, head, subreq);
+ nfs_release_request(subreq);
nfs_unlock_and_release_request(head);
return ERR_PTR(ret);
}
@@ -537,8 +537,8 @@ nfs_lock_and_join_requests(struct page *page)
} else if (WARN_ON_ONCE(subreq->wb_offset < head->wb_offset ||
((subreq->wb_offset + subreq->wb_bytes) >
(head->wb_offset + total_bytes)))) {
- nfs_unlock_request(subreq);
nfs_unroll_locks(inode, head, subreq);
+ nfs_unlock_and_release_request(subreq);
nfs_page_group_unlock(head);
nfs_unlock_and_release_request(head);
return ERR_PTR(-EIO);
--
2.13.3


2017-08-03 13:45:51

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 13/28] NFS: Further optimise nfs_lock_and_join_requests()

When locking the entire group in order to remove subrequests,
the locks are always taken in order, and with the page group
lock being taken after the page head is locked. The intention
is that:

1) The lock on the group head guarantees that requests may not
be removed from the group (although new entries could be appended
if we're not holding the group lock).
2) It is safe to drop and retake the page group lock while iterating
through the list, in particular when waiting for a subrequest lock.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 45 ++++++++++++++++++---------------------------
1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ff7c90c7ff79..1ee5d89380d9 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -377,31 +377,17 @@ nfs_page_group_clear_bits(struct nfs_page *req)
*
* returns 0 on success, < 0 on error.
*/
-static int
-nfs_unroll_locks_and_wait(struct inode *inode, struct nfs_page *head,
+static void
+nfs_unroll_locks(struct inode *inode, struct nfs_page *head,
struct nfs_page *req)
{
struct nfs_page *tmp;
- int ret;

/* relinquish all the locks successfully grabbed this run */
for (tmp = head->wb_this_page ; tmp != req; tmp = tmp->wb_this_page)
nfs_unlock_request(tmp);

WARN_ON_ONCE(test_bit(PG_TEARDOWN, &req->wb_flags));
-
- /* grab a ref on the request that will be waited on */
- kref_get(&req->wb_kref);
-
- nfs_page_group_unlock(head);
-
- /* release ref from nfs_page_find_head_request_locked */
- nfs_unlock_and_release_request(head);
-
- ret = nfs_wait_on_request(req);
- nfs_release_request(req);
-
- return ret;
}

/*
@@ -525,18 +511,21 @@ nfs_lock_and_join_requests(struct page *page)
total_bytes = head->wb_bytes;
for (subreq = head->wb_this_page; subreq != head;
subreq = subreq->wb_this_page) {
- if (!nfs_lock_request(subreq)) {
+
+ while (!nfs_lock_request(subreq)) {
/*
- * releases page group bit lock and
- * page locks and all references
+ * Unlock page to allow nfs_page_group_sync_on_bit()
+ * to succeed
*/
- ret = nfs_unroll_locks_and_wait(inode, head,
- subreq);
-
- if (ret == 0)
- goto try_again;
-
- return ERR_PTR(ret);
+ nfs_page_group_unlock(head);
+ ret = nfs_wait_on_request(subreq);
+ if (!ret)
+ ret = nfs_page_group_lock(head, false);
+ if (ret < 0) {
+ nfs_unroll_locks(inode, head, subreq);
+ nfs_unlock_and_release_request(head);
+ return ERR_PTR(ret);
+ }
}
/*
* Subrequests are always contiguous, non overlapping
@@ -549,7 +538,9 @@ nfs_lock_and_join_requests(struct page *page)
((subreq->wb_offset + subreq->wb_bytes) >
(head->wb_offset + total_bytes)))) {
nfs_unlock_request(subreq);
- nfs_unroll_locks_and_wait(inode, head, subreq);
+ nfs_unroll_locks(inode, head, subreq);
+ nfs_page_group_unlock(head);
+ nfs_unlock_and_release_request(head);
return ERR_PTR(-EIO);
}
}
--
2.13.3


2017-08-03 13:45:38

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 04/28] NFS: Ensure we always dereference the page head last

This fixes a race with nfs_page_group_sync_on_bit() whereby the
call to wake_up_bit() in nfs_page_group_unlock() could occur after
the page header had been freed.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/pagelist.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index de9066a92c0d..a6f2bbd709ba 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -306,14 +306,11 @@ static void
nfs_page_group_destroy(struct kref *kref)
{
struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref);
+ struct nfs_page *head = req->wb_head;
struct nfs_page *tmp, *next;

- /* subrequests must release the ref on the head request */
- if (req->wb_head != req)
- nfs_release_request(req->wb_head);
-
if (!nfs_page_group_sync_on_bit(req, PG_TEARDOWN))
- return;
+ goto out;

tmp = req;
do {
@@ -324,6 +321,10 @@ nfs_page_group_destroy(struct kref *kref)
nfs_free_request(tmp);
tmp = next;
} while (tmp != req);
+out:
+ /* subrequests must release the ref on the head request */
+ if (head != req)
+ nfs_release_request(head);
}

/**
--
2.13.3


2017-08-03 13:45:40

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v2 05/28] NFS: Fix a reference and lock leak in nfs_lock_and_join_requests()

Yes, this is a situation that should never happen (hence the WARN_ON)
but we should still ensure that we free up the locks and references to
the faulty pages.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index bb019096c331..1ca759719429 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -526,8 +526,7 @@ nfs_lock_and_join_requests(struct page *page)
} else if (WARN_ON_ONCE(subreq->wb_offset < head->wb_offset ||
((subreq->wb_offset + subreq->wb_bytes) >
(head->wb_offset + total_bytes)))) {
- nfs_page_group_unlock(head);
- spin_unlock(&inode->i_lock);
+ nfs_unroll_locks_and_wait(inode, head, subreq);
return ERR_PTR(-EIO);
}

--
2.13.3


2017-08-03 15:05:04

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 00/28] Reducing inode->i_lock contention in writebacks


> On Aug 3, 2017, at 9:44 AM, Trond Myklebust <[email protected]> wrote:
>
> Chuck Lever has presented measurements, that would appear to indicate that
> the inode->i_lock can be a source of contention when doing writeback using
> high performance networks. This patch series is therefore an attempt to
> address the sources of contention that his measurements pointed to.
>
> According to Chuck's table, the main sources of contention appear to
> be in nfs_flush_incompatible(), nfs_lock_and_join_requests() and
> nfs_updatepage(). Most of this patch series focuses on fixing
> nfs_lock_and_join_requests(), since it holds the inode lock, while
> taking locks on the page group and the sub-requests themselves,
> rolling them all back if ever there is contention.
> By noting a few simple rules that are mostly already in place, we
> can simplify that locking to ensure that we only have to keep the
> spin lock while we're dereferencing and locking the head page pointer
> that is stored in page_private(page).
>
> Along the way, the patches also simplify the code a little, and fix
> a number of subtle races which mainly occur when you set wsize to
> some value smaller than the page size. The most notable such race
> occurs between nfs_lock_and_join_requests() and nfs_page_group_destroy(),
> and could lead to a double free() of some of the sub-requests.
>
> Finally, there are 2 patches tacked onto the end of the series that
> attempt to improve the throttling of writes when the RPC layer is
> congested. They do so by forcing each caller of nfs_initiate_pgio()
> to wait until the request is being transmitted. The expectation is
> that this might help improve latencies when there are several processes
> competing for access to the RPC transport.
>
> v2: Further patches to eliminate use of inode->i_lock to protect the
> page->private contents.
> Protect the (large!) list of unstable pages using a dedicated
> NFS_I(inode)->commit_mutex instead of the inode->i_lock.
>
> Trond Myklebust (28):
> NFS: Simplify page writeback
> NFS: Reduce lock contention in nfs_page_find_head_request()
> NFS: Reduce lock contention in nfs_try_to_update_request()
> NFS: Ensure we always dereference the page head last
> NFS: Fix a reference and lock leak in nfs_lock_and_join_requests()
> NFS: Fix an ABBA issue in nfs_lock_and_join_requests()
> NFS: Don't check request offset and size without holding a lock
> NFS: Don't unlock writebacks before declaring PG_WB_END
> NFS: Fix the inode request accounting when pages have subrequests
> NFS: Teach nfs_try_to_update_request() to deal with request
> page_groups
> NFS: Remove page group limit in nfs_flush_incompatible()
> NFS: Reduce inode->i_lock contention in nfs_lock_and_join_requests()
> NFS: Further optimise nfs_lock_and_join_requests()
> NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests()
> race cases
> NFS: Remove nfs_page_group_clear_bits()
> NFS: Remove unuse function nfs_page_group_lock_wait()
> NFS: Remove unused parameter from nfs_page_group_lock()
> NFS: Fix up nfs_page_group_covers_page()
> NFSv4: Convert nfs_lock_and_join_requests() to use
> nfs_page_find_head_request()
> NFS: Refactor nfs_page_find_head_request()
> NFSv4: Use a mutex to protect the per-inode commit lists
> NFS: Use an atomic_long_t to count the number of requests
> NFS: Use an atomic_long_t to count the number of commits
> NFS: Switch to using mapping->private_lock for page writeback lookups.
> NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg()
> NFS: Wait for requests that are locked on the commit list
> SUNRPC: Add a function to allow waiting for RPC transmission
> NFS: Throttle I/O to the NFS server
>
> fs/nfs/callback_proc.c | 2 +-
> fs/nfs/delegation.c | 2 +-
> fs/nfs/direct.c | 4 +-
> fs/nfs/inode.c | 10 +-
> fs/nfs/pagelist.c | 70 +++----
> fs/nfs/pnfs.c | 41 ----
> fs/nfs/pnfs.h | 2 -
> fs/nfs/pnfs_nfs.c | 37 ++--
> fs/nfs/write.c | 440 ++++++++++++++++++++-----------------------
> include/linux/nfs_fs.h | 5 +-
> include/linux/nfs_page.h | 3 +-
> include/linux/nfs_xdr.h | 2 +-
> include/linux/sunrpc/sched.h | 3 +
> net/sunrpc/sched.c | 22 +++
> net/sunrpc/xprt.c | 6 +
> 15 files changed, 295 insertions(+), 354 deletions(-)

Hey Trond, do you have a topic branch somewhere I can pull this?


--
Chuck Lever




2017-08-03 15:39:55

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v2 00/28] Reducing inode->i_lock contention in writebacks

T24gVGh1LCAyMDE3LTA4LTAzIGF0IDExOjA0IC0wNDAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g
PiBPbiBBdWcgMywgMjAxNywgYXQgOTo0NCBBTSwgVHJvbmQgTXlrbGVidXN0IDx0cm9uZC5teWts
ZWJ1c3RAcHJpbWFyDQo+ID4geWRhdGEuY29tPiB3cm90ZToNCj4gPiANCj4gPiBDaHVjayBMZXZl
ciBoYXMgcHJlc2VudGVkIG1lYXN1cmVtZW50cywgdGhhdCB3b3VsZCBhcHBlYXIgdG8NCj4gPiBp
bmRpY2F0ZSB0aGF0DQo+ID4gdGhlIGlub2RlLT5pX2xvY2sgY2FuIGJlIGEgc291cmNlIG9mIGNv
bnRlbnRpb24gd2hlbiBkb2luZw0KPiA+IHdyaXRlYmFjayB1c2luZw0KPiA+IGhpZ2ggcGVyZm9y
bWFuY2UgbmV0d29ya3MuIFRoaXMgcGF0Y2ggc2VyaWVzIGlzIHRoZXJlZm9yZSBhbg0KPiA+IGF0
dGVtcHQgdG8NCj4gPiBhZGRyZXNzIHRoZSBzb3VyY2VzIG9mIGNvbnRlbnRpb24gdGhhdCBoaXMg
bWVhc3VyZW1lbnRzIHBvaW50ZWQgdG8uDQo+ID4gDQo+ID4gQWNjb3JkaW5nIHRvIENodWNrJ3Mg
dGFibGUsIHRoZSBtYWluIHNvdXJjZXMgb2YgY29udGVudGlvbiBhcHBlYXINCj4gPiB0bw0KPiA+
IGJlIGluIG5mc19mbHVzaF9pbmNvbXBhdGlibGUoKSwgbmZzX2xvY2tfYW5kX2pvaW5fcmVxdWVz
dHMoKSBhbmQNCj4gPiBuZnNfdXBkYXRlcGFnZSgpLiBNb3N0IG9mIHRoaXMgcGF0Y2ggc2VyaWVz
IGZvY3VzZXMgb24gZml4aW5nDQo+ID4gbmZzX2xvY2tfYW5kX2pvaW5fcmVxdWVzdHMoKSwgc2lu
Y2UgaXQgaG9sZHMgdGhlIGlub2RlIGxvY2ssIHdoaWxlDQo+ID4gdGFraW5nIGxvY2tzIG9uIHRo
ZSBwYWdlIGdyb3VwIGFuZCB0aGUgc3ViLXJlcXVlc3RzIHRoZW1zZWx2ZXMsDQo+ID4gcm9sbGlu
ZyB0aGVtIGFsbCBiYWNrIGlmIGV2ZXIgdGhlcmUgaXMgY29udGVudGlvbi4NCj4gPiBCeSBub3Rp
bmcgYSBmZXcgc2ltcGxlIHJ1bGVzIHRoYXQgYXJlIG1vc3RseSBhbHJlYWR5IGluIHBsYWNlLCB3
ZQ0KPiA+IGNhbiBzaW1wbGlmeSB0aGF0IGxvY2tpbmcgdG8gZW5zdXJlIHRoYXQgd2Ugb25seSBo
YXZlIHRvIGtlZXAgdGhlDQo+ID4gc3BpbiBsb2NrIHdoaWxlIHdlJ3JlIGRlcmVmZXJlbmNpbmcg
YW5kIGxvY2tpbmcgdGhlIGhlYWQgcGFnZQ0KPiA+IHBvaW50ZXINCj4gPiB0aGF0IGlzIHN0b3Jl
ZCBpbiBwYWdlX3ByaXZhdGUocGFnZSkuDQo+ID4gDQo+ID4gQWxvbmcgdGhlIHdheSwgdGhlIHBh
dGNoZXMgYWxzbyBzaW1wbGlmeSB0aGUgY29kZSBhIGxpdHRsZSwgYW5kIGZpeA0KPiA+IGEgbnVt
YmVyIG9mIHN1YnRsZSByYWNlcyB3aGljaCBtYWlubHkgb2NjdXIgd2hlbiB5b3Ugc2V0IHdzaXpl
IHRvDQo+ID4gc29tZSB2YWx1ZSBzbWFsbGVyIHRoYW4gdGhlIHBhZ2Ugc2l6ZS4gVGhlIG1vc3Qg
bm90YWJsZSBzdWNoIHJhY2UNCj4gPiBvY2N1cnMgYmV0d2VlbiBuZnNfbG9ja19hbmRfam9pbl9y
ZXF1ZXN0cygpIGFuZA0KPiA+IG5mc19wYWdlX2dyb3VwX2Rlc3Ryb3koKSwNCj4gPiBhbmQgY291
bGQgbGVhZCB0byBhIGRvdWJsZSBmcmVlKCkgb2Ygc29tZSBvZiB0aGUgc3ViLXJlcXVlc3RzLg0K
PiA+IA0KPiA+IEZpbmFsbHksIHRoZXJlIGFyZSAyIHBhdGNoZXMgdGFja2VkIG9udG8gdGhlIGVu
ZCBvZiB0aGUgc2VyaWVzIHRoYXQNCj4gPiBhdHRlbXB0IHRvIGltcHJvdmUgdGhlIHRocm90dGxp
bmcgb2Ygd3JpdGVzIHdoZW4gdGhlIFJQQyBsYXllciBpcw0KPiA+IGNvbmdlc3RlZC4gVGhleSBk
byBzbyBieSBmb3JjaW5nIGVhY2ggY2FsbGVyIG9mIG5mc19pbml0aWF0ZV9wZ2lvKCkNCj4gPiB0
byB3YWl0IHVudGlsIHRoZSByZXF1ZXN0IGlzIGJlaW5nIHRyYW5zbWl0dGVkLiBUaGUgZXhwZWN0
YXRpb24gaXMNCj4gPiB0aGF0IHRoaXMgbWlnaHQgaGVscCBpbXByb3ZlIGxhdGVuY2llcyB3aGVu
IHRoZXJlIGFyZSBzZXZlcmFsDQo+ID4gcHJvY2Vzc2VzDQo+ID4gY29tcGV0aW5nIGZvciBhY2Nl
c3MgdG8gdGhlIFJQQyB0cmFuc3BvcnQuDQo+ID4gDQo+ID4gdjI6IEZ1cnRoZXIgcGF0Y2hlcyB0
byBlbGltaW5hdGUgdXNlIG9mIGlub2RlLT5pX2xvY2sgdG8gcHJvdGVjdA0KPiA+IHRoZQ0KPiA+
ICAgIHBhZ2UtPnByaXZhdGUgY29udGVudHMuDQo+ID4gICAgUHJvdGVjdCB0aGUgKGxhcmdlISkg
bGlzdCBvZiB1bnN0YWJsZSBwYWdlcyB1c2luZyBhIGRlZGljYXRlZA0KPiA+ICAgIE5GU19JKGlu
b2RlKS0+Y29tbWl0X211dGV4IGluc3RlYWQgb2YgdGhlIGlub2RlLT5pX2xvY2suDQo+ID4gDQo+
ID4gVHJvbmQgTXlrbGVidXN0ICgyOCk6DQo+ID4gIE5GUzogU2ltcGxpZnkgcGFnZSB3cml0ZWJh
Y2sNCj4gPiAgTkZTOiBSZWR1Y2UgbG9jayBjb250ZW50aW9uIGluIG5mc19wYWdlX2ZpbmRfaGVh
ZF9yZXF1ZXN0KCkNCj4gPiAgTkZTOiBSZWR1Y2UgbG9jayBjb250ZW50aW9uIGluIG5mc190cnlf
dG9fdXBkYXRlX3JlcXVlc3QoKQ0KPiA+ICBORlM6IEVuc3VyZSB3ZSBhbHdheXMgZGVyZWZlcmVu
Y2UgdGhlIHBhZ2UgaGVhZCBsYXN0DQo+ID4gIE5GUzogRml4IGEgcmVmZXJlbmNlIGFuZCBsb2Nr
IGxlYWsgaW4gbmZzX2xvY2tfYW5kX2pvaW5fcmVxdWVzdHMoKQ0KPiA+ICBORlM6IEZpeCBhbiBB
QkJBIGlzc3VlIGluIG5mc19sb2NrX2FuZF9qb2luX3JlcXVlc3RzKCkNCj4gPiAgTkZTOiBEb24n
dCBjaGVjayByZXF1ZXN0IG9mZnNldCBhbmQgc2l6ZSB3aXRob3V0IGhvbGRpbmcgYSBsb2NrDQo+
ID4gIE5GUzogRG9uJ3QgdW5sb2NrIHdyaXRlYmFja3MgYmVmb3JlIGRlY2xhcmluZyBQR19XQl9F
TkQNCj4gPiAgTkZTOiBGaXggdGhlIGlub2RlIHJlcXVlc3QgYWNjb3VudGluZyB3aGVuIHBhZ2Vz
IGhhdmUgc3VicmVxdWVzdHMNCj4gPiAgTkZTOiBUZWFjaCBuZnNfdHJ5X3RvX3VwZGF0ZV9yZXF1
ZXN0KCkgdG8gZGVhbCB3aXRoIHJlcXVlc3QNCj4gPiAgICBwYWdlX2dyb3Vwcw0KPiA+ICBORlM6
IFJlbW92ZSBwYWdlIGdyb3VwIGxpbWl0IGluIG5mc19mbHVzaF9pbmNvbXBhdGlibGUoKQ0KPiA+
ICBORlM6IFJlZHVjZSBpbm9kZS0+aV9sb2NrIGNvbnRlbnRpb24gaW4NCj4gPiBuZnNfbG9ja19h
bmRfam9pbl9yZXF1ZXN0cygpDQo+ID4gIE5GUzogRnVydGhlciBvcHRpbWlzZSBuZnNfbG9ja19h
bmRfam9pbl9yZXF1ZXN0cygpDQo+ID4gIE5GUzogRml4IG5mc19wYWdlX2dyb3VwX2Rlc3Ryb3ko
KSBhbmQgbmZzX2xvY2tfYW5kX2pvaW5fcmVxdWVzdHMoKQ0KPiA+ICAgIHJhY2UgY2FzZXMNCj4g
PiAgTkZTOiBSZW1vdmUgbmZzX3BhZ2VfZ3JvdXBfY2xlYXJfYml0cygpDQo+ID4gIE5GUzogUmVt
b3ZlIHVudXNlIGZ1bmN0aW9uIG5mc19wYWdlX2dyb3VwX2xvY2tfd2FpdCgpDQo+ID4gIE5GUzog
UmVtb3ZlIHVudXNlZCBwYXJhbWV0ZXIgZnJvbSBuZnNfcGFnZV9ncm91cF9sb2NrKCkNCj4gPiAg
TkZTOiBGaXggdXAgbmZzX3BhZ2VfZ3JvdXBfY292ZXJzX3BhZ2UoKQ0KPiA+ICBORlN2NDogQ29u
dmVydCBuZnNfbG9ja19hbmRfam9pbl9yZXF1ZXN0cygpIHRvIHVzZQ0KPiA+ICAgIG5mc19wYWdl
X2ZpbmRfaGVhZF9yZXF1ZXN0KCkNCj4gPiAgTkZTOiBSZWZhY3RvciBuZnNfcGFnZV9maW5kX2hl
YWRfcmVxdWVzdCgpDQo+ID4gIE5GU3Y0OiBVc2UgYSBtdXRleCB0byBwcm90ZWN0IHRoZSBwZXIt
aW5vZGUgY29tbWl0IGxpc3RzDQo+ID4gIE5GUzogVXNlIGFuIGF0b21pY19sb25nX3QgdG8gY291
bnQgdGhlIG51bWJlciBvZiByZXF1ZXN0cw0KPiA+ICBORlM6IFVzZSBhbiBhdG9taWNfbG9uZ190
IHRvIGNvdW50IHRoZSBudW1iZXIgb2YgY29tbWl0cw0KPiA+ICBORlM6IFN3aXRjaCB0byB1c2lu
ZyBtYXBwaW5nLT5wcml2YXRlX2xvY2sgZm9yIHBhZ2Ugd3JpdGViYWNrDQo+ID4gbG9va3Vwcy4N
Cj4gPiAgTkZTdjQvcG5mczogUmVwbGFjZSBwbmZzX3B1dF9sc2VnX2xvY2tlZCgpIHdpdGggcG5m
c19wdXRfbHNlZygpDQo+ID4gIE5GUzogV2FpdCBmb3IgcmVxdWVzdHMgdGhhdCBhcmUgbG9ja2Vk
IG9uIHRoZSBjb21taXQgbGlzdA0KPiA+ICBTVU5SUEM6IEFkZCBhIGZ1bmN0aW9uIHRvIGFsbG93
IHdhaXRpbmcgZm9yIFJQQyB0cmFuc21pc3Npb24NCj4gPiAgTkZTOiBUaHJvdHRsZSBJL08gdG8g
dGhlIE5GUyBzZXJ2ZXINCj4gPiANCj4gPiBmcy9uZnMvY2FsbGJhY2tfcHJvYy5jICAgICAgIHwg
ICAyICstDQo+ID4gZnMvbmZzL2RlbGVnYXRpb24uYyAgICAgICAgICB8ICAgMiArLQ0KPiA+IGZz
L25mcy9kaXJlY3QuYyAgICAgICAgICAgICAgfCAgIDQgKy0NCj4gPiBmcy9uZnMvaW5vZGUuYyAg
ICAgICAgICAgICAgIHwgIDEwICstDQo+ID4gZnMvbmZzL3BhZ2VsaXN0LmMgICAgICAgICAgICB8
ICA3MCArKystLS0tDQo+ID4gZnMvbmZzL3BuZnMuYyAgICAgICAgICAgICAgICB8ICA0MSAtLS0t
DQo+ID4gZnMvbmZzL3BuZnMuaCAgICAgICAgICAgICAgICB8ICAgMiAtDQo+ID4gZnMvbmZzL3Bu
ZnNfbmZzLmMgICAgICAgICAgICB8ICAzNyArKy0tDQo+ID4gZnMvbmZzL3dyaXRlLmMgICAgICAg
ICAgICAgICB8IDQ0MCArKysrKysrKysrKysrKysrKysrKy0tLS0tLS0tLS0NCj4gPiAtLS0tLS0t
LS0tLS0tDQo+ID4gaW5jbHVkZS9saW51eC9uZnNfZnMuaCAgICAgICB8ICAgNSArLQ0KPiA+IGlu
Y2x1ZGUvbGludXgvbmZzX3BhZ2UuaCAgICAgfCAgIDMgKy0NCj4gPiBpbmNsdWRlL2xpbnV4L25m
c194ZHIuaCAgICAgIHwgICAyICstDQo+ID4gaW5jbHVkZS9saW51eC9zdW5ycGMvc2NoZWQuaCB8
ICAgMyArDQo+ID4gbmV0L3N1bnJwYy9zY2hlZC5jICAgICAgICAgICB8ICAyMiArKysNCj4gPiBu
ZXQvc3VucnBjL3hwcnQuYyAgICAgICAgICAgIHwgICA2ICsNCj4gPiAxNSBmaWxlcyBjaGFuZ2Vk
LCAyOTUgaW5zZXJ0aW9ucygrKSwgMzU0IGRlbGV0aW9ucygtKQ0KPiANCj4gSGV5IFRyb25kLCBk
byB5b3UgaGF2ZSBhIHRvcGljIGJyYW5jaCBzb21ld2hlcmUgSSBjYW4gcHVsbCB0aGlzPw0KPiAN
Cg0KWWVwLiBJJ3ZlIHB1c2hlZCB0aGUgIndyaXRlYmFjayIgdG9waWMgYnJhbmNoIHRvIGdpdC5s
aW51eC5vcmcuIEl0IGhhcw0KMSBleHRyYSBkZWJ1Z2dpbmcgY29tbWl0IGF0dGFjaGVkIHRvIHRo
ZSBlbmQsIGJ1dCBJIGRvbid0IGV4cGVjdCB0aGF0DQp0byBib3RoZXIgeW91Lg0KDQotLSANClRy
b25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0K
dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K