2017-12-18 15:17:40

by Mike Galbraith

[permalink] [raw]
Subject: NFS: 82ms wakeup latency 4.14-rc4

Greetings,

While doing some generic scheduler latency testing, I stumbled onto
$subject. ?To reproduce this, I simply nfs mount my box, cd to one of
it's spinning rust buckets, and do bonnie -s <ramsize>. ?With nothing
else going on in the box, I've hit > 100ms wakeup latencies.

(nouveau apparently also thinks this is uncool, and whimpers)?

....
kworker/-7421 0.N.. 82893us : nfs_release_request <-nfs_commit_release_pages
kworker/-7421 0.N.. 82893us : nfs_unlock_and_release_request <-nfs_commit_release_pages
kworker/-7421 0.N.. 82893us : nfs_unlock_request <-nfs_unlock_and_release_request
kworker/-7421 0.N.. 82893us : nfs_page_group_destroy <-nfs_commit_release_pages
kworker/-7421 0.N.. 82893us : nfs_page_group_sync_on_bit <-nfs_page_group_destroy
kworker/-7421 0.N.. 82893us : nfs_page_group_lock <-nfs_page_group_sync_on_bit
kworker/-7421 0.N.. 82893us : nfs_page_group_unlock <-nfs_page_group_sync_on_bit
kworker/-7421 0.N.. 82893us : nfs_free_request <-nfs_page_group_destroy
kworker/-7421 0.N.. 82893us : nfs_put_lock_context <-nfs_free_request
kworker/-7421 0.N.. 82893us : put_nfs_open_context <-nfs_free_request
kworker/-7421 0.N.. 82893us : __put_nfs_open_context <-nfs_free_request
kworker/-7421 0.N.. 82894us : kmem_cache_free <-nfs_page_group_destroy
kworker/-7421 0.N.. 82894us : __slab_free <-kmem_cache_free
kworker/-7421 0.N.. 82894us : clear_wb_congested <-nfs_commit_release_pages
kworker/-7421 0.N.. 82894us : nfs_init_cinfo <-nfs_commit_release_pages
kworker/-7421 0.N.. 82894us : nfs_init_cinfo_from_inode <-nfs_commit_release_pages
kworker/-7421 0.N.. 82894us : nfs_commit_end <-nfs_commit_release_pages
kworker/-7421 0.N.. 82894us : nfs_commitdata_release <-rpc_free_task
kworker/-7421 0.N.. 82894us : put_nfs_open_context <-nfs_commitdata_release
kworker/-7421 0.N.. 82894us : __put_nfs_open_context <-nfs_commitdata_release
kworker/-7421 0.N.. 82895us : mempool_free <-rpc_free_task
kworker/-7421 0.N.. 82895us : mempool_free_slab <-rpc_free_task
kworker/-7421 0.N.. 82895us : kmem_cache_free <-rpc_free_task
kworker/-7421 0.N.. 82895us : ___might_sleep <-process_one_work
kworker/-7421 0.N.. 82895us : _cond_resched <-process_one_work
kworker/-7421 0dN.1 82895us : rcu_note_context_switch <-__schedule


2017-12-18 15:32:04

by Mike Galbraith

[permalink] [raw]
Subject: Re: NFS: 82ms wakeup latency 4.14-rc4

On Mon, 2017-12-18 at 16:17 +0100, Mike Galbraith wrote:
>
> ....
> kworker/-7421 0.N.. 82893us : nfs_release_request <-nfs_commit_release_pages
> kworker/-7421 0.N.. 82893us : nfs_unlock_and_release_request <-nfs_commit_release_pages
> kworker/-7421 0.N.. 82893us : nfs_unlock_request <-nfs_unlock_and_release_request
> kworker/-7421 0.N.. 82893us : nfs_page_group_destroy <-nfs_commit_release_pages
> kworker/-7421 0.N.. 82893us : nfs_page_group_sync_on_bit <-nfs_page_group_destroy
> kworker/-7421 0.N.. 82893us : nfs_page_group_lock <-nfs_page_group_sync_on_bit
> kworker/-7421 0.N.. 82893us : nfs_page_group_unlock <-nfs_page_group_sync_on_bit
> kworker/-7421 0.N.. 82893us : nfs_free_request <-nfs_page_group_destroy
> kworker/-7421 0.N.. 82893us : nfs_put_lock_context <-nfs_free_request
> kworker/-7421 0.N.. 82893us : put_nfs_open_context <-nfs_free_request
> kworker/-7421 0.N.. 82893us : __put_nfs_open_context <-nfs_free_request
> kworker/-7421 0.N.. 82894us : kmem_cache_free <-nfs_page_group_destroy
> kworker/-7421 0.N.. 82894us : __slab_free <-kmem_cache_free
> kworker/-7421 0.N.. 82894us : clear_wb_congested <-nfs_commit_release_pages
> kworker/-7421 0.N.. 82894us : nfs_init_cinfo <-nfs_commit_release_pages
> kworker/-7421 0.N.. 82894us : nfs_init_cinfo_from_inode <-nfs_commit_release_pages
> kworker/-7421 0.N.. 82894us : nfs_commit_end <-nfs_commit_release_pages
> kworker/-7421 0.N.. 82894us : nfs_commitdata_release <-rpc_free_task
> kworker/-7421 0.N.. 82894us : put_nfs_open_context <-nfs_commitdata_release
> kworker/-7421 0.N.. 82894us : __put_nfs_open_context <-nfs_commitdata_release
> kworker/-7421 0.N.. 82895us : mempool_free <-rpc_free_task
> kworker/-7421 0.N.. 82895us : mempool_free_slab <-rpc_free_task
> kworker/-7421 0.N.. 82895us : kmem_cache_free <-rpc_free_task
> kworker/-7421 0.N.. 82895us : ___might_sleep <-process_one_work
> kworker/-7421 0.N.. 82895us : _cond_resched <-process_one_work
> kworker/-7421 0dN.1 82895us : rcu_note_context_switch <-__schedule

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index d0543e19098a..b42bf3b21e05 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -428,6 +428,7 @@ void nfs_free_request(struct nfs_page *req)
/* Release struct file and open context */
nfs_clear_request(req);
nfs_page_free(req);
+ cond_resched();
}

void nfs_release_request(struct nfs_page *req)

2017-12-18 16:36:03

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS: 82ms wakeup latency 4.14-rc4

On Mon, Dec 18, 2017 at 04:31:52PM +0100, Mike Galbraith wrote:
> On Mon, 2017-12-18 at 16:17 +0100, Mike Galbraith wrote:
> >
> > ....
> > kworker/-7421 0.N.. 82893us : nfs_release_request <-nfs_commit_release_pages
> > kworker/-7421 0.N.. 82893us : nfs_unlock_and_release_request <-nfs_commit_release_pages
> > kworker/-7421 0.N.. 82893us : nfs_unlock_request <-nfs_unlock_and_release_request
> > kworker/-7421 0.N.. 82893us : nfs_page_group_destroy <-nfs_commit_release_pages
> > kworker/-7421 0.N.. 82893us : nfs_page_group_sync_on_bit <-nfs_page_group_destroy
> > kworker/-7421 0.N.. 82893us : nfs_page_group_lock <-nfs_page_group_sync_on_bit
> > kworker/-7421 0.N.. 82893us : nfs_page_group_unlock <-nfs_page_group_sync_on_bit
> > kworker/-7421 0.N.. 82893us : nfs_free_request <-nfs_page_group_destroy
> > kworker/-7421 0.N.. 82893us : nfs_put_lock_context <-nfs_free_request
> > kworker/-7421 0.N.. 82893us : put_nfs_open_context <-nfs_free_request
> > kworker/-7421 0.N.. 82893us : __put_nfs_open_context <-nfs_free_request
> > kworker/-7421 0.N.. 82894us : kmem_cache_free <-nfs_page_group_destroy
> > kworker/-7421 0.N.. 82894us : __slab_free <-kmem_cache_free
> > kworker/-7421 0.N.. 82894us : clear_wb_congested <-nfs_commit_release_pages
> > kworker/-7421 0.N.. 82894us : nfs_init_cinfo <-nfs_commit_release_pages
> > kworker/-7421 0.N.. 82894us : nfs_init_cinfo_from_inode <-nfs_commit_release_pages
> > kworker/-7421 0.N.. 82894us : nfs_commit_end <-nfs_commit_release_pages
> > kworker/-7421 0.N.. 82894us : nfs_commitdata_release <-rpc_free_task
> > kworker/-7421 0.N.. 82894us : put_nfs_open_context <-nfs_commitdata_release
> > kworker/-7421 0.N.. 82894us : __put_nfs_open_context <-nfs_commitdata_release
> > kworker/-7421 0.N.. 82895us : mempool_free <-rpc_free_task
> > kworker/-7421 0.N.. 82895us : mempool_free_slab <-rpc_free_task
> > kworker/-7421 0.N.. 82895us : kmem_cache_free <-rpc_free_task
> > kworker/-7421 0.N.. 82895us : ___might_sleep <-process_one_work
> > kworker/-7421 0.N.. 82895us : _cond_resched <-process_one_work
> > kworker/-7421 0dN.1 82895us : rcu_note_context_switch <-__schedule
>
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index d0543e19098a..b42bf3b21e05 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -428,6 +428,7 @@ void nfs_free_request(struct nfs_page *req)
> /* Release struct file and open context */
> nfs_clear_request(req);
> nfs_page_free(req);
> + cond_resched();
> }
>
> void nfs_release_request(struct nfs_page *req)

This probably just shows I don't understand the issues, but: isn't this
the job of preemption? If we're not holding any locks that would
prevent scheduling here, shouldn't latency-sensitive users be building
preemptible kernels and letting the scheduler take care of this? It
seems unfortunate to require explicit cond_resched()s allovers.

Like I say, I don't really understand the issues here, so it's more a
question than an objection.... (I don't know any reason a
cond_resched() would be bad there.)

--b.

2017-12-18 16:48:18

by Mike Galbraith

[permalink] [raw]
Subject: Re: NFS: 82ms wakeup latency 4.14-rc4

On Mon, 2017-12-18 at 11:35 -0500, J. Bruce Fields wrote:
>
> This probably just shows I don't understand the issues, but: isn't this
> the job of preemption?

If this were a PREEMPT kernel, yes, we'd check need_resched() on lock
release etc, but it's a PREEMPT_VOLUNTARY kernel.

-Mike

2017-12-18 17:00:20

by Mike Galbraith

[permalink] [raw]
Subject: Re: NFS: 82ms wakeup latency 4.14-rc4

On Mon, 2017-12-18 at 11:35 -0500, J. Bruce Fields wrote:
>
> Like I say, I don't really understand the issues here, so it's more a
> question than an objection.... (I don't know any reason a
> cond_resched() would be bad there.)

Think of it this way: what all can be queued up behind that kworker
that is hogging CPU for huge swaths of time? ?It's not only userspace
that suffers.

-Mike

2017-12-18 17:18:09

by Mike Galbraith

[permalink] [raw]
Subject: Re: NFS: 82ms wakeup latency 4.14-rc4

On Mon, 2017-12-18 at 18:00 +0100, Mike Galbraith wrote:
> On Mon, 2017-12-18 at 11:35 -0500, J. Bruce Fields wrote:
> >
> > Like I say, I don't really understand the issues here, so it's more a
> > question than an objection.... (I don't know any reason a
> > cond_resched() would be bad there.)
>
> Think of it this way: what all can be queued up behind that kworker
> that is hogging CPU for huge swaths of time? ?It's not only userspace
> that suffers.

Bah, I'm gonna sound like a damn Baptist preacher, but I gotta say,
latency matters just as much to an enterprise NOPREEMPT kernel and its
users as it does to a desktop kernel and its users. ?For max
throughput, you don't want to do work in _tiny_ quantum, because you
then lose throughput due to massive cache trashing and scheduling
overhead, but latency still does matter, and not just a little.

-Mike

2017-12-18 17:24:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS: 82ms wakeup latency 4.14-rc4

On Mon, 2017-12-18 at 18:00 +0100, Mike Galbraith wrote:
> On Mon, 2017-12-18 at 11:35 -0500, J. Bruce Fields wrote:
> >
> > Like I say, I don't really understand the issues here, so it's more
> > a
> > question than an objection.... (I don't know any reason a
> > cond_resched() would be bad there.)
>
> Think of it this way: what all can be queued up behind that kworker
> that is hogging CPU for huge swaths of time? It's not only userspace
> that suffers.
>

Any cond_sched() belongs in the loop in nfs_commit_release_pages()
(where it can be mitigated) rather than in a function whose purpose is
to free memory. There is no reason to call it from the writeback or
readpages code.

--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2017-12-18 17:27:18

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS: 82ms wakeup latency 4.14-rc4

On Mon, Dec 18, 2017 at 06:17:36PM +0100, Mike Galbraith wrote:
> On Mon, 2017-12-18 at 18:00 +0100, Mike Galbraith wrote:
> > On Mon, 2017-12-18 at 11:35 -0500, J. Bruce Fields wrote:
> > >
> > > Like I say, I don't really understand the issues here, so it's more a
> > > question than an objection.... (I don't know any reason a
> > > cond_resched() would be bad there.)
> >
> > Think of it this way: what all can be queued up behind that kworker
> > that is hogging CPU for huge swaths of time?  It's not only userspace
> > that suffers.
>
> Bah, I'm gonna sound like a damn Baptist preacher, but I gotta say,
> latency matters just as much to an enterprise NOPREEMPT kernel and its
> users as it does to a desktop kernel and its users.  For max
> throughput, you don't want to do work in _tiny_ quantum, because you
> then lose throughput due to massive cache trashing and scheduling
> overhead, but latency still does matter, and not just a little.

Right, what I don't understand is why kernels are still built without
preemption. I'd naively assumed that was just a bandaid while we
weren't sure how much old kernel code might still depend on it for
correctness. I'd forgotten about throughput/latency tradeoffs--but
couldn't those in theory be managed by runtime configuration of the
sceduler, or at least some smaller hammer than turning off preemption
entirely?

(But, again, just idle curiosity on my part, thanks for the answers.)

--b.

2017-12-18 17:27:28

by Mike Galbraith

[permalink] [raw]
Subject: Re: NFS: 82ms wakeup latency 4.14-rc4

On Mon, 2017-12-18 at 17:24 +0000, Trond Myklebust wrote:
> On Mon, 2017-12-18 at 18:00 +0100, Mike Galbraith wrote:
> > On Mon, 2017-12-18 at 11:35 -0500, J. Bruce Fields wrote:
> > >
> > > Like I say, I don't really understand the issues here, so it's more
> > > a
> > > question than an objection.... (I don't know any reason a
> > > cond_resched() would be bad there.)
> >
> > Think of it this way: what all can be queued up behind that kworker
> > that is hogging CPU for huge swaths of time? It's not only userspace
> > that suffers.
> >
>
> Any cond_sched() belongs in the loop in nfs_commit_release_pages()
> (where it can be mitigated) rather than in a function whose purpose is
> to free memory. There is no reason to call it from the writeback or
> readpages code.

(this is why bandaid didn't come equipped with changelog etc:)

2017-12-18 17:47:53

by Mike Galbraith

[permalink] [raw]
Subject: Re: NFS: 82ms wakeup latency 4.14-rc4

On Mon, 2017-12-18 at 12:27 -0500, J. Bruce Fields wrote:
> On Mon, Dec 18, 2017 at 06:17:36PM +0100, Mike Galbraith wrote:
> > On Mon, 2017-12-18 at 18:00 +0100, Mike Galbraith wrote:
> > > On Mon, 2017-12-18 at 11:35 -0500, J. Bruce Fields wrote:
> > > >
> > > > Like I say, I don't really understand the issues here, so it's more a
> > > > question than an objection.... (I don't know any reason a
> > > > cond_resched() would be bad there.)
> > >
> > > Think of it this way: what all can be queued up behind that kworker
> > > that is hogging CPU for huge swaths of time? ?It's not only userspace
> > > that suffers.
> >
> > Bah, I'm gonna sound like a damn Baptist preacher, but I gotta say,
> > latency matters just as much to an enterprise NOPREEMPT kernel and its
> > users as it does to a desktop kernel and its users. ?For max
> > throughput, you don't want to do work in _tiny_ quantum, because you
> > then lose throughput due to massive cache trashing and scheduling
> > overhead, but latency still does matter, and not just a little.
>
> Right, what I don't understand is why kernels are still built without
> preemption.

Oh, that's easy.. because they absolutely SUCK at throughput, and
contrary to popular belief, they do not provide anything remotely
similar to deterministic behavior. ?They're better at latency, but not
really all that much in general use, while being horrid at getting work
done efficiently.

I tried PREEMPT, but much prefer VOLUNTARY performance/efficiency.

-Mike

2017-12-18 18:35:21

by Mike Galbraith

[permalink] [raw]
Subject: Re: NFS: 82ms wakeup latency 4.14-rc4

On Mon, 2017-12-18 at 12:27 -0500, J. Bruce Fields wrote:
> I'd forgotten about throughput/latency tradeoffs--but
> couldn't those in theory be managed by runtime configuration of the
> sceduler, or at least some smaller hammer than turning off preemption
> entirely?

A kernel that has all of the goop required to support preemption can't
possibly perform as well as a kernel that can simply assume everything
is safe. ?All that infrastructure costs cycles.

-Mike

2018-01-02 20:30:15

by Mike Galbraith

[permalink] [raw]
Subject: [patch] fs/nfs: Add a resched point to nfs_commit_release_pages()

On Mon, 2017-12-18 at 17:24 +0000, Trond Myklebust wrote:
> On Mon, 2017-12-18 at 18:00 +0100, Mike Galbraith wrote:
> > On Mon, 2017-12-18 at 11:35 -0500, J. Bruce Fields wrote:
> > >
> > > Like I say, I don't really understand the issues here, so it's more
> > > a
> > > question than an objection.... (I don't know any reason a
> > > cond_resched() would be bad there.)
> >
> > Think of it this way: what all can be queued up behind that kworker
> > that is hogging CPU for huge swaths of time? It's not only userspace
> > that suffers.
> >
>
> Any cond_sched() belongs in the loop in nfs_commit_release_pages()
> (where it can be mitigated) rather than in a function whose purpose is
> to free memory. There is no reason to call it from the writeback or
> readpages code.

Just in case NFS folks are expecting a patchlet, trash otherwise.

fs/nfs: Add a resched point to nfs_commit_release_pages()

During heavy NFS write, kworkers can do very large amounts of work
without scheduling (82ms traced). Add a resched point.

Signed-off-by: Mike Galbraith <[email protected]>
Suggested-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 1 +
1 file changed, 1 insertion(+)

--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1837,6 +1837,7 @@ static void nfs_commit_release_pages(str
set_bit(NFS_CONTEXT_RESEND_WRITES, &req->wb_context->flags);
next:
nfs_unlock_and_release_request(req);
+ cond_resched();
}
nfss = NFS_SERVER(data->inode);
if (atomic_long_read(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)