On Wed, Nov 15, 2017 at 07:58:09PM +0900, Tetsuo Handa wrote:
> I think that Minchan's approach depends on how
>
> In our production, we have observed that the job loader gets stuck for
> 10s of seconds while doing mount operation. It turns out that it was
> stuck in register_shrinker() and some unrelated job was under memory
> pressure and spending time in shrink_slab(). Our machines have a lot
> of shrinkers registered and jobs under memory pressure has to traverse
> all of those memcg-aware shrinkers and do affect unrelated jobs which
> want to register their own shrinkers.
>
> is interpreted. If there were 100000 shrinkers and each do_shrink_slab() call
> took 1 millisecond, aborting the iteration as soon as rwsem_is_contended() would
> help a lot. But if there were 10 shrinkers and each do_shrink_slab() call took
> 10 seconds, aborting the iteration as soon as rwsem_is_contended() would help
> less. Or, there might be some specific shrinker where its do_shrink_slab() call
> takes 100 seconds. In that case, checking rwsem_is_contended() is too lazy.
In your patch, unregister() waits for shrinker->nr_active instead of
the lock, which is decreased in the same location where Minchan drops
the lock. How is that different behavior for long-running shrinkers?
Anyway, I suspect it's many shrinkers and many concurrent invocations,
so the lockbreak granularity you both chose should be fine.
From 1584129572845568066@xxx Wed Nov 15 11:01:27 +0000 2017
X-GM-THRID: 1583988486620709897
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread