From: ebiederm@xmission.com (Eric W. Biederman)
To: Andrei Vagin <avagin@virtuozzo.com>
Cc: Andrei Vagin <avagin@openvz.org>,
        Alexander Viro <viro@zeniv.linux.org.uk>,
        <containers@lists.linux-foundation.org>,
        <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>
References: <1475772564-25627-1-git-send-email-avagin@openvz.org>
        <87eg3tclbd.fsf@x220.int.ebiederm.org>
        <20161006230616.GA2296@outlook.office365.com>
Date: Thu, 06 Oct 2016 23:45:48 -0500
In-Reply-To: <20161006230616.GA2296@outlook.office365.com> (Andrei Vagin's
        message of "Thu, 6 Oct 2016 16:06:28 -0700")
Message-ID: <87twcoahs3.fsf@x220.int.ebiederm.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [PATCH v2] mount: dont execute propagate_umount() many times for same mounts
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1869
Lines: 44

Andrei Vagin <avagin@virtuozzo.com> writes:

> On Thu, Oct 06, 2016 at 02:46:30PM -0500, Eric W. Biederman wrote:
>> Andrei Vagin <avagin@openvz.org> writes:
>> 
>> > The reason of this optimization is that umount() can hold namespace_sem
>> > for a long time, this semaphore is global, so it affects all users.
>> > Recently Eric W. Biederman added a per mount namespace limit on the
>> > number of mounts. The default number of mounts allowed per mount
>> > namespace at 100,000. Currently this value is allowed to construct a tree
>> > which requires hours to be umounted.
>> 
>> I am going to take a hard look at this as this problem sounds very
>> unfortunate.  My memory of going through this code before strongly
>> suggests that changing the last list_for_each_entry to
>> list_for_each_entry_reverse is going to impact the correctness of this
>> change.
>
> I have read this code again and you are right, list_for_each_entry can't
> be changed on list_for_each_entry_reverse here.
>
> I tested these changes more carefully and find one more issue, so I am
> going to send a new patch and would like to get your comments to it.
>
> Thank you for your time.

No problem.

A quick question.  You have introduced lookup_mnt_cont.  Is that a core
part of your fix or do you truly have problmenatic long hash chains.

Simply increasing the hash table size should fix problems long hash
chains (and there are other solutions like rhashtable that may be more
appropriate than pre-allocating large hash chains).

If it is not long hash chains introducing lookup_mnt_cont in your patch
is a distraction to the core of what is going on.

Perhaps I am blind but if the hash chains are not long I don't see mount
propagation could be more than quadratic in the worst case.  As there is
only a loop within a loop.  Or Is the tree walking in propagation_next
that bad?

Eric