Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932654AbcJNCba (ORCPT ); Thu, 13 Oct 2016 22:31:30 -0400 Received: from mail-it0-f67.google.com ([209.85.214.67]:34074 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932273AbcJNCbY (ORCPT ); Thu, 13 Oct 2016 22:31:24 -0400 MIME-Version: 1.0 In-Reply-To: <20161013214650.GB19836@outlook.office365.com> References: <1476141965-21429-1-git-send-email-avagin@openvz.org> <877f9c6ui8.fsf@x220.int.ebiederm.org> <87pon458l1.fsf_-_@x220.int.ebiederm.org> <20161013214650.GB19836@outlook.office365.com> From: Andrey Vagin Date: Thu, 13 Oct 2016 19:31:22 -0700 X-Google-Sender-Auth: -hDutuc9_lbqGEjaAAztrAIApfI Message-ID: Subject: Re: [RFC][PATCH] mount: In mark_umount_candidates and __propogate_umount visit each mount once To: Andrei Vagin Cc: "Eric W. Biederman" , Alexander Viro , Linux Containers , linux-fsdevel , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3528 Lines: 106 On Thu, Oct 13, 2016 at 2:46 PM, Andrei Vagin wrote: > On Thu, Oct 13, 2016 at 02:53:46PM -0500, Eric W. Biederman wrote: >> >> Adrei Vagin pointed out that time to executue propagate_umount can go >> non-linear (and take a ludicrious amount of time) when the mount >> propogation trees of the mounts to be unmunted by a lazy unmount >> overlap. >> >> Solve this in the most straight forward way possible, by adding a new >> mount flag to mark parts of the mount propagation tree that have been >> visited, and use that mark to skip parts of the mount propagation tree >> that have already been visited during an unmount. This guarantees >> that each mountpoint in the possibly overlapping mount propagation >> trees will be visited exactly once. >> >> Add the functions propagation_visit_next and propagation_revisit_next >> to coordinate setting and clearling the visited mount mark. >> >> Here is a script to generate such mount tree: >> $ cat run.sh >> mount -t tmpfs test-mount /mnt >> mount --make-shared /mnt >> for i in `seq $1`; do >> mkdir /mnt/test.$i >> mount --bind /mnt /mnt/test.$i >> done >> cat /proc/mounts | grep test-mount | wc -l >> time umount -l /mnt >> $ for i in `seq 10 16`; do echo $i; unshare -Urm bash ./run.sh $i; done >> >> Here are the performance numbers with and without the patch: >> >> mounts | before | after (real sec) >> ----------------------------- >> 1024 | 0.071 | 0.024 >> 2048 | 0.184 | 0.030 >> 4096 | 0.604 | 0.040 >> 8912 | 4.471 | 0.043 >> 16384 | 34.826 | 0.082 >> 32768 | | 0.151 >> 65536 | | 0.289 >> 131072 | | 0.659 >> >> Andrei Vagin fixing this performance problem is part of the >> work to fix CVE-2016-6213. >> >> Cc: stable@vger.kernel.org >> Reported-by: Andrei Vagin >> Signed-off-by: "Eric W. Biederman" >> --- >> >> Andrei can you take a look at this patch and see if you can see any >> problems. My limited testing suggests this approach does a much better >> job of solving the problem you were seeing. With the time looking >> almost linear in the number of mounts now. > > I read this patch and I like the idea. > > Then I run my tests and one of them doesn't work with this patch. > I haven't found a reason yet. >> + for (m = propagation_visit_next(parent, parent); m; >> + m = propagation_visit_next(m, parent)) { >> struct mount *child = __lookup_mnt_last(&m->mnt, >> mnt->mnt_mountpoint); The reason is that this loop is called for different "mnt", but it is executed only once with this optimization. So I think the idea to mark parent will not work, because one parent can have a few children which have to be umounted. > > Here is the test: > > [root@fc24 mounts]# cat run.sh > set -e > mount -t tmpfs zdtm /mnt > mkdir -p /mnt/1 /mnt/2 > mount -t tmpfs zdtm /mnt/1 > mount --make-shared /mnt/1 > for i in `seq $1`; do > mount --bind /mnt/1 `mktemp -d /mnt/1/test.XXXXXX` > done > mount --rbind /mnt/1 /mnt/2 > cat /proc/self/mountinfo | grep zdtm | wc -l > time umount -l /mnt/1 > cat /proc/self/mountinfo | grep zdtm | wc -l > umount /mnt/2 > > > [root@fc24 mounts]# unshare -Urm ./run.sh 5 > 65 > > real 0m0.014s > user 0m0.000s > sys 0m0.004s > 33 > umount: /mnt/2: target is busy > (In some cases useful info about processes that > use the device is found by lsof(8) or fuser(1).) > >> Thanks, Andrei