Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932458AbcJUJHV (ORCPT ); Fri, 21 Oct 2016 05:07:21 -0400 Received: from mail-db5eur01on0118.outbound.protection.outlook.com ([104.47.2.118]:22336 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932306AbcJUJHR (ORCPT ); Fri, 21 Oct 2016 05:07:17 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=avagin@virtuozzo.com; Date: Thu, 20 Oct 2016 14:30:53 -0700 From: Andrei Vagin To: "Eric W. Biederman" CC: Andrey Vagin , Alexander Viro , Linux Containers , linux-fsdevel , LKML Subject: Re: [REVIEW][PATCH] mount: In propagate_umount handle overlapping mount propagation trees Message-ID: <20161020213052.GA25226@outlook.office365.com> References: <1476141965-21429-1-git-send-email-avagin@openvz.org> <877f9c6ui8.fsf@x220.int.ebiederm.org> <87pon458l1.fsf_-_@x220.int.ebiederm.org> <20161013214650.GB19836@outlook.office365.com> <87wphb4pjn.fsf@x220.int.ebiederm.org> <8737jy3htt.fsf_-_@x220.int.ebiederm.org> <20161018024000.GA4901@outlook.office365.com> <87r37e9mnj.fsf@xmission.com> <877f95ngpr.fsf_-_@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Disposition: inline In-Reply-To: <877f95ngpr.fsf_-_@xmission.com> User-Agent: Mutt/1.7.0 (2016-08-17) X-Originating-IP: [162.246.95.100] X-ClientProxiedBy: BY2PR08CA0070.namprd08.prod.outlook.com (10.163.62.166) To VI1PR0801MB1981.eurprd08.prod.outlook.com (10.173.74.14) X-MS-Office365-Filtering-Correlation-Id: 01c913eb-1e3a-4f19-b0f1-08d3f930673f X-Microsoft-Exchange-Diagnostics: 1;VI1PR0801MB1981;2:MFKrkBEf2cfwhed+zf1JCZwaowCN+J4dA7vTDo7WT0dBoaEd3bSBVO9xWKLacuaCvD8Y1uT1Qyh3QqkqlYCnDxh0A6zWgr77xZgvllwjfC0yvZGVTTBd8EyeDNLxiEK7tP2L5nwu6qiJ7JZ4TABlQQGbM52vg0c3WLGct4Fw+HFZquHgdYBplgWnjDUAtvMf;3:OBKMtJGjsxK2bMazAHjjPgAFcnZ6e10aHVi3R0VTdUGumd8h1kGzgHYk6iaSUqq56+DkIK8XE4sVQM8x3H2h6gUzfIMto8o/xWh8+LCe79EnPiPPT1gJfGs2KwWCnmqm X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:VI1PR0801MB1981; X-Microsoft-Exchange-Diagnostics: 1;VI1PR0801MB1981;25:YH8ray16fdYYBLx2bVu5tP74GiWgesZib+sabvjoI2MLUTk7oM4CchvcRs+vDHnkfNp2bqXWGm0aordMSDpHJJzMKhl4DXVdrWiRsNH+FoZPbxoYj4uc5OR03D58gf2MmvgKULfOWy4PjTVBoXoqwRcEfWzjHVfqi8e96CmhmAmdnyTl9ZHNJScmj0tc78FojGZsvZ73nbsPSZdJuxQ8t/bEYBwFO532b9pIH+amhr37Q/SjxND+vdD2FoqUZ/mgroHMIQUsFA4OX3F0ELp8ac/hAhx/NufGXfvnWFzHvKyjRNJk69OQ0dgzk+ultoCSj6QbD5+ojKfd7N4AaYb39HUSNKD/XfkadePz0vFmswwSMbPmgh5xyaYLlL3ad/gxWl0Ec61bwbWCHaBEvIWNfYZTgksRFlVO11ODgo7lcVWeJ673hkABy5Fzb187XkbjkuiPcca/D8BmUtIiHchCVPMWm697EjUN7zQ0Uhn7ILFPVVh/jTobQdE4t/zd0wUCIihYrpjbPyF8zeR4j6FnxHoXGWYEyTeFyoAyXPHMdMWQwDLW6EdzkqlAt34Z6JEumeeqwY09qapqcNPJHaenLbscFz0bP2vNJe+Ft7vtuOWjJFxi5i0ntsYahrwS5oxAXUKSzWuWkR/3EMxoKP8jEklC53jt+FkLjvue3AM5hHUCp4FVYMoKpjhX3uOYWmvQnjbcbx11cto3GhJOXbdn31M2VyfvRF+H/e/5kRXkaRY65XlREZaf8Ysf2iaqjCbD X-Microsoft-Exchange-Diagnostics: 1;VI1PR0801MB1981;31:MTLUxHzVI+84JizZcHhph6SHQGX7BMfz1KY2WySIVbZ/p4fpYvaAva6L/CvBspFqjD9K/nvFAn3gUGO5O0kXYDZmLrIGaxfUDkJt/0IjWDxTW6SsJbmoTJvhowDQkOwna+Dwbrtu7rUt6k9IoJiN4rOc/y83nuh+W/WWV7VlwnzwnUEOjUdlGsbSiplFViOWR6zpMOHZveyNE8+3A8s1aVT+BwgA4EsiLNurvbQ0tJfEMhT0tm7kTKRKx27JQ3KmJTgxT/t1O7j67HPayp0nww==;4:vslUqp+f/NcE8OwgSdovnnzS1dLxmQ90AH8Sy0HETBpACfqvTH3OF2RGH3al/1dXn4lEaFl7wS37Xxlt2E1t7q7pyaXgM/KO6im2vWViN2lMtl1WqtA+OGo7TLocgQ9qERQ8+LBQVQVck1kVt9Uz5g9YOkggtsa6fmeUJF0+Z4WUixe0AX0DsdjlEr8nTegdd0BlOuO6tsFFOJgxL7Pd5A+DfVaV73tEVT8quZ8oyXPGG/9qpec56P7uYkkQ3z7yxQShUNn7x+gdJGI+/Ffm2JVu7BDUglY4E4Sy6D0E7EkLZI3naYcNkbcUP3lfjHGwoGfGdFjqyilQeXrhRHODyFyMMR/CzwQNCJa1S819bZCikejBbBsucHBtKSnDCbRy/YQwqZuwfaI1lfoL63LzmvxoC/wqpaGiox5w8rbfoONsaWpRdklTEQ+RGU8kU3dPD4r/6VrpWrvJAJrYo5ng7XJP6hNdBrYWk8Ppams5ki1dxGSiRBVUKDgblAXyMMcw/d/GJueY8oLevUL58rRNrYKZes6+YxE6IbnjR+ghGaM= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(20558992708506)(9452136761055)(151381331826461); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6043046)(6042046);SRVR:VI1PR0801MB1981;BCL:0;PCL:0;RULEID:;SRVR:VI1PR0801MB1981; X-Forefront-PRVS: 01018CB5B3 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(199003)(24454002)(189002)(7846002)(189998001)(110136003)(86362001)(54356999)(7736002)(575784001)(2950100002)(50466002)(6666003)(76176999)(6916009)(4326007)(305945005)(50986999)(83506001)(101416001)(3846002)(9686002)(1076002)(105586002)(19580395003)(586003)(6116002)(81156014)(92566002)(77096005)(4001350100001)(19580405001)(97736004)(5660300001)(42186005)(53416004)(8676002)(68736007)(69596002)(33656002)(2906002)(93886004)(81166006)(47776003)(66066001)(23686003)(106356001)(7099028)(18370500001)(26326002);DIR:OUT;SFP:1102;SCL:1;SRVR:VI1PR0801MB1981;H:outlook.office365.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?koi8-r?Q?1;VI1PR0801MB1981;23:DLGGFVsfxqwKNoLH8B9yKj5zr7pZCjvA/WQ0potZ4?= =?koi8-r?Q?5v12VkvtA8Wtbo+FefaFhRJAyOUbgVfGOH+qxsYXCWZkIB7gq21ljQqmlY3m7J?= =?koi8-r?Q?8uJo3dFvQVsNUcRXk+v1VCMMqzuDOFQkLbiqaNQmfLmQPwrFA21cD4qPzITiX1?= =?koi8-r?Q?6k4HRUs0GF1ytT+PMTID9dGsT90yzyGlhyUE3M6JcufXN/axwnZGX9Zf0A4sOa?= =?koi8-r?Q?mjwNJgGPYrOZf/YErQkJkcYbJGUsoYRs+XikgVuOTrGXRxMPxbCQOTrWYD5Qmv?= =?koi8-r?Q?ZYnksEDNy4/4E07sDTMLT4ZdC0e/C5Hg/8br01ui1I20VUwehaiKX6MfYr3pVG?= =?koi8-r?Q?2SjsTnt6c/+3/L72qJMNZNyNHR4jLR1gR6azce3vhbEeJJAyzBlQfGik2xU2tB?= =?koi8-r?Q?V0TKz30T4cwKuGLGKvF7x5Wh/ECqZYvyVZaUdjqOU1heC1LWMMFKM2WIQzDMM5?= =?koi8-r?Q?2tK873nD/crkh/ucd/e45MD+kpCJS0ScxbaoTKucdScwV8JhjzLr7s5qgaD0qr?= =?koi8-r?Q?5V0gkOhqqI+TG6Sh13wCM/HLW64zz4/RptoIvR9t6QacBYN043eHbf2tfdl8lt?= =?koi8-r?Q?ibZtDpXcAZ4kLAkq+yN37uGrY9qhfvGV/okPAhl9bhQAGJKFzl9ko9Pph6lzTi?= =?koi8-r?Q?oNyT//LHsyxQiaOpJDMNwe4EJxEqkXNufQ92nZbcNrkq4JSZUCQdwrwYx/n/Zt?= =?koi8-r?Q?m9GiHDtbGMxvMEbq0pc9F5ijO4uvcbY7z7hWLvSck9XnM8n/qxopyB3Ysacvgr?= =?koi8-r?Q?R85foJUo0LWTReIiH52TQgFxjDPBG8mfQwxFunqZMM03JF1RZnd/sL6mKBBy4J?= =?koi8-r?Q?/BcA2NXvvI6YeWgbQfSNUwc0gI04YyWoM/wgoOI9IEk0uEjjnWiuuM6lRTiToa?= =?koi8-r?Q?qnYBmzmMi4QZDu8LDI+aR5vbHXJg4wY4BxKstKO7nCrOHY0uLt0Txx5tL18f/s?= =?koi8-r?Q?qG+xoixz0R/Q+Mf5+6iOYaspe7QKY6elUwNX8H32fbDTXizPKs7dmbgzVM/A4e?= =?koi8-r?Q?3sb1WChzqnXyIUcoj7thYGKK9euGslR/jkYStJ5czTLICFoHBpffnxw36/3koU?= =?koi8-r?Q?qT5BuLP5tMd0+RlKq39JnJoL+vdaL2s2GH6Ip5RkdTcZx3q0YCM4qRUYlBJgqc?= =?koi8-r?Q?bNlCkSe+CjZZTcbg70h+wOVMNPZyV3zMLzgPAwxISgvoQGG406sGmfRLwLMokx?= =?koi8-r?Q?fmqOpMN5B036ZituyaozG7IDN4g5eFiP1zulMRANqZpxYSGn6GJKYm4fKi/K72?= =?koi8-r?Q?7r62qatFX9ee6FKnGry8ZKkD4ZVlY0k1I+IyHLYt10=3D?= X-Microsoft-Exchange-Diagnostics: 1;VI1PR0801MB1981;6:ejzjplBE1BLUpYhEBBV4gCOzpY2aOiCeWv5MUQ3DM/whySI06mXG9wF12Zahw+9Z0kMXuLMv8612TefSqDJpdxOYMXW/NcDWEtbVcE1sdBvXDQ7TxnSxQ7rmaCOn0NALRd/yneJWfR7j9DbVnoYXdIM875ReTVuBIxHmQbN54j9U0TqlALKW1mEOxUpR53O8UuB6EOCDzmbHF9IF+mExorLyVjg5Vf+skNDb39FkXJyjRfR8uUyQRTRStS4O4kWOtVjKIWbNEvvTVxTOKDXy2ULOdo/2pP84tvxeeRKTXDgEC0WqFPLqhvqfmSnPjbrd;5:/fy/AYx47CRuD8e0HOUc7TjgXhm8gmjo6ix0KeMbjusU9d0gOdUbJLIm9ZVeBQPD8Zv1ec+G+VQ1B4yyXims1swPpcaT9ITyY21wQc6AJsQlsniYv8vbw7LySCNo6u1eSjFuVuWxD0fo7k1RMANYig==;24:h69YuRQ6Tt3GB6pvV8iazgtZkEhKuaKBLueqQFI8SqWduWr2QRsbElfgUlcRn0BzsVQZF/jTRus1TqBmwZ7++BdHeM1fVhGrymfOU+6MwuQ=;7:iq9gs3zZNsxfn14iMCxXhA78y7R0P7JpDUJMB1GWsXQoJQZYej+zeIUXF3zYps2s8xo/o23bRFjZWoGSsCMoT3+AQOtsCvAWEkeKZuLio3U1mjVhNuRgcJ+vioTfk2ntW4Uza/OecvpEpAjgbhDEi8k3HCLLysCsb4YhRnh0hzMaQRaa9rJbZLdFZ8Uur2wud6+Fu/OKitGnzIhcSN95I8fgc6H+i8S5U70WLwG98e8AqquQ0xqIBhlde2ZP7k9TClYe23bu5isHamGI24PhH008KS+mW7m7sqcRwiFnxkVXqq6jNUAmpWjD+UdMywxMotjlD5eSgi8R/59o9huDmA== SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;VI1PR0801MB1981;20:Vf6/hRZYEwWNM1NXDalSeUexUtdUprzG1Y+aeY9MHckNZ9AVbUc/sfHi+oyRVNfjvIWmmF/IKMZhIidFqX/3y4PqTjiLJL+1T5IQNjM/P7NByd1N1WlZD6iZIckKydlrKjdG9AzOcaILPsmwuAboVhNKZ4qF1dmGE8FBnfvuORs= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Oct 2016 21:31:06.0814 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1981 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 16933 Lines: 490 On Tue, Oct 18, 2016 at 10:46:40PM -0500, Eric W. Biederman wrote: > > Adrei Vagin pointed out that time to executue propagate_umount can go > non-linear (and take a ludicrious amount of time) when the mount > propogation trees of the mounts to be unmunted by a lazy unmount > overlap. > > While investigating the horrible performance I realized that in > the case overlapping mount trees since the addition of locked > mount support the code has been failing to unmount all of the > mounts it should have been unmounting. > > Make the walk of the mount propagation trees nearly linear by using > MNT_MARK to mark pieces of the mount propagation trees that have > already been visited, allowing subsequent walks to skip over > subtrees. > > Make the processing of mounts order independent by adding a list of > mount entries that need to be unmounted, and simply adding a mount to > that list when it becomes apparent the mount can safely be unmounted. > For mounts that are locked on other mounts but otherwise could be > unmounted move them from their parnets mnt_mounts to mnt_umounts so > that if and when their parent becomes unmounted these mounts can be > added to the list of mounts to unmount. > > Add a final pass to clear MNT_MARK and to restore mnt_mounts > from mnt_umounts for anything that did not get unmounted. > > Add the functions propagation_visit_next and propagation_revisit_next > to coordinate walking of the mount tree and setting and clearing the > mount mark. > > The skipping of already unmounted mounts has been moved from > __lookup_mnt_last to mark_umount_candidates, so that the new > propagation functions can notice when the propagation tree passes > through the initial set of unmounted mounts. Except in umount_tree as > part of the unmounting process the only place where unmounted mounts > should be found are in unmounted subtrees. All of the other callers > of __lookup_mnt_last are from mounted subtrees so the not checking for > unmounted mounts should not affect them. > > A script to generate overlapping mount propagation trees: > $ cat run.sh > mount -t tmpfs test-mount /mnt > mount --make-shared /mnt > for i in `seq $1`; do > mkdir /mnt/test.$i > mount --bind /mnt /mnt/test.$i > done > cat /proc/mounts | grep test-mount | wc -l > time umount -l /mnt > $ for i in `seq 10 16`; do echo $i; unshare -Urm bash ./run.sh $i; done > > Here are the performance numbers with and without the patch: > > mhash | 8192 | 8192 | 8192 | 131072 | 131072 | 104857 | 104857 > mounts | before | after | after (sys) | after | after (sys) | after | after (sys) > ------------------------------------------------------------------------------------- > 1024 | 0.071s | 0.020s | 0.000s | 0.022s | 0.004s | 0.020s | 0.004s > 2048 | 0.184s | 0.022s | 0.004s | 0.023s | 0.004s | 0.022s | 0.008s > 4096 | 0.604s | 0.025s | 0.020s | 0.029s | 0.008s | 0.026s | 0.004s > 8912 | 4.471s | 0.053s | 0.020s | 0.051s | 0.024s | 0.047s | 0.016s > 16384 | 34.826s | 0.088s | 0.060s | 0.081s | 0.048s | 0.082s | 0.052s > 32768 | | 0.216s | 0.172s | 0.160s | 0.124s | 0.160s | 0.096s > 65536 | | 0.819s | 0.726s | 0.330s | 0.260s | 0.338s | 0.256s > 131072 | | 4.502s | 4.168s | 0.707s | 0.580s | 0.709s | 0.592s > > Andrei Vagin reports fixing the performance problem is part of the > work to fix CVE-2016-6213. > > A script for a pathlogical set of mounts: > > $ cat pathological.sh > > mount -t tmpfs base /mnt > mount --make-shared /mnt > mkdir -p /mnt/b > > mount -t tmpfs test1 /mnt/b > mount --make-shared /mnt/b > mkdir -p /mnt/b/10 > > mount -t tmpfs test2 /mnt/b/10 > mount --make-shared /mnt/b/10 > mkdir -p /mnt/b/10/20 > > mount --rbind /mnt/b /mnt/b/10/20 > > unshare -Urm sleep 2 > umount -l /mnt/b > wait %% > > $ unsahre -Urm pathlogical.sh > > Cc: stable@vger.kernel.org > Fixes: a05964f3917c ("[PATCH] shared mounts handling: umount") > Fixes: 0c56fe31420c ("mnt: Don't propagate unmounts to locked mounts") > Reported-by: Andrei Vagin > Signed-off-by: "Eric W. Biederman" > --- > > Barring some stupid mistake this looks like it fixes both the performance > and the correctness issues I was able to spot earlier. Andrei if you > could give this version a look over I would appreciate it. Eric, could you try out this script: [root@fc24 mounts]# cat run.sh set -e -m mount -t tmpfs zdtm /mnt mkdir -p /mnt/1 /mnt/2 mount -t tmpfs zdtm /mnt/1 mount --make-shared /mnt/1 mkdir /mnt/1/1 iteration=30 if [ -n "$1" ]; then iteration=$1 fi for i in `seq $iteration`; do mount --bind /mnt/1/1 /mnt/1/1 & done ret=0 for i in `seq $iteration`; do wait -n || ret=1 done [ "$ret" -ne 0 ] && { time umount -l /mnt/1 exit 0 } mount --rbind /mnt/1 /mnt/2 mount --make-slave /mnt/2 mount -t tmpfs zzz /mnt/2/1 nr=`cat /proc/self/mountinfo | grep zdtm | wc -l` echo -n "umount -l /mnt/1 -> $nr " /usr/bin/time -f '%E' umount -l /mnt/1 nr=`cat /proc/self/mountinfo | grep zdtm | wc -l` echo -n "umount -l /mnt/2 -> $nr " /usr/bin/time -f '%E' umount -l /mnt/2 [root@fc24 mounts]# unshare -Urm sh run.sh 4 It hangs up on my host with this patch. NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:789] Modules linked in: nfsv3 nfs fscache bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables ppdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon i2c_piix4 parport_pc parport acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc virtio_net virtio_blk virtio_console crc32c_intel serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi CPU: 0 PID: 789 Comm: umount Not tainted 4.9.0-rc1+ #137 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 task: ffff88007c11c100 task.stack: ffffc900007c0000 RIP: 0010:[] [] __lookup_mnt_last+0x67/0x80 RSP: 0018:ffffc900007c3db0 EFLAGS: 00000286 RAX: ffff88007a5f0900 RBX: ffff88007b136620 RCX: ffff88007a3e2900 RDX: ffff88007a3e2900 RSI: ffff88007b136600 RDI: ffff88007b136600 RBP: ffffc900007c3dc0 R08: ffff880036df5850 R09: ffffffff81249664 R10: ffff88007bd84c38 R11: 0000000100000000 R12: ffff88007bce3f00 R13: ffffc900007c3e00 R14: ffff88007bce3f00 R15: 00007ffe54245328 FS: 00007ff465de0840(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f86b328128 CR3: 000000007ba23000 CR4: 00000000003406f0 Stack: ffff88007b136600 ffff88007b136480 ffffc900007c3df0 ffffffff8126a70a ffffc900007c3e58 ffffc900007c3e10 ffffc900007c3e00 ffff880079f99980 ffffc900007c3e48 ffffffff8126b0d5 ffff88007a3e2660 ffff88007a3e24e0 Call Trace: [] propagation_visit_child.isra.8+0x5a/0xd0 [] propagate_umount+0x65/0x2e0 [] umount_tree+0x2be/0x2d0 [] do_umount+0x13f/0x340 [] SyS_umount+0x10e/0x120 [] entry_SYSCALL_64_fastpath+0x1a/0xa9 Thanks, Andrei > > Unless we can find a problem I am going to call this the final version. > > fs/mount.h | 1 + > fs/namespace.c | 7 +- > fs/pnode.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++----------- > fs/pnode.h | 2 +- > 4 files changed, 165 insertions(+), 43 deletions(-) > > diff --git a/fs/mount.h b/fs/mount.h > index d2e25d7b64b3..00fe0d1d6ba7 100644 > --- a/fs/mount.h > +++ b/fs/mount.h > @@ -58,6 +58,7 @@ struct mount { > struct mnt_namespace *mnt_ns; /* containing namespace */ > struct mountpoint *mnt_mp; /* where is it mounted */ > struct hlist_node mnt_mp_list; /* list mounts with the same mountpoint */ > + struct list_head mnt_umounts; /* list of children that are being unmounted */ > #ifdef CONFIG_FSNOTIFY > struct hlist_head mnt_fsnotify_marks; > __u32 mnt_fsnotify_mask; > diff --git a/fs/namespace.c b/fs/namespace.c > index e6c234b1a645..73801391bb00 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -237,6 +237,7 @@ static struct mount *alloc_vfsmnt(const char *name) > INIT_LIST_HEAD(&mnt->mnt_slave_list); > INIT_LIST_HEAD(&mnt->mnt_slave); > INIT_HLIST_NODE(&mnt->mnt_mp_list); > + INIT_LIST_HEAD(&mnt->mnt_umounts); > #ifdef CONFIG_FSNOTIFY > INIT_HLIST_HEAD(&mnt->mnt_fsnotify_marks); > #endif > @@ -650,13 +651,11 @@ struct mount *__lookup_mnt_last(struct vfsmount *mnt, struct dentry *dentry) > p = __lookup_mnt(mnt, dentry); > if (!p) > goto out; > - if (!(p->mnt.mnt_flags & MNT_UMOUNT)) > - res = p; > + res = p; > hlist_for_each_entry_continue(p, mnt_hash) { > if (&p->mnt_parent->mnt != mnt || p->mnt_mountpoint != dentry) > break; > - if (!(p->mnt.mnt_flags & MNT_UMOUNT)) > - res = p; > + res = p; > } > out: > return res; > diff --git a/fs/pnode.c b/fs/pnode.c > index 234a9ac49958..15e30e861a14 100644 > --- a/fs/pnode.c > +++ b/fs/pnode.c > @@ -390,56 +390,153 @@ void propagate_mount_unlock(struct mount *mnt) > } > > /* > - * Mark all mounts that the MNT_LOCKED logic will allow to be unmounted. > + * get the next mount in the propagation tree (that has not been visited) > + * @m: the mount seen last > + * @origin: the original mount from where the tree walk initiated > + * > + * Note that peer groups form contiguous segments of slave lists. > + * We rely on that in get_source() to be able to find out if > + * vfsmount found while iterating with propagation_next() is > + * a peer of one we'd found earlier. > */ > -static void mark_umount_candidates(struct mount *mnt) > +static struct mount *propagation_visit_child(struct mount *last_child, > + struct mount *origin_child) > { > - struct mount *parent = mnt->mnt_parent; > - struct mount *m; > + struct mount *m = last_child->mnt_parent; > + struct mount *origin = origin_child->mnt_parent; > + struct dentry *mountpoint = origin_child->mnt_mountpoint; > + struct mount *child; > > - BUG_ON(parent == mnt); > + /* Has this part of the propgation tree already been visited? */ > + if (IS_MNT_MARKED(last_child)) > + return NULL; > > - for (m = propagation_next(parent, parent); m; > - m = propagation_next(m, parent)) { > - struct mount *child = __lookup_mnt_last(&m->mnt, > - mnt->mnt_mountpoint); > - if (child && (!IS_MNT_LOCKED(child) || IS_MNT_MARKED(m))) { > - SET_MNT_MARK(child); > + SET_MNT_MARK(last_child); > + > + /* are there any slaves of this mount? */ > + if (!list_empty(&m->mnt_slave_list)) { > + m = first_slave(m); > + goto check_slave; > + } > + while (1) { > + struct mount *master = m->mnt_master; > + > + if (master == origin->mnt_master) { > + struct mount *next = next_peer(m); > + while (1) { > + if (next == origin) > + return NULL; > + child = __lookup_mnt_last(&next->mnt, mountpoint); > + if (child && !IS_MNT_MARKED(child)) > + return child; > + next = next_peer(next); > + } > + } else { > + while (1) { > + if (m->mnt_slave.next == &master->mnt_slave_list) > + break; > + m = next_slave(m); > + check_slave: > + child = __lookup_mnt_last(&m->mnt, mountpoint); > + if (child && !IS_MNT_MARKED(child)) > + return child; > + } > } > + > + /* back at master */ > + m = master; > } > } > > /* > - * NOTE: unmounting 'mnt' naturally propagates to all other mounts its > - * parent propagates to. > + * get the next mount in the propagation tree (that has not been revisited) > + * @m: the mount seen last > + * @origin: the original mount from where the tree walk initiated > + * > + * Note that peer groups form contiguous segments of slave lists. > + * We rely on that in get_source() to be able to find out if > + * vfsmount found while iterating with propagation_next() is > + * a peer of one we'd found earlier. > */ > -static void __propagate_umount(struct mount *mnt) > +static struct mount *propagation_revisit_child(struct mount *last_child, > + struct mount *origin_child) > { > - struct mount *parent = mnt->mnt_parent; > - struct mount *m; > + struct mount *m = last_child->mnt_parent; > + struct mount *origin = origin_child->mnt_parent; > + struct dentry *mountpoint = origin_child->mnt_mountpoint; > + struct mount *child; > > - BUG_ON(parent == mnt); > + /* Has this part of the propgation tree already been revisited? */ > + if (!IS_MNT_MARKED(last_child)) > + return NULL; > > - for (m = propagation_next(parent, parent); m; > - m = propagation_next(m, parent)) { > + CLEAR_MNT_MARK(last_child); > > - struct mount *child = __lookup_mnt_last(&m->mnt, > - mnt->mnt_mountpoint); > - /* > - * umount the child only if the child has no children > - * and the child is marked safe to unmount. > - */ > - if (!child || !IS_MNT_MARKED(child)) > - continue; > - CLEAR_MNT_MARK(child); > - if (list_empty(&child->mnt_mounts)) { > - list_del_init(&child->mnt_child); > - child->mnt.mnt_flags |= MNT_UMOUNT; > - list_move_tail(&child->mnt_list, &mnt->mnt_list); > + /* are there any slaves of this mount? */ > + if (!list_empty(&m->mnt_slave_list)) { > + m = first_slave(m); > + goto check_slave; > + } > + while (1) { > + struct mount *master = m->mnt_master; > + > + if (master == origin->mnt_master) { > + struct mount *next = next_peer(m); > + while (1) { > + if (next == origin) > + return NULL; > + child = __lookup_mnt_last(&next->mnt, mountpoint); > + if (child && IS_MNT_MARKED(child)) > + return child; > + next = next_peer(next); > + } > + } else { > + while (1) { > + if (m->mnt_slave.next == &master->mnt_slave_list) > + break; > + m = next_slave(m); > + check_slave: > + child = __lookup_mnt_last(&m->mnt, mountpoint); > + if (child && IS_MNT_MARKED(child)) > + return child; > + } > } > + > + /* back at master */ > + m = master; > } > } > > +static void start_umount_propagation(struct mount *child, > + struct list_head *to_umount) > +{ > + do { > + struct mount *parent = child->mnt_parent; > + > + if ((child->mnt.mnt_flags & MNT_UMOUNT) || > + !list_empty(&child->mnt_mounts)) > + return; > + > + if (!IS_MNT_LOCKED(child)) > + list_move_tail(&child->mnt_child, to_umount); > + else > + list_move_tail(&child->mnt_child, &parent->mnt_umounts); > + > + child = NULL; > + if (IS_MNT_MARKED(parent)) > + child = parent; > + } while (child); > +} > + > +static void end_umount_propagation(struct mount *child) > +{ > + struct mount *parent = child->mnt_parent; > + > + if (!list_empty(&parent->mnt_umounts)) > + list_splice_tail_init(&parent->mnt_umounts, &parent->mnt_mounts); > +} > + > + > /* > * collect all mounts that receive propagation from the mount in @list, > * and return these additional mounts in the same list. > @@ -447,14 +544,39 @@ static void __propagate_umount(struct mount *mnt) > * > * vfsmount lock must be held for write > */ > -int propagate_umount(struct list_head *list) > +void propagate_umount(struct list_head *list) > { > struct mount *mnt; > + LIST_HEAD(to_umount); > + LIST_HEAD(tmp_list); > + > + /* Find candidates for unmounting */ > + list_for_each_entry(mnt, list, mnt_list) { > + struct mount *child; > + for (child = propagation_visit_child(mnt, mnt); child; > + child = propagation_visit_child(child, mnt)) > + start_umount_propagation(child, &to_umount); > + } > > - list_for_each_entry_reverse(mnt, list, mnt_list) > - mark_umount_candidates(mnt); > + /* Begin unmounting */ > + while (!list_empty(&to_umount)) { > + mnt = list_first_entry(&to_umount, struct mount, mnt_child); > > - list_for_each_entry(mnt, list, mnt_list) > - __propagate_umount(mnt); > - return 0; > + list_del_init(&mnt->mnt_child); > + mnt->mnt.mnt_flags |= MNT_UMOUNT; > + list_move_tail(&mnt->mnt_list, &tmp_list); > + > + if (!list_empty(&mnt->mnt_umounts)) > + list_splice_tail_init(&mnt->mnt_umounts, &to_umount); > + } > + > + /* Cleanup the mount propagation tree */ > + list_for_each_entry(mnt, list, mnt_list) { > + struct mount *child; > + for (child = propagation_revisit_child(mnt, mnt); child; > + child = propagation_revisit_child(child, mnt)) > + end_umount_propagation(child); > + } > + > + list_splice_tail(&tmp_list, list); > } > diff --git a/fs/pnode.h b/fs/pnode.h > index 550f5a8b4fcf..38c6cdb96b34 100644 > --- a/fs/pnode.h > +++ b/fs/pnode.h > @@ -41,7 +41,7 @@ static inline void set_mnt_shared(struct mount *mnt) > void change_mnt_propagation(struct mount *, int); > int propagate_mnt(struct mount *, struct mountpoint *, struct mount *, > struct hlist_head *); > -int propagate_umount(struct list_head *); > +void propagate_umount(struct list_head *); > int propagate_mount_busy(struct mount *, int); > void propagate_mount_unlock(struct mount *); > void mnt_release_group_id(struct mount *); > -- > 2.10.1 >