Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755769AbXK0Mii (ORCPT ); Tue, 27 Nov 2007 07:38:38 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753742AbXK0Mi1 (ORCPT ); Tue, 27 Nov 2007 07:38:27 -0500 Received: from ebiederm.dsl.xmission.com ([166.70.28.69]:40151 "EHLO ebiederm.dsl.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753444AbXK0Mi0 (ORCPT ); Tue, 27 Nov 2007 07:38:26 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Pavel Emelyanov Cc: Linus Torvalds , Andrew Morton , "Rafael J. Wysocki" , Pavel Machek , kernel list , netdev Subject: Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage References: <20071119191000.GA1560@elf.ucw.cz> <200711192304.25087.rjw@sisk.pl> <4743026B.2020907@openvz.org> <474BFD83.6060500@openvz.org> Date: Tue, 27 Nov 2007 05:36:49 -0700 In-Reply-To: <474BFD83.6060500@openvz.org> (Pavel Emelyanov's message of "Tue, 27 Nov 2007 14:20:35 +0300") Message-ID: User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3249 Lines: 86 Pavel Emelyanov writes: > [snip] > >> >> Well I clearly goofed when I added the initial network namespace support >> for /proc/net. Currently things work but there are odd details visible >> to user space, even when we have a single network namespace. >> >> Since we do not cache proc_dir_entry dentries at the moment we can >> just modify ->lookup to return a different directory inode depending >> on the network namespace of the process looking at /proc/net, replacing >> the current technique of using a magic and fragile follow_link method. >> >> To accomplish that this patch: >> - introduces a shadow_proc method to allow different dentries to >> be returned from proc_lookup. >> - Removes the old /proc/net follow_link magic >> - Fixes a weakness in our not caching of proc generic dentries. >> >> As shadow_proc uses a task struct to decided which dentry to return we >> can go back later and fix the proc generic caching without modifying any code > that >> uses the shadow_proc method. >> >> Signed-off-by: Eric W. Biederman > > Thanks, Eric. > > Much better ('find /proc' works and so does 'ls ..'), but one > issue is still unsolved :( > > I mentioned the program, that opens the directory and dumps the > content of the /proc/self/fd. Here it is (stupid but simple): The cause is totally different this time, and is not limited to /proc/net. But this is the only side effect of the changes now. I used the one line version of your test program to confirm this: $ cd /proc/driver/ $ ls -l /proc/self/fd/100 100< . lr-x------ 1 eric eric 64 Nov 27 05:06 /proc/self/fd/100 -> /proc/driver/ $ ls -l /proc/self/fd/100 100< . lr-x------ 1 eric eric 64 Nov 27 05:07 /proc/self/fd/100 -> /proc/driver (deleted) What is happening is that the aggressive non-caching logic is dropping the dentries and thus they show up as deleted. I.e. When something triggers another lookup of that dentry revalidate drops the old dentry. This actually fixes a race in /proc today where if you open a file, remove the module for it, reload the module for that file, and then attempt to access it, lookup will return the old dentry with the old fops (even if there is not a current version of that file). kill_proc_inodes current keeps us from using the old version of the file_operations for directories (because it removes them) but it does not prevent this DOS attack where keeping a proc file open always ensures everyone will see the old version. Given that the behavior seems more correct then what we have currently I can live with files showing up as deleted from time to time. Ultimately the fix is to correct the caching logic in fs/proc/generic.c to only drop dentries when necessary and then the deleted markers will only show up when the file or directory really goes away. I don't expect user space will care about the semi-legitimate (deleted) marks. So I don't think this one down side will be a big deal for 2.6.24. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/