Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp178706ybv; Wed, 12 Feb 2020 21:55:53 -0800 (PST) X-Google-Smtp-Source: APXvYqwFzrR0OCcN/SNeL0+i3+iqWWeo6XTlxCsSZrxRsup5amTMGdyxjmXhbxNeRen33ppWgy7s X-Received: by 2002:a05:6830:184:: with SMTP id q4mr12326908ota.232.1581573353324; Wed, 12 Feb 2020 21:55:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581573353; cv=none; d=google.com; s=arc-20160816; b=x0bMzxWPkOgAdTCSS9Zd239Wi24J1Vtr++P/X+GUopuBeUxlu26APqKkCy6ekHtuYg +QJFyiNoDywDzgLXliYWspQH5QA10t4rVxyRaSfpVJaROj9CkBxK+cX5bDpKM0yC5N3M IVXWU0D66KOgVTaZf2RmnhHHHOkCsMfx74tQxcpyIvAs69pi3ddJlSLYoFqjHAByxPbZ Vlza0vCHh9DIAfFMlYn1s7tkDQAZE0bCv7/o4LxYl3OBhsplKctrOe4M6jrUXPpUDYWR a96Cf/mR0TaRxaHRlYCu5sP6SVlDbvIxSH7vPwC5nIuQtQlCVaVtnFpui5LsJpPL48hk rfQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=eBLL2KzE+3FhoyeeTZQssoN0wN6tV7dwN9nBAEnbA3Q=; b=Ee4RdnnH9Aa3M7exDW1X59L2AIchSUhwzeBUCvg2rRSnz2Oy29uNYngjdC9XrZWDk4 2Y69EsyHq6JRIo8C+xsq/f4N8arh0Esz/XvZsFrFFsRPhjodYIuy0/IbOVQXPV+isUvn bQXCqEpQOl2aLBfyeYGhGa4x5y7KqTdqiE7DeoyRLzUXov7CpiGglPnGMLua56O9HSBI IhetSd9irWZ7o6kdU+JUPbCjcWeVmtX5wsdfC3akXhXLcaNfaopwGvaFvlqO1onoaCHH kdBgZKLAcblJiALqESIFMTLetA9Z1TATiUEQua08xRXUf6t0b16fPHwjtWHXZzBGoK3j pgKA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p25si600693oto.191.2020.02.12.21.55.40; Wed, 12 Feb 2020 21:55:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729603AbgBMFzh (ORCPT + 99 others); Thu, 13 Feb 2020 00:55:37 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:49902 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726368AbgBMFzg (ORCPT ); Thu, 13 Feb 2020 00:55:36 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1j27TD-00Bp5a-8k; Thu, 13 Feb 2020 05:55:27 +0000 Date: Thu, 13 Feb 2020 05:55:27 +0000 From: Al Viro To: "Eric W. Biederman" Cc: Linus Torvalds , LKML , Kernel Hardening , Linux API , Linux FS Devel , Linux Security Module , Akinobu Mita , Alexey Dobriyan , Andrew Morton , Andy Lutomirski , Daniel Micay , Djalal Harouni , "Dmitry V . Levin" , Greg Kroah-Hartman , Ingo Molnar , "J . Bruce Fields" , Jeff Layton , Jonathan Corbet , Kees Cook , Oleg Nesterov , Solar Designer Subject: Re: [PATCH v8 07/11] proc: flush task dcache entries from all procfs instances Message-ID: <20200213055527.GS23230@ZenIV.linux.org.uk> References: <87v9obipk9.fsf@x220.int.ebiederm.org> <20200212200335.GO23230@ZenIV.linux.org.uk> <20200212203833.GQ23230@ZenIV.linux.org.uk> <20200212204124.GR23230@ZenIV.linux.org.uk> <87lfp7h422.fsf@x220.int.ebiederm.org> <87pnejf6fz.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87pnejf6fz.fsf@x220.int.ebiederm.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 12, 2020 at 10:37:52PM -0600, Eric W. Biederman wrote: > I think I have an alternate idea that could work. Add some extra code > into proc_task_readdir, that would look for dentries that no longer > point to tasks and d_invalidate them. With the same logic probably > being called from a few more places as well like proc_pid_readdir, > proc_task_lookup, and proc_pid_lookup. > > We could even optimize it and have a process died flag we set in the > superblock. > > That would would batch up the freeing work until the next time someone > reads from proc in a way that would create more dentries. So it would > prevent dentries from reaped zombies from growing without bound. > > Hmm. Given the existence of proc_fill_cache it would really be a good > idea if readdir and lookup performed some of the freeing work as well. > As on readdir we always populate the dcache for all of the directory > entries. First of all, that won't do a damn thing when nobody is accessing given superblock. What's more, readdir in root of that procfs instance is not enough - you need it in task/ of group leader. What I don't understand is the insistence on getting those dentries via dcache lookups. _IF_ we are willing to live with cacheline contention (on ->d_lock of root dentry, if nothing else), why not do the following: * put all dentries of such directories ([0-9]* and [0-9]*/task/*) into a list anchored in task_struct; have non-counting reference to task_struct stored in them (might simplify part of get_proc_task() users, BTW - avoids pid-to-task_struct lookups if we have a dentry and not just the inode; many callers do) * have ->d_release() remove from it (protecting per-task_struct lock nested outside of all ->d_lock) * on exit: lock the (per-task_struct) list while list is non-empty pick the first dentry remove from the list sb = dentry->d_sb try to bump sb->s_active (if non-zero, that is). if failed continue // move on to the next one - nothing to do here grab ->d_lock res = handle_it(dentry, &temp_list) drop ->d_lock unlock the list if (!list_empty(&temp_list)) shrink_dentry_list(&temp_list) if (res) d_invalidate(dentry) dput(dentry) deactivate_super(sb) lock the list unlock the list handle_it(dentry, temp_list) // ->d_lock held; that one should be in dcache.c if ->d_count is negative // unlikely return 0; if ->d_count is positive, increment ->d_count return 1; // OK, it's still alive, but ->d_count is 0 __d_drop // equivalent of d_invalidate in this case if not on a shrink list // otherwise it's not our headache if on lru list d_lru_del d_shrink_add dentry to temp_list return 0; And yeah, that'll dirty ->s_active for each procfs superblock that has dentry for our process present in dcache. On exit()...