Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp511382pxu; Thu, 7 Jan 2021 10:29:41 -0800 (PST) X-Google-Smtp-Source: ABdhPJy2o75wiuFxGt7n2jk21utdJygJYUBUr3aeqPbGL5towjDLnRKnF43PxupMKPmEq/dJG+TR X-Received: by 2002:aa7:c7d8:: with SMTP id o24mr2694379eds.328.1610044181182; Thu, 07 Jan 2021 10:29:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610044181; cv=none; d=google.com; s=arc-20160816; b=LE7TS6J0kPCKjmXdpp1umw2LSNibf/wr5+mImqsp3eeANhTeef1aBCzpsKCZbge9l9 MYcIEDpH/AB0Wn2mc9gblkTMBWl/tOPwSk5d3fFpnNQ60c0VV70lsfn/oGA7IwP/amhQ BikdPecV/x9zy66igEO/lSCpKMX6iEv3Ddrv1q4COTCPexA8q8D9SUoGhRMpuyDEpHx5 VnDIEBkd1jh+LCR1MAMbo/U7vBRBzced6EMehJJ+CrEdlq6ZmMUEn33Ur3PiYJgoNbVu ygvc9M5rjHuuQPDubxrNhCZcw+nhqZ24qeLZf0WwBKSUgzT9PbjHTJnox6GunUrTIxOs nCNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=9462JkE63EQA6UHNCl8dwtoqe+hYPOjExl/+gyvV9OU=; b=aWvcerhyiSfZSfGsKqn9LkOceMw35davu4D2pxYdUJmboFHCCrY2Gbf4sftQhKYyPU FhZrU4wi5zJ1eTnovO9HEyvm5cNJBhfbTOaTwUYfBY4vKQbGFJV9HsNgfevlsmLx2jfL 3kzR32uSFOd2WEqrn5l9c1oqvGflQOFcW8Hw1qWm7pjiRv9Z4KBa6V/dk8xg2oHH9L6p DBzHQSwZZ15w0L8s4Rm7kwCn28pZP1G8V0XmbeC/X8pPCNNcfFqvF3Mn4SezKiFQ4PEb QvK9mzCv49gwiiObS7WGz3Faq2cNy9xxjRpPn4l2fe32t/QcN7Fa1ySPZfRsQ5qpY7OT dpIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=Yc07YXQh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c5si2461392ejz.259.2021.01.07.10.29.16; Thu, 07 Jan 2021 10:29:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=Yc07YXQh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728257AbhAGS2J (ORCPT + 99 others); Thu, 7 Jan 2021 13:28:09 -0500 Received: from mail.kernel.org ([198.145.29.99]:37104 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727061AbhAGS2J (ORCPT ); Thu, 7 Jan 2021 13:28:09 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8A6FB23428; Thu, 7 Jan 2021 18:27:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1610044048; bh=66kiyYV4QsUShqFsHnKrli1XPFRb81YdO3zpTdzneUE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Yc07YXQh5qOLv6gXRzkf9/EYL2XST1GsgVq09BofgNz/SUWu92n9WD31+iF+Ksg4f rCkdjk9MI15VD4Qvv51C7QR0mfpdhV7zZjrPJxGF7RTlbxwUJqTdOeWEnUxYsicQar HUl39QgSkqTLAFejeACSRqMKK2iX77iFBDXhK/Fo= Date: Thu, 7 Jan 2021 19:28:47 +0100 From: Greg Kroah-Hartman To: Wen Yang , Christian Brauner Cc: Sasha Levin , Xunlei Pang , linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 4.9 00/10] fix a race in release_task when flushing the dentry Message-ID: References: <20210107075222.62623-1-wenyang@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 08, 2021 at 12:21:38AM +0800, Wen Yang wrote: > > > 在 2021/1/7 下午8:17, Greg Kroah-Hartman 写道: > > On Thu, Jan 07, 2021 at 03:52:12PM +0800, Wen Yang wrote: > > > The dentries such as /proc//ns/ have the DCACHE_OP_DELETE flag, they > > > should be deleted when the process exits. > > > > > > Suppose the following race appears: > > > > > > release_task dput > > > -> proc_flush_task > > > -> dentry->d_op->d_delete(dentry) > > > -> __exit_signal > > > -> dentry->d_lockref.count-- and return. > > > > > > In the proc_flush_task(), if another process is using this dentry, it will > > > not be deleted. At the same time, in dput(), d_op->d_delete() can be executed > > > before __exit_signal(pid has not been hashed), d_delete returns false, so > > > this dentry still cannot be deleted. > > > > > > This dentry will always be cached (although its count is 0 and the > > > DCACHE_OP_DELETE flag is set), its parent denry will also be cached too, and > > > these dentries can only be deleted when drop_caches is manually triggered. > > > > > > This will result in wasted memory. What's more troublesome is that these > > > dentries reference pid, according to the commit f333c700c610 ("pidns: Add a > > > limit on the number of pid namespaces"), if the pid cannot be released, it > > > may result in the inability to create a new pid_ns. > > > > > > This issue was introduced by 60347f6716aa ("pid namespaces: prepare > > > proc_flust_task() to flush entries from multiple proc trees"), exposed by > > > f333c700c610 ("pidns: Add a limit on the number of pid namespaces"), and then > > > fixed by 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc"). > > > > Why are you just submitting a series for 4.9 and 4.19, what about 4.14? > > We can't have users move to a newer kernel and then experience old bugs, > > right? > > > Okay, the patches corresponding to 4.14 will be ready later. Note for some reason you didn't cc: the stable list for these patches :( > > But the larger question is why are you backporting a whole new feature > > here? Why is CLONE_PIDFD needed? That feels really wrong... > > > > The reason for backporting CLONE_PIDFD is because 7bc3e6e55acf ("proc: Use a > list of inodes to flush from proc") relies on wait_pidfd.lock. There are > indeed many associated modifications here. We are also testing it. Please > check the code more. Is the only "issue" here wasted memory? Will it eventually be freed anyway even if you do not echo to the proc file to flush caches? You mention the inability to create a new pid for a specific namespace, is that really a problem? Shouldn't the code handle such issues normally? What breaks without these changes? I think at this point, it might just time for you to move to a newer kernel release, as adding a whole new userspace feature for this feels really really odd. What is preventing you from doing that today? What holds you to older kernels that will not allow you to move forward? thanks, greg k-h