Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp471808imw; Thu, 14 Jul 2022 05:08:15 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tr7ouoWO0Xrf6CzdAcVBHsK1cEx2t/lWoeUMu3zjwdE5A5uRBPr7pWldTCZzJgDXMa2o/R X-Received: by 2002:a05:6402:913:b0:43a:b594:93a8 with SMTP id g19-20020a056402091300b0043ab59493a8mr12097713edz.346.1657800495107; Thu, 14 Jul 2022 05:08:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657800495; cv=none; d=google.com; s=arc-20160816; b=BagcBILZFPvqckM8wMdST+4szTbpk8+t2hDlFnNiVzL1vmQoH4nJZwvqyA+TzWUM1d 0hniWD24DyJjTrHhM5u5Ai16tYPROZgIzKIPlZAXLjRCWEDYQc7Z5KI8zJPBgd6yB7YF UCFWQHwzINrKG0C3JsFu7ZXYvc192zl7JUafc9OpobvudMmDPYFx/KKzpSTtYb5LAPPb iSgTrEMcOJiKL+GkcadPj7y64FFRzDI+D4LusriB7YDjACa1JZgWTVFNoK4Y6J8aWr+T h9EHRazjVeli5gr7OStwI/i4fnGcEH70UF+FyLdPX344iANlqXKFjpjPTJsoA63gVCPW ei8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Eo54O2xCYx2NW0FGu98tWkr1EbZZUok8MtcsOnw642U=; b=XpGf/BHsOax4SBU4fBYpFme/zYlKVB0XTVsBedCg8PTlnyYdzcNwxQBxUvI1ANtynL NH+d5zuNPN5lXZPjvzlIR5OFAy0Hpw8Xdp2GUImBwofchoSYOlzLSERz6QeGq5plWAr6 0w1XqyPDp2WGP04Ij5k8xrNSk7wA79Eye1M6mM7+CSOJHriPvRwKin2T6SAxDTfY3yLX zAdwIEYEu01e23bF3lj4PKjnrb8aUdpLM9KbMVb2Iu/roVU7zZGceC/FGd52C9m3kKPU ooovbofTXwB5zo/Bp42jxzbgmEWt8yg7kGQlfM87//UUSoMFX66y+d2sQ/QGBR2Fn/9z IimQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YjwnzuDG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p9-20020a17090653c900b0072b64f0c355si2087960ejo.171.2022.07.14.05.07.50; Thu, 14 Jul 2022 05:08:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YjwnzuDG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229702AbiGNMBZ (ORCPT + 99 others); Thu, 14 Jul 2022 08:01:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229801AbiGNMBW (ORCPT ); Thu, 14 Jul 2022 08:01:22 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B321D5C362 for ; Thu, 14 Jul 2022 05:01:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1657800079; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Eo54O2xCYx2NW0FGu98tWkr1EbZZUok8MtcsOnw642U=; b=YjwnzuDGZMo3gzzDhwVlIS8vE1rjGqbk6MD5PwuaNLvRf+jHu5mbU6cvM7xvq9oSE6TmmQ 6HFRrc0YsBcQLvq1FtzSfbeam4NAM22XGi5/+096zL15g9HF/2LJh7/k9+jQGhQcg3D5Z9 YWjBmpbrzqh8ES2nRzzAKe6DqsanV7g= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-422-YwpCUZESN3WBImYq8pL5YA-1; Thu, 14 Jul 2022 08:01:18 -0400 X-MC-Unique: YwpCUZESN3WBImYq8pL5YA-1 Received: by mail-qk1-f198.google.com with SMTP id k190-20020a37bac7000000b006af6d953751so1006727qkf.13 for ; Thu, 14 Jul 2022 05:01:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=Eo54O2xCYx2NW0FGu98tWkr1EbZZUok8MtcsOnw642U=; b=CYG1BzVjjygYWomDpMPfRXy2QKx1jyIymNkm3knaM5whQogN84E7rnHN/IiSvsj8km zB/kKekMG0SpCYhEmZhYwjNOQQ6tfzQgi0KK82KRmnlVq4CL+0K/4f+SOxj6NDbu/R/i BKfR28AIICAqskKqe3RVtRuHKB+BNdiVxqP3zRYyp+c4EyyhJh1zofJ+VonIhLoFSkCO DPCSpwtKHGVL/d0ZA+6PO0CMJ6Vpsp8wmIZlIUb9c6BB5gnHkpcZ5F/33OB96pSC1VTz 3R3c1rntfVY57dZ9y3PREyJLLfPqwJ6GZ3tCRKmgedMPWyIZEZAYwAgdDk+2anQhOQGD mzSA== X-Gm-Message-State: AJIora9KwN9zgXgtROnom7klBC9B/hGtjkeKR8zxpJIDVSc4NxpdStip 0y3yMuHRbhURjixih7benp0OaClJ+U7OlvQDeVyeNr9PDZ80dG6irZQC1XnATv8bw5Z3psZtbEC akqQ+yihBn0nRg50i/1+wO8Ei X-Received: by 2002:ad4:5d6e:0:b0:473:72a7:5baf with SMTP id fn14-20020ad45d6e000000b0047372a75bafmr7668239qvb.80.1657800078125; Thu, 14 Jul 2022 05:01:18 -0700 (PDT) X-Received: by 2002:ad4:5d6e:0:b0:473:72a7:5baf with SMTP id fn14-20020ad45d6e000000b0047372a75bafmr7668200qvb.80.1657800077813; Thu, 14 Jul 2022 05:01:17 -0700 (PDT) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id n20-20020ac81e14000000b003177969a48fsm1334267qtl.21.2022.07.14.05.01.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 05:01:17 -0700 (PDT) Date: Thu, 14 Jul 2022 08:01:15 -0400 From: Brian Foster To: Zhihao Cheng Cc: ebiederm@xmission.com, willy@infradead.org, bhe@redhat.com, akpm@linux-foundation.org, kaleshsingh@google.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, yukuai3@huawei.com Subject: Re: [PATCH v5] proc: Fix a dentry lock race between release_task and lookup Message-ID: References: <20220713130029.4133533-1-chengzhihao1@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20220713130029.4133533-1-chengzhihao1@huawei.com> X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 13, 2022 at 09:00:29PM +0800, Zhihao Cheng wrote: > Commit 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc") > moved proc_flush_task() behind __exit_signal(). Then, process systemd > can take long period high cpu usage during releasing task in following > concurrent processes: > > systemd ps > kernel_waitid stat(/proc/tgid) > do_wait filename_lookup > wait_consider_task lookup_fast > release_task > __exit_signal > __unhash_process > detach_pid > __change_pid // remove task->pid_links > d_revalidate -> pid_revalidate // 0 > d_invalidate(/proc/tgid) > shrink_dcache_parent(/proc/tgid) > d_walk(/proc/tgid) > spin_lock_nested(/proc/tgid/fd) > // iterating opened fd > proc_flush_pid | > d_invalidate (/proc/tgid/fd) | > shrink_dcache_parent(/proc/tgid/fd) | > shrink_dentry_list(subdirs) ↓ > shrink_lock_dentry(/proc/tgid/fd) --> race on dentry lock > > Function d_invalidate() will remove dentry from hash firstly, but why does > proc_flush_pid() process dentry '/proc/tgid/fd' before dentry '/proc/tgid'? > That's because proc_pid_make_inode() adds proc inode in reverse order by > invoking hlist_add_head_rcu(). But proc should not add any inodes under > '/proc/tgid' except '/proc/tgid/task/pid', fix it by adding inode into > 'pid->inodes' only if the inode is /proc/tgid or /proc/tgid/task/pid. > > Performance regression: > Create 200 tasks, each task open one file for 50,000 times. Kill all > tasks when opened files exceed 10,000,000 (cat /proc/sys/fs/file-nr). > > Before fix: > $ time killall -wq aa > real 4m40.946s # During this period, we can see 'ps' and 'systemd' > taking high cpu usage. > > After fix: > $ time killall -wq aa > real 1m20.732s # During this period, we can see 'systemd' taking > high cpu usage. > > Fixes: 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216054 > Signed-off-by: Zhihao Cheng > Signed-off-by: Zhang Yi > Suggested-by: Brian Foster > --- LGTM, and thanks for the tweaks: Reviewed-by: Brian Foster > v1->v2: Add new helper proc_pid_make_base_inode that performs the extra > work of adding to the pid->list. > v2->v3: Add performance regression in commit message. > v3->v4: Make proc_pid_make_base_inode() static > v4->v5: Add notes to explain what proc_pid_make_base_inode() does > fs/proc/base.c | 46 ++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 38 insertions(+), 8 deletions(-) > > diff --git a/fs/proc/base.c b/fs/proc/base.c > index 8dfa36a99c74..93f7e3d971e4 100644 > --- a/fs/proc/base.c > +++ b/fs/proc/base.c > @@ -1885,7 +1885,7 @@ void proc_pid_evict_inode(struct proc_inode *ei) > put_pid(pid); > } > > -struct inode *proc_pid_make_inode(struct super_block * sb, > +struct inode *proc_pid_make_inode(struct super_block *sb, > struct task_struct *task, umode_t mode) > { > struct inode * inode; > @@ -1914,11 +1914,6 @@ struct inode *proc_pid_make_inode(struct super_block * sb, > > /* Let the pid remember us for quick removal */ > ei->pid = pid; > - if (S_ISDIR(mode)) { > - spin_lock(&pid->lock); > - hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes); > - spin_unlock(&pid->lock); > - } > > task_dump_owner(task, 0, &inode->i_uid, &inode->i_gid); > security_task_to_inode(task, inode); > @@ -1931,6 +1926,39 @@ struct inode *proc_pid_make_inode(struct super_block * sb, > return NULL; > } > > +/* > + * Generating an inode and adding it into @pid->inodes, so that task will > + * invalidate inode's dentry before being released. > + * > + * This helper is used for creating dir-type entries under '/proc' and > + * '/proc//task'. Other entries(eg. fd, stat) under '/proc/' > + * can be released by invalidating '/proc/' dentry. > + * In theory, dentries under '/proc//task' can also be released by > + * invalidating '/proc/' dentry, we reserve it to handle single > + * thread exiting situation: Any one of threads should invalidate its > + * '/proc//task/' dentry before released. > + */ > +static struct inode *proc_pid_make_base_inode(struct super_block *sb, > + struct task_struct *task, umode_t mode) > +{ > + struct inode *inode; > + struct proc_inode *ei; > + struct pid *pid; > + > + inode = proc_pid_make_inode(sb, task, mode); > + if (!inode) > + return NULL; > + > + /* Let proc_flush_pid find this directory inode */ > + ei = PROC_I(inode); > + pid = ei->pid; > + spin_lock(&pid->lock); > + hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes); > + spin_unlock(&pid->lock); > + > + return inode; > +} > + > int pid_getattr(struct user_namespace *mnt_userns, const struct path *path, > struct kstat *stat, u32 request_mask, unsigned int query_flags) > { > @@ -3369,7 +3397,8 @@ static struct dentry *proc_pid_instantiate(struct dentry * dentry, > { > struct inode *inode; > > - inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO); > + inode = proc_pid_make_base_inode(dentry->d_sb, task, > + S_IFDIR | S_IRUGO | S_IXUGO); > if (!inode) > return ERR_PTR(-ENOENT); > > @@ -3671,7 +3700,8 @@ static struct dentry *proc_task_instantiate(struct dentry *dentry, > struct task_struct *task, const void *ptr) > { > struct inode *inode; > - inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO); > + inode = proc_pid_make_base_inode(dentry->d_sb, task, > + S_IFDIR | S_IRUGO | S_IXUGO); > if (!inode) > return ERR_PTR(-ENOENT); > > -- > 2.31.1 >