Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp16712611rwd; Mon, 26 Jun 2023 14:06:53 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7/wTX75ITfxgd5r0PspiaTVFuJZPueNrTJQnw1BDVp+nisxWHsnk7NHWkIwM67NlfiW31b X-Received: by 2002:a17:907:a428:b0:98d:470d:9341 with SMTP id sg40-20020a170907a42800b0098d470d9341mr11142955ejc.27.1687813613699; Mon, 26 Jun 2023 14:06:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687813613; cv=none; d=google.com; s=arc-20160816; b=Z2gprQuLEFJgN6RouVXzWNdKdZlhG+VLvzoYSuxOY5jEgJEGfHWiDe2Vdi8L4LjFvQ t0Rx1NG1t7GAXIxg/ArauX0WOQYpjjiPPddou/g0qkKC1O0+GT/vqjNPi35M16BLW/1m 8Y0N9Pbl+ZRbBaEkcI+97Voqkb8Mgekmp4IPdlVyQVtS+2GD2GyITWDFzh97IAEI9B76 L4mk6E3koP8I6zhNyaWpLMdp3x4Rd39LL/utvSDi/KvdxpkCLY0EL9uRtwx9UyETU2u2 ujxaQXFWo7jTgvmNqnoGSfgiGCV7gLDjW95vieCA9ZbYThu10Ue37qjC+ghCBbx7RyEq Kv2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=fI8fnnmiPoJmQFuh4c2Nrpo4R9wQpHx0us5ixQXCUds=; fh=eX4gcFuMEF9zqB2PfyPZfy+qBIGqC5cZ9Gi3osG76zY=; b=Jr9IjgtO6DbVQX8CW5KQ0rl+nZu7YKQP5Jfjok+DA7ei5vG5arjw7cMxNA/WYRw3vy zA5OjxZ5DllgzQE+JcmhEXRAcW6XQvefrVc1MBF0ZeijLKh1HlALygJGMT4V5Y3PMjSy G4OOm0HbIV1+/X2Gd1ux5L6B1CaLqSKFVcAbPPeGgLJLZVrE2adROvDvvjaqA//48N25 rq0dK0VGyfb9koNOSOis7YFrsCiX2teTusVznJVBRHdmOopLqxlKPUvET7xqULEd7AuW +hPxKWhO5q+1/PVCSGe6yALw4ZynO7Om/aMxczkZvaVPWkTiEM+7fTuz6Kjefx0aVhLZ z22w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=KdBZhNK9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q16-20020a170906361000b009784f00c5besi3444897ejb.263.2023.06.26.14.06.28; Mon, 26 Jun 2023 14:06:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=KdBZhNK9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229806AbjFZUXD (ORCPT + 99 others); Mon, 26 Jun 2023 16:23:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229796AbjFZUXC (ORCPT ); Mon, 26 Jun 2023 16:23:02 -0400 Received: from mail-yb1-xb33.google.com (mail-yb1-xb33.google.com [IPv6:2607:f8b0:4864:20::b33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B7CD10CC for ; Mon, 26 Jun 2023 13:22:54 -0700 (PDT) Received: by mail-yb1-xb33.google.com with SMTP id 3f1490d57ef6-bb15165ba06so2329534276.2 for ; Mon, 26 Jun 2023 13:22:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687810973; x=1690402973; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fI8fnnmiPoJmQFuh4c2Nrpo4R9wQpHx0us5ixQXCUds=; b=KdBZhNK9qKnNXPXVMCshewDhGg7Rjwd00iMAcpAkvFEiyOvKyUUDf6Gj70sMcJhvdS CieDao3SRFumLtCmixoELkXHvRA6DUUWT8JUb+Vi/M/7X0WnqZDPT0LWulcyS+MygSRQ nv99lLDbXfVzUxV+bYHmOGxwgbjbP2jEP8z9XDxm4w8/a/FYEtWhvokGMMALfBK6H9Q6 YZurWzWi1WZh+i/THmo2fCoDpS3PVW29UNdnE9NThl8XqaC/11oB30qV36crKCYvVpqn EhKqCAsLl+Ag7Ve8wRos/U9nAXHMuJI2H8MKIArcRC982hha4blm07ZelKTCfR8w4MXr hIAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687810973; x=1690402973; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fI8fnnmiPoJmQFuh4c2Nrpo4R9wQpHx0us5ixQXCUds=; b=LlpHgq8Om8gXu09gb7YY0LNZ8IYCCS9WR36QTp1tBZhklzLeQisj4AKiBB98zCjN9R 8F0dkn5aAF7QnIeyNyyGDXNOEzxJ35IpcWqDHOVJzyRkb4vcyfxkl9EW6J3kEQdC2VpS QCXeyaosg0Lddrn+2d7XiO1Y0KfWU4u1uacafQw0YRMzAx892AatSChccz9Owx61x/Jl 7xikZymcNJjQWQBV1WvPMBeqi5BzQoDybfqqmc3PtMrdm8XQPs39VMCGYrLGonckYkL2 74NOFnyUtn1JYiDI9SyccjHsLDi1QWY2wUhP2aPXgLn7QjN+aQqMThYrSXRdkusiW2WL LGTw== X-Gm-Message-State: AC+VfDwtd+Bg3w+903u5qGDcrRP93wD+MCk5X7xi4reH7FmnXWH9CUSc gNK2hRbM42g/8KqussNyrswP8Kbz4mTkrrQvTW/nqg== X-Received: by 2002:a25:abf3:0:b0:bab:ef61:8b31 with SMTP id v106-20020a25abf3000000b00babef618b31mr19830542ybi.53.1687810973428; Mon, 26 Jun 2023 13:22:53 -0700 (PDT) MIME-Version: 1.0 References: <20230614070733.113068-1-lujialin4@huawei.com> <20230614174004.GC1146@sol.localdomain> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 26 Jun 2023 13:22:42 -0700 Message-ID: Subject: Re: [PATCH v2] poll: Fix use-after-free in poll_freewait() To: "lujialin (A)" Cc: Eric Biggers , Tejun Heo , Johannes Weiner , Andrew Morton , Alexander Viro , Christian Brauner , Oleg Nesterov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 23, 2023 at 6:58=E2=80=AFPM Suren Baghdasaryan wrote: > > On Tue, Jun 20, 2023 at 5:09=E2=80=AFPM Suren Baghdasaryan wrote: > > > > On Sun, Jun 18, 2023 at 6:28=E2=80=AFAM lujialin (A) wrote: > > > > > > Hi Suren: > > > > > > kernel config: > > > x86_64_defconfig > > > CONFIG_PSI=3Dy > > > CONFIG_SLUB_DEBUG=3Dy > > > CONFIG_SLUB_DEBUG_ON=3Dy > > > CONFIG_KASAN=3Dy > > > CONFIG_KASAN_INLINE=3Dy > > > > > > I make some change in code, in order to increase the recurrence proba= bility. > > > diff --git a/fs/select.c b/fs/select.c > > > index 5edffee1162c..5ee5b74a8386 100644 > > > --- a/fs/select.c > > > +++ b/fs/select.c > > > @@ -139,6 +139,7 @@ void poll_freewait(struct poll_wqueues *pwq) > > > { > > > struct poll_table_page * p =3D pwq->table; > > > int i; > > > + mdelay(50); > > > for (i =3D 0; i < pwq->inline_index; i++) > > > free_poll_entry(pwq->inline_entries + i); > > > while (p) { > > > > > > Here is the simple repo test.sh: > > > #!/bin/bash > > > > > > RESOURCE_TYPES=3D("cpu" "memory" "io" "irq") > > > #RESOURCE_TYPES=3D("cpu") > > > cgroup_num=3D50 > > > test_dir=3D/sys/fs/cgroup/test > > > > > > function restart_cgroup() { > > > num=3D$(expr $RANDOM % $cgroup_num + 1) > > > rmdir $test_dir/test_$num > > > mkdir $test_dir/test_$num > > > } > > > > > > function create_triggers() { > > > num=3D$(expr $RANDOM % $cgroup_num + 1) > > > random=3D$(expr $RANDOM % "${#RESOURCE_TYPES[@]}") > > > psi_type=3D"${RESOURCE_TYPES[${random}]}" > > > ./psi_monitor $test_dir/test_$num $psi_type & > > > } > > > > > > mkdir $test_dir > > > for i in $(seq 1 $cgroup_num) > > > do > > > mkdir $test_dir/test_$i > > > done > > > for j in $(seq 1 100) > > > do > > > restart_cgroup & > > > create_triggers & > > > done > > > > > > psi_monitor.c: > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > > > > int main(int argc, char *argv[]) { > > > const char trig[] =3D "full 1000000 1000000"; > > > struct pollfd fds; > > > char filename[100]; > > > > > > sprintf(filename, "%s/%s.pressure", argv[1], argv[2]); > > > > > > fds.fd =3D open(filename, O_RDWR | O_NONBLOCK); > > > if (fds.fd < 0) { > > > printf("%s open error: %s\n", filename,strerror(errn= o)); > > > return 1; > > > } > > > fds.events =3D POLLPRI; > > > if (write(fds.fd, trig, strlen(trig) + 1) < 0) { > > > printf("%s write error: %s\n",filename,strerror(errn= o)); > > > return 1; > > > } > > > while (1) { > > > poll(&fds, 1, -1); > > > } > > > close(fds.fd); > > > return 0; > > > } > > > > Thanks a lot! > > I'll try to get this reproduced and fixed by the end of this week. > > Ok, I was able to reproduce the issue and I think the ultimate problem > here is that kernfs_release_file() can be called from both > kernfs_fop_release() and kernfs_drain_open_files(). While > kernfs_fop_release() is called when the FD's refcount is 0 and there > are no users, kernfs_drain_open_files() can be called while there are > still other users. In our scenario, kn->attr.ops->release points to > cgroup_pressure_release() which destroys the psi trigger thinking that > (since the file is "released") there could be no other users. So, > shell process which issues the rmdir command will destroy the trigger > once cgroup_pressure_release() is called and the reproducer will step > on the freed wait_queue_head. The way kernfs_release_file() is > implemented, it ensures that kn->attr.ops->release(of) is called only > once (the first time), therefore cgroup_pressure_release() is never > called in reproducer's context. That prevents me from implementing > some kind of refcounting schema for psi triggers because we are never > notified when the last user is gone. > I think to fix this I would need to modify kernfs_release_file() and > maybe add another operation in kernfs_ops to indicate that the last > user is gone (smth like final_release()). It's not pretty but so far I > did not find a better way. I'll think some more over the weekend and > will try to post a patch implementing the fix on Monday. Posted 2 patches to fix this at: https://lore.kernel.org/all/20230626201713.1204982-1-surenb@google.com/ Thanks, Suren. > Thanks, > Suren. > > > > Suren. > > > > > Thanks, > > > Lu > > > =E5=9C=A8 2023/6/16 7:13, Suren Baghdasaryan =E5=86=99=E9=81=93: > > > > On Wed, Jun 14, 2023 at 11:19=E2=80=AFAM Suren Baghdasaryan wrote: > > > >> > > > >> On Wed, Jun 14, 2023 at 10:40=E2=80=AFAM Eric Biggers wrote: > > > >>> > > > >>> On Wed, Jun 14, 2023 at 03:07:33PM +0800, Lu Jialin wrote: > > > >>>> We found a UAF bug in remove_wait_queue as follows: > > > >>>> > > > >>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>>> BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x71/0xe0 > > > >>>> Write of size 4 at addr ffff8881150d7b28 by task psi_trigger/153= 06 > > > >>>> Call Trace: > > > >>>> dump_stack+0x9c/0xd3 > > > >>>> print_address_description.constprop.0+0x19/0x170 > > > >>>> __kasan_report.cold+0x6c/0x84 > > > >>>> kasan_report+0x3a/0x50 > > > >>>> check_memory_region+0xfd/0x1f0 > > > >>>> _raw_spin_lock_irqsave+0x71/0xe0 > > > >>>> remove_wait_queue+0x26/0xc0 > > > >>>> poll_freewait+0x6b/0x120 > > > >>>> do_sys_poll+0x305/0x400 > > > >>>> do_syscall_64+0x33/0x40 > > > >>>> entry_SYSCALL_64_after_hwframe+0x61/0xc6 > > > >>>> > > > >>>> Allocated by task 15306: > > > >>>> kasan_save_stack+0x1b/0x40 > > > >>>> __kasan_kmalloc.constprop.0+0xb5/0xe0 > > > >>>> psi_trigger_create.part.0+0xfc/0x450 > > > >>>> cgroup_pressure_write+0xfc/0x3b0 > > > >>>> cgroup_file_write+0x1b3/0x390 > > > >>>> kernfs_fop_write_iter+0x224/0x2e0 > > > >>>> new_sync_write+0x2ac/0x3a0 > > > >>>> vfs_write+0x365/0x430 > > > >>>> ksys_write+0xd5/0x1b0 > > > >>>> do_syscall_64+0x33/0x40 > > > >>>> entry_SYSCALL_64_after_hwframe+0x61/0xc6 > > > >>>> > > > >>>> Freed by task 15850: > > > >>>> kasan_save_stack+0x1b/0x40 > > > >>>> kasan_set_track+0x1c/0x30 > > > >>>> kasan_set_free_info+0x20/0x40 > > > >>>> __kasan_slab_free+0x151/0x180 > > > >>>> kfree+0xba/0x680 > > > >>>> cgroup_file_release+0x5c/0xe0 > > > >>>> kernfs_drain_open_files+0x122/0x1e0 > > > >>>> kernfs_drain+0xff/0x1e0 > > > >>>> __kernfs_remove.part.0+0x1d1/0x3b0 > > > >>>> kernfs_remove_by_name_ns+0x89/0xf0 > > > >>>> cgroup_addrm_files+0x393/0x3d0 > > > >>>> css_clear_dir+0x8f/0x120 > > > >>>> kill_css+0x41/0xd0 > > > >>>> cgroup_destroy_locked+0x166/0x300 > > > >>>> cgroup_rmdir+0x37/0x140 > > > >>>> kernfs_iop_rmdir+0xbb/0xf0 > > > >>>> vfs_rmdir.part.0+0xa5/0x230 > > > >>>> do_rmdir+0x2e0/0x320 > > > >>>> __x64_sys_unlinkat+0x99/0xc0 > > > >>>> do_syscall_64+0x33/0x40 > > > >>>> entry_SYSCALL_64_after_hwframe+0x61/0xc6 > > > >>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>>> > > > >>>> If using epoll(), wake_up_pollfree will empty waitqueue and set > > > >>>> wait_queue_head is NULL before free waitqueue of psi trigger. Bu= t is > > > >>>> doesn't work when using poll(), which will lead a UAF problem in > > > >>>> poll_freewait coms as following: > > > >>>> > > > >>>> (cgroup_rmdir) | > > > >>>> psi_trigger_destroy | > > > >>>> wake_up_pollfree(&t->event_wait) | > > > >>>> synchronize_rcu(); | > > > >>>> kfree(t) | > > > >>>> | (poll_freewait) > > > >>>> | free_poll_entry(pwq->in= line_entries + i) > > > >>>> | remove_wait_queue(ent= ry->wait_address) > > > >>>> | spin_lock_irqsave(&= wq_head->lock) > > > >>>> > > > >>>> entry->wait_address in poll_freewait() is t->event_wait in cgrou= p_rmdir(). > > > >>>> t->event_wait is free in psi_trigger_destroy before call poll_fr= eewait(), > > > >>>> therefore wq_head in poll_freewait() has been already freed, whi= ch would > > > >>>> lead to a UAF. > > > > > > > > Hi Lu, > > > > Could you please share your reproducer along with the kernel config > > > > you used? I'm trying to reproduce this UAF but every time I delete = the > > > > cgroup being polled, poll() simply returns POLLERR. > > > > Thanks, > > > > Suren. > > > > > > > >>>> > > > >>>> similar problem for epoll() has been fixed commit c2dbe32d5db5 > > > >>>> ("sched/psi: Fix use-after-free in ep_remove_wait_queue()"). > > > >>>> epoll wakeup function ep_poll_callback() will empty waitqueue an= d set > > > >>>> wait_queue_head is NULL when pollflags is POLLFREE and judge pwq= ->whead > > > >>>> is NULL or not before remove_wait_queue in ep_remove_wait_queue(= ), > > > >>>> which will fix the UAF bug in ep_remove_wait_queue. > > > >>>> > > > >>>> But poll wakeup function pollwake() doesn't do that. To fix the > > > >>>> problem, we empty waitqueue and set wait_address is NULL in poll= wake() when > > > >>>> key is POLLFREE. otherwise in remove_wait_queue, which is simila= r to > > > >>>> epoll(). > > > >>>> > > > >>>> Fixes: 0e94682b73bf ("psi: introduce psi monitor") > > > >>>> Suggested-by: Suren Baghdasaryan > > > >>>> Link: https://lore.kernel.org/all/CAJuCfpEoCRHkJF-=3D1Go9E94wchB= 4BzwQ1E3vHGWxNe+tEmSJoA@mail.gmail.com/#t > > > >>>> Signed-off-by: Lu Jialin > > > >>>> --- > > > >>>> v2: correct commit msg and title suggested by Suren Baghdasaryan > > > >>>> --- > > > >>>> fs/select.c | 20 +++++++++++++++++++- > > > >>>> 1 file changed, 19 insertions(+), 1 deletion(-) > > > >>>> > > > >>>> diff --git a/fs/select.c b/fs/select.c > > > >>>> index 0ee55af1a55c..e64c7b4e9959 100644 > > > >>>> --- a/fs/select.c > > > >>>> +++ b/fs/select.c > > > >>>> @@ -132,7 +132,17 @@ EXPORT_SYMBOL(poll_initwait); > > > >>>> > > > >>>> static void free_poll_entry(struct poll_table_entry *entry) > > > >>>> { > > > >>>> - remove_wait_queue(entry->wait_address, &entry->wait); > > > >>>> + wait_queue_head_t *whead; > > > >>>> + > > > >>>> + rcu_read_lock(); > > > >>>> + /* If it is cleared by POLLFREE, it should be rcu-safe. > > > >>>> + * If we read NULL we need a barrier paired with smp_store= _release() > > > >>>> + * in pollwake(). > > > >>>> + */ > > > >>>> + whead =3D smp_load_acquire(&entry->wait_address); > > > >>>> + if (whead) > > > >>>> + remove_wait_queue(whead, &entry->wait); > > > >>>> + rcu_read_unlock(); > > > >>>> fput(entry->filp); > > > >>>> } > > > >>>> > > > >>>> @@ -215,6 +225,14 @@ static int pollwake(wait_queue_entry_t *wai= t, unsigned mode, int sync, void *key > > > >>>> entry =3D container_of(wait, struct poll_table_entry, wai= t); > > > >>>> if (key && !(key_to_poll(key) & entry->key)) > > > >>>> return 0; > > > >>>> + if (key_to_poll(key) & POLLFREE) { > > > >>>> + list_del_init(&wait->entry); > > > >>>> + /* wait_address !=3DNULL protects us from the race= with > > > >>>> + * poll_freewait(). > > > >>>> + */ > > > >>>> + smp_store_release(&entry->wait_address, NULL); > > > >>>> + return 0; > > > >>>> + } > > > >>>> return __pollwake(wait, mode, sync, key); > > > >>> > > > >>> I don't understand why this patch is needed. > > > >>> > > > >>> The last time I looked at POLLFREE, it is only needed because of = asynchronous > > > >>> polls. See my explanation in the commit message of commit 50252e= 4b5e989ce6. > > > >> > > > >> Ah, I missed that. Thanks for the correction. > > > >> > > > >>> > > > >>> In summary, POLLFREE solves the problem of polled waitqueues whos= e lifetime is > > > >>> tied to the current task rather than to the file being polled. A= lso refer to > > > >>> the comment above wake_up_pollfree(), which mentions this. > > > >>> > > > >>> fs/select.c is synchronous polling, not asynchronous. Therefore,= it should not > > > >>> need to handle POLLFREE. > > > >>> > > > >>> If there's actually a bug here, most likely it's a bug in psi_tri= gger_poll() > > > >>> where it is using a waitqueue whose lifetime is tied to neither t= he current task > > > >>> nor the file being polled. That needs to be fixed. > > > >> > > > >> Yeah. We discussed this issue in > > > >> https://lore.kernel.org/all/CAJuCfpFb0J5ZwO6kncjRG0_4jQLXUy-_dicpH= 5uGiWP8aKYEJQ@mail.gmail.com > > > >> and the root cause is that cgroup_file_release() where > > > >> psi_trigger_destroy() is called is not tied to the cgroup file's r= eal > > > >> lifetime (see my analysis here: > > > >> https://lore.kernel.org/all/CAJuCfpFZ3B4530TgsSHqp5F_gwfrDujwRYewK= ReJru=3D=3DMdEHQg@mail.gmail.com/#t). > > > >> I guess it's time to do a deeper surgery and figure out a way to c= all > > > >> psi_trigger_destroy() when the polled cgroup file is actually bein= g > > > >> destroyed. I'll take a closer look into this later today. > > > >> A fix will likely require some cgroup or kernfs code changes, so > > > >> CC'ing Tejun for visibility. > > > >> Thanks, > > > >> Suren. > > > >> > > > >>> > > > >>> - Eric > > > > . > > > >