Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1807764rwd; Thu, 15 Jun 2023 16:22:07 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5toycxIppK1KhnlHAilMCPkaAvATlid8AZrA1cENsiPJi/zBwCjGMq/F5+MZtkx+iBLZdV X-Received: by 2002:a05:6a00:8d5:b0:666:ad0c:c0f4 with SMTP id s21-20020a056a0008d500b00666ad0cc0f4mr608962pfu.23.1686871327361; Thu, 15 Jun 2023 16:22:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686871327; cv=none; d=google.com; s=arc-20160816; b=twMZm1jevlKQtwn9AuuvK5uWZudIu/9l7s4PTLY5pWmlzq0QbgErnPHzIFXNiE3tFG +pV6SJ5Vh/NyBy26XS65ROO2VKZX212m9xWtdzUAHsmvWmlt4tORlh84CIBtIUwcCIct F8apd+aKZ07qtIrL0xmqJrU3RfOEktwNaRItBEPb93bYY1auYfCviGb7IcIFlbFOV4UK XfZMkA1uXq0DWp3OfoiW8mG0v3oMzS2o1cQteHL7HcoeCwO3GpDEt+rTL3SME/WyAPmk vpt51YqYG6tzrICBqeEzuZzVhIZbNaA2wzolMqN7CAz3bxeNCDXWBa/1VIJ9+u6ZAFni 9WHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=R0BL9jQqU/XGVw4elhsSufWdE9l6+qPNuoKqN07c2CM=; b=KWZRrB60cugc9u2XxwLO96TyV67yblUY9vDpWM0MNZFUjefNeOHnvPi+d0jta+RpVL gin+D3/0k7jxQRmw+yVdR9JGMVPbkUM8Waut863TePUhY2U/a/UTm2KAOubFe/iUpz3r LwBeETa/mOEveATe6fvOkTd9m5EajT0X6KI2smS805HwpyRCOCXqb69J7mLsdgAzEbw0 z9zxJGT0XOPF8xwtgDwuI4dP7K44V01m4WlZVitFthZBkO2RFjkrNv1pAizrZ4vrY+ub nkplhWzCMLiCnBYOuRzUHCnXafW/Rr5hcC8usM4smuB+zR5GkHeZAyeWEdxt3eIzAGGQ sbaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=6GrOQdgk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e25-20020aa79819000000b0064d2cca82b1si13954330pfl.48.2023.06.15.16.21.52; Thu, 15 Jun 2023 16:22:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=6GrOQdgk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237077AbjFOXNh (ORCPT + 99 others); Thu, 15 Jun 2023 19:13:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232033AbjFOXNg (ORCPT ); Thu, 15 Jun 2023 19:13:36 -0400 Received: from mail-yb1-xb35.google.com (mail-yb1-xb35.google.com [IPv6:2607:f8b0:4864:20::b35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDD352960 for ; Thu, 15 Jun 2023 16:13:34 -0700 (PDT) Received: by mail-yb1-xb35.google.com with SMTP id 3f1490d57ef6-bc572dbef27so38055276.3 for ; Thu, 15 Jun 2023 16:13:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686870814; x=1689462814; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=R0BL9jQqU/XGVw4elhsSufWdE9l6+qPNuoKqN07c2CM=; b=6GrOQdgkP0IPTkvB/7dDt++fQpu/DLFhFLM7vG/VsboxPl9l0Yf4fGQlgp7uejo1AJ X28m8krYfJh7vK05bnsEtu8Yz3gAJXr9QBDXKEOQSWae8Fn2FHSK0TPmKJVy+GLU5Ujk MgxovJQXXkU3j6CjPANl+dYxhAVvXKwzlFQQpzwoWs6+eSn8q5IFZ2C+Qwm8Dm2NAoMu iJ0XxfahlhrHU4GUWiHtT22vyYf0wB3oLMs/ClAY7ICtqSqE9+d/6hJERE1Xx1v99Hnr /DjEH50dEsIRnvYzETOm6/j/RE4EaxS9MIvMeYLtg3YX4ghuEcT84zz96css2fsAKc7U fJFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686870814; x=1689462814; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R0BL9jQqU/XGVw4elhsSufWdE9l6+qPNuoKqN07c2CM=; b=H4iZOyRI+fmYM67r2/yM+epmvyfAEQcAxE/vAhLRBjtC8J+7g1tfFdFrx3CEONxyWS 7G+9qPwup8URBgZw/Roxfo2gIZtih8mMS4yo2jAmZLkf9py1HDbzkwjWEQPkjg5XZtrI VTnO4jGgEYquGHq75ZE6WIe5ms2SlIfsi4kMsnwdJ4c6tvl1f87RZAqC5ZSJ2mHdENkk ApVtJCaCF2XMmTmGQ5x8vXPupSDzdbjcKmc/P/tlkKtNEfcd76zcBiRAiO5MU7QRnDaY PCxHDnsp4kAVZa//Uw1p8qYCPjLa8IQnsBIqB3QaTyaIPRKxM91hxBKCqM6LtYDWIqVD 2VoQ== X-Gm-Message-State: AC+VfDxfRvlUtG4ojhCr2T/rHEYd+W2GxfcnLZu16AK3vX6ETOGvF3Rj G1VimoMtc11+HZm8DvJrcTLD2/nQFsZImLjQJYis5Q== X-Received: by 2002:a25:be8e:0:b0:bc9:cbe:99dc with SMTP id i14-20020a25be8e000000b00bc90cbe99dcmr5297617ybk.60.1686870813634; Thu, 15 Jun 2023 16:13:33 -0700 (PDT) MIME-Version: 1.0 References: <20230614070733.113068-1-lujialin4@huawei.com> <20230614174004.GC1146@sol.localdomain> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 15 Jun 2023 16:13:22 -0700 Message-ID: Subject: Re: [PATCH v2] poll: Fix use-after-free in poll_freewait() To: Eric Biggers Cc: Tejun Heo , Lu Jialin , Johannes Weiner , Andrew Morton , Alexander Viro , Christian Brauner , Oleg Nesterov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 14, 2023 at 11:19=E2=80=AFAM Suren Baghdasaryan wrote: > > On Wed, Jun 14, 2023 at 10:40=E2=80=AFAM Eric Biggers wrote: > > > > On Wed, Jun 14, 2023 at 03:07:33PM +0800, Lu Jialin wrote: > > > We found a UAF bug in remove_wait_queue as follows: > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x71/0xe0 > > > Write of size 4 at addr ffff8881150d7b28 by task psi_trigger/15306 > > > Call Trace: > > > dump_stack+0x9c/0xd3 > > > print_address_description.constprop.0+0x19/0x170 > > > __kasan_report.cold+0x6c/0x84 > > > kasan_report+0x3a/0x50 > > > check_memory_region+0xfd/0x1f0 > > > _raw_spin_lock_irqsave+0x71/0xe0 > > > remove_wait_queue+0x26/0xc0 > > > poll_freewait+0x6b/0x120 > > > do_sys_poll+0x305/0x400 > > > do_syscall_64+0x33/0x40 > > > entry_SYSCALL_64_after_hwframe+0x61/0xc6 > > > > > > Allocated by task 15306: > > > kasan_save_stack+0x1b/0x40 > > > __kasan_kmalloc.constprop.0+0xb5/0xe0 > > > psi_trigger_create.part.0+0xfc/0x450 > > > cgroup_pressure_write+0xfc/0x3b0 > > > cgroup_file_write+0x1b3/0x390 > > > kernfs_fop_write_iter+0x224/0x2e0 > > > new_sync_write+0x2ac/0x3a0 > > > vfs_write+0x365/0x430 > > > ksys_write+0xd5/0x1b0 > > > do_syscall_64+0x33/0x40 > > > entry_SYSCALL_64_after_hwframe+0x61/0xc6 > > > > > > Freed by task 15850: > > > kasan_save_stack+0x1b/0x40 > > > kasan_set_track+0x1c/0x30 > > > kasan_set_free_info+0x20/0x40 > > > __kasan_slab_free+0x151/0x180 > > > kfree+0xba/0x680 > > > cgroup_file_release+0x5c/0xe0 > > > kernfs_drain_open_files+0x122/0x1e0 > > > kernfs_drain+0xff/0x1e0 > > > __kernfs_remove.part.0+0x1d1/0x3b0 > > > kernfs_remove_by_name_ns+0x89/0xf0 > > > cgroup_addrm_files+0x393/0x3d0 > > > css_clear_dir+0x8f/0x120 > > > kill_css+0x41/0xd0 > > > cgroup_destroy_locked+0x166/0x300 > > > cgroup_rmdir+0x37/0x140 > > > kernfs_iop_rmdir+0xbb/0xf0 > > > vfs_rmdir.part.0+0xa5/0x230 > > > do_rmdir+0x2e0/0x320 > > > __x64_sys_unlinkat+0x99/0xc0 > > > do_syscall_64+0x33/0x40 > > > entry_SYSCALL_64_after_hwframe+0x61/0xc6 > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > If using epoll(), wake_up_pollfree will empty waitqueue and set > > > wait_queue_head is NULL before free waitqueue of psi trigger. But is > > > doesn't work when using poll(), which will lead a UAF problem in > > > poll_freewait coms as following: > > > > > > (cgroup_rmdir) | > > > psi_trigger_destroy | > > > wake_up_pollfree(&t->event_wait) | > > > synchronize_rcu(); | > > > kfree(t) | > > > | (poll_freewait) > > > | free_poll_entry(pwq->inline_e= ntries + i) > > > | remove_wait_queue(entry->wa= it_address) > > > | spin_lock_irqsave(&wq_hea= d->lock) > > > > > > entry->wait_address in poll_freewait() is t->event_wait in cgroup_rmd= ir(). > > > t->event_wait is free in psi_trigger_destroy before call poll_freewai= t(), > > > therefore wq_head in poll_freewait() has been already freed, which wo= uld > > > lead to a UAF. Hi Lu, Could you please share your reproducer along with the kernel config you used? I'm trying to reproduce this UAF but every time I delete the cgroup being polled, poll() simply returns POLLERR. Thanks, Suren. > > > > > > similar problem for epoll() has been fixed commit c2dbe32d5db5 > > > ("sched/psi: Fix use-after-free in ep_remove_wait_queue()"). > > > epoll wakeup function ep_poll_callback() will empty waitqueue and set > > > wait_queue_head is NULL when pollflags is POLLFREE and judge pwq->whe= ad > > > is NULL or not before remove_wait_queue in ep_remove_wait_queue(), > > > which will fix the UAF bug in ep_remove_wait_queue. > > > > > > But poll wakeup function pollwake() doesn't do that. To fix the > > > problem, we empty waitqueue and set wait_address is NULL in pollwake(= ) when > > > key is POLLFREE. otherwise in remove_wait_queue, which is similar to > > > epoll(). > > > > > > Fixes: 0e94682b73bf ("psi: introduce psi monitor") > > > Suggested-by: Suren Baghdasaryan > > > Link: https://lore.kernel.org/all/CAJuCfpEoCRHkJF-=3D1Go9E94wchB4BzwQ= 1E3vHGWxNe+tEmSJoA@mail.gmail.com/#t > > > Signed-off-by: Lu Jialin > > > --- > > > v2: correct commit msg and title suggested by Suren Baghdasaryan > > > --- > > > fs/select.c | 20 +++++++++++++++++++- > > > 1 file changed, 19 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/select.c b/fs/select.c > > > index 0ee55af1a55c..e64c7b4e9959 100644 > > > --- a/fs/select.c > > > +++ b/fs/select.c > > > @@ -132,7 +132,17 @@ EXPORT_SYMBOL(poll_initwait); > > > > > > static void free_poll_entry(struct poll_table_entry *entry) > > > { > > > - remove_wait_queue(entry->wait_address, &entry->wait); > > > + wait_queue_head_t *whead; > > > + > > > + rcu_read_lock(); > > > + /* If it is cleared by POLLFREE, it should be rcu-safe. > > > + * If we read NULL we need a barrier paired with smp_store_rele= ase() > > > + * in pollwake(). > > > + */ > > > + whead =3D smp_load_acquire(&entry->wait_address); > > > + if (whead) > > > + remove_wait_queue(whead, &entry->wait); > > > + rcu_read_unlock(); > > > fput(entry->filp); > > > } > > > > > > @@ -215,6 +225,14 @@ static int pollwake(wait_queue_entry_t *wait, un= signed mode, int sync, void *key > > > entry =3D container_of(wait, struct poll_table_entry, wait); > > > if (key && !(key_to_poll(key) & entry->key)) > > > return 0; > > > + if (key_to_poll(key) & POLLFREE) { > > > + list_del_init(&wait->entry); > > > + /* wait_address !=3DNULL protects us from the race with > > > + * poll_freewait(). > > > + */ > > > + smp_store_release(&entry->wait_address, NULL); > > > + return 0; > > > + } > > > return __pollwake(wait, mode, sync, key); > > > > I don't understand why this patch is needed. > > > > The last time I looked at POLLFREE, it is only needed because of asynch= ronous > > polls. See my explanation in the commit message of commit 50252e4b5e98= 9ce6. > > Ah, I missed that. Thanks for the correction. > > > > > In summary, POLLFREE solves the problem of polled waitqueues whose life= time is > > tied to the current task rather than to the file being polled. Also re= fer to > > the comment above wake_up_pollfree(), which mentions this. > > > > fs/select.c is synchronous polling, not asynchronous. Therefore, it sh= ould not > > need to handle POLLFREE. > > > > If there's actually a bug here, most likely it's a bug in psi_trigger_p= oll() > > where it is using a waitqueue whose lifetime is tied to neither the cur= rent task > > nor the file being polled. That needs to be fixed. > > Yeah. We discussed this issue in > https://lore.kernel.org/all/CAJuCfpFb0J5ZwO6kncjRG0_4jQLXUy-_dicpH5uGiWP8= aKYEJQ@mail.gmail.com > and the root cause is that cgroup_file_release() where > psi_trigger_destroy() is called is not tied to the cgroup file's real > lifetime (see my analysis here: > https://lore.kernel.org/all/CAJuCfpFZ3B4530TgsSHqp5F_gwfrDujwRYewKReJru= =3D=3DMdEHQg@mail.gmail.com/#t). > I guess it's time to do a deeper surgery and figure out a way to call > psi_trigger_destroy() when the polled cgroup file is actually being > destroyed. I'll take a closer look into this later today. > A fix will likely require some cgroup or kernfs code changes, so > CC'ing Tejun for visibility. > Thanks, > Suren. > > > > > - Eric