Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp807550rdg; Fri, 13 Oct 2023 01:24:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHUFxpyTAfoGy4IPUAhVZ3pLyKGg6qFvxV927qko8IBpXrzBGYlfrLPur5/3F6t4bl2GyRA X-Received: by 2002:a05:6a20:12c6:b0:174:63a9:293 with SMTP id v6-20020a056a2012c600b0017463a90293mr6113495pzg.48.1697185483268; Fri, 13 Oct 2023 01:24:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697185483; cv=none; d=google.com; s=arc-20160816; b=XAUZGj0pwWoWpuZHTw0Fc6NO/tVqW38ljjY/txt36+kew4Dlo7zqOWVtcToi04hdst pS6E96+CeOOBhOT67mi4eqktecrkRJ2J8yNMKwq7Rs0CaDOd9sE1Hr0j3D+YwSoRCbxi Ww5uG+Lq/BE6wE0kKf3FWd0Qyc6fuKklt+lMy++IQKh0bQrTP8AvcSPMXxT4Y0VdlJJu aFVifStL3JL3p9wYDIyH0/GQOrs9XDIAcLCM0wB8ResgoWGIRuTtzCjLJ75WCUgK7d3X KkNjbysAIAy4cz1PvMxPP39oS+BH6uOzJwk28PHhjSnXLvk/QgF9eDt9G8QOrGo4lRlg /BJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=gvi2E06S5SysxtE/eKhRWMlO/lHPSO7mwuig/MWM9gM=; fh=KMYGxpzHsHhvrQ88/Mb3ukJHrmJrxxavXsZqoVuWon8=; b=AbHeI4CrA6gP5tQw6y79jMP+2SNDptnSo00cxFeLx4P5Cf48pX9R85ayrt6IGz//qv /ZEuUfQAckwdj2S4hDbFApfrwcyRN1hSFI4IIRrBE93FzM3S0Nll3Ijc5OEum1E5HcmV hcso0qFUQN5yMsmrJaoC5L0uQBcxSFTtbEBlYL9yp0RrAz2ArQewfdCzS03e2G40c0vJ yGSzA7SYHb8QP9jhJVA0ersNEvkxkOOETMWC9JeOgAGGG7+rR4IGf5FR6wzvTsHfnarf B62GLD4TcFIDq799IG2Czsnr/6Ur5RLS5pHujvJBHVCiiy3xtzpNOOf90TuNq+CL2vSY TgGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=thtYiLb+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id nh3-20020a17090b364300b002792440fc3bsi2315709pjb.185.2023.10.13.01.24.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Oct 2023 01:24:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=thtYiLb+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id A7FA880BB1FB; Fri, 13 Oct 2023 01:24:39 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230025AbjJMIY0 (ORCPT + 99 others); Fri, 13 Oct 2023 04:24:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48244 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229939AbjJMIYZ (ORCPT ); Fri, 13 Oct 2023 04:24:25 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6162EB8; Fri, 13 Oct 2023 01:24:24 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DFCAFC433C7; Fri, 13 Oct 2023 08:24:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697185464; bh=QqmQxaKl/YOc8772F3o6ch9rZJ0UYhPVI1Fy6DaUYBM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=thtYiLb+x5/ppF+9l7rDIvcn10/GWovnyPX4SvtoxviY0BfdzBI7fcOfvJvhBzEHE +0PcTJZF2L+Tn0IErma7NxaN+4qlExTpwAQLICr21qaNRsBhmVnTcv/rRgIpHLy5tC 0XjIMLiMk4ENAjWJiSZPMg5Xw0XjzmPhU9yKcEYrjVezQsav7eAz5qq50bY97Eg09H RPAXc5IVHf3JjSZOYbjDX8rMexx6KcR5MtDk44gOcRoFFKoEGeOhbVAd7PEQ5yhgjh GBSfYon3N8hO3D+VBe5ao1tu3V8uP5v26XuK88xbfFkoNtMn1FvGY6CPxftRZbY0kd mux2g0sdQGGiA== Date: Fri, 13 Oct 2023 10:24:19 +0200 From: Christian Brauner To: Dan Clash Cc: audit@vger.kernel.org, io-uring@vger.kernel.org, linux-kernel@vger.kernel.org, paul@paul-moore.com, axboe@kernel.dk, linux-fsdevel@vger.kernel.org, dan.clash@microsoft.com Subject: Re: [PATCH] audit,io_uring: io_uring openat triggers audit reference count underflow Message-ID: <20231013-insofern-gegolten-75ca48b24cf5@brauner> References: <20231012215518.GA4048@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20231012215518.GA4048@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 13 Oct 2023 01:24:39 -0700 (PDT) On Thu, Oct 12, 2023 at 02:55:18PM -0700, Dan Clash wrote: > An io_uring openat operation can update an audit reference count > from multiple threads resulting in the call trace below. > > A call to io_uring_submit() with a single openat op with a flag of > IOSQE_ASYNC results in the following reference count updates. > > These first part of the system call performs two increments that do not race. > > do_syscall_64() > __do_sys_io_uring_enter() > io_submit_sqes() > io_openat_prep() > __io_openat_prep() > getname() > getname_flags() /* update 1 (increment) */ > __audit_getname() /* update 2 (increment) */ > > The openat op is queued to an io_uring worker thread which starts the > opportunity for a race. The system call exit performs one decrement. > > do_syscall_64() > syscall_exit_to_user_mode() > syscall_exit_to_user_mode_prepare() > __audit_syscall_exit() > audit_reset_context() > putname() /* update 3 (decrement) */ > > The io_uring worker thread performs one increment and two decrements. > These updates can race with the system call decrement. > > io_wqe_worker() > io_worker_handle_work() > io_wq_submit_work() > io_issue_sqe() > io_openat() > io_openat2() > do_filp_open() > path_openat() > __audit_inode() /* update 4 (increment) */ > putname() /* update 5 (decrement) */ > __audit_uring_exit() > audit_reset_context() > putname() /* update 6 (decrement) */ > > The fix is to change the refcnt member of struct audit_names > from int to atomic_t. > > kernel BUG at fs/namei.c:262! > Call Trace: > ... > ? putname+0x68/0x70 > audit_reset_context.part.0.constprop.0+0xe1/0x300 > __audit_uring_exit+0xda/0x1c0 > io_issue_sqe+0x1f3/0x450 > ? lock_timer_base+0x3b/0xd0 > io_wq_submit_work+0x8d/0x2b0 > ? __try_to_del_timer_sync+0x67/0xa0 > io_worker_handle_work+0x17c/0x2b0 > io_wqe_worker+0x10a/0x350 > > Cc: > Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@MW2PR2101MB1033.namprd21.prod.outlook.com/ > Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") > Signed-off-by: Dan Clash > --- > fs/namei.c | 9 +++++---- > include/linux/fs.h | 2 +- > kernel/auditsc.c | 8 ++++---- > 3 files changed, 10 insertions(+), 9 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 567ee547492b..94565bd7e73f 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -188,7 +188,7 @@ getname_flags(const char __user *filename, int flags, int *empty) > } > } > > - result->refcnt = 1; > + atomic_set(&result->refcnt, 1); > /* The empty path is special. */ > if (unlikely(!len)) { > if (empty) > @@ -249,7 +249,7 @@ getname_kernel(const char * filename) > memcpy((char *)result->name, filename, len); > result->uptr = NULL; > result->aname = NULL; > - result->refcnt = 1; > + atomic_set(&result->refcnt, 1); > audit_getname(result); > > return result; > @@ -261,9 +261,10 @@ void putname(struct filename *name) > if (IS_ERR(name)) > return; > > - BUG_ON(name->refcnt <= 0); > + if (WARN_ON_ONCE(!atomic_read(&name->refcnt))) > + return; > > - if (--name->refcnt > 0) > + if (!atomic_dec_and_test(&name->refcnt)) > return; Fine by me. I'd write this as: count = atomic_dec_if_positive(&name->refcnt); if (WARN_ON_ONCE(unlikely(count < 0)) return; if (count > 0) return;