Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp2494745rdb; Fri, 8 Dec 2023 09:39:31 -0800 (PST) X-Google-Smtp-Source: AGHT+IEfScPtJyVThnytvDF85SDCLm3u7Yv4cbLTnwihqFs1lv+fBYhIXFarJGkI9GoxIkCpwK2b X-Received: by 2002:a17:902:f682:b0:1d0:45bb:aa11 with SMTP id l2-20020a170902f68200b001d045bbaa11mr495678plg.4.1702057171428; Fri, 08 Dec 2023 09:39:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702057171; cv=none; d=google.com; s=arc-20160816; b=mO82xnShkwytz2U3GPkWEL9PcvYlAzDLGzXEVyl7rqew9gLahL+Bcoq/XWfF4JDjOQ MpgaLn9GJkA688fzdbD2mim77h/gSb0oMHrLpB8ClM1cJ+PE9RQJBxDR/4527L+p1ofW E89nFpKgGHx2w2bTWlQeH6VtukVvVoj5xq4To4yrEaLNVxvAgZOHXF7hISEB3qp6On6k 2F7AKvDcE1Lo0l8hdljNVwpWyN1h6iMeDNXo/VL9KxUetunOqZ44Yo+BODju4AWAqlfV AuhS0LCTM4sya2ZYb1jwJddM7aJPY0xc2hfTt5N48xBy81/o2l2y729ALa6fZoMsho+L VYXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :feedback-id:references:in-reply-to:message-id:subject:cc:from:to :date:dkim-signature; bh=HY9Md4VHLYQT5RpanAdJRLZSL23/aGNw4oFWZPZrWZg=; fh=yGC6hLtEKxG/qAMK6tCmLY72Pr3WjPw6U6qgCoktHOQ=; b=IZhUVgmULWkWgekp+EZJdY4xJlH9DQiT+qlG61ONNV/XF9NLWemoNPCKLbA0i/ler2 UuEs5k4UegzYwML6rw78nKRiELL8KtMw+jjFYJ6oiBQKIgtBh8DeYjyIgB13O4Y761fI mT4VzaQosTtkipd+av3Iijks1cgEnuwAGBqZbIqbm8m+PwD2iceWeGC5zhJqi7wmf5kU ph+Aqhen7nJnv7OGBF6os24JgOtri8fn6XghjyEJwDs0Nlq+ilmBDyftOX13RXU97vHL 9mBTbUl3RBop2bd+p78pBQ1k94SfhcEJF84IZSXKBxoxyECXRNLarhx2iIzbFv/Tr7vu D1+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@proton.me header.s=ixvrilbdlnczzba4dtnifmhobu.protonmail header.b=AunkrxB1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=proton.me Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id jw19-20020a170903279300b001cfabbeeda5si1840518plb.291.2023.12.08.09.39.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 09:39:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@proton.me header.s=ixvrilbdlnczzba4dtnifmhobu.protonmail header.b=AunkrxB1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=proton.me Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 4A1238085990; Fri, 8 Dec 2023 09:39:28 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1574139AbjLHRjJ (ORCPT + 99 others); Fri, 8 Dec 2023 12:39:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233752AbjLHRjH (ORCPT ); Fri, 8 Dec 2023 12:39:07 -0500 Received: from mail-40131.protonmail.ch (mail-40131.protonmail.ch [185.70.40.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF752123 for ; Fri, 8 Dec 2023 09:39:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=proton.me; s=ixvrilbdlnczzba4dtnifmhobu.protonmail; t=1702057148; x=1702316348; bh=HY9Md4VHLYQT5RpanAdJRLZSL23/aGNw4oFWZPZrWZg=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=AunkrxB1GjMKQMft60KVeeD1E0BRxc4RbOJeL56qlByAizAea7/v01nZmsdEnQwWK 1kOzObe2JVU16Whu9xn03azaQAmOh4KzAnPA1O2f6mPVbH1yRupswNHYGEL/VVzgoU WPhNLURfbPFTKDUTLhPIFZcfK6DIOufo+fXUZB9lmpabKkuAyXqqwTLRzNntoPN5jl ft5a+v5z+a0qy4KIStgL7RXLCcBwAU2qa0yXy9cUUyK2UURBjtkEbKD2D7/w/hJfSD DhcckwiAV4aHPbj8UBNAaPSbiwyl3qk1AIJ6fOSGkYElhI+zduseym2AbPJLAU7T34 Ii3g5STYKFycQ== Date: Fri, 08 Dec 2023 17:39:01 +0000 To: Alice Ryhl From: Benno Lossin Cc: Miguel Ojeda , Alex Gaynor , Wedson Almeida Filho , Boqun Feng , Gary Guo , =?utf-8?Q?Bj=C3=B6rn_Roy_Baron?= , Andreas Hindborg , Peter Zijlstra , Alexander Viro , Christian Brauner , Greg Kroah-Hartman , =?utf-8?Q?Arve_Hj=C3=B8nnev=C3=A5g?= , Todd Kjos , Martijn Coenen , Joel Fernandes , Carlos Llamas , Suren Baghdasaryan , Dan Williams , Kees Cook , Matthew Wilcox , Thomas Gleixner , Daniel Xu , linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH v2 6/7] rust: file: add `DeferredFdCloser` Message-ID: In-Reply-To: <20231206-alice-file-v2-6-af617c0d9d94@google.com> References: <20231206-alice-file-v2-0-af617c0d9d94@google.com> <20231206-alice-file-v2-6-af617c0d9d94@google.com> Feedback-ID: 71780778:user:proton MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 08 Dec 2023 09:39:28 -0800 (PST) On 12/6/23 12:59, Alice Ryhl wrote: > + /// Schedule a task work that closes the file descriptor when this t= ask returns to userspace. > + /// > + /// Fails if this is called from a context where we cannot run work = when returning to > + /// userspace. (E.g., from a kthread.) > + pub fn close_fd(self, fd: u32) -> Result<(), DeferredFdCloseError> { > + use bindings::task_work_notify_mode_TWA_RESUME as TWA_RESUME; > + > + // In this method, we schedule the task work before closing the = file. This is because > + // scheduling a task work is fallible, and we need to know wheth= er it will fail before we > + // attempt to close the file. > + > + // SAFETY: Getting a pointer to current is always safe. > + let current =3D unsafe { bindings::get_current() }; > + > + // SAFETY: Accessing the `flags` field of `current` is always sa= fe. > + let is_kthread =3D (unsafe { (*current).flags } & bindings::PF_K= THREAD) !=3D 0; Since Boqun brought to my attention that we already have a wrapper for `get_current()`, how about you use it here as well? > + if is_kthread { > + return Err(DeferredFdCloseError::TaskWorkUnavailable); > + } > + > + // This disables the destructor of the box, so the allocation is= not cleaned up > + // automatically below. > + let inner =3D Box::into_raw(self.inner); Importantly this also lifts the uniqueness requirement (maybe add this to the comment?). > + > + // The `callback_head` field is first in the struct, so this cas= t correctly gives us a > + // pointer to the field. > + let callback_head =3D inner.cast::(); > + // SAFETY: This pointer offset operation does not go out-of-boun= ds. > + let file_field =3D unsafe { core::ptr::addr_of_mut!((*inner).fil= e) }; > + > + // SAFETY: The `callback_head` pointer is compatible with the `d= o_close_fd` method. Also, `callback_head` is valid, since it is derived from... > + unsafe { bindings::init_task_work(callback_head, Some(Self::do_c= lose_fd)) }; > + // SAFETY: The `callback_head` pointer points at a valid and ful= ly initialized task work > + // that is ready to be scheduled. > + // > + // If the task work gets scheduled as-is, then it will be a no-o= p. However, we will update > + // the file pointer below to tell it which file to fput. > + let res =3D unsafe { bindings::task_work_add(current, callback_h= ead, TWA_RESUME) }; > + > + if res !=3D 0 { > + // SAFETY: Scheduling the task work failed, so we still have= ownership of the box, so > + // we may destroy it. > + unsafe { drop(Box::from_raw(inner)) }; > + > + return Err(DeferredFdCloseError::TaskWorkUnavailable); Just curious, what could make the `task_work_add` call fail? I imagine an OOM situation, but is that all? > + } > + > + // SAFETY: Just an FFI call. This is safe no matter what `fd` is= . I took a look at the C code and there I found this comment: /* * variant of close_fd that gets a ref on the file for later fput. * The caller must ensure that filp_close() called on the file. */ And while you do call `filp_close` later, this seems like a safety requirement to me. Also, you do not call it when `file` is null, which I imagine to be fine, but I do not know that since the C comment does not cover that case. > + let file =3D unsafe { bindings::close_fd_get_file(fd) }; > + if file.is_null() { > + // We don't clean up the task work since that might be expen= sive if the task work queue > + // is long. Just let it execute and let it clean up for itse= lf. > + return Err(DeferredFdCloseError::BadFd); > + } > + > + // SAFETY: The `file` pointer points at a valid file. > + unsafe { bindings::get_file(file) }; > + > + // SAFETY: Due to the above `get_file`, even if the current task= holds an `fdget` to > + // this file right now, the refcount will not drop to zero until= after it is released > + // with `fdput`. This is because when using `fdget`, you must al= ways use `fdput` before Shouldn't this be "the refcount will not drop to zero until after it is released with `fput`."? Why is this the SAFETY comment for `filp_close`? I am not understanding the requirement on that function that needs this. This seems more a justification for accessing `file` inside `do_close_fd`. In which case I think it would be better to make it a type invariant of `DeferredFdCloserInner`. > + // returning to userspace, and our task work runs after any `fdg= et` users have returned > + // to userspace. > + // > + // Note: fl_owner_t is currently a void pointer. > + unsafe { bindings::filp_close(file, (*current).files as bindings= ::fl_owner_t) }; > + > + // We update the file pointer that the task work is supposed to = fput. > + // > + // SAFETY: Task works are executed on the current thread once we= return to userspace, so > + // this write is guaranteed to happen before `do_close_fd` is ca= lled, which means that a > + // race is not possible here. > + // > + // It's okay to pass this pointer to the task work, since we jus= t acquired a refcount with > + // the previous call to `get_file`. Furthermore, the refcount wi= ll not drop to zero during > + // an `fdget` call, since we defer the `fput` until after return= ing to userspace. > + unsafe { *file_field =3D file }; A synchronization question: who guarantees that this write is actually available to the cpu that executes `do_close_fd`? Is there some synchronization run when returning to userspace? > + > + Ok(()) > + } > + > + // SAFETY: This function is an implementation detail of `close_fd`, = so its safety comments > + // should be read in extension of that method. Why not use this?: - `inner` is a valid pointer to the `callback_head` field of a valid `DeferredFdCloserInner`. - `inner` has exclusive access to the pointee and owns the allocation. - `inner` originates from a call to `Box::into_raw`. > + unsafe extern "C" fn do_close_fd(inner: *mut bindings::callback_head= ) { > + // SAFETY: In `close_fd` we use this method together with a poin= ter that originates from a > + // `Box`, and we have just been given own= ership of that allocation. > + let inner =3D unsafe { Box::from_raw(inner as *mut DeferredFdClo= serInner) }; Use `.cast()`. > + if !inner.file.is_null() { > + // SAFETY: This drops a refcount we acquired in `close_fd`. = Since this callback runs in > + // a task work after we return to userspace, it is guarantee= d that the current thread > + // doesn't hold this file with `fdget`, as `fdget` must be r= eleased before returning to > + // userspace. > + unsafe { bindings::fput(inner.file) }; > + } > + // Free the allocation. > + drop(inner); > + } > +} > + > +/// Represents a failure to close an fd in a deferred manner. > +#[derive(Copy, Clone, Eq, PartialEq)] > +pub enum DeferredFdCloseError { > + /// Closing the fd failed because we were unable to schedule a task = work. > + TaskWorkUnavailable, > + /// Closing the fd failed because the fd does not exist. > + BadFd, > +} > + > +impl From for Error { > + fn from(err: DeferredFdCloseError) -> Error { > + match err { > + DeferredFdCloseError::TaskWorkUnavailable =3D> ESRCH, This error reads "No such process", I am not sure if that is the best way to express the problem in that situation. I took a quick look at the other error codes, but could not find a better fit. Do you have any better ideas? Or is this the error that C binder uses? --=20 Cheers, Benno > + DeferredFdCloseError::BadFd =3D> EBADF, > + } > + } > +}