This patchset contains the file abstractions needed by the Rust
implementation of the Binder driver.
Please see the Rust Binder RFC for usage examples:
https://lore.kernel.org/rust-for-linux/[email protected]/
Users of "rust: file: add Rust abstraction for `struct file`":
[PATCH RFC 02/20] rust_binder: add binderfs support to Rust binder
[PATCH RFC 03/20] rust_binder: add threading support
Users of "rust: cred: add Rust abstraction for `struct cred`":
[PATCH RFC 05/20] rust_binder: add nodes and context managers
[PATCH RFC 06/20] rust_binder: add oneway transactions
[PATCH RFC 11/20] rust_binder: send nodes in transaction
[PATCH RFC 13/20] rust_binder: add BINDER_TYPE_FD support
Users of "rust: security: add abstraction for security_secid_to_secctx":
[PATCH RFC 06/20] rust_binder: add oneway transactions
Users of "rust: file: add `FileDescriptorReservation`":
[PATCH RFC 13/20] rust_binder: add BINDER_TYPE_FD support
[PATCH RFC 14/20] rust_binder: add BINDER_TYPE_FDA support
Users of "rust: file: add kuid getters":
[PATCH RFC 05/20] rust_binder: add nodes and context managers
[PATCH RFC 06/20] rust_binder: add oneway transactions
Users of "rust: file: add `DeferredFdCloser`":
[PATCH RFC 14/20] rust_binder: add BINDER_TYPE_FDA support
Users of "rust: file: add abstraction for `poll_table`":
[PATCH RFC 07/20] rust_binder: add epoll support
This patchset has some uses of read_volatile in place of READ_ONCE.
Please see the following rfc for context on this:
https://lore.kernel.org/all/[email protected]/
This was previously sent as an rfc:
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Alice Ryhl <[email protected]>
---
Alice Ryhl (4):
rust: security: add abstraction for security_secid_to_secctx
rust: file: add `Kuid` wrapper
rust: file: add `DeferredFdCloser`
rust: file: add abstraction for `poll_table`
Wedson Almeida Filho (3):
rust: file: add Rust abstraction for `struct file`
rust: cred: add Rust abstraction for `struct cred`
rust: file: add `FileDescriptorReservation`
rust/bindings/bindings_helper.h | 9 ++
rust/bindings/lib.rs | 1 +
rust/helpers.c | 94 +++++++++++
rust/kernel/cred.rs | 73 +++++++++
rust/kernel/file.rs | 345 ++++++++++++++++++++++++++++++++++++++++
rust/kernel/file/poll_table.rs | 97 +++++++++++
rust/kernel/lib.rs | 3 +
rust/kernel/security.rs | 78 +++++++++
rust/kernel/sync/condvar.rs | 2 +-
rust/kernel/task.rs | 71 ++++++++-
10 files changed, 771 insertions(+), 2 deletions(-)
---
base-commit: 98b1cc82c4affc16f5598d4fa14b1858671b2263
change-id: 20231123-alice-file-525b98e8a724
Best regards,
--
Alice Ryhl <[email protected]>
From: Wedson Almeida Filho <[email protected]>
Add a wrapper around `struct cred` called `Credential`, and provide
functionality to get the `Credential` associated with a `File`.
Rust Binder must check the credentials of processes when they attempt to
perform various operations, and these checks usually take a
`&Credential` as parameter. The security_binder_set_context_mgr function
would be one example. This patch is necessary to access these security_*
methods from Rust.
Signed-off-by: Wedson Almeida Filho <[email protected]>
Co-developed-by: Alice Ryhl <[email protected]>
Signed-off-by: Alice Ryhl <[email protected]>
---
rust/bindings/bindings_helper.h | 1 +
rust/helpers.c | 13 +++++++++
rust/kernel/cred.rs | 64 +++++++++++++++++++++++++++++++++++++++++
rust/kernel/file.rs | 16 +++++++++++
rust/kernel/lib.rs | 1 +
5 files changed, 95 insertions(+)
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index beed3ef1fbc3..6d1bd2229aab 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -7,6 +7,7 @@
*/
#include <kunit/test.h>
+#include <linux/cred.h>
#include <linux/errname.h>
#include <linux/file.h>
#include <linux/fs.h>
diff --git a/rust/helpers.c b/rust/helpers.c
index 03141a3608a4..10ed69f76424 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -23,6 +23,7 @@
#include <kunit/test-bug.h>
#include <linux/bug.h>
#include <linux/build_bug.h>
+#include <linux/cred.h>
#include <linux/err.h>
#include <linux/errname.h>
#include <linux/fs.h>
@@ -164,6 +165,18 @@ struct file *rust_helper_get_file(struct file *f)
}
EXPORT_SYMBOL_GPL(rust_helper_get_file);
+const struct cred *rust_helper_get_cred(const struct cred *cred)
+{
+ return get_cred(cred);
+}
+EXPORT_SYMBOL_GPL(rust_helper_get_cred);
+
+void rust_helper_put_cred(const struct cred *cred)
+{
+ put_cred(cred);
+}
+EXPORT_SYMBOL_GPL(rust_helper_put_cred);
+
/*
* `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
* use it in contexts where Rust expects a `usize` like slice (array) indices.
diff --git a/rust/kernel/cred.rs b/rust/kernel/cred.rs
new file mode 100644
index 000000000000..497058ec89bb
--- /dev/null
+++ b/rust/kernel/cred.rs
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Credentials management.
+//!
+//! C header: [`include/linux/cred.h`](../../../../include/linux/cred.h)
+//!
+//! Reference: <https://www.kernel.org/doc/html/latest/security/credentials.html>
+
+use crate::{
+ bindings,
+ types::{AlwaysRefCounted, Opaque},
+};
+
+/// Wraps the kernel's `struct cred`.
+///
+/// # Invariants
+///
+/// Instances of this type are always ref-counted, that is, a call to `get_cred` ensures that the
+/// allocation remains valid at least until the matching call to `put_cred`.
+#[repr(transparent)]
+pub struct Credential(pub(crate) Opaque<bindings::cred>);
+
+// SAFETY: By design, the only way to access a `Credential` is via an immutable reference or an
+// `ARef`. This means that the only situation in which a `Credential` can be accessed mutably is
+// when the refcount drops to zero and the destructor runs. It is safe for that to happen on any
+// thread, so it is ok for this type to be `Send`.
+unsafe impl Send for Credential {}
+
+// SAFETY: It's OK to access `Credential` through shared references from other threads because
+// we're either accessing properties that don't change or that are properly synchronised by C code.
+unsafe impl Sync for Credential {}
+
+impl Credential {
+ /// Creates a reference to a [`Credential`] from a valid pointer.
+ ///
+ /// # Safety
+ ///
+ /// The caller must ensure that `ptr` is valid and remains valid for the lifetime of the
+ /// returned [`Credential`] reference.
+ pub unsafe fn from_ptr<'a>(ptr: *const bindings::cred) -> &'a Credential {
+ // SAFETY: The safety requirements guarantee the validity of the dereference, while the
+ // `Credential` type being transparent makes the cast ok.
+ unsafe { &*ptr.cast() }
+ }
+
+ /// Returns the effective UID of the given credential.
+ pub fn euid(&self) -> bindings::kuid_t {
+ // SAFETY: By the type invariant, we know that `self.0` is valid.
+ unsafe { (*self.0.get()).euid }
+ }
+}
+
+// SAFETY: The type invariants guarantee that `Credential` is always ref-counted.
+unsafe impl AlwaysRefCounted for Credential {
+ fn inc_ref(&self) {
+ // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+ unsafe { bindings::get_cred(self.0.get()) };
+ }
+
+ unsafe fn dec_ref(obj: core::ptr::NonNull<Self>) {
+ // SAFETY: The safety requirements guarantee that the refcount is nonzero.
+ unsafe { bindings::put_cred(obj.cast().as_ptr()) };
+ }
+}
diff --git a/rust/kernel/file.rs b/rust/kernel/file.rs
index ee4ec8b919af..f1f71c3d97e2 100644
--- a/rust/kernel/file.rs
+++ b/rust/kernel/file.rs
@@ -7,6 +7,7 @@
use crate::{
bindings,
+ cred::Credential,
error::{code::*, Error, Result},
types::{ARef, AlwaysRefCounted, Opaque},
};
@@ -138,6 +139,21 @@ pub unsafe fn from_ptr<'a>(ptr: *const bindings::file) -> &'a File {
unsafe { &*ptr.cast() }
}
+ /// Returns the credentials of the task that originally opened the file.
+ pub fn cred(&self) -> &Credential {
+ // This `read_volatile` is intended to correspond to a READ_ONCE call.
+ //
+ // SAFETY: The file is valid because the shared reference guarantees a nonzero refcount.
+ //
+ // TODO: Replace with `read_once` when available on the Rust side.
+ let ptr = unsafe { core::ptr::addr_of!((*self.0.get()).f_cred).read_volatile() };
+
+ // SAFETY: The signature of this function ensures that the caller will only access the
+ // returned credential while the file is still valid, and the credential must stay valid
+ // while the file is valid.
+ unsafe { Credential::from_ptr(ptr) }
+ }
+
/// Returns the flags associated with the file.
///
/// The flags are a combination of the constants in [`flags`].
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index ce9abceab784..097fe9bb93ed 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -33,6 +33,7 @@
#[cfg(not(testlib))]
mod allocator;
mod build_assert;
+pub mod cred;
pub mod error;
pub mod file;
pub mod init;
--
2.43.0.rc1.413.gea7ed67945-goog
Adds an abstraction for viewing the string representation of a security
context.
This is needed by Rust Binder because it has feature where a process can
view the string representation of the security context for incoming
transactions. The process can use that to authenticate incoming
transactions, and since the feature is provided by the kernel, the
process can trust that the security context is legitimate.
Signed-off-by: Alice Ryhl <[email protected]>
---
rust/bindings/bindings_helper.h | 1 +
rust/helpers.c | 21 +++++++++
rust/kernel/cred.rs | 8 ++++
rust/kernel/lib.rs | 1 +
rust/kernel/security.rs | 78 +++++++++++++++++++++++++++++++++
5 files changed, 109 insertions(+)
create mode 100644 rust/kernel/security.rs
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 6d1bd2229aab..81b13a953eae 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -11,6 +11,7 @@
#include <linux/errname.h>
#include <linux/file.h>
#include <linux/fs.h>
+#include <linux/security.h>
#include <linux/slab.h>
#include <linux/refcount.h>
#include <linux/wait.h>
diff --git a/rust/helpers.c b/rust/helpers.c
index 10ed69f76424..fd633d9db79a 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -30,6 +30,7 @@
#include <linux/mutex.h>
#include <linux/refcount.h>
#include <linux/sched/signal.h>
+#include <linux/security.h>
#include <linux/spinlock.h>
#include <linux/wait.h>
#include <linux/workqueue.h>
@@ -177,6 +178,26 @@ void rust_helper_put_cred(const struct cred *cred)
}
EXPORT_SYMBOL_GPL(rust_helper_put_cred);
+#ifndef CONFIG_SECURITY
+void rust_helper_security_cred_getsecid(const struct cred *c, u32 *secid)
+{
+ security_cred_getsecid(c, secid);
+}
+EXPORT_SYMBOL_GPL(rust_helper_security_cred_getsecid);
+
+int rust_helper_security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
+{
+ return security_secid_to_secctx(secid, secdata, seclen);
+}
+EXPORT_SYMBOL_GPL(rust_helper_security_secid_to_secctx);
+
+void rust_helper_security_release_secctx(char *secdata, u32 seclen)
+{
+ security_release_secctx(secdata, seclen);
+}
+EXPORT_SYMBOL_GPL(rust_helper_security_release_secctx);
+#endif
+
/*
* `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
* use it in contexts where Rust expects a `usize` like slice (array) indices.
diff --git a/rust/kernel/cred.rs b/rust/kernel/cred.rs
index 497058ec89bb..3794937b5294 100644
--- a/rust/kernel/cred.rs
+++ b/rust/kernel/cred.rs
@@ -43,6 +43,14 @@ pub unsafe fn from_ptr<'a>(ptr: *const bindings::cred) -> &'a Credential {
unsafe { &*ptr.cast() }
}
+ /// Get the id for this security context.
+ pub fn get_secid(&self) -> u32 {
+ let mut secid = 0;
+ // SAFETY: The invariants of this type ensures that the pointer is valid.
+ unsafe { bindings::security_cred_getsecid(self.0.get(), &mut secid) };
+ secid
+ }
+
/// Returns the effective UID of the given credential.
pub fn euid(&self) -> bindings::kuid_t {
// SAFETY: By the type invariant, we know that `self.0` is valid.
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 097fe9bb93ed..342cb02c495a 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -42,6 +42,7 @@
pub mod kunit;
pub mod prelude;
pub mod print;
+pub mod security;
mod static_assert;
#[doc(hidden)]
pub mod std_vendor;
diff --git a/rust/kernel/security.rs b/rust/kernel/security.rs
new file mode 100644
index 000000000000..69c10ed89a57
--- /dev/null
+++ b/rust/kernel/security.rs
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Linux Security Modules (LSM).
+//!
+//! C header: [`include/linux/security.h`](../../../../include/linux/security.h).
+
+use crate::{
+ bindings,
+ error::{to_result, Result},
+};
+
+/// A security context string.
+///
+/// The struct has the invariant that it always contains a valid security context.
+pub struct SecurityCtx {
+ secdata: *mut core::ffi::c_char,
+ seclen: usize,
+}
+
+impl SecurityCtx {
+ /// Get the security context given its id.
+ pub fn from_secid(secid: u32) -> Result<Self> {
+ let mut secdata = core::ptr::null_mut();
+ let mut seclen = 0;
+ // SAFETY: Just a C FFI call. The pointers are valid for writes.
+ unsafe {
+ to_result(bindings::security_secid_to_secctx(
+ secid,
+ &mut secdata,
+ &mut seclen,
+ ))?;
+ }
+
+ // If the above call did not fail, then we have a valid security
+ // context, so the invariants are not violated.
+ Ok(Self {
+ secdata,
+ seclen: usize::try_from(seclen).unwrap(),
+ })
+ }
+
+ /// Returns whether the security context is empty.
+ pub fn is_empty(&self) -> bool {
+ self.seclen == 0
+ }
+
+ /// Returns the length of this security context.
+ pub fn len(&self) -> usize {
+ self.seclen
+ }
+
+ /// Returns the bytes for this security context.
+ pub fn as_bytes(&self) -> &[u8] {
+ let mut ptr = self.secdata;
+ if ptr.is_null() {
+ // Many C APIs will use null pointers for strings of length zero, but
+ // `slice::from_raw_parts` doesn't allow the pointer to be null even if the length is
+ // zero. Replace the pointer with a dangling but non-null pointer in this case.
+ debug_assert_eq!(self.seclen, 0);
+ ptr = core::ptr::NonNull::dangling().as_ptr();
+ }
+
+ // SAFETY: The call to `security_secid_to_secctx` guarantees that the pointer is valid for
+ // `seclen` bytes. Furthermore, if the length is zero, then we have ensured that the
+ // pointer is not null.
+ unsafe { core::slice::from_raw_parts(ptr.cast(), self.seclen) }
+ }
+}
+
+impl Drop for SecurityCtx {
+ fn drop(&mut self) {
+ // SAFETY: This frees a pointer that came from a successful call to
+ // `security_secid_to_secctx`.
+ unsafe {
+ bindings::security_release_secctx(self.secdata, self.seclen as u32);
+ }
+ }
+}
--
2.43.0.rc1.413.gea7ed67945-goog
From: Wedson Almeida Filho <[email protected]>
Allow for the creation of a file descriptor in two steps: first, we
reserve a slot for it, then we commit or drop the reservation. The first
step may fail (e.g., the current process ran out of available slots),
but commit and drop never fail (and are mutually exclusive).
This is needed by Rust Binder when fds are sent from one process to
another. It has to be a two-step process to properly handle the case
where multiple fds are sent: The operation must fail or succeed
atomically, which we achieve by first reserving the fds we need, and
only installing the files once we have reserved enough fds to send the
files.
Fd reservations assume that the value of `current` does not change
between the call to get_unused_fd_flags and the call to fd_install (or
put_unused_fd). By not implementing the Send trait, this abstraction
ensures that the `FileDescriptorReservation` cannot be moved into a
different process.
Signed-off-by: Wedson Almeida Filho <[email protected]>
Co-developed-by: Alice Ryhl <[email protected]>
Signed-off-by: Alice Ryhl <[email protected]>
---
rust/kernel/file.rs | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 63 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/file.rs b/rust/kernel/file.rs
index f1f71c3d97e2..2186a6ea3f2f 100644
--- a/rust/kernel/file.rs
+++ b/rust/kernel/file.rs
@@ -11,7 +11,7 @@
error::{code::*, Error, Result},
types::{ARef, AlwaysRefCounted, Opaque},
};
-use core::ptr;
+use core::{marker::PhantomData, ptr};
/// Flags associated with a [`File`].
pub mod flags {
@@ -180,6 +180,68 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
}
}
+/// A file descriptor reservation.
+///
+/// This allows the creation of a file descriptor in two steps: first, we reserve a slot for it,
+/// then we commit or drop the reservation. The first step may fail (e.g., the current process ran
+/// out of available slots), but commit and drop never fail (and are mutually exclusive).
+///
+/// Dropping the reservation happens in the destructor of this type.
+///
+/// # Invariants
+///
+/// The fd stored in this struct must correspond to a reserved file descriptor of the current task.
+pub struct FileDescriptorReservation {
+ fd: u32,
+ /// Prevent values of this type from being moved to a different task.
+ ///
+ /// This is necessary because the C FFI calls assume that `current` is set to the task that
+ /// owns the fd in question.
+ _not_send_sync: PhantomData<*mut ()>,
+}
+
+impl FileDescriptorReservation {
+ /// Creates a new file descriptor reservation.
+ pub fn new(flags: u32) -> Result<Self> {
+ // SAFETY: FFI call, there are no safety requirements on `flags`.
+ let fd: i32 = unsafe { bindings::get_unused_fd_flags(flags) };
+ if fd < 0 {
+ return Err(Error::from_errno(fd));
+ }
+ Ok(Self {
+ fd: fd as _,
+ _not_send_sync: PhantomData,
+ })
+ }
+
+ /// Returns the file descriptor number that was reserved.
+ pub fn reserved_fd(&self) -> u32 {
+ self.fd
+ }
+
+ /// Commits the reservation.
+ ///
+ /// The previously reserved file descriptor is bound to `file`. This method consumes the
+ /// [`FileDescriptorReservation`], so it will not be usable after this call.
+ pub fn commit(self, file: ARef<File>) {
+ // SAFETY: `self.fd` was previously returned by `get_unused_fd_flags`, and `file.ptr` is
+ // guaranteed to have an owned ref count by its type invariants.
+ unsafe { bindings::fd_install(self.fd, file.0.get()) };
+
+ // `fd_install` consumes both the file descriptor and the file reference, so we cannot run
+ // the destructors.
+ core::mem::forget(self);
+ core::mem::forget(file);
+ }
+}
+
+impl Drop for FileDescriptorReservation {
+ fn drop(&mut self) {
+ // SAFETY: `self.fd` was returned by a previous call to `get_unused_fd_flags`.
+ unsafe { bindings::put_unused_fd(self.fd) };
+ }
+}
+
/// Represents the `EBADF` error code.
///
/// Used for methods that can only fail with `EBADF`.
--
2.43.0.rc1.413.gea7ed67945-goog
Adds a wrapper around `kuid_t` called `Kuid`. This allows us to define
various operations on kuids such as equality and current_euid. It also
lets us provide conversions from kuid into userspace values.
Rust Binder needs these operations because it needs to compare kuids for
equality, and it needs to tell userspace about the pid and uid of
incoming transactions.
Signed-off-by: Alice Ryhl <[email protected]>
---
rust/bindings/bindings_helper.h | 1 +
rust/helpers.c | 45 ++++++++++++++++++++++++++
rust/kernel/cred.rs | 5 +--
rust/kernel/task.rs | 71 ++++++++++++++++++++++++++++++++++++++++-
4 files changed, 119 insertions(+), 3 deletions(-)
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 81b13a953eae..700f01840188 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -11,6 +11,7 @@
#include <linux/errname.h>
#include <linux/file.h>
#include <linux/fs.h>
+#include <linux/pid_namespace.h>
#include <linux/security.h>
#include <linux/slab.h>
#include <linux/refcount.h>
diff --git a/rust/helpers.c b/rust/helpers.c
index fd633d9db79a..58e3a9dff349 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -142,6 +142,51 @@ void rust_helper_put_task_struct(struct task_struct *t)
}
EXPORT_SYMBOL_GPL(rust_helper_put_task_struct);
+kuid_t rust_helper_task_uid(struct task_struct *task)
+{
+ return task_uid(task);
+}
+EXPORT_SYMBOL_GPL(rust_helper_task_uid);
+
+kuid_t rust_helper_task_euid(struct task_struct *task)
+{
+ return task_euid(task);
+}
+EXPORT_SYMBOL_GPL(rust_helper_task_euid);
+
+#ifndef CONFIG_USER_NS
+uid_t rust_helper_from_kuid(struct user_namespace *to, kuid_t uid)
+{
+ return from_kuid(to, uid);
+}
+EXPORT_SYMBOL_GPL(rust_helper_from_kuid);
+#endif /* CONFIG_USER_NS */
+
+bool rust_helper_uid_eq(kuid_t left, kuid_t right)
+{
+ return uid_eq(left, right);
+}
+EXPORT_SYMBOL_GPL(rust_helper_uid_eq);
+
+kuid_t rust_helper_current_euid(void)
+{
+ return current_euid();
+}
+EXPORT_SYMBOL_GPL(rust_helper_current_euid);
+
+struct user_namespace *rust_helper_current_user_ns(void)
+{
+ return current_user_ns();
+}
+EXPORT_SYMBOL_GPL(rust_helper_current_user_ns);
+
+pid_t rust_helper_task_tgid_nr_ns(struct task_struct *tsk,
+ struct pid_namespace *ns)
+{
+ return task_tgid_nr_ns(tsk, ns);
+}
+EXPORT_SYMBOL_GPL(rust_helper_task_tgid_nr_ns);
+
struct kunit *rust_helper_kunit_get_current_test(void)
{
return kunit_get_current_test();
diff --git a/rust/kernel/cred.rs b/rust/kernel/cred.rs
index 3794937b5294..fbc749788bfa 100644
--- a/rust/kernel/cred.rs
+++ b/rust/kernel/cred.rs
@@ -8,6 +8,7 @@
use crate::{
bindings,
+ task::Kuid,
types::{AlwaysRefCounted, Opaque},
};
@@ -52,9 +53,9 @@ pub fn get_secid(&self) -> u32 {
}
/// Returns the effective UID of the given credential.
- pub fn euid(&self) -> bindings::kuid_t {
+ pub fn euid(&self) -> Kuid {
// SAFETY: By the type invariant, we know that `self.0` is valid.
- unsafe { (*self.0.get()).euid }
+ Kuid::from_raw(unsafe { (*self.0.get()).euid })
}
}
diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index b2299bc7ac1f..1a27b968a907 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -5,7 +5,12 @@
//! C header: [`include/linux/sched.h`](../../../../include/linux/sched.h).
use crate::{bindings, types::Opaque};
-use core::{marker::PhantomData, ops::Deref, ptr};
+use core::{
+ cmp::{Eq, PartialEq},
+ marker::PhantomData,
+ ops::Deref,
+ ptr,
+};
/// Returns the currently running task.
#[macro_export]
@@ -78,6 +83,12 @@ unsafe impl Sync for Task {}
/// The type of process identifiers (PIDs).
type Pid = bindings::pid_t;
+/// The type of user identifiers (UIDs).
+#[derive(Copy, Clone)]
+pub struct Kuid {
+ kuid: bindings::kuid_t,
+}
+
impl Task {
/// Returns a task reference for the currently executing task/thread.
///
@@ -132,12 +143,34 @@ pub fn pid(&self) -> Pid {
unsafe { *ptr::addr_of!((*self.0.get()).pid) }
}
+ /// Returns the UID of the given task.
+ pub fn uid(&self) -> Kuid {
+ // SAFETY: By the type invariant, we know that `self.0` is valid.
+ Kuid::from_raw(unsafe { bindings::task_uid(self.0.get()) })
+ }
+
+ /// Returns the effective UID of the given task.
+ pub fn euid(&self) -> Kuid {
+ // SAFETY: By the type invariant, we know that `self.0` is valid.
+ Kuid::from_raw(unsafe { bindings::task_euid(self.0.get()) })
+ }
+
/// Determines whether the given task has pending signals.
pub fn signal_pending(&self) -> bool {
// SAFETY: By the type invariant, we know that `self.0` is valid.
unsafe { bindings::signal_pending(self.0.get()) != 0 }
}
+ /// Returns the given task's pid in the current pid namespace.
+ pub fn pid_in_current_ns(&self) -> Pid {
+ // SAFETY: We know that `self.0.get()` is valid by the type invariant. The rest is just FFI
+ // calls.
+ unsafe {
+ let namespace = bindings::task_active_pid_ns(bindings::get_current());
+ bindings::task_tgid_nr_ns(self.0.get(), namespace)
+ }
+ }
+
/// Wakes up the task.
pub fn wake_up(&self) {
// SAFETY: By the type invariant, we know that `self.0.get()` is non-null and valid.
@@ -147,6 +180,42 @@ pub fn wake_up(&self) {
}
}
+impl Kuid {
+ /// Get the current euid.
+ pub fn current_euid() -> Kuid {
+ // SAFETY: Just an FFI call.
+ Self {
+ kuid: unsafe { bindings::current_euid() },
+ }
+ }
+
+ /// Create a `Kuid` given the raw C type.
+ pub fn from_raw(kuid: bindings::kuid_t) -> Self {
+ Self { kuid }
+ }
+
+ /// Turn this kuid into the raw C type.
+ pub fn into_raw(self) -> bindings::kuid_t {
+ self.kuid
+ }
+
+ /// Converts this kernel UID into a UID that userspace understands. Uses the namespace of the
+ /// current task.
+ pub fn into_uid_in_current_ns(self) -> bindings::uid_t {
+ // SAFETY: Just an FFI call.
+ unsafe { bindings::from_kuid(bindings::current_user_ns(), self.kuid) }
+ }
+}
+
+impl PartialEq for Kuid {
+ fn eq(&self, other: &Kuid) -> bool {
+ // SAFETY: Just an FFI call.
+ unsafe { bindings::uid_eq(self.kuid, other.kuid) }
+ }
+}
+
+impl Eq for Kuid {}
+
// SAFETY: The type invariants guarantee that `Task` is always ref-counted.
unsafe impl crate::types::AlwaysRefCounted for Task {
fn inc_ref(&self) {
--
2.43.0.rc1.413.gea7ed67945-goog
To close an fd from kernel space, we could call `ksys_close`. However,
if we do this to an fd that is held using `fdget`, then we may trigger a
use-after-free. Introduce a helper that can be used to close an fd even
if the fd is currently held with `fdget`. This is done by grabbing an
extra refcount to the file and dropping it in a task work once we return
to userspace.
This is necessary for Rust Binder because otherwise the user might try
to have Binder close its fd for /dev/binder, which would cause problems
as this happens inside an ioctl on /dev/binder, and ioctls hold the fd
using `fdget`.
Additional motivation can be found in commit 80cd795630d6 ("binder: fix
use-after-free due to ksys_close() during fdget()") and in the comments
on `binder_do_fd_close`.
If there is some way to detect whether an fd is currently held with
`fdget`, then this could be optimized to skip the allocation and task
work when this is not the case. Another possible optimization would be
to combine several fds into a single task work, since this is used with
fd arrays that might hold several fds.
That said, it might not be necessary to optimize it, because Rust Binder
has two ways to send fds: BINDER_TYPE_FD and BINDER_TYPE_FDA. With
BINDER_TYPE_FD, it is userspace's responsibility to close the fd, so
this mechanism is used only by BINDER_TYPE_FDA, but fd arrays are used
rarely these days.
Signed-off-by: Alice Ryhl <[email protected]>
---
rust/bindings/bindings_helper.h | 2 +
rust/helpers.c | 8 ++++
rust/kernel/file.rs | 84 ++++++++++++++++++++++++++++++++++++++++-
3 files changed, 93 insertions(+), 1 deletion(-)
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 700f01840188..c8daee341df6 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -9,6 +9,7 @@
#include <kunit/test.h>
#include <linux/cred.h>
#include <linux/errname.h>
+#include <linux/fdtable.h>
#include <linux/file.h>
#include <linux/fs.h>
#include <linux/pid_namespace.h>
@@ -17,6 +18,7 @@
#include <linux/refcount.h>
#include <linux/wait.h>
#include <linux/sched.h>
+#include <linux/task_work.h>
#include <linux/workqueue.h>
/* `bindgen` gets confused at certain things. */
diff --git a/rust/helpers.c b/rust/helpers.c
index 58e3a9dff349..d146bbf25aec 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -32,6 +32,7 @@
#include <linux/sched/signal.h>
#include <linux/security.h>
#include <linux/spinlock.h>
+#include <linux/task_work.h>
#include <linux/wait.h>
#include <linux/workqueue.h>
@@ -243,6 +244,13 @@ void rust_helper_security_release_secctx(char *secdata, u32 seclen)
EXPORT_SYMBOL_GPL(rust_helper_security_release_secctx);
#endif
+void rust_helper_init_task_work(struct callback_head *twork,
+ task_work_func_t func)
+{
+ init_task_work(twork, func);
+}
+EXPORT_SYMBOL_GPL(rust_helper_init_task_work);
+
/*
* `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
* use it in contexts where Rust expects a `usize` like slice (array) indices.
diff --git a/rust/kernel/file.rs b/rust/kernel/file.rs
index 2186a6ea3f2f..578ee307093f 100644
--- a/rust/kernel/file.rs
+++ b/rust/kernel/file.rs
@@ -11,7 +11,8 @@
error::{code::*, Error, Result},
types::{ARef, AlwaysRefCounted, Opaque},
};
-use core::{marker::PhantomData, ptr};
+use alloc::boxed::Box;
+use core::{alloc::AllocError, marker::PhantomData, mem, ptr};
/// Flags associated with a [`File`].
pub mod flags {
@@ -242,6 +243,87 @@ fn drop(&mut self) {
}
}
+/// Helper used for closing file descriptors in a way that is safe even if the file is currently
+/// held using `fdget`.
+///
+/// Additional motivation can be found in commit 80cd795630d6 ("binder: fix use-after-free due to
+/// ksys_close() during fdget()") and in the comments on `binder_do_fd_close`.
+pub struct DeferredFdCloser {
+ inner: Box<DeferredFdCloserInner>,
+}
+
+/// SAFETY: This just holds an allocation with no real content, so there's no safety issue with
+/// moving it across threads.
+unsafe impl Send for DeferredFdCloser {}
+unsafe impl Sync for DeferredFdCloser {}
+
+#[repr(C)]
+struct DeferredFdCloserInner {
+ twork: mem::MaybeUninit<bindings::callback_head>,
+ file: *mut bindings::file,
+}
+
+impl DeferredFdCloser {
+ /// Create a new [`DeferredFdCloser`].
+ pub fn new() -> Result<Self, AllocError> {
+ Ok(Self {
+ inner: Box::try_new(DeferredFdCloserInner {
+ twork: mem::MaybeUninit::uninit(),
+ file: core::ptr::null_mut(),
+ })?,
+ })
+ }
+
+ /// Schedule a task work that closes the file descriptor when this task returns to userspace.
+ pub fn close_fd(mut self, fd: u32) {
+ use bindings::task_work_notify_mode_TWA_RESUME as TWA_RESUME;
+
+ let file = unsafe { bindings::close_fd_get_file(fd) };
+ if file.is_null() {
+ // Nothing further to do. The allocation is freed by the destructor of `self.inner`.
+ return;
+ }
+
+ self.inner.file = file;
+
+ // SAFETY: Since `DeferredFdCloserInner` is `#[repr(C)]`, casting the pointers gives a
+ // pointer to the `twork` field.
+ let inner = Box::into_raw(self.inner) as *mut bindings::callback_head;
+
+ // SAFETY: Getting a pointer to current is always safe.
+ let current = unsafe { bindings::get_current() };
+ // SAFETY: The `file` pointer points at a valid file.
+ unsafe { bindings::get_file(file) };
+ // SAFETY: Due to the above `get_file`, even if the current task holds an `fdget` to
+ // this file right now, the refcount will not drop to zero until after it is released
+ // with `fdput`. This is because when using `fdget`, you must always use `fdput` before
+ // returning to userspace, and our task work runs after any `fdget` users have returned
+ // to userspace.
+ //
+ // Note: fl_owner_t is currently a void pointer.
+ unsafe { bindings::filp_close(file, (*current).files as bindings::fl_owner_t) };
+ // SAFETY: The `inner` pointer is compatible with the `do_close_fd` method.
+ unsafe { bindings::init_task_work(inner, Some(Self::do_close_fd)) };
+ // SAFETY: The `inner` pointer points at a valid and fully initialized task work that is
+ // ready to be scheduled.
+ unsafe { bindings::task_work_add(current, inner, TWA_RESUME) };
+ }
+
+ // SAFETY: This function is an implementation detail of `close_fd`, so its safety comments
+ // should be read in extension of that method.
+ unsafe extern "C" fn do_close_fd(inner: *mut bindings::callback_head) {
+ // SAFETY: In `close_fd` we use this method together with a pointer that originates from a
+ // `Box<DeferredFdCloserInner>`, and we have just been given ownership of that allocation.
+ let inner = unsafe { Box::from_raw(inner as *mut DeferredFdCloserInner) };
+ // SAFETY: This drops a refcount we acquired in `close_fd`. Since this callback runs in a
+ // task work after we return to userspace, it is guaranteed that the current thread doesn't
+ // hold this file with `fdget`, as `fdget` must be released before returning to userspace.
+ unsafe { bindings::fput(inner.file) };
+ // Free the allocation.
+ drop(inner);
+ }
+}
+
/// Represents the `EBADF` error code.
///
/// Used for methods that can only fail with `EBADF`.
--
2.43.0.rc1.413.gea7ed67945-goog
The existing `CondVar` abstraction is a wrapper around `wait_list`, but
it does not support all use-cases of the C `wait_list` type. To be
specific, a `CondVar` cannot be registered with a `struct poll_table`.
This limitation has the advantage that you do not need to call
`synchronize_rcu` when destroying a `CondVar`.
However, we need the ability to register a `poll_table` with a
`wait_list` in Rust Binder. To enable this, introduce a type called
`PollCondVar`, which is like `CondVar` except that you can register a
`poll_table`. We also introduce `PollTable`, which is a safe wrapper
around `poll_table` that is intended to be used with `PollCondVar`.
The destructor of `PollCondVar` unconditionally calls `synchronize_rcu`
to ensure that the removal of epoll waiters has fully completed before
the `wait_list` is destroyed.
Signed-off-by: Alice Ryhl <[email protected]>
---
That said, `synchronize_rcu` is rather expensive and is not needed in
all cases: If we have never registered a `poll_table` with the
`wait_list`, then we don't need to call `synchronize_rcu`. (And this is
a common case in Binder - not all processes use Binder with epoll.) The
current implementation does not account for this, but we could change it
to store a boolean next to the `wait_list` to keep track of whether a
`poll_table` has ever been registered. It is up to discussion whether
this is desireable.
It is not clear to me whether we can implement the above without storing
an extra boolean. We could check whether the `wait_list` is empty, but
it is not clear that this is sufficient. Perhaps someone knows the
answer? If a `poll_table` has previously been registered with a
`wait_list`, is it the case that we can kfree the `wait_list` after
observing that the `wait_list` is empty without waiting for an rcu grace
period?
rust/bindings/bindings_helper.h | 2 +
rust/bindings/lib.rs | 1 +
rust/kernel/file.rs | 3 ++
rust/kernel/file/poll_table.rs | 97 +++++++++++++++++++++++++++++++++++++++++
rust/kernel/sync/condvar.rs | 2 +-
5 files changed, 104 insertions(+), 1 deletion(-)
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index c8daee341df6..14f84aeef62d 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -13,6 +13,7 @@
#include <linux/file.h>
#include <linux/fs.h>
#include <linux/pid_namespace.h>
+#include <linux/poll.h>
#include <linux/security.h>
#include <linux/slab.h>
#include <linux/refcount.h>
@@ -25,3 +26,4 @@
const size_t BINDINGS_ARCH_SLAB_MINALIGN = ARCH_SLAB_MINALIGN;
const gfp_t BINDINGS_GFP_KERNEL = GFP_KERNEL;
const gfp_t BINDINGS___GFP_ZERO = __GFP_ZERO;
+const __poll_t BINDINGS_POLLFREE = POLLFREE;
diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs
index 9bcbea04dac3..eeb291cc60db 100644
--- a/rust/bindings/lib.rs
+++ b/rust/bindings/lib.rs
@@ -51,3 +51,4 @@ mod bindings_helper {
pub const GFP_KERNEL: gfp_t = BINDINGS_GFP_KERNEL;
pub const __GFP_ZERO: gfp_t = BINDINGS___GFP_ZERO;
+pub const POLLFREE: __poll_t = BINDINGS_POLLFREE;
diff --git a/rust/kernel/file.rs b/rust/kernel/file.rs
index 578ee307093f..35576678c993 100644
--- a/rust/kernel/file.rs
+++ b/rust/kernel/file.rs
@@ -14,6 +14,9 @@
use alloc::boxed::Box;
use core::{alloc::AllocError, marker::PhantomData, mem, ptr};
+mod poll_table;
+pub use self::poll_table::{PollCondVar, PollTable};
+
/// Flags associated with a [`File`].
pub mod flags {
/// File is opened in append mode.
diff --git a/rust/kernel/file/poll_table.rs b/rust/kernel/file/poll_table.rs
new file mode 100644
index 000000000000..a26b64df0106
--- /dev/null
+++ b/rust/kernel/file/poll_table.rs
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Utilities for working with `struct poll_table`.
+
+use crate::{
+ bindings,
+ file::File,
+ prelude::*,
+ sync::{CondVar, LockClassKey},
+ types::Opaque,
+};
+use core::ops::Deref;
+
+/// Creates a [`PollCondVar`] initialiser with the given name and a newly-created lock class.
+#[macro_export]
+macro_rules! new_poll_condvar {
+ ($($name:literal)?) => {
+ $crate::file::PollCondVar::new($crate::optional_name!($($name)?), $crate::static_lock_class!())
+ };
+}
+
+/// Wraps the kernel's `struct poll_table`.
+#[repr(transparent)]
+pub struct PollTable(Opaque<bindings::poll_table>);
+
+impl PollTable {
+ /// Creates a reference to a [`PollTable`] from a valid pointer.
+ ///
+ /// # Safety
+ ///
+ /// The caller must ensure that for the duration of 'a, the pointer will point at a valid poll
+ /// table, and that it is only accessed via the returned reference.
+ pub unsafe fn from_ptr<'a>(ptr: *mut bindings::poll_table) -> &'a mut PollTable {
+ // SAFETY: The safety requirements guarantee the validity of the dereference, while the
+ // `PollTable` type being transparent makes the cast ok.
+ unsafe { &mut *ptr.cast() }
+ }
+
+ fn get_qproc(&self) -> bindings::poll_queue_proc {
+ let ptr = self.0.get();
+ // SAFETY: The `ptr` is valid because it originates from a reference, and the `_qproc`
+ // field is not modified concurrently with this call.
+ unsafe { (*ptr)._qproc }
+ }
+
+ /// Register this [`PollTable`] with the provided [`PollCondVar`], so that it can be notified
+ /// using the condition variable.
+ pub fn register_wait(&mut self, file: &File, cv: &PollCondVar) {
+ if let Some(qproc) = self.get_qproc() {
+ // SAFETY: The pointers to `self` and `file` are valid because they are references.
+ //
+ // Before the wait list is destroyed, the destructor of `PollCondVar` will clear
+ // everything in the wait list, so the wait list is not used after it is freed.
+ unsafe { qproc(file.0.get() as _, cv.wait_list.get(), self.0.get()) };
+ }
+ }
+}
+
+/// A wrapper around [`CondVar`] that makes it usable with [`PollTable`].
+///
+/// [`CondVar`]: crate::sync::CondVar
+#[pin_data(PinnedDrop)]
+pub struct PollCondVar {
+ #[pin]
+ inner: CondVar,
+}
+
+impl PollCondVar {
+ /// Constructs a new condvar initialiser.
+ #[allow(clippy::new_ret_no_self)]
+ pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self> {
+ pin_init!(Self {
+ inner <- CondVar::new(name, key),
+ })
+ }
+}
+
+// Make the `CondVar` methods callable on `PollCondVar`.
+impl Deref for PollCondVar {
+ type Target = CondVar;
+
+ fn deref(&self) -> &CondVar {
+ &self.inner
+ }
+}
+
+#[pinned_drop]
+impl PinnedDrop for PollCondVar {
+ fn drop(self: Pin<&mut Self>) {
+ // Clear anything registered using `register_wait`.
+ self.inner.notify(1, bindings::POLLHUP | bindings::POLLFREE);
+ // Wait for epoll items to be properly removed.
+ //
+ // SAFETY: Just an FFI call.
+ unsafe { bindings::synchronize_rcu() };
+ }
+}
diff --git a/rust/kernel/sync/condvar.rs b/rust/kernel/sync/condvar.rs
index b679b6f6dbeb..2d276a013ec8 100644
--- a/rust/kernel/sync/condvar.rs
+++ b/rust/kernel/sync/condvar.rs
@@ -143,7 +143,7 @@ pub fn wait_uninterruptible<T: ?Sized, B: Backend>(&self, guard: &mut Guard<'_,
}
/// Calls the kernel function to notify the appropriate number of threads with the given flags.
- fn notify(&self, count: i32, flags: u32) {
+ pub(crate) fn notify(&self, count: i32, flags: u32) {
// SAFETY: `wait_list` points to valid memory.
unsafe {
bindings::__wake_up(
--
2.43.0.rc1.413.gea7ed67945-goog
On Wed, Nov 29, 2023 at 01:11:56PM +0000, Alice Ryhl wrote:
> From: Wedson Almeida Filho <[email protected]>
>
> Allow for the creation of a file descriptor in two steps: first, we
> reserve a slot for it, then we commit or drop the reservation. The first
> step may fail (e.g., the current process ran out of available slots),
> but commit and drop never fail (and are mutually exclusive).
>
> This is needed by Rust Binder when fds are sent from one process to
> another. It has to be a two-step process to properly handle the case
> where multiple fds are sent: The operation must fail or succeed
> atomically, which we achieve by first reserving the fds we need, and
> only installing the files once we have reserved enough fds to send the
> files.
>
> Fd reservations assume that the value of `current` does not change
> between the call to get_unused_fd_flags and the call to fd_install (or
> put_unused_fd). By not implementing the Send trait, this abstraction
> ensures that the `FileDescriptorReservation` cannot be moved into a
> different process.
>
> Signed-off-by: Wedson Almeida Filho <[email protected]>
> Co-developed-by: Alice Ryhl <[email protected]>
> Signed-off-by: Alice Ryhl <[email protected]>
> ---
> rust/kernel/file.rs | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 63 insertions(+), 1 deletion(-)
>
> diff --git a/rust/kernel/file.rs b/rust/kernel/file.rs
> index f1f71c3d97e2..2186a6ea3f2f 100644
> --- a/rust/kernel/file.rs
> +++ b/rust/kernel/file.rs
> @@ -11,7 +11,7 @@
> error::{code::*, Error, Result},
> types::{ARef, AlwaysRefCounted, Opaque},
> };
> -use core::ptr;
> +use core::{marker::PhantomData, ptr};
>
> /// Flags associated with a [`File`].
> pub mod flags {
> @@ -180,6 +180,68 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
> }
> }
>
> +/// A file descriptor reservation.
> +///
> +/// This allows the creation of a file descriptor in two steps: first, we reserve a slot for it,
> +/// then we commit or drop the reservation. The first step may fail (e.g., the current process ran
> +/// out of available slots), but commit and drop never fail (and are mutually exclusive).
> +///
> +/// Dropping the reservation happens in the destructor of this type.
> +///
> +/// # Invariants
> +///
> +/// The fd stored in this struct must correspond to a reserved file descriptor of the current task.
> +pub struct FileDescriptorReservation {
Can we follow the traditional file terminology, i.e.,
get_unused_fd_flags() and fd_install()? At least at the beginning this
might be quite helpful instead of having to mentally map new() and
commit() onto the C functions.
> + fd: u32,
> + /// Prevent values of this type from being moved to a different task.
> + ///
> + /// This is necessary because the C FFI calls assume that `current` is set to the task that
> + /// owns the fd in question.
> + _not_send_sync: PhantomData<*mut ()>,
I don't fully understand this. Can you explain in a little more detail
what you mean by this and how this works?
> +}
> +
> +impl FileDescriptorReservation {
> + /// Creates a new file descriptor reservation.
> + pub fn new(flags: u32) -> Result<Self> {
> + // SAFETY: FFI call, there are no safety requirements on `flags`.
> + let fd: i32 = unsafe { bindings::get_unused_fd_flags(flags) };
> + if fd < 0 {
> + return Err(Error::from_errno(fd));
> + }
> + Ok(Self {
> + fd: fd as _,
This is a cast to a u32?
> + _not_send_sync: PhantomData,
Can you please draft a quick example how that return value would be
expected to be used by a caller? It's really not clear
> + })
> + }
> +
> + /// Returns the file descriptor number that was reserved.
> + pub fn reserved_fd(&self) -> u32 {
> + self.fd
> + }
> +
> + /// Commits the reservation.
> + ///
> + /// The previously reserved file descriptor is bound to `file`. This method consumes the
> + /// [`FileDescriptorReservation`], so it will not be usable after this call.
> + pub fn commit(self, file: ARef<File>) {
> + // SAFETY: `self.fd` was previously returned by `get_unused_fd_flags`, and `file.ptr` is
> + // guaranteed to have an owned ref count by its type invariants.
> + unsafe { bindings::fd_install(self.fd, file.0.get()) };
Why file.0.get()? Where did that come from?
> +
> + // `fd_install` consumes both the file descriptor and the file reference, so we cannot run
> + // the destructors.
> + core::mem::forget(self);
> + core::mem::forget(file);
> + }
> +}
> +
> +impl Drop for FileDescriptorReservation {
> + fn drop(&mut self) {
> + // SAFETY: `self.fd` was returned by a previous call to `get_unused_fd_flags`.
> + unsafe { bindings::put_unused_fd(self.fd) };
> + }
> +}
> +
> /// Represents the `EBADF` error code.
> ///
> /// Used for methods that can only fail with `EBADF`.
>
> --
> 2.43.0.rc1.413.gea7ed67945-goog
>
On Wed, Nov 29, 2023 at 01:12:17PM +0000, Alice Ryhl wrote:
> Adds a wrapper around `kuid_t` called `Kuid`. This allows us to define
> various operations on kuids such as equality and current_euid. It also
> lets us provide conversions from kuid into userspace values.
>
> Rust Binder needs these operations because it needs to compare kuids for
> equality, and it needs to tell userspace about the pid and uid of
> incoming transactions.
>
> Signed-off-by: Alice Ryhl <[email protected]>
> ---
> rust/bindings/bindings_helper.h | 1 +
> rust/helpers.c | 45 ++++++++++++++++++++++++++
> rust/kernel/cred.rs | 5 +--
> rust/kernel/task.rs | 71 ++++++++++++++++++++++++++++++++++++++++-
> 4 files changed, 119 insertions(+), 3 deletions(-)
>
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 81b13a953eae..700f01840188 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -11,6 +11,7 @@
> #include <linux/errname.h>
> #include <linux/file.h>
> #include <linux/fs.h>
> +#include <linux/pid_namespace.h>
> #include <linux/security.h>
> #include <linux/slab.h>
> #include <linux/refcount.h>
> diff --git a/rust/helpers.c b/rust/helpers.c
> index fd633d9db79a..58e3a9dff349 100644
> --- a/rust/helpers.c
> +++ b/rust/helpers.c
> @@ -142,6 +142,51 @@ void rust_helper_put_task_struct(struct task_struct *t)
> }
> EXPORT_SYMBOL_GPL(rust_helper_put_task_struct);
>
> +kuid_t rust_helper_task_uid(struct task_struct *task)
> +{
> + return task_uid(task);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_task_uid);
> +
> +kuid_t rust_helper_task_euid(struct task_struct *task)
> +{
> + return task_euid(task);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_task_euid);
> +
> +#ifndef CONFIG_USER_NS
> +uid_t rust_helper_from_kuid(struct user_namespace *to, kuid_t uid)
> +{
> + return from_kuid(to, uid);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_from_kuid);
> +#endif /* CONFIG_USER_NS */
> +
> +bool rust_helper_uid_eq(kuid_t left, kuid_t right)
> +{
> + return uid_eq(left, right);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_uid_eq);
> +
> +kuid_t rust_helper_current_euid(void)
> +{
> + return current_euid();
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_current_euid);
> +
> +struct user_namespace *rust_helper_current_user_ns(void)
> +{
> + return current_user_ns();
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_current_user_ns);
> +
> +pid_t rust_helper_task_tgid_nr_ns(struct task_struct *tsk,
> + struct pid_namespace *ns)
> +{
> + return task_tgid_nr_ns(tsk, ns);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_task_tgid_nr_ns);
I'm a bit puzzled by all these rust_helper_*() calls. Can you explain
why they are needed? Because they are/can be static inlines and that
somehow doesn't work?
> +
> struct kunit *rust_helper_kunit_get_current_test(void)
> {
> return kunit_get_current_test();
> diff --git a/rust/kernel/cred.rs b/rust/kernel/cred.rs
> index 3794937b5294..fbc749788bfa 100644
> --- a/rust/kernel/cred.rs
> +++ b/rust/kernel/cred.rs
> @@ -8,6 +8,7 @@
>
> use crate::{
> bindings,
> + task::Kuid,
> types::{AlwaysRefCounted, Opaque},
> };
>
> @@ -52,9 +53,9 @@ pub fn get_secid(&self) -> u32 {
> }
>
> /// Returns the effective UID of the given credential.
> - pub fn euid(&self) -> bindings::kuid_t {
> + pub fn euid(&self) -> Kuid {
> // SAFETY: By the type invariant, we know that `self.0` is valid.
> - unsafe { (*self.0.get()).euid }
> + Kuid::from_raw(unsafe { (*self.0.get()).euid })
> }
> }
>
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index b2299bc7ac1f..1a27b968a907 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -5,7 +5,12 @@
> //! C header: [`include/linux/sched.h`](../../../../include/linux/sched.h).
>
> use crate::{bindings, types::Opaque};
> -use core::{marker::PhantomData, ops::Deref, ptr};
> +use core::{
> + cmp::{Eq, PartialEq},
> + marker::PhantomData,
> + ops::Deref,
> + ptr,
> +};
>
> /// Returns the currently running task.
> #[macro_export]
> @@ -78,6 +83,12 @@ unsafe impl Sync for Task {}
> /// The type of process identifiers (PIDs).
> type Pid = bindings::pid_t;
>
> +/// The type of user identifiers (UIDs).
> +#[derive(Copy, Clone)]
> +pub struct Kuid {
> + kuid: bindings::kuid_t,
> +}
> +
> impl Task {
> /// Returns a task reference for the currently executing task/thread.
> ///
> @@ -132,12 +143,34 @@ pub fn pid(&self) -> Pid {
> unsafe { *ptr::addr_of!((*self.0.get()).pid) }
> }
>
> + /// Returns the UID of the given task.
> + pub fn uid(&self) -> Kuid {
> + // SAFETY: By the type invariant, we know that `self.0` is valid.
> + Kuid::from_raw(unsafe { bindings::task_uid(self.0.get()) })
> + }
> +
> + /// Returns the effective UID of the given task.
> + pub fn euid(&self) -> Kuid {
> + // SAFETY: By the type invariant, we know that `self.0` is valid.
> + Kuid::from_raw(unsafe { bindings::task_euid(self.0.get()) })
> + }
> +
> /// Determines whether the given task has pending signals.
> pub fn signal_pending(&self) -> bool {
> // SAFETY: By the type invariant, we know that `self.0` is valid.
> unsafe { bindings::signal_pending(self.0.get()) != 0 }
> }
>
> + /// Returns the given task's pid in the current pid namespace.
> + pub fn pid_in_current_ns(&self) -> Pid {
> + // SAFETY: We know that `self.0.get()` is valid by the type invariant. The rest is just FFI
> + // calls.
> + unsafe {
> + let namespace = bindings::task_active_pid_ns(bindings::get_current());
> + bindings::task_tgid_nr_ns(self.0.get(), namespace)
> + }
> + }
> +
> /// Wakes up the task.
> pub fn wake_up(&self) {
> // SAFETY: By the type invariant, we know that `self.0.get()` is non-null and valid.
> @@ -147,6 +180,42 @@ pub fn wake_up(&self) {
> }
> }
>
> +impl Kuid {
> + /// Get the current euid.
> + pub fn current_euid() -> Kuid {
> + // SAFETY: Just an FFI call.
> + Self {
> + kuid: unsafe { bindings::current_euid() },
> + }
> + }
> +
> + /// Create a `Kuid` given the raw C type.
> + pub fn from_raw(kuid: bindings::kuid_t) -> Self {
> + Self { kuid }
> + }
> +
> + /// Turn this kuid into the raw C type.
> + pub fn into_raw(self) -> bindings::kuid_t {
> + self.kuid
> + }
> +
> + /// Converts this kernel UID into a UID that userspace understands. Uses the namespace of the
> + /// current task.
> + pub fn into_uid_in_current_ns(self) -> bindings::uid_t {
Hm, I wouldn't special-case this. Just expose from_kuid() and let it
take a namespace argument, no? You don't need to provide bindings for
namespaces ofc.
> + // SAFETY: Just an FFI call.
> + unsafe { bindings::from_kuid(bindings::current_user_ns(), self.kuid) }
> + }
> +}
> +
> +impl PartialEq for Kuid {
> + fn eq(&self, other: &Kuid) -> bool {
> + // SAFETY: Just an FFI call.
> + unsafe { bindings::uid_eq(self.kuid, other.kuid) }
> + }
> +}
> +
> +impl Eq for Kuid {}
Do you need that?
> +
> // SAFETY: The type invariants guarantee that `Task` is always ref-counted.
> unsafe impl crate::types::AlwaysRefCounted for Task {
> fn inc_ref(&self) {
>
> --
> 2.43.0.rc1.413.gea7ed67945-goog
>
On Wed, Nov 29, 2023 at 12:51:06PM +0000, Alice Ryhl wrote:
> This patchset contains the file abstractions needed by the Rust
> implementation of the Binder driver.
>
> Please see the Rust Binder RFC for usage examples:
> https://lore.kernel.org/rust-for-linux/[email protected]/
>
> Users of "rust: file: add Rust abstraction for `struct file`":
> [PATCH RFC 02/20] rust_binder: add binderfs support to Rust binder
> [PATCH RFC 03/20] rust_binder: add threading support
>
> Users of "rust: cred: add Rust abstraction for `struct cred`":
> [PATCH RFC 05/20] rust_binder: add nodes and context managers
> [PATCH RFC 06/20] rust_binder: add oneway transactions
> [PATCH RFC 11/20] rust_binder: send nodes in transaction
> [PATCH RFC 13/20] rust_binder: add BINDER_TYPE_FD support
>
> Users of "rust: security: add abstraction for security_secid_to_secctx":
> [PATCH RFC 06/20] rust_binder: add oneway transactions
>
> Users of "rust: file: add `FileDescriptorReservation`":
> [PATCH RFC 13/20] rust_binder: add BINDER_TYPE_FD support
> [PATCH RFC 14/20] rust_binder: add BINDER_TYPE_FDA support
>
> Users of "rust: file: add kuid getters":
> [PATCH RFC 05/20] rust_binder: add nodes and context managers
> [PATCH RFC 06/20] rust_binder: add oneway transactions
>
> Users of "rust: file: add `DeferredFdCloser`":
> [PATCH RFC 14/20] rust_binder: add BINDER_TYPE_FDA support
>
> Users of "rust: file: add abstraction for `poll_table`":
> [PATCH RFC 07/20] rust_binder: add epoll support
>
> This patchset has some uses of read_volatile in place of READ_ONCE.
> Please see the following rfc for context on this:
> https://lore.kernel.org/all/[email protected]/
>
> This was previously sent as an rfc:
> https://lore.kernel.org/all/[email protected]/
>
> Signed-off-by: Alice Ryhl <[email protected]>
> ---
> Alice Ryhl (4):
> rust: security: add abstraction for security_secid_to_secctx
> rust: file: add `Kuid` wrapper
> rust: file: add `DeferredFdCloser`
> rust: file: add abstraction for `poll_table`
>
> Wedson Almeida Filho (3):
> rust: file: add Rust abstraction for `struct file`
> rust: cred: add Rust abstraction for `struct cred`
> rust: file: add `FileDescriptorReservation`
>
> rust/bindings/bindings_helper.h | 9 ++
> rust/bindings/lib.rs | 1 +
> rust/helpers.c | 94 +++++++++++
> rust/kernel/cred.rs | 73 +++++++++
> rust/kernel/file.rs | 345 ++++++++++++++++++++++++++++++++++++++++
> rust/kernel/file/poll_table.rs | 97 +++++++++++
That's pretty far away from the subsystem these wrappers belong to. I
would prefer if wrappers such as this would live directly in fs/rust/
and so live within the subsystem they belong to. I think I mentioned
that before. Maybe I missed some sort of agreement here?
> rust/kernel/lib.rs | 3 +
> rust/kernel/security.rs | 78 +++++++++
> rust/kernel/sync/condvar.rs | 2 +-
> rust/kernel/task.rs | 71 ++++++++-
> 10 files changed, 771 insertions(+), 2 deletions(-)
> ---
> base-commit: 98b1cc82c4affc16f5598d4fa14b1858671b2263
> change-id: 20231123-alice-file-525b98e8a724
>
> Best regards,
> --
> Alice Ryhl <[email protected]>
>
On Wed, Nov 29, 2023 at 5:31 PM Christian Brauner <[email protected]> wrote:
>
> That's pretty far away from the subsystem these wrappers belong to. I
> would prefer if wrappers such as this would live directly in fs/rust/
> and so live within the subsystem they belong to. I think I mentioned
> that before. Maybe I missed some sort of agreement here?
The plan is that the code will be moved to the right places when the
new build system is in place (WIP). Currently the "abstractions" go
inside the `kernel` crate.
Cheers,
Miguel
On Wed, Nov 29, 2023 at 05:28:27PM +0100, Christian Brauner wrote:
> > +pid_t rust_helper_task_tgid_nr_ns(struct task_struct *tsk,
> > + struct pid_namespace *ns)
> > +{
> > + return task_tgid_nr_ns(tsk, ns);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_task_tgid_nr_ns);
>
> I'm a bit puzzled by all these rust_helper_*() calls. Can you explain
> why they are needed? Because they are/can be static inlines and that
> somehow doesn't work?
Correct, because Rust can only talk to C ABI, it cannot use C headers.
Bindgen would need to translate the full C headers into valid Rust for
that to work.
I really think the Rust peoples should spend more effort on that,
because you are quite right, all this wrappery is tedious at best.
Christian Brauner <[email protected]> writes:
> Can we follow the traditional file terminology, i.e.,
> get_unused_fd_flags() and fd_install()? At least at the beginning this
> might be quite helpful instead of having to mentally map new() and
> commit() onto the C functions.
Sure, I'll do that in the next version.
>> + /// Prevent values of this type from being moved to a different task.
>> + ///
>> + /// This is necessary because the C FFI calls assume that `current` is set to the task that
>> + /// owns the fd in question.
>> + _not_send_sync: PhantomData<*mut ()>,
>
> I don't fully understand this. Can you explain in a little more detail
> what you mean by this and how this works?
Yeah, so, this has to do with the Rust trait `Send` that controls
whether it's okay for a value to get moved from one thread to another.
In this case, we don't want it to be `Send` so that it can't be moved to
another thread, since current might be different there.
The `Send` trait is automatically applied to structs whenever *all*
fields of the struct are `Send`. So to ensure that a struct is not
`Send`, you add a field that is not `Send`.
The `PhantomData` type used here is a special zero-sized type.
Basically, it says "pretend this struct has a field of type `*mut ()`,
but don't actually add the field". So for the purposes of `Send`, it has
a non-Send field, but since its wrapped in `PhantomData`, the field is
not there at runtime.
>> + Ok(Self {
>> + fd: fd as _,
>
> This is a cast to a u32?
Yes.
> Can you please draft a quick example how that return value would be
> expected to be used by a caller? It's really not clear
The most basic usage would look like this:
// First, reserve the fd.
let reservation = FileDescriptorReservation::new(O_CLOEXEC)?;
// Then, somehow get a file to put in it.
let file = get_file_using_fallible_operation()?;
// Finally, commit it to the fd.
reservation.commit(file);
In Rust Binder, reservations are used here:
https://github.com/Darksonn/linux/blob/dca45e6c7848e024709b165a306cdbe88e5b086a/drivers/android/allocation.rs#L199-L210
https://github.com/Darksonn/linux/blob/dca45e6c7848e024709b165a306cdbe88e5b086a/drivers/android/allocation.rs#L512-L541
>> + pub fn commit(self, file: ARef<File>) {
>> + // SAFETY: `self.fd` was previously returned by `get_unused_fd_flags`, and `file.ptr` is
>> + // guaranteed to have an owned ref count by its type invariants.
>> + unsafe { bindings::fd_install(self.fd, file.0.get()) };
>
> Why file.0.get()? Where did that come from?
This gets a raw pointer to the C type.
The `.0` part is a field access. `ARef` struct is a tuple struct, so its
fields are unnamed. However, the fields can still be accessed by index.
Alice
On Wed, Nov 29, 2023 at 5:55 PM Alice Ryhl <[email protected]> wrote:
> >> + pub fn commit(self, file: ARef<File>) {
> >> + // SAFETY: `self.fd` was previously returned by `get_unused_fd_flags`, and `file.ptr` is
> >> + // guaranteed to have an owned ref count by its type invariants.
> >> + unsafe { bindings::fd_install(self.fd, file.0.get()) };
> >
> > Why file.0.get()? Where did that come from?
>
> This gets a raw pointer to the C type.
>
> The `.0` part is a field access. `ARef` struct is a tuple struct, so its
> fields are unnamed. However, the fields can still be accessed by index.
Oh, sorry, this is wrong. Let me try again:
This gets a raw pointer to the C type. The `.0` part accesses the
field of type `Opaque<bindings::file>` in the Rust wrapper. Recall
that File is defined like this:
pub struct File(Opaque<bindings::file>);
The above syntax defines a tuple struct, which means that the fields
are unnamed. The `.0` syntax accesses the first field of a tuple
struct [1].
The `.get()` method is from the `Opaque` struct, which returns a raw
pointer to the C type being wrapped.
Alice
[1]: https://doc.rust-lang.org/std/keyword.struct.html#:~:text=Tuple%20structs%20are%20similar%20to,with%20regular%20tuples%2C%20namely%20foo.
On Wed, Nov 29, 2023 at 04:55:51PM +0000, Alice Ryhl wrote:
> Christian Brauner <[email protected]> writes:
> > Can we follow the traditional file terminology, i.e.,
> > get_unused_fd_flags() and fd_install()? At least at the beginning this
> > might be quite helpful instead of having to mentally map new() and
> > commit() onto the C functions.
>
> Sure, I'll do that in the next version.
>
> >> + /// Prevent values of this type from being moved to a different task.
> >> + ///
> >> + /// This is necessary because the C FFI calls assume that `current` is set to the task that
> >> + /// owns the fd in question.
> >> + _not_send_sync: PhantomData<*mut ()>,
> >
> > I don't fully understand this. Can you explain in a little more detail
> > what you mean by this and how this works?
>
> Yeah, so, this has to do with the Rust trait `Send` that controls
> whether it's okay for a value to get moved from one thread to another.
> In this case, we don't want it to be `Send` so that it can't be moved to
> another thread, since current might be different there.
>
> The `Send` trait is automatically applied to structs whenever *all*
> fields of the struct are `Send`. So to ensure that a struct is not
> `Send`, you add a field that is not `Send`.
>
> The `PhantomData` type used here is a special zero-sized type.
> Basically, it says "pretend this struct has a field of type `*mut ()`,
> but don't actually add the field". So for the purposes of `Send`, it has
> a non-Send field, but since its wrapped in `PhantomData`, the field is
> not there at runtime.
This probably a stupid suggestion, question. But while PhantomData gives
the right hint of what is happening I wouldn't mind if that was very
explicitly called NoSendTrait or just add the explanatory comment. Yes,
that's a lot of verbiage but you'd help us a lot.
>
> >> + Ok(Self {
> >> + fd: fd as _,
> >
> > This is a cast to a u32?
>
> Yes.
>
> > Can you please draft a quick example how that return value would be
> > expected to be used by a caller? It's really not clear
>
> The most basic usage would look like this:
>
> // First, reserve the fd.
> let reservation = FileDescriptorReservation::new(O_CLOEXEC)?;
>
> // Then, somehow get a file to put in it.
> let file = get_file_using_fallible_operation()?;
>
> // Finally, commit it to the fd.
> reservation.commit(file);
Ok, the reason I asked was that I was confused about the PhantomData and
how that would figure into using the return value as I hadn't seen that
Ok(Self { }) syntax before. Thanks.
>
> In Rust Binder, reservations are used here:
> https://github.com/Darksonn/linux/blob/dca45e6c7848e024709b165a306cdbe88e5b086a/drivers/android/allocation.rs#L199-L210
> https://github.com/Darksonn/linux/blob/dca45e6c7848e024709b165a306cdbe88e5b086a/drivers/android/allocation.rs#L512-L541
>
> >> + pub fn commit(self, file: ARef<File>) {
> >> + // SAFETY: `self.fd` was previously returned by `get_unused_fd_flags`, and `file.ptr` is
> >> + // guaranteed to have an owned ref count by its type invariants.
> >> + unsafe { bindings::fd_install(self.fd, file.0.get()) };
> >
> > Why file.0.get()? Where did that come from?
>
> This gets a raw pointer to the C type.
>
> The `.0` part is a field access. `ARef` struct is a tuple struct, so its
Ah, there we go. It's a bit ugly tbh.
> fields are unnamed. However, the fields can still be accessed by index.
On Wed, Nov 29, 2023 at 06:14:24PM +0100, Alice Ryhl wrote:
> On Wed, Nov 29, 2023 at 5:55 PM Alice Ryhl <[email protected]> wrote:
>
> > >> + pub fn commit(self, file: ARef<File>) {
> > >> + // SAFETY: `self.fd` was previously returned by `get_unused_fd_flags`, and `file.ptr` is
> > >> + // guaranteed to have an owned ref count by its type invariants.
> > >> + unsafe { bindings::fd_install(self.fd, file.0.get()) };
> > >
> > > Why file.0.get()? Where did that come from?
> >
> > This gets a raw pointer to the C type.
> >
> > The `.0` part is a field access. `ARef` struct is a tuple struct, so its
> > fields are unnamed. However, the fields can still be accessed by index.
>
> Oh, sorry, this is wrong. Let me try again:
>
> This gets a raw pointer to the C type. The `.0` part accesses the
> field of type `Opaque<bindings::file>` in the Rust wrapper. Recall
> that File is defined like this:
>
> pub struct File(Opaque<bindings::file>);
>
> The above syntax defines a tuple struct, which means that the fields
> are unnamed. The `.0` syntax accesses the first field of a tuple
> struct [1].
>
> The `.get()` method is from the `Opaque` struct, which returns a raw
> pointer to the C type being wrapped.
It'd be nice if this could be written in a more obvious/elegant way. And
if not a comment would help. I know there'll be more text then code but
until this is second nature to read I personally won't mind... Because
searching for this specific syntax isn't really possible.
Christian Brauner <[email protected]> writes:
>>>> + /// Prevent values of this type from being moved to a different task.
>>>> + ///
>>>> + /// This is necessary because the C FFI calls assume that `current` is set to the task that
>>>> + /// owns the fd in question.
>>>> + _not_send_sync: PhantomData<*mut ()>,
>>>
>>> I don't fully understand this. Can you explain in a little more detail
>>> what you mean by this and how this works?
>>
>> Yeah, so, this has to do with the Rust trait `Send` that controls
>> whether it's okay for a value to get moved from one thread to another.
>> In this case, we don't want it to be `Send` so that it can't be moved to
>> another thread, since current might be different there.
>>
>> The `Send` trait is automatically applied to structs whenever *all*
>> fields of the struct are `Send`. So to ensure that a struct is not
>> `Send`, you add a field that is not `Send`.
>>
>> The `PhantomData` type used here is a special zero-sized type.
>> Basically, it says "pretend this struct has a field of type `*mut ()`,
>> but don't actually add the field". So for the purposes of `Send`, it has
>> a non-Send field, but since its wrapped in `PhantomData`, the field is
>> not there at runtime.
>
> This probably a stupid suggestion, question. But while PhantomData gives
> the right hint of what is happening I wouldn't mind if that was very
> explicitly called NoSendTrait or just add the explanatory comment. Yes,
> that's a lot of verbiage but you'd help us a lot.
I suppose we could add a typedef:
type NoSendTrait = PhantomData<*mut ()>;
and use that as the field type. The way I did it here is the "standard"
way of doing it, and if you look at code outside the kernel, you will
also find them using `PhantomData` like this. However, I don't mind
adding the typedef if you think it is helpful.
Alice
Christian Brauner <[email protected]> writes:
> On Wed, Nov 29, 2023 at 06:14:24PM +0100, Alice Ryhl wrote:
> > On Wed, Nov 29, 2023 at 5:55 PM Alice Ryhl <[email protected]> wrote:
> >
> > > >> + pub fn commit(self, file: ARef<File>) {
> > > >> + // SAFETY: `self.fd` was previously returned by `get_unused_fd_flags`, and `file.ptr` is
> > > >> + // guaranteed to have an owned ref count by its type invariants.
> > > >> + unsafe { bindings::fd_install(self.fd, file.0.get()) };
> > > >
> > > > Why file.0.get()? Where did that come from?
> > >
> > > This gets a raw pointer to the C type.
> > >
> > > The `.0` part is a field access. `ARef` struct is a tuple struct, so its
> > > fields are unnamed. However, the fields can still be accessed by index.
> >
> > Oh, sorry, this is wrong. Let me try again:
> >
> > This gets a raw pointer to the C type. The `.0` part accesses the
> > field of type `Opaque<bindings::file>` in the Rust wrapper. Recall
> > that File is defined like this:
> >
> > pub struct File(Opaque<bindings::file>);
> >
> > The above syntax defines a tuple struct, which means that the fields
> > are unnamed. The `.0` syntax accesses the first field of a tuple
> > struct [1].
> >
> > The `.get()` method is from the `Opaque` struct, which returns a raw
> > pointer to the C type being wrapped.
>
> It'd be nice if this could be written in a more obvious/elegant way. And
> if not a comment would help. I know there'll be more text then code but
> until this is second nature to read I personally won't mind... Because
> searching for this specific syntax isn't really possible.
Adding a comment to every instance of this is probably not realisitic.
This kind of code will be very common in abstraction code. However,
there are two other options that I think are reasonable:
1. I can change the definition of `File` so that the field has a name:
struct File {
inner: Opaque<bindings::file>,
}
Then, it would say `file.inner.get()`.
2. Alternatively, I can add a method to file:
impl File {
#[inline]
pub fn as_ptr(&self) -> *mut bindings::file {
self.0.get()
}
}
And then write `file.as_ptr()` whenever I want a pointer.
Alice
Christian Brauner <[email protected]> writes:
> I'm a bit puzzled by all these rust_helper_*() calls. Can you explain
> why they are needed? Because they are/can be static inlines and that
> somehow doesn't work?
Yes, it's because the methods are inline. Rust can only call C methods
that are actually exported by the C code.
>> + /// Converts this kernel UID into a UID that userspace understands. Uses the namespace of the
>> + /// current task.
>> + pub fn into_uid_in_current_ns(self) -> bindings::uid_t {
>
> Hm, I wouldn't special-case this. Just expose from_kuid() and let it
> take a namespace argument, no? You don't need to provide bindings for
> namespaces ofc.
To make `from_kuid` safe, I would need to wrap the namespace type too. I
could do that, but it would be more code than this method because I need
another wrapper struct and so on.
Personally I would prefer to special-case it until someone needs the
non-special-case. Then, they can delete this method when they introduce
the non-special-case.
But I'll do it if you think I should.
>> +impl PartialEq for Kuid {
>> + fn eq(&self, other: &Kuid) -> bool {
>> + // SAFETY: Just an FFI call.
>> + unsafe { bindings::uid_eq(self.kuid, other.kuid) }
>> + }
>> +}
>> +
>> +impl Eq for Kuid {}
>
> Do you need that?
Yes. This is the code that tells the compiler what `==` means for the
`Kuid` type. Binder uses it here:
https://github.com/Darksonn/linux/blob/dca45e6c7848e024709b165a306cdbe88e5b086a/drivers/android/context.rs#L174
Alice
On Wed, Nov 29, 2023 at 01:12:17PM +0000, Alice Ryhl wrote:
> diff --git a/rust/helpers.c b/rust/helpers.c
> index fd633d9db79a..58e3a9dff349 100644
> --- a/rust/helpers.c
> +++ b/rust/helpers.c
> @@ -142,6 +142,51 @@ void rust_helper_put_task_struct(struct task_struct *t)
> }
> EXPORT_SYMBOL_GPL(rust_helper_put_task_struct);
>
> +kuid_t rust_helper_task_uid(struct task_struct *task)
> +{
> + return task_uid(task);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_task_uid);
> +
> +kuid_t rust_helper_task_euid(struct task_struct *task)
> +{
> + return task_euid(task);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_task_euid);
Aren't these like ideal speculation gadgets? And shouldn't we avoid
functions like this for exactly that reason?
On Thu, Nov 30, 2023 at 09:17:56AM +0000, Alice Ryhl wrote:
> Christian Brauner <[email protected]> writes:
> >>>> + /// Prevent values of this type from being moved to a different task.
> >>>> + ///
> >>>> + /// This is necessary because the C FFI calls assume that `current` is set to the task that
> >>>> + /// owns the fd in question.
> >>>> + _not_send_sync: PhantomData<*mut ()>,
> >>>
> >>> I don't fully understand this. Can you explain in a little more detail
> >>> what you mean by this and how this works?
> >>
> >> Yeah, so, this has to do with the Rust trait `Send` that controls
> >> whether it's okay for a value to get moved from one thread to another.
> >> In this case, we don't want it to be `Send` so that it can't be moved to
> >> another thread, since current might be different there.
> >>
> >> The `Send` trait is automatically applied to structs whenever *all*
> >> fields of the struct are `Send`. So to ensure that a struct is not
> >> `Send`, you add a field that is not `Send`.
> >>
> >> The `PhantomData` type used here is a special zero-sized type.
> >> Basically, it says "pretend this struct has a field of type `*mut ()`,
> >> but don't actually add the field". So for the purposes of `Send`, it has
> >> a non-Send field, but since its wrapped in `PhantomData`, the field is
> >> not there at runtime.
> >
> > This probably a stupid suggestion, question. But while PhantomData gives
> > the right hint of what is happening I wouldn't mind if that was very
> > explicitly called NoSendTrait or just add the explanatory comment. Yes,
> > that's a lot of verbiage but you'd help us a lot.
>
> I suppose we could add a typedef:
>
> type NoSendTrait = PhantomData<*mut ()>;
>
> and use that as the field type. The way I did it here is the "standard"
> way of doing it, and if you look at code outside the kernel, you will
> also find them using `PhantomData` like this. However, I don't mind
> adding the typedef if you think it is helpful.
I'm fine with just a comment as well. I just need to be able to read
this a bit faster. I'm basically losing half a day just dealing with
this patchset and that's not realistic if I want to keep up with other
patches that get sent.
And if you resend and someone else review you might have to answer the
same question again.
On Thu, Nov 30, 2023 at 09:36:03AM +0000, Alice Ryhl wrote:
> Christian Brauner <[email protected]> writes:
> > I'm a bit puzzled by all these rust_helper_*() calls. Can you explain
> > why they are needed? Because they are/can be static inlines and that
> > somehow doesn't work?
>
> Yes, it's because the methods are inline. Rust can only call C methods
> that are actually exported by the C code.
>
> >> + /// Converts this kernel UID into a UID that userspace understands. Uses the namespace of the
> >> + /// current task.
> >> + pub fn into_uid_in_current_ns(self) -> bindings::uid_t {
> >
> > Hm, I wouldn't special-case this. Just expose from_kuid() and let it
> > take a namespace argument, no? You don't need to provide bindings for
> > namespaces ofc.
>
> To make `from_kuid` safe, I would need to wrap the namespace type too. I
> could do that, but it would be more code than this method because I need
> another wrapper struct and so on.
>
> Personally I would prefer to special-case it until someone needs the
> non-special-case. Then, they can delete this method when they introduce
> the non-special-case.
>
> But I'll do it if you think I should.
No, don't start wrapping namespaces as well. You already do parts of LSM
as well.
>
> >> +impl PartialEq for Kuid {
> >> + fn eq(&self, other: &Kuid) -> bool {
> >> + // SAFETY: Just an FFI call.
> >> + unsafe { bindings::uid_eq(self.kuid, other.kuid) }
> >> + }
> >> +}
> >> +
> >> +impl Eq for Kuid {}
> >
> > Do you need that?
>
> Yes. This is the code that tells the compiler what `==` means for the
> `Kuid` type. Binder uses it here:
Ok, thanks.
Christian Brauner <[email protected]> writes:
> On Thu, Nov 30, 2023 at 09:17:56AM +0000, Alice Ryhl wrote:
>> Christian Brauner <[email protected]> writes:
>>>>>> + /// Prevent values of this type from being moved to a different task.
>>>>>> + ///
>>>>>> + /// This is necessary because the C FFI calls assume that `current` is set to the task that
>>>>>> + /// owns the fd in question.
>>>>>> + _not_send_sync: PhantomData<*mut ()>,
>>>>>
>>>>> I don't fully understand this. Can you explain in a little more detail
>>>>> what you mean by this and how this works?
>>>>
>>>> Yeah, so, this has to do with the Rust trait `Send` that controls
>>>> whether it's okay for a value to get moved from one thread to another.
>>>> In this case, we don't want it to be `Send` so that it can't be moved to
>>>> another thread, since current might be different there.
>>>>
>>>> The `Send` trait is automatically applied to structs whenever *all*
>>>> fields of the struct are `Send`. So to ensure that a struct is not
>>>> `Send`, you add a field that is not `Send`.
>>>>
>>>> The `PhantomData` type used here is a special zero-sized type.
>>>> Basically, it says "pretend this struct has a field of type `*mut ()`,
>>>> but don't actually add the field". So for the purposes of `Send`, it has
>>>> a non-Send field, but since its wrapped in `PhantomData`, the field is
>>>> not there at runtime.
>>>
>>> This probably a stupid suggestion, question. But while PhantomData gives
>>> the right hint of what is happening I wouldn't mind if that was very
>>> explicitly called NoSendTrait or just add the explanatory comment. Yes,
>>> that's a lot of verbiage but you'd help us a lot.
>>
>> I suppose we could add a typedef:
>>
>> type NoSendTrait = PhantomData<*mut ()>;
>>
>> and use that as the field type. The way I did it here is the "standard"
>> way of doing it, and if you look at code outside the kernel, you will
>> also find them using `PhantomData` like this. However, I don't mind
>> adding the typedef if you think it is helpful.
>
> I'm fine with just a comment as well. I just need to be able to read
> this a bit faster. I'm basically losing half a day just dealing with
> this patchset and that's not realistic if I want to keep up with other
> patches that get sent.
>
> And if you resend and someone else review you might have to answer the
> same question again.
What do you think about this wording?
/// Prevent values of this type from being moved to a different task.
///
/// This field has the type `PhantomData<*mut ()>`, which does not
/// implement the Send trait. By adding a field with this property, we
/// ensure that the `FileDescriptorReservation` struct will not
/// implement the Send trait either. This has the consequence that the
/// compiler will prevent you from moving values of type
/// `FileDescriptorReservation` into a different task, which we want
/// because other tasks might have a different value of `current`. We
/// want to avoid that because `fd_install` assumes that the value of
/// `current` is unchanged since the call to `get_unused_fd_flags`.
///
/// The `PhantomData` type has size zero, so the field does not exist at
/// runtime.
Alice
On 30.11.23 12:54, Alice Ryhl wrote:
> Christian Brauner <[email protected]> writes:
>> On Thu, Nov 30, 2023 at 09:17:56AM +0000, Alice Ryhl wrote:
>>> Christian Brauner <[email protected]> writes:
>>>>>>> + /// Prevent values of this type from being moved to a different task.
>>>>>>> + ///
>>>>>>> + /// This is necessary because the C FFI calls assume that `current` is set to the task that
>>>>>>> + /// owns the fd in question.
>>>>>>> + _not_send_sync: PhantomData<*mut ()>,
>>>>>>
>>>>>> I don't fully understand this. Can you explain in a little more detail
>>>>>> what you mean by this and how this works?
>>>>>
>>>>> Yeah, so, this has to do with the Rust trait `Send` that controls
>>>>> whether it's okay for a value to get moved from one thread to another.
>>>>> In this case, we don't want it to be `Send` so that it can't be moved to
>>>>> another thread, since current might be different there.
>>>>>
>>>>> The `Send` trait is automatically applied to structs whenever *all*
>>>>> fields of the struct are `Send`. So to ensure that a struct is not
>>>>> `Send`, you add a field that is not `Send`.
>>>>>
>>>>> The `PhantomData` type used here is a special zero-sized type.
>>>>> Basically, it says "pretend this struct has a field of type `*mut ()`,
>>>>> but don't actually add the field". So for the purposes of `Send`, it has
>>>>> a non-Send field, but since its wrapped in `PhantomData`, the field is
>>>>> not there at runtime.
>>>>
>>>> This probably a stupid suggestion, question. But while PhantomData gives
>>>> the right hint of what is happening I wouldn't mind if that was very
>>>> explicitly called NoSendTrait or just add the explanatory comment. Yes,
>>>> that's a lot of verbiage but you'd help us a lot.
>>>
>>> I suppose we could add a typedef:
>>>
>>> type NoSendTrait = PhantomData<*mut ()>;
>>>
>>> and use that as the field type. The way I did it here is the "standard"
>>> way of doing it, and if you look at code outside the kernel, you will
>>> also find them using `PhantomData` like this. However, I don't mind
>>> adding the typedef if you think it is helpful.
>>
>> I'm fine with just a comment as well. I just need to be able to read
>> this a bit faster. I'm basically losing half a day just dealing with
>> this patchset and that's not realistic if I want to keep up with other
>> patches that get sent.
>>
>> And if you resend and someone else review you might have to answer the
>> same question again.
>
> What do you think about this wording?
>
> /// Prevent values of this type from being moved to a different task.
> ///
> /// This field has the type `PhantomData<*mut ()>`, which does not
> /// implement the Send trait. By adding a field with this property, we
> /// ensure that the `FileDescriptorReservation` struct will not
> /// implement the Send trait either. This has the consequence that the
> /// compiler will prevent you from moving values of type
> /// `FileDescriptorReservation` into a different task, which we want
> /// because other tasks might have a different value of `current`. We
> /// want to avoid that because `fd_install` assumes that the value of
> /// `current` is unchanged since the call to `get_unused_fd_flags`.
> ///
> /// The `PhantomData` type has size zero, so the field does not exist at
> /// runtime.
>
> Alice
I don't think it is a good idea to add this big comment to every
`PhantomData` field. I would much rather have a type alias:
/// Zero-sized type to mark types not [`Send`].
///
/// Add this type as a field to your struct if your type should not be sent to a different task.
/// Since [`Send`] is an auto trait, adding a single field that is [`!Send`] will ensure that the
/// whole type is [`!Send`].
///
/// If a type is [`!Send`] it is impossible to give control over an instance of the type to another
/// task. This is useful when a type stores task-local information for example file descriptors.
pub type NotSend = PhantomData<*mut ()>;
If you have suggestions for improving the doc comment or the name,
please go ahead.
This doesn't mean that there should be no comment on the `NotSend`
field of `FileDescriptorReservation`, but I don't want to repeat
the `Send` stuff all over the place (since it comes up a lot):
/// Ensure that `FileDescriptorReservation` cannot be sent to a different task, since there the
/// value of `current` is different. We want to avoid that because `fd_install` assumes that the
/// value of `current` is unchanged since the call to `get_unused_fd_flags`.
_not_send: NotSend,
--
Cheers,
Benno
On Thu, Nov 30, 2023 at 12:17:14PM +0000, Benno Lossin wrote:
> On 30.11.23 12:54, Alice Ryhl wrote:
> > Christian Brauner <[email protected]> writes:
> >> On Thu, Nov 30, 2023 at 09:17:56AM +0000, Alice Ryhl wrote:
> >>> Christian Brauner <[email protected]> writes:
> >>>>>>> + /// Prevent values of this type from being moved to a different task.
> >>>>>>> + ///
> >>>>>>> + /// This is necessary because the C FFI calls assume that `current` is set to the task that
> >>>>>>> + /// owns the fd in question.
> >>>>>>> + _not_send_sync: PhantomData<*mut ()>,
> >>>>>>
> >>>>>> I don't fully understand this. Can you explain in a little more detail
> >>>>>> what you mean by this and how this works?
> >>>>>
> >>>>> Yeah, so, this has to do with the Rust trait `Send` that controls
> >>>>> whether it's okay for a value to get moved from one thread to another.
> >>>>> In this case, we don't want it to be `Send` so that it can't be moved to
> >>>>> another thread, since current might be different there.
> >>>>>
> >>>>> The `Send` trait is automatically applied to structs whenever *all*
> >>>>> fields of the struct are `Send`. So to ensure that a struct is not
> >>>>> `Send`, you add a field that is not `Send`.
> >>>>>
> >>>>> The `PhantomData` type used here is a special zero-sized type.
> >>>>> Basically, it says "pretend this struct has a field of type `*mut ()`,
> >>>>> but don't actually add the field". So for the purposes of `Send`, it has
> >>>>> a non-Send field, but since its wrapped in `PhantomData`, the field is
> >>>>> not there at runtime.
> >>>>
> >>>> This probably a stupid suggestion, question. But while PhantomData gives
> >>>> the right hint of what is happening I wouldn't mind if that was very
> >>>> explicitly called NoSendTrait or just add the explanatory comment. Yes,
> >>>> that's a lot of verbiage but you'd help us a lot.
> >>>
> >>> I suppose we could add a typedef:
> >>>
> >>> type NoSendTrait = PhantomData<*mut ()>;
> >>>
> >>> and use that as the field type. The way I did it here is the "standard"
> >>> way of doing it, and if you look at code outside the kernel, you will
> >>> also find them using `PhantomData` like this. However, I don't mind
> >>> adding the typedef if you think it is helpful.
> >>
> >> I'm fine with just a comment as well. I just need to be able to read
> >> this a bit faster. I'm basically losing half a day just dealing with
> >> this patchset and that's not realistic if I want to keep up with other
> >> patches that get sent.
> >>
> >> And if you resend and someone else review you might have to answer the
> >> same question again.
> >
> > What do you think about this wording?
> >
> > /// Prevent values of this type from being moved to a different task.
> > ///
> > /// This field has the type `PhantomData<*mut ()>`, which does not
> > /// implement the Send trait. By adding a field with this property, we
> > /// ensure that the `FileDescriptorReservation` struct will not
> > /// implement the Send trait either. This has the consequence that the
> > /// compiler will prevent you from moving values of type
> > /// `FileDescriptorReservation` into a different task, which we want
> > /// because other tasks might have a different value of `current`. We
> > /// want to avoid that because `fd_install` assumes that the value of
> > /// `current` is unchanged since the call to `get_unused_fd_flags`.
> > ///
> > /// The `PhantomData` type has size zero, so the field does not exist at
> > /// runtime.
> >
> > Alice
>
> I don't think it is a good idea to add this big comment to every
> `PhantomData` field. I would much rather have a type alias:
>
> /// Zero-sized type to mark types not [`Send`].
> ///
> /// Add this type as a field to your struct if your type should not be sent to a different task.
> /// Since [`Send`] is an auto trait, adding a single field that is [`!Send`] will ensure that the
> /// whole type is [`!Send`].
> ///
> /// If a type is [`!Send`] it is impossible to give control over an instance of the type to another
> /// task. This is useful when a type stores task-local information for example file descriptors.
> pub type NotSend = PhantomData<*mut ()>;
>
> If you have suggestions for improving the doc comment or the name,
> please go ahead.
>
> This doesn't mean that there should be no comment on the `NotSend`
> field of `FileDescriptorReservation`, but I don't want to repeat
> the `Send` stuff all over the place (since it comes up a lot):
>
> /// Ensure that `FileDescriptorReservation` cannot be sent to a different task, since there the
> /// value of `current` is different. We want to avoid that because `fd_install` assumes that the
> /// value of `current` is unchanged since the call to `get_unused_fd_flags`.
> _not_send: NotSend,
Seems sane to me. But I would suggest to move away from the "send"
terminology?
* CurrentOnly
* AccessCurrentTask vs AccessForeignTask
* NoForeignTaskAccess
* TaskLocalContext
* TaskCurrentAccess
Or some other variant thereof.
On Wed, Nov 29, 2023 at 05:48:15PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 29, 2023 at 05:28:27PM +0100, Christian Brauner wrote:
>
> > > +pid_t rust_helper_task_tgid_nr_ns(struct task_struct *tsk,
> > > + struct pid_namespace *ns)
> > > +{
> > > + return task_tgid_nr_ns(tsk, ns);
> > > +}
> > > +EXPORT_SYMBOL_GPL(rust_helper_task_tgid_nr_ns);
> >
> > I'm a bit puzzled by all these rust_helper_*() calls. Can you explain
> > why they are needed? Because they are/can be static inlines and that
> > somehow doesn't work?
>
> Correct, because Rust can only talk to C ABI, it cannot use C headers.
> Bindgen would need to translate the full C headers into valid Rust for
> that to work.
>
> I really think the Rust peoples should spend more effort on that,
> because you are quite right, all this wrappery is tedious at best.
The problem is that we end up with a long list of explicit exports that
also are all really weirdly named like rust_helper_*(). I wouldn't even
complain if it they were somehow auto-generated but as you say that
might be out of scope.
The thing is though that if I want to change the static inlines I know
also have to very likely care about these explicit Rust wrappers which
seems less than ideal.
So if we could not do rust_helper_*() exports we'd probably be better
off.
On 11/29/23 13:51, Alice Ryhl wrote:
> + /// Returns the credentials of the task that originally opened the file.
> + pub fn cred(&self) -> &Credential {
> + // This `read_volatile` is intended to correspond to a READ_ONCE call.
> + //
> + // SAFETY: The file is valid because the shared reference guarantees a nonzero refcount.
> + //
> + // TODO: Replace with `read_once` when available on the Rust side.
> + let ptr = unsafe { core::ptr::addr_of!((*self.0.get()).f_cred).read_volatile() };
> +
> + // SAFETY: The signature of this function ensures that the caller will only access the
> + // returned credential while the file is still valid, and the credential must stay valid
> + // while the file is valid.
About the last part of this safety comment, is this a guarantee from the
C side? If yes, then I would phrase it that way:
... while the file is still valid, and the C side ensures that the
credentials stay valid while the file is valid.
--
Cheers,
Benno
> + unsafe { Credential::from_ptr(ptr) }
> + }
> +
On 11/29/23 14:11, Alice Ryhl wrote:
> +/// A security context string.
> +///
> +/// The struct has the invariant that it always contains a valid security context.
Refactor to use the `# Invariants` section:
# Invariants
`secdata` points to a valid security context.
I also do not know what a "valid security context" is, so a link to the
definition wouldn't hurt.
> +pub struct SecurityCtx {
> + secdata: *mut core::ffi::c_char,
> + seclen: usize,
> +}
> +
> +impl SecurityCtx {
> + /// Get the security context given its id.
> + pub fn from_secid(secid: u32) -> Result<Self> {
> + let mut secdata = core::ptr::null_mut();
> + let mut seclen = 0;
> + // SAFETY: Just a C FFI call. The pointers are valid for writes.
> + unsafe {
> + to_result(bindings::security_secid_to_secctx(
> + secid,
> + &mut secdata,
> + &mut seclen,
> + ))?;
> + }
> +
> + // If the above call did not fail, then we have a valid security
> + // context, so the invariants are not violated.
Should be tagged `INVARIANT`.
> + Ok(Self {
> + secdata,
> + seclen: usize::try_from(seclen).unwrap(),
> + })
> + }
> +
> + /// Returns whether the security context is empty.
> + pub fn is_empty(&self) -> bool {
> + self.seclen == 0
> + }
> +
> + /// Returns the length of this security context.
> + pub fn len(&self) -> usize {
> + self.seclen
> + }
> +
> + /// Returns the bytes for this security context.
> + pub fn as_bytes(&self) -> &[u8] {
> + let mut ptr = self.secdata;
> + if ptr.is_null() {
> + // Many C APIs will use null pointers for strings of length zero, but
I would just write that the secctx API uses null pointers to denote a
string of length zero.
> + // `slice::from_raw_parts` doesn't allow the pointer to be null even if the length is
> + // zero. Replace the pointer with a dangling but non-null pointer in this case.
> + debug_assert_eq!(self.seclen, 0);
I am feeling a bit uncomfortable with this, why can't we just return
an empty slice in this case?
> + ptr = core::ptr::NonNull::dangling().as_ptr();
> + }
> +
> + // SAFETY: The call to `security_secid_to_secctx` guarantees that the pointer is valid for
> + // `seclen` bytes. Furthermore, if the length is zero, then we have ensured that the
> + // pointer is not null.
> + unsafe { core::slice::from_raw_parts(ptr.cast(), self.seclen) }
> + }
> +}
> +
> +impl Drop for SecurityCtx {
> + fn drop(&mut self) {
> + // SAFETY: This frees a pointer that came from a successful call to
> + // `security_secid_to_secctx`.
This should be part of the type invariant.
--
Cheers,
Benno
> + unsafe {
> + bindings::security_release_secctx(self.secdata, self.seclen as u32);
> + }
> + }
> +}
> --
> 2.43.0.rc1.413.gea7ed67945-goog
>
On 11/29/23 14:11, Alice Ryhl wrote:
> +impl FileDescriptorReservation {
> + /// Creates a new file descriptor reservation.
> + pub fn new(flags: u32) -> Result<Self> {
> + // SAFETY: FFI call, there are no safety requirements on `flags`.
> + let fd: i32 = unsafe { bindings::get_unused_fd_flags(flags) };
> + if fd < 0 {
> + return Err(Error::from_errno(fd));
> + }
I think here we could also use the modified `to_result` function that
returns a `u32` if the value is non-negative.
> + Ok(Self {
> + fd: fd as _,
> + _not_send_sync: PhantomData,
> + })
> + }
> +
> + /// Returns the file descriptor number that was reserved.
> + pub fn reserved_fd(&self) -> u32 {
> + self.fd
> + }
> +
> + /// Commits the reservation.
> + ///
> + /// The previously reserved file descriptor is bound to `file`. This method consumes the
> + /// [`FileDescriptorReservation`], so it will not be usable after this call.
> + pub fn commit(self, file: ARef<File>) {
> + // SAFETY: `self.fd` was previously returned by `get_unused_fd_flags`, and `file.ptr` is
> + // guaranteed to have an owned ref count by its type invariants.
> + unsafe { bindings::fd_install(self.fd, file.0.get()) };
> +
> + // `fd_install` consumes both the file descriptor and the file reference, so we cannot run
> + // the destructors.
> + core::mem::forget(self);
> + core::mem::forget(file);
Would be useful to have an `ARef::into_raw` function that would do
the `forget` for us.
--
Cheers,
Benno
On 11/29/23 14:12, Alice Ryhl wrote:
> + /// Returns the given task's pid in the current pid namespace.
> + pub fn pid_in_current_ns(&self) -> Pid {
> + // SAFETY: We know that `self.0.get()` is valid by the type invariant. The rest is just FFI
> + // calls.
> + unsafe {
> + let namespace = bindings::task_active_pid_ns(bindings::get_current());
> + bindings::task_tgid_nr_ns(self.0.get(), namespace)
> + }
I would split this into two `unsafe` blocks.
> + }
> +
> /// Wakes up the task.
> pub fn wake_up(&self) {
> // SAFETY: By the type invariant, we know that `self.0.get()` is non-null and valid.
> @@ -147,6 +180,42 @@ pub fn wake_up(&self) {
> }
> }
>
> +impl Kuid {
> + /// Get the current euid.
> + pub fn current_euid() -> Kuid {
> + // SAFETY: Just an FFI call.
> + Self {
> + kuid: unsafe { bindings::current_euid() },
> + }
Would expect a call to `from_raw` here instead of `Self {}`.
> + }
> +
> + /// Create a `Kuid` given the raw C type.
> + pub fn from_raw(kuid: bindings::kuid_t) -> Self {
> + Self { kuid }
> + }
Is there a reason that this is named `from_raw` and not just a normal
`From` impl? AFAICT any `bindings::kuid_t` is a valid `Kuid`.
> +
> + /// Turn this kuid into the raw C type.
> + pub fn into_raw(self) -> bindings::kuid_t {
> + self.kuid
> + }
> +
> + /// Converts this kernel UID into a UID that userspace understands. Uses the namespace of the
> + /// current task.
Why not:
/// Converts this kernel UID into a userspace UID.
///
/// Uses the namespace of the current task.
--
Cheers,
Benno
> + pub fn into_uid_in_current_ns(self) -> bindings::uid_t {
> + // SAFETY: Just an FFI call.
> + unsafe { bindings::from_kuid(bindings::current_user_ns(), self.kuid) }
> + }
> +}
On 11/29/23 14:12, Alice Ryhl wrote:
> + /// Schedule a task work that closes the file descriptor when this task returns to userspace.
> + pub fn close_fd(mut self, fd: u32) {
> + use bindings::task_work_notify_mode_TWA_RESUME as TWA_RESUME;
> +
> + let file = unsafe { bindings::close_fd_get_file(fd) };
> + if file.is_null() {
> + // Nothing further to do. The allocation is freed by the destructor of `self.inner`.
> + return;
> + }
> +
> + self.inner.file = file;
> +
> + // SAFETY: Since `DeferredFdCloserInner` is `#[repr(C)]`, casting the pointers gives a
> + // pointer to the `twork` field.
> + let inner = Box::into_raw(self.inner) as *mut bindings::callback_head;
Here you can just use `.cast::<...>()`.
> + // SAFETY: Getting a pointer to current is always safe.
> + let current = unsafe { bindings::get_current() };
> + // SAFETY: The `file` pointer points at a valid file.
> + unsafe { bindings::get_file(file) };
> + // SAFETY: Due to the above `get_file`, even if the current task holds an `fdget` to
> + // this file right now, the refcount will not drop to zero until after it is released
> + // with `fdput`. This is because when using `fdget`, you must always use `fdput` before
> + // returning to userspace, and our task work runs after any `fdget` users have returned
> + // to userspace.
> + //
> + // Note: fl_owner_t is currently a void pointer.
> + unsafe { bindings::filp_close(file, (*current).files as bindings::fl_owner_t) };
> + // SAFETY: The `inner` pointer is compatible with the `do_close_fd` method.
> + unsafe { bindings::init_task_work(inner, Some(Self::do_close_fd)) };
> + // SAFETY: The `inner` pointer points at a valid and fully initialized task work that is
> + // ready to be scheduled.
> + unsafe { bindings::task_work_add(current, inner, TWA_RESUME) };
I am a bit confused, when does `do_close_fd` actually run? Does
`TWA_RESUME` mean that `inner` is scheduled to run after the current
task has been completed?
> + }
> +
> + // SAFETY: This function is an implementation detail of `close_fd`, so its safety comments
> + // should be read in extension of that method.
> + unsafe extern "C" fn do_close_fd(inner: *mut bindings::callback_head) {
> + // SAFETY: In `close_fd` we use this method together with a pointer that originates from a
> + // `Box<DeferredFdCloserInner>`, and we have just been given ownership of that allocation.
> + let inner = unsafe { Box::from_raw(inner as *mut DeferredFdCloserInner) };
In order for this call to be sound, `inner` must be an exclusive
pointer (including any possible references into the `callback_head`).
Is this the case?
--
Cheers,
Benno
> + // SAFETY: This drops a refcount we acquired in `close_fd`. Since this callback runs in a
> + // task work after we return to userspace, it is guaranteed that the current thread doesn't
> + // hold this file with `fdget`, as `fdget` must be released before returning to userspace.
> + unsafe { bindings::fput(inner.file) };
> + // Free the allocation.
> + drop(inner);
> + }
> +}
> +
> /// Represents the `EBADF` error code.
> ///
> /// Used for methods that can only fail with `EBADF`.
>
> --
> 2.43.0.rc1.413.gea7ed67945-goog
>
On 11/29/23 14:12, Alice Ryhl wrote:
> diff --git a/rust/kernel/file.rs b/rust/kernel/file.rs
> index 578ee307093f..35576678c993 100644
> --- a/rust/kernel/file.rs
> +++ b/rust/kernel/file.rs
> @@ -14,6 +14,9 @@
> use alloc::boxed::Box;
> use core::{alloc::AllocError, marker::PhantomData, mem, ptr};
>
> +mod poll_table;
> +pub use self::poll_table::{PollCondVar, PollTable};
I think it makes more sense to put it under `rust/kernel/sync/`.
> + fn get_qproc(&self) -> bindings::poll_queue_proc {
> + let ptr = self.0.get();
> + // SAFETY: The `ptr` is valid because it originates from a reference, and the `_qproc`
> + // field is not modified concurrently with this call.
What ensures this? Maybe use a type invariant?
> + unsafe { (*ptr)._qproc }
> + }
[...]
> +impl PollCondVar {
> + /// Constructs a new condvar initialiser.
> + #[allow(clippy::new_ret_no_self)]
This is no longer needed, as Gary fixed this, see [1].
[1]: https://github.com/rust-lang/rust-clippy/issues/7344
> + pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self> {
> + pin_init!(Self {
> + inner <- CondVar::new(name, key),
> + })
> + }
> +}
> +
> +// Make the `CondVar` methods callable on `PollCondVar`.
> +impl Deref for PollCondVar {
> + type Target = CondVar;
> +
> + fn deref(&self) -> &CondVar {
> + &self.inner
> + }
> +}
> +
> +#[pinned_drop]
> +impl PinnedDrop for PollCondVar {
> + fn drop(self: Pin<&mut Self>) {
> + // Clear anything registered using `register_wait`.
> + self.inner.notify(1, bindings::POLLHUP | bindings::POLLFREE);
Isn't notifying only a single thread problematic, since a user could
misuse the `PollCondVar` (since all functions of `CondVar` are also
accessible) and also `.wait()` on the condvar? When dropping a
`PollCondVar` it might notify only the user `.wait()`, but not the
`PollTable`. Or am I missing something?
--
Cheers,
Benno
> + // Wait for epoll items to be properly removed.
> + //
> + // SAFETY: Just an FFI call.
> + unsafe { bindings::synchronize_rcu() };
> + }
> +}
On Wed, Nov 29, 2023 at 01:12:51PM +0000, Alice Ryhl wrote:
> The existing `CondVar` abstraction is a wrapper around `wait_list`, but
> it does not support all use-cases of the C `wait_list` type. To be
> specific, a `CondVar` cannot be registered with a `struct poll_table`.
> This limitation has the advantage that you do not need to call
> `synchronize_rcu` when destroying a `CondVar`.
>
> However, we need the ability to register a `poll_table` with a
> `wait_list` in Rust Binder. To enable this, introduce a type called
> `PollCondVar`, which is like `CondVar` except that you can register a
> `poll_table`. We also introduce `PollTable`, which is a safe wrapper
> around `poll_table` that is intended to be used with `PollCondVar`.
>
> The destructor of `PollCondVar` unconditionally calls `synchronize_rcu`
> to ensure that the removal of epoll waiters has fully completed before
> the `wait_list` is destroyed.
>
> Signed-off-by: Alice Ryhl <[email protected]>
> ---
> That said, `synchronize_rcu` is rather expensive and is not needed in
> all cases: If we have never registered a `poll_table` with the
> `wait_list`, then we don't need to call `synchronize_rcu`. (And this is
> a common case in Binder - not all processes use Binder with epoll.) The
> current implementation does not account for this, but we could change it
> to store a boolean next to the `wait_list` to keep track of whether a
> `poll_table` has ever been registered. It is up to discussion whether
> this is desireable.
>
> It is not clear to me whether we can implement the above without storing
> an extra boolean. We could check whether the `wait_list` is empty, but
> it is not clear that this is sufficient. Perhaps someone knows the
> answer? If a `poll_table` has previously been registered with a
That won't be sufficient, considering this:
CPU 0 CPU 1
ep_remove_wait_queue():
whead = smp_load_acquire(&pwq->whead); // whead is not NULL
PollCondVar::drop():
self.inner.notify():
<for each wait entry in the list>
ep_poll_callback():
<remove wait entry>
smp_store_release(&ep_pwq_from_wait(wait)->whead, NULL);
<lock the waitqueue>
waitqueue_active() // return false, since the queue is emtpy
<unlock>
...
<free the waitqueue>
if (whead) {
remove_wait_queue(whead, &pwq->wait); // Use-after-free BOOM!
}
Note that moving the `wait_list` empty checking before
`self.inner.notify()` won't change the result, since there might be a
`notify` called by users before `PollCondVar::drop()`, hence the same
result.
Regards,
Boqun
> `wait_list`, is it the case that we can kfree the `wait_list` after
> observing that the `wait_list` is empty without waiting for an rcu grace
> period?
>
[...]
On Wed, Nov 29, 2023 at 01:12:51PM +0000, Alice Ryhl wrote:
[...]
> +// Make the `CondVar` methods callable on `PollCondVar`.
> +impl Deref for PollCondVar {
> + type Target = CondVar;
> +
> + fn deref(&self) -> &CondVar {
> + &self.inner
> + }
> +}
I generally think we should avoid using Deref for "subclass pattern" due
to the potential confusion for the code readers (of the deref() usage).
Would it be possible we start with `impl AsRef<CondVar>`?
Thanks!
Regards,
Boqun
Benno Lossin <[email protected]> writes:
> On 11/29/23 13:51, Alice Ryhl wrote:
>> + /// Returns the credentials of the task that originally opened the file.
>> + pub fn cred(&self) -> &Credential {
>> + // This `read_volatile` is intended to correspond to a READ_ONCE call.
>> + //
>> + // SAFETY: The file is valid because the shared reference guarantees a nonzero refcount.
>> + //
>> + // TODO: Replace with `read_once` when available on the Rust side.
>> + let ptr = unsafe { core::ptr::addr_of!((*self.0.get()).f_cred).read_volatile() };
>> +
>> + // SAFETY: The signature of this function ensures that the caller will only access the
>> + // returned credential while the file is still valid, and the credential must stay valid
>> + // while the file is valid.
>
> About the last part of this safety comment, is this a guarantee from the
> C side? If yes, then I would phrase it that way:
>
> ... while the file is still valid, and the C side ensures that the
> credentials stay valid while the file is valid.
Yes, that's my intention with this code.
But I guess this is a good question for Christian Brauner to confirm:
If I read the credential from the `f_cred` field, is it guaranteed that
the pointer remains valid for at least as long as the file?
Or should I do some dance along the lines of "lock file, increment
refcount on credential, unlock file"?
Alice
On Fri, Dec 01, 2023 at 09:06:35AM +0000, Alice Ryhl wrote:
> Benno Lossin <[email protected]> writes:
> > On 11/29/23 13:51, Alice Ryhl wrote:
> >> + /// Returns the credentials of the task that originally opened the file.
> >> + pub fn cred(&self) -> &Credential {
> >> + // This `read_volatile` is intended to correspond to a READ_ONCE call.
> >> + //
> >> + // SAFETY: The file is valid because the shared reference guarantees a nonzero refcount.
> >> + //
> >> + // TODO: Replace with `read_once` when available on the Rust side.
> >> + let ptr = unsafe { core::ptr::addr_of!((*self.0.get()).f_cred).read_volatile() };
> >> +
> >> + // SAFETY: The signature of this function ensures that the caller will only access the
> >> + // returned credential while the file is still valid, and the credential must stay valid
> >> + // while the file is valid.
> >
> > About the last part of this safety comment, is this a guarantee from the
> > C side? If yes, then I would phrase it that way:
> >
> > ... while the file is still valid, and the C side ensures that the
> > credentials stay valid while the file is valid.
>
> Yes, that's my intention with this code.
>
> But I guess this is a good question for Christian Brauner to confirm:
>
> If I read the credential from the `f_cred` field, is it guaranteed that
> the pointer remains valid for at least as long as the file?
>
> Or should I do some dance along the lines of "lock file, increment
> refcount on credential, unlock file"?
The lifetime of the f_cred reference is at least as long as the lifetime
of the file:
// file not yet visible anywhere
some_file = alloc_file*()
-> init_file()
{
file->f_cred = get_cred(cred /* usually current_cred() */)
}
// install into fd_table -> irreversible, thing visible, possibly shared
fd_install(1234, some_file)
// last fput
fput()
// atomic_dec_and_test() dance:
-> file_free() // either "delayed" through task work, workqueue, or
// sometimes freed right away if file hasn't been opened,
// i.e., if fd_install() wasn't called
-> put_cred(file->f_cred)
In order to access anything you must hold a reference to the file or
files->file_lock. IOW, no poking around in f->f_cred or any field for
that matter just under rcu_read_lock() for example. Because files are
SLAB_TYPESAFE_BY_RCU. You might be poking in someone else's creds then.
Benno Lossin <[email protected]> writes:
> On 11/29/23 14:11, Alice Ryhl wrote:
>> + /// Returns the bytes for this security context.
>> + pub fn as_bytes(&self) -> &[u8] {
>> + let mut ptr = self.secdata;
>> + if ptr.is_null() {
>> + // Many C APIs will use null pointers for strings of length zero, but
>
> I would just write that the secctx API uses null pointers to denote a
> string of length zero.
I don't actually know whether it can ever be null, I just wanted to stay
on the safe side.
>> + // `slice::from_raw_parts` doesn't allow the pointer to be null even if the length is
>> + // zero. Replace the pointer with a dangling but non-null pointer in this case.
>> + debug_assert_eq!(self.seclen, 0);
>
> I am feeling a bit uncomfortable with this, why can't we just return
> an empty slice in this case?
I can do that, but to be clear, what I'm doing here is also definitely
okay.
Alice
Benno Lossin <[email protected]> writes:
> On 11/29/23 14:11, Alice Ryhl wrote:
> > +impl FileDescriptorReservation {
> > + /// Creates a new file descriptor reservation.
> > + pub fn new(flags: u32) -> Result<Self> {
> > + // SAFETY: FFI call, there are no safety requirements on `flags`.
> > + let fd: i32 = unsafe { bindings::get_unused_fd_flags(flags) };
> > + if fd < 0 {
> > + return Err(Error::from_errno(fd));
> > + }
>
> I think here we could also use the modified `to_result` function that
> returns a `u32` if the value is non-negative.
I'll look into that for the next version.
>> + /// Commits the reservation.
>> + ///
>> + /// The previously reserved file descriptor is bound to `file`. This method consumes the
>> + /// [`FileDescriptorReservation`], so it will not be usable after this call.
>> + pub fn commit(self, file: ARef<File>) {
>> + // SAFETY: `self.fd` was previously returned by `get_unused_fd_flags`, and `file.ptr` is
>> + // guaranteed to have an owned ref count by its type invariants.
>> + unsafe { bindings::fd_install(self.fd, file.0.get()) };
>> +
>> + // `fd_install` consumes both the file descriptor and the file reference, so we cannot run
>> + // the destructors.
>> + core::mem::forget(self);
>> + core::mem::forget(file);
>
> Would be useful to have an `ARef::into_raw` function that would do
> the `forget` for us.
That makes sense to me, but I don't think it needs to happen in this patchset.
Alice
Benno Lossin <[email protected]> writes:
>> + // SAFETY: The `inner` pointer points at a valid and fully initialized task work that is
>> + // ready to be scheduled.
>> + unsafe { bindings::task_work_add(current, inner, TWA_RESUME) };
>
> I am a bit confused, when does `do_close_fd` actually run? Does
> `TWA_RESUME` mean that `inner` is scheduled to run after the current
> task has been completed?
When the current syscall returns to userspace.
>> + // SAFETY: This function is an implementation detail of `close_fd`, so its safety comments
>> + // should be read in extension of that method.
>> + unsafe extern "C" fn do_close_fd(inner: *mut bindings::callback_head) {
>> + // SAFETY: In `close_fd` we use this method together with a pointer that originates from a
>> + // `Box<DeferredFdCloserInner>`, and we have just been given ownership of that allocation.
>> + let inner = unsafe { Box::from_raw(inner as *mut DeferredFdCloserInner) };
>
> In order for this call to be sound, `inner` must be an exclusive
> pointer (including any possible references into the `callback_head`).
> Is this the case?
Yes, when this is called, it's been removed from the linked list of task
work. That's why we can kfree it.
>> + // SAFETY: Since `DeferredFdCloserInner` is `#[repr(C)]`, casting the pointers gives a
>> + // pointer to the `twork` field.
>> + let inner = Box::into_raw(self.inner) as *mut bindings::callback_head;
>
> Here you can just use `.cast::<...>()`.
Will do.
Alice
Boqun Feng <[email protected]> writes:
>> That said, `synchronize_rcu` is rather expensive and is not needed in
>> all cases: If we have never registered a `poll_table` with the
>> `wait_list`, then we don't need to call `synchronize_rcu`. (And this is
>> a common case in Binder - not all processes use Binder with epoll.) The
>> current implementation does not account for this, but we could change it
>> to store a boolean next to the `wait_list` to keep track of whether a
>> `poll_table` has ever been registered. It is up to discussion whether
>> this is desireable.
>>
>> It is not clear to me whether we can implement the above without storing
>> an extra boolean. We could check whether the `wait_list` is empty, but
>> it is not clear that this is sufficient. Perhaps someone knows the
>> answer? If a `poll_table` has previously been registered with a
>
> That won't be sufficient, considering this:
>
> CPU 0 CPU 1
> ep_remove_wait_queue():
> whead = smp_load_acquire(&pwq->whead); // whead is not NULL
> PollCondVar::drop():
> self.inner.notify():
> <for each wait entry in the list>
> ep_poll_callback():
> <remove wait entry>
> smp_store_release(&ep_pwq_from_wait(wait)->whead, NULL);
> <lock the waitqueue>
> waitqueue_active() // return false, since the queue is emtpy
> <unlock>
> ...
> <free the waitqueue>
> if (whead) {
> remove_wait_queue(whead, &pwq->wait); // Use-after-free BOOM!
> }
>
>
> Note that moving the `wait_list` empty checking before
> `self.inner.notify()` won't change the result, since there might be a
> `notify` called by users before `PollCondVar::drop()`, hence the same
> result.
>
> Regards,
> Boqun
Thank you for confirming my suspicion.
Alice
On 12/1/23 11:48, Alice Ryhl wrote:
> Benno Lossin <[email protected]> writes:
>> On 11/29/23 14:11, Alice Ryhl wrote:
>>> + /// Returns the bytes for this security context.
>>> + pub fn as_bytes(&self) -> &[u8] {
>>> + let mut ptr = self.secdata;
>>> + if ptr.is_null() {
>>> + // Many C APIs will use null pointers for strings of length zero, but
>>
>> I would just write that the secctx API uses null pointers to denote a
>> string of length zero.
>
> I don't actually know whether it can ever be null, I just wanted to stay
> on the safe side.
I see, can someone from the C side confirm/refute this?
I found the comment a bit weird, since it is phrased in a general way.
If it turns out that the pointer can never be null, maybe use `NonNull`
instead (I would then also move the length check into the constructor)?
You can probably also do this if the pointer is allowed to be null,
assuming that you then do not need to call `security_release_secctx`.
>>> + // `slice::from_raw_parts` doesn't allow the pointer to be null even if the length is
>>> + // zero. Replace the pointer with a dangling but non-null pointer in this case.
>>> + debug_assert_eq!(self.seclen, 0);
>>
>> I am feeling a bit uncomfortable with this, why can't we just return
>> an empty slice in this case?
>
> I can do that, but to be clear, what I'm doing here is also definitely
> okay.
Yes it is okay, but I see this similar to avoiding `unsafe` code when it
can be done safely. In this example we are not strictly avoiding any
`unsafe` code, but we are avoiding a codepath with `unsafe` code. You
should of course still keep the `debug_assert_eq`.
--
Cheers,
Benno
On 12/1/23 12:35, Alice Ryhl wrote:
> Benno Lossin <[email protected]> writes:
>>> + // SAFETY: The `inner` pointer points at a valid and fully initialized task work that is
>>> + // ready to be scheduled.
>>> + unsafe { bindings::task_work_add(current, inner, TWA_RESUME) };
>>
>> I am a bit confused, when does `do_close_fd` actually run? Does
>> `TWA_RESUME` mean that `inner` is scheduled to run after the current
>> task has been completed?
>
> When the current syscall returns to userspace.
What happens when I use `DeferredFdCloser` outside of a syscall? Will
it never run? Maybe add some documentation about that?
>>> + // SAFETY: This function is an implementation detail of `close_fd`, so its safety comments
>>> + // should be read in extension of that method.
>>> + unsafe extern "C" fn do_close_fd(inner: *mut bindings::callback_head) {
>>> + // SAFETY: In `close_fd` we use this method together with a pointer that originates from a
>>> + // `Box<DeferredFdCloserInner>`, and we have just been given ownership of that allocation.
>>> + let inner = unsafe { Box::from_raw(inner as *mut DeferredFdCloserInner) };
>>
>> In order for this call to be sound, `inner` must be an exclusive
>> pointer (including any possible references into the `callback_head`).
>> Is this the case?
>
> Yes, when this is called, it's been removed from the linked list of task
> work. That's why we can kfree it.
Please add this to the SAFETY comment.
--
Cheers,
Benno
Christian Brauner <[email protected]> writes:
> On Fri, Dec 01, 2023 at 09:06:35AM +0000, Alice Ryhl wrote:
> > Benno Lossin <[email protected]> writes:
> > > On 11/29/23 13:51, Alice Ryhl wrote:
> > >> + /// Returns the credentials of the task that originally opened the file.
> > >> + pub fn cred(&self) -> &Credential {
> > >> + // This `read_volatile` is intended to correspond to a READ_ONCE call.
> > >> + //
> > >> + // SAFETY: The file is valid because the shared reference guarantees a nonzero refcount.
> > >> + //
> > >> + // TODO: Replace with `read_once` when available on the Rust side.
> > >> + let ptr = unsafe { core::ptr::addr_of!((*self.0.get()).f_cred).read_volatile() };
> > >> +
> > >> + // SAFETY: The signature of this function ensures that the caller will only access the
> > >> + // returned credential while the file is still valid, and the credential must stay valid
> > >> + // while the file is valid.
> > >
> > > About the last part of this safety comment, is this a guarantee from the
> > > C side? If yes, then I would phrase it that way:
> > >
> > > ... while the file is still valid, and the C side ensures that the
> > > credentials stay valid while the file is valid.
> >
> > Yes, that's my intention with this code.
> >
> > But I guess this is a good question for Christian Brauner to confirm:
> >
> > If I read the credential from the `f_cred` field, is it guaranteed that
> > the pointer remains valid for at least as long as the file?
> >
> > Or should I do some dance along the lines of "lock file, increment
> > refcount on credential, unlock file"?
>
> The lifetime of the f_cred reference is at least as long as the lifetime
> of the file:
>
> // file not yet visible anywhere
> some_file = alloc_file*()
> -> init_file()
> {
> file->f_cred = get_cred(cred /* usually current_cred() */)
> }
>
>
> // install into fd_table -> irreversible, thing visible, possibly shared
> fd_install(1234, some_file)
>
> // last fput
> fput()
> // atomic_dec_and_test() dance:
> -> file_free() // either "delayed" through task work, workqueue, or
> // sometimes freed right away if file hasn't been opened,
> // i.e., if fd_install() wasn't called
> -> put_cred(file->f_cred)
>
> In order to access anything you must hold a reference to the file or
> files->file_lock. IOW, no poking around in f->f_cred or any field for
> that matter just under rcu_read_lock() for example. Because files are
> SLAB_TYPESAFE_BY_RCU. You might be poking in someone else's creds then.
Okay, we aren't dealing with the rcu case in this patchset, so we know
that it wont be freed while we're accessing it.
I guess this means that the `f_cred` field is immutable, which means
that I don't need READ_ONCE to read it? I'll use an ordinary load in the
next version.
Alice
Benno Lossin <[email protected]> writes:
> On 12/1/23 12:35, Alice Ryhl wrote:
>> Benno Lossin <[email protected]> writes:
>>>> + // SAFETY: The `inner` pointer points at a valid and fully initialized task work that is
>>>> + // ready to be scheduled.
>>>> + unsafe { bindings::task_work_add(current, inner, TWA_RESUME) };
>>>
>>> I am a bit confused, when does `do_close_fd` actually run? Does
>>> `TWA_RESUME` mean that `inner` is scheduled to run after the current
>>> task has been completed?
>>
>> When the current syscall returns to userspace.
>
> What happens when I use `DeferredFdCloser` outside of a syscall? Will
> it never run? Maybe add some documentation about that?
Christian Brauner, I think I need your help here.
I spent a bunch of time today trying to understand the correct way of
closing an fd held with fdget, and I'm unsure what the best way is.
So, first, `task_work_add` only really works when we're called from a
syscall. For one, it's fallible, and for another, you shouldn't even
attempt to use it from a kthread. (See e.g., the implementation of
`fput` in `fs/file_table.c`.)
To handle the above, we could fall back to the workqueue and schedule
the `fput` there when we are on a kthread or `task_work_add` fails. And
since I don't really care about the performance of this utility, let's
say we just unconditionally use the workqueue to simplify the
implementation.
However, it's not clear to me that this is okay. Consider this
execution: (please compare to `binder_deferred_fd_close`)
Thread A Thread B (workqueue)
fdget()
close_fd_get_file()
get_file()
filp_close()
schedule_work(do_close_fd)
// we are preempted
fput()
fdput()
And now, since the workqueue can run before thread A returns to
userspace, we are in trouble again, right? Unless I missed an upgrade
to shared file descriptor somewhere that somehow makes this okay? I
looked around the C code and couldn't find one and I guess such an
upgrade has to happen before the call to `fdget` anyway?
In Binder, the above is perfectly fine since it closes the fd from a
context where `task_work_add` will always work, and a task work
definitely runs after the `fdput`. But I added this as a utility in the
shared kernel crate, and I want to avoid the situation where someone
comes along later and uses it from a kthread, gets the fallback to
workqueue, and then has an UAF due to the previously mentioned
execution...
What do you advise that I do?
Maybe the answer is just that, if you're in a context where it makes
sense to talk about an fd of the current task, then task_work_add will
also definitely work? So if `task_work_add` won't work, then
`close_fd_get_file` will return a null pointer and we never reach the
`task_work_add`. This seems fragile though.
Alice
On Tue, Dec 5, 2023 at 3:43 PM Alice Ryhl <[email protected]> wrote:
>
> Benno Lossin <[email protected]> writes:
> > On 12/1/23 12:35, Alice Ryhl wrote:
> >> Benno Lossin <[email protected]> writes:
> >>>> + // SAFETY: The `inner` pointer points at a valid and fully initialized task work that is
> >>>> + // ready to be scheduled.
> >>>> + unsafe { bindings::task_work_add(current, inner, TWA_RESUME) };
> >>>
> >>> I am a bit confused, when does `do_close_fd` actually run? Does
> >>> `TWA_RESUME` mean that `inner` is scheduled to run after the current
> >>> task has been completed?
> >>
> >> When the current syscall returns to userspace.
> >
> > What happens when I use `DeferredFdCloser` outside of a syscall? Will
> > it never run? Maybe add some documentation about that?
>
> Christian Brauner, I think I need your help here.
>
> I spent a bunch of time today trying to understand the correct way of
> closing an fd held with fdget, and I'm unsure what the best way is.
>
> So, first, `task_work_add` only really works when we're called from a
> syscall. For one, it's fallible, and for another, you shouldn't even
> attempt to use it from a kthread. (See e.g., the implementation of
> `fput` in `fs/file_table.c`.)
>
> To handle the above, we could fall back to the workqueue and schedule
> the `fput` there when we are on a kthread or `task_work_add` fails. And
> since I don't really care about the performance of this utility, let's
> say we just unconditionally use the workqueue to simplify the
> implementation.
>
> However, it's not clear to me that this is okay. Consider this
> execution: (please compare to `binder_deferred_fd_close`)
>
> Thread A Thread B (workqueue)
> fdget()
> close_fd_get_file()
> get_file()
> filp_close()
> schedule_work(do_close_fd)
> // we are preempted
> fput()
> fdput()
>
> And now, since the workqueue can run before thread A returns to
> userspace, we are in trouble again, right? Unless I missed an upgrade
> to shared file descriptor somewhere that somehow makes this okay? I
> looked around the C code and couldn't find one and I guess such an
> upgrade has to happen before the call to `fdget` anyway?
>
> In Binder, the above is perfectly fine since it closes the fd from a
> context where `task_work_add` will always work, and a task work
> definitely runs after the `fdput`. But I added this as a utility in the
> shared kernel crate, and I want to avoid the situation where someone
> comes along later and uses it from a kthread, gets the fallback to
> workqueue, and then has an UAF due to the previously mentioned
> execution...
>
> What do you advise that I do?
>
> Maybe the answer is just that, if you're in a context where it makes
> sense to talk about an fd of the current task, then task_work_add will
> also definitely work? So if `task_work_add` won't work, then
> `close_fd_get_file` will return a null pointer and we never reach the
> `task_work_add`. This seems fragile though.
>
> Alice
Ah! I realized that there's another option: Report an error if we
can't schedule the task work.
I didn't suggest this originally because I didn't want to leak the
file in the error path, and I couldn't think of anything else sane to
do.
But! We can schedule the task work *first*, then attempt to close the
file. This way, the file doesn't get closed in the error path. And
there's no race condition since the task work is guaranteed to get
scheduled later on the same thread, so there's no way for it to get
executed in between us scheduling it and closing the file.
Thoughts?
Alice
On Thu, Nov 30, 2023 at 01:46:36PM +0100, Christian Brauner wrote:
> On Wed, Nov 29, 2023 at 05:48:15PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 29, 2023 at 05:28:27PM +0100, Christian Brauner wrote:
> >
> > > > +pid_t rust_helper_task_tgid_nr_ns(struct task_struct *tsk,
> > > > + struct pid_namespace *ns)
> > > > +{
> > > > + return task_tgid_nr_ns(tsk, ns);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(rust_helper_task_tgid_nr_ns);
> > >
> > > I'm a bit puzzled by all these rust_helper_*() calls. Can you explain
> > > why they are needed? Because they are/can be static inlines and that
> > > somehow doesn't work?
> >
> > Correct, because Rust can only talk to C ABI, it cannot use C headers.
> > Bindgen would need to translate the full C headers into valid Rust for
> > that to work.
> >
> > I really think the Rust peoples should spend more effort on that,
> > because you are quite right, all this wrappery is tedious at best.
I suspect even if the manpower existed to go that route we'd end up
regretting it, because then the Rust compiler would need to be able to
handle _all_ the craziness a modern C compiler knows how to do -
preprocessor magic/devilry isn't even the worst of it, it gets even
worse when you start to consider things like bitfields and all the crazy
__attributes__(()) people have invented.
Swift went that route, but they have Apple funding them, and I doubt
even they would want anything to do with Linux kernel C.
IOW: yes, the extra friction from not being able to do full C -> Rust
translation is annoying now, but probably a good thing in the long run.
> The problem is that we end up with a long list of explicit exports that
> also are all really weirdly named like rust_helper_*(). I wouldn't even
> complain if it they were somehow auto-generated but as you say that
> might be out of scope.
I think we'd need help from the C side to auto generate them - what we
really want is for them to be inline, not static inline, but of course
that has never really worked for functions used across a single C file.
But maybe C compiler people are smarter these days?
Just a keyword to to tell the C compiler "take this static inline and
generate a compiled version in this .c file" would be all we need.
I could see it being handy for other things, too: as Linus has been
saying, we tend to inline too much code these days, and part of the
reason for that is we make a function inline because of the _one_
fastpath that needs it, but there's 3 more slowpaths that don't.
And right now we don't have any sane way of having a function be
available with both inlined and outlined versions, besides the same kind
of manual wrappers the Rust people are doing here... so we should
probably just fix that.
On Thu, Nov 30, 2023 at 11:36:35AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 29, 2023 at 01:12:17PM +0000, Alice Ryhl wrote:
>
> > diff --git a/rust/helpers.c b/rust/helpers.c
> > index fd633d9db79a..58e3a9dff349 100644
> > --- a/rust/helpers.c
> > +++ b/rust/helpers.c
> > @@ -142,6 +142,51 @@ void rust_helper_put_task_struct(struct task_struct *t)
> > }
> > EXPORT_SYMBOL_GPL(rust_helper_put_task_struct);
> >
> > +kuid_t rust_helper_task_uid(struct task_struct *task)
> > +{
> > + return task_uid(task);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_task_uid);
> > +
> > +kuid_t rust_helper_task_euid(struct task_struct *task)
> > +{
> > + return task_euid(task);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_task_euid);
>
> Aren't these like ideal speculation gadgets? And shouldn't we avoid
> functions like this for exactly that reason?
I think asking the Rust people to care about that is probably putting
too many constraints on them, unless you actually have an idea for
something better to do...
(loudly giving the CPU manufacturers the middle finger for making _all_
of us deal with this bullshit)
On Wed, Nov 29, 2023 at 05:31:44PM +0100, Christian Brauner wrote:
> On Wed, Nov 29, 2023 at 12:51:06PM +0000, Alice Ryhl wrote:
> > This patchset contains the file abstractions needed by the Rust
> > implementation of the Binder driver.
> >
> > Please see the Rust Binder RFC for usage examples:
> > https://lore.kernel.org/rust-for-linux/[email protected]/
> >
> > Users of "rust: file: add Rust abstraction for `struct file`":
> > [PATCH RFC 02/20] rust_binder: add binderfs support to Rust binder
> > [PATCH RFC 03/20] rust_binder: add threading support
> >
> > Users of "rust: cred: add Rust abstraction for `struct cred`":
> > [PATCH RFC 05/20] rust_binder: add nodes and context managers
> > [PATCH RFC 06/20] rust_binder: add oneway transactions
> > [PATCH RFC 11/20] rust_binder: send nodes in transaction
> > [PATCH RFC 13/20] rust_binder: add BINDER_TYPE_FD support
> >
> > Users of "rust: security: add abstraction for security_secid_to_secctx":
> > [PATCH RFC 06/20] rust_binder: add oneway transactions
> >
> > Users of "rust: file: add `FileDescriptorReservation`":
> > [PATCH RFC 13/20] rust_binder: add BINDER_TYPE_FD support
> > [PATCH RFC 14/20] rust_binder: add BINDER_TYPE_FDA support
> >
> > Users of "rust: file: add kuid getters":
> > [PATCH RFC 05/20] rust_binder: add nodes and context managers
> > [PATCH RFC 06/20] rust_binder: add oneway transactions
> >
> > Users of "rust: file: add `DeferredFdCloser`":
> > [PATCH RFC 14/20] rust_binder: add BINDER_TYPE_FDA support
> >
> > Users of "rust: file: add abstraction for `poll_table`":
> > [PATCH RFC 07/20] rust_binder: add epoll support
> >
> > This patchset has some uses of read_volatile in place of READ_ONCE.
> > Please see the following rfc for context on this:
> > https://lore.kernel.org/all/[email protected]/
> >
> > This was previously sent as an rfc:
> > https://lore.kernel.org/all/[email protected]/
> >
> > Signed-off-by: Alice Ryhl <[email protected]>
> > ---
> > Alice Ryhl (4):
> > rust: security: add abstraction for security_secid_to_secctx
> > rust: file: add `Kuid` wrapper
> > rust: file: add `DeferredFdCloser`
> > rust: file: add abstraction for `poll_table`
> >
> > Wedson Almeida Filho (3):
> > rust: file: add Rust abstraction for `struct file`
> > rust: cred: add Rust abstraction for `struct cred`
> > rust: file: add `FileDescriptorReservation`
> >
> > rust/bindings/bindings_helper.h | 9 ++
> > rust/bindings/lib.rs | 1 +
> > rust/helpers.c | 94 +++++++++++
> > rust/kernel/cred.rs | 73 +++++++++
> > rust/kernel/file.rs | 345 ++++++++++++++++++++++++++++++++++++++++
> > rust/kernel/file/poll_table.rs | 97 +++++++++++
>
> That's pretty far away from the subsystem these wrappers belong to. I
> would prefer if wrappers such as this would live directly in fs/rust/
> and so live within the subsystem they belong to. I think I mentioned
> that before. Maybe I missed some sort of agreement here?
I spoke to Miguel about this and it was my understanding that everything
was in place for moving Rust wrappers to the proper directory -
previously there was build system stuff blocking, but he said that's all
working now. Perhaps the memo just didn't get passed down?
(My vote would actually be for fs/ directly, not fs/rust, and a 1:1
mapping between .c files and the .rs files that wrap them).
On Wed, Dec 06, 2023 at 03:02:24PM -0500, Kent Overstreet wrote:
> On Thu, Nov 30, 2023 at 11:36:35AM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 29, 2023 at 01:12:17PM +0000, Alice Ryhl wrote:
> >
> > > diff --git a/rust/helpers.c b/rust/helpers.c
> > > index fd633d9db79a..58e3a9dff349 100644
> > > --- a/rust/helpers.c
> > > +++ b/rust/helpers.c
> > > @@ -142,6 +142,51 @@ void rust_helper_put_task_struct(struct task_struct *t)
> > > }
> > > EXPORT_SYMBOL_GPL(rust_helper_put_task_struct);
> > >
> > > +kuid_t rust_helper_task_uid(struct task_struct *task)
> > > +{
> > > + return task_uid(task);
> > > +}
> > > +EXPORT_SYMBOL_GPL(rust_helper_task_uid);
> > > +
> > > +kuid_t rust_helper_task_euid(struct task_struct *task)
> > > +{
> > > + return task_euid(task);
> > > +}
> > > +EXPORT_SYMBOL_GPL(rust_helper_task_euid);
> >
> > Aren't these like ideal speculation gadgets? And shouldn't we avoid
> > functions like this for exactly that reason?
>
> I think asking the Rust people to care about that is probably putting
> too many constraints on them, unless you actually have an idea for
> something better to do...
It's not a constraint, it is a "we can not do this as it is buggy
because cpus are broken and we need to protect users from those bugs."
If we were to accept this type of code, then the people who are going
"it's safer to write kernel code in Rust" would be "pleasantly
surprised" when it turns out that their systems are actually more
insecure.
Hint, when "known broken" code is found in code review, it can not just
be ignored.
thanks,
greg k-h
On Thu, Dec 07, 2023 at 08:18:37AM +0100, Greg Kroah-Hartman wrote:
> On Wed, Dec 06, 2023 at 03:02:24PM -0500, Kent Overstreet wrote:
> > On Thu, Nov 30, 2023 at 11:36:35AM +0100, Peter Zijlstra wrote:
> > > On Wed, Nov 29, 2023 at 01:12:17PM +0000, Alice Ryhl wrote:
> > >
> > > > diff --git a/rust/helpers.c b/rust/helpers.c
> > > > index fd633d9db79a..58e3a9dff349 100644
> > > > --- a/rust/helpers.c
> > > > +++ b/rust/helpers.c
> > > > @@ -142,6 +142,51 @@ void rust_helper_put_task_struct(struct task_struct *t)
> > > > }
> > > > EXPORT_SYMBOL_GPL(rust_helper_put_task_struct);
> > > >
> > > > +kuid_t rust_helper_task_uid(struct task_struct *task)
> > > > +{
> > > > + return task_uid(task);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(rust_helper_task_uid);
> > > > +
> > > > +kuid_t rust_helper_task_euid(struct task_struct *task)
> > > > +{
> > > > + return task_euid(task);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(rust_helper_task_euid);
> > >
> > > Aren't these like ideal speculation gadgets? And shouldn't we avoid
> > > functions like this for exactly that reason?
> >
> > I think asking the Rust people to care about that is probably putting
> > too many constraints on them, unless you actually have an idea for
> > something better to do...
>
> It's not a constraint, it is a "we can not do this as it is buggy
> because cpus are broken and we need to protect users from those bugs."
>
> If we were to accept this type of code, then the people who are going
> "it's safer to write kernel code in Rust" would be "pleasantly
> surprised" when it turns out that their systems are actually more
> insecure.
>
> Hint, when "known broken" code is found in code review, it can not just
> be ignored.
We're talking about a CPU bug, not a Rust bug, and maybe try a nm
--size-sort and see what you find before throwing stones at them...
> On Nov 30, 2023, at 4:46 AM, Christian Brauner <[email protected]> wrote:
>
> I wouldn't even
> complain if it they were somehow auto-generated but as you say that
> might be out of scope.
FYI, rust-bindgen got an experimental feature of this nature earlier this year:
https://github.com/rust-lang/rust-bindgen/pull/2335
Though apparently it has significant limitations meriting it the “experimental” title.
Regarding the issue of wrappers not being inlined, it's possible to get LLVM to optimize C and Rust code together into an object file, with the help of a compatible Clang and LLD:
@ rustc -O --emit llvm-bc a.rs
@ clang --target=x86_64-unknown-linux-gnu -O2 -c -emit-llvm -o b.bc b.c
@ ld.lld -r -o c.o a.bc b.bc
Basically LTO but within the scope of a single object file. This would be redundant in cases where kernel-wide LTO is enabled.
Using this approach might slow down compilation a bit due to needing to pass the LLVM bitcode between multiple commands, but probably not very much.
Just chiming in as someone not involved in Rust for Linux but familiar with these tools. Perhaps this has been considered before and rejected for some reason; I wouldn’t know.
On Fri, Dec 8, 2023 at 6:28 AM comex <[email protected]> wrote:
>
> Regarding the issue of wrappers not being inlined, it's possible to get LLVM to optimize C and Rust code together into an object file, with the help of a compatible Clang and LLD:
>
> @ rustc -O --emit llvm-bc a.rs
> @ clang --target=x86_64-unknown-linux-gnu -O2 -c -emit-llvm -o b.bc b.c
> @ ld.lld -r -o c.o a.bc b.bc
>
> Basically LTO but within the scope of a single object file. This would be redundant in cases where kernel-wide LTO is enabled.
>
> Using this approach might slow down compilation a bit due to needing to pass the LLVM bitcode between multiple commands, but probably not very much.
>
> Just chiming in as someone not involved in Rust for Linux but familiar with these tools. Perhaps this has been considered before and rejected for some reason; I wouldn’t know.
Thanks comex for chiming in, much appreciated.
Yeah, this is what we have been calling the "local-LTO hack" and it
was one of the possibilities we were considering for non-LTO kernel
builds for performance reasons originally. I don't recall who
originally suggested it in one of our meetings (Gary or Björn
perhaps).
If LLVM folks think LLVM-wise nothing will break, then we are happy to
go ahead with that (since it also solves the performance side), but it
would be nice to know if it will always be OK to build like that, i.e.
I think Andreas actually tried it and it seemed to work and boot, but
the worry is whether there is something subtle that could have bad
codegen in the future.
(We will also need to worry about GCC.)
Cheers,
Miguel
On Wed, Dec 06, 2023 at 02:59:11PM -0500, Kent Overstreet wrote:
> I suspect even if the manpower existed to go that route we'd end up
> regretting it, because then the Rust compiler would need to be able to
> handle _all_ the craziness a modern C compiler knows how to do -
> preprocessor magic/devilry isn't even the worst of it, it gets even
> worse when you start to consider things like bitfields and all the crazy
> __attributes__(()) people have invented.
Dude, clang can already handle all of that. Both rust and clang are
build on top of llvm, they generate the same IR, you can simply feed a
string into libclang and get IR out of it, which you can splice into
your rust generated IR.
On Wed, Dec 6, 2023 at 9:05 PM Kent Overstreet
<[email protected]> wrote:
>
> I spoke to Miguel about this and it was my understanding that everything
> was in place for moving Rust wrappers to the proper directory -
> previously there was build system stuff blocking, but he said that's all
> working now. Perhaps the memo just didn't get passed down?
No, it is being worked on (please see my sibling reply).
> (My vote would actually be for fs/ directly, not fs/rust, and a 1:1
> mapping between .c files and the .rs files that wrap them).
Thanks Kent for voting :)
Though note that an exact 1:1 mapping is going to be hard, e.g.
consider nested Rust submodules which would go in folders or
abstractions that you may arrange differently even if they wrap the
same concepts.
But, yeah, one should try to avoid to diverge without a good reason,
of course, especially in the beginning.
Cheers,
Miguel
On Fri, Dec 8, 2023 at 8:19 AM Miguel Ojeda
<[email protected]> wrote:
>
> On Fri, Dec 8, 2023 at 6:28 AM comex <[email protected]> wrote:
> >
> > Regarding the issue of wrappers not being inlined, it's possible to get LLVM to optimize C and Rust code together into an object file, with the help of a compatible Clang and LLD:
> >
> > @ rustc -O --emit llvm-bc a.rs
> > @ clang --target=x86_64-unknown-linux-gnu -O2 -c -emit-llvm -o b.bc b.c
> > @ ld.lld -r -o c.o a.bc b.bc
> >
> > Basically LTO but within the scope of a single object file. This would be redundant in cases where kernel-wide LTO is enabled.
> >
> > Using this approach might slow down compilation a bit due to needing to pass the LLVM bitcode between multiple commands, but probably not very much.
> >
> > Just chiming in as someone not involved in Rust for Linux but familiar with these tools. Perhaps this has been considered before and rejected for some reason; I wouldn’t know.
>
> Thanks comex for chiming in, much appreciated.
>
> Yeah, this is what we have been calling the "local-LTO hack" and it
> was one of the possibilities we were considering for non-LTO kernel
> builds for performance reasons originally. I don't recall who
> originally suggested it in one of our meetings (Gary or Björn
> perhaps).
>
> If LLVM folks think LLVM-wise nothing will break, then we are happy to
On paper, nothing comes to mind. No promises though.
From a build system perspective, I'd rather just point users towards
LTO if they have this concern. We support full and thin lto. This
proposal would add a third variant for just rust drivers. Each
variation on LTO has a maintenance cost and each have had their own
distinct fun bugs in the past. Not sure an additional variant is
worth the maintenance cost, even if it's technically feasible.
> go ahead with that (since it also solves the performance side), but it
> would be nice to know if it will always be OK to build like that, i.e.
> I think Andreas actually tried it and it seemed to work and boot, but
> the worry is whether there is something subtle that could have bad
> codegen in the future.
>
> (We will also need to worry about GCC.)
>
> Cheers,
> Miguel
--
Thanks,
~Nick Desaulniers
On Fri, Dec 8, 2023 at 6:09 PM Nick Desaulniers <[email protected]> wrote:
>
> On paper, nothing comes to mind. No promises though.
Thanks Nick -- that is useful nevertheless.
> From a build system perspective, I'd rather just point users towards
> LTO if they have this concern. We support full and thin lto. This
> proposal would add a third variant for just rust drivers. Each
> variation on LTO has a maintenance cost and each have had their own
> distinct fun bugs in the past. Not sure an additional variant is
> worth the maintenance cost, even if it's technically feasible.
I was thinking it would be something always done for Rust object
files: under a normal "no LTO" build, the Rust object files would
always get the cross-language inlining done and therefore no extra
dimension in the matrix. Would that help?
I think it is worth at least considering, given there is also a
non-trivial amount of performance to gain if we always do it, e.g.
Andreas wanted it for non-LTO kernel for this reason.
Cheers,
Miguel
On Fri, Dec 08, 2023 at 09:08:47AM -0800, Nick Desaulniers wrote:
> On Fri, Dec 8, 2023 at 8:19 AM Miguel Ojeda
> <[email protected]> wrote:
> >
> > On Fri, Dec 8, 2023 at 6:28 AM comex <[email protected]> wrote:
> > >
> > > Regarding the issue of wrappers not being inlined, it's possible to get LLVM to optimize C and Rust code together into an object file, with the help of a compatible Clang and LLD:
> > >
> > > @ rustc -O --emit llvm-bc a.rs
> > > @ clang --target=x86_64-unknown-linux-gnu -O2 -c -emit-llvm -o b.bc b.c
> > > @ ld.lld -r -o c.o a.bc b.bc
> > >
> > > Basically LTO but within the scope of a single object file. This would be redundant in cases where kernel-wide LTO is enabled.
> > >
> > > Using this approach might slow down compilation a bit due to needing to pass the LLVM bitcode between multiple commands, but probably not very much.
> > >
> > > Just chiming in as someone not involved in Rust for Linux but familiar with these tools. Perhaps this has been considered before and rejected for some reason; I wouldn’t know.
> >
> > Thanks comex for chiming in, much appreciated.
> >
> > Yeah, this is what we have been calling the "local-LTO hack" and it
> > was one of the possibilities we were considering for non-LTO kernel
> > builds for performance reasons originally. I don't recall who
> > originally suggested it in one of our meetings (Gary or Björn
> > perhaps).
> >
> > If LLVM folks think LLVM-wise nothing will break, then we are happy to
>
> On paper, nothing comes to mind. No promises though.
>
> From a build system perspective, I'd rather just point users towards
> LTO if they have this concern. We support full and thin lto. This
> proposal would add a third variant for just rust drivers. Each
> variation on LTO has a maintenance cost and each have had their own
> distinct fun bugs in the past. Not sure an additional variant is
> worth the maintenance cost, even if it's technically feasible.
>
Actually, the "LTO" in "local-LTO" may be misleading ;-) The problem we
want to resolve here is letting Rust code call small C functions (or
macros) without exporting the symbols. To me, it's really just "static
linking" a library (right now it's rust/helpers.o) contains small C
functions and macros used by Rust into a Rust driver kmodule, the "LTO"
part can be optional: let the linker make the call.
Regards,
Boqun
> > go ahead with that (since it also solves the performance side), but it
> > would be nice to know if it will always be OK to build like that, i.e.
> > I think Andreas actually tried it and it seemed to work and boot, but
> > the worry is whether there is something subtle that could have bad
> > codegen in the future.
> >
> > (We will also need to worry about GCC.)
> >
> > Cheers,
> > Miguel
>
>
>
> --
> Thanks,
> ~Nick Desaulniers
On Fri, Dec 08, 2023 at 05:26:16PM +0100, Peter Zijlstra wrote:
> On Wed, Dec 06, 2023 at 02:59:11PM -0500, Kent Overstreet wrote:
>
> > I suspect even if the manpower existed to go that route we'd end up
> > regretting it, because then the Rust compiler would need to be able to
> > handle _all_ the craziness a modern C compiler knows how to do -
> > preprocessor magic/devilry isn't even the worst of it, it gets even
> > worse when you start to consider things like bitfields and all the crazy
> > __attributes__(()) people have invented.
>
> Dude, clang can already handle all of that. Both rust and clang are
> build on top of llvm, they generate the same IR, you can simply feed a
> string into libclang and get IR out of it, which you can splice into
> your rust generated IR.
If only it were that simple :)
This is struct definitions we're talking about, not code, so what you
want isn't even IR, what you're generating is a memory layout for a
type, linked in with all your other type information.
And people critize Linux for being a giant monorepo that makes no
considerations for making our code reusable in other contexts; clang and
LLVM are no different. But that's not really the issue because you're
going to need a huge chunk of clang to even parse this stuff, what you
really want is a way to invoke clang and dump _type information_ in a
standardized, easy to consume way. What you want is actually more akin
to the debug info that's generated today.
So... yeah, sure, lovely if it existed, but not the world we live in :)
(As an aside, I've actually got an outstanding bug filed with rustc
because it needs to be able to handle types that are marked both packed
and aligned... if anyone in this thread _does_ know some rust compiler
folks, we need that for bcachefs on disk format types).
On Fri, Dec 08, 2023 at 09:08:47AM -0800, Nick Desaulniers wrote:
> From a build system perspective, I'd rather just point users towards
> LTO if they have this concern. We support full and thin lto. This
> proposal would add a third variant for just rust drivers. Each
> variation on LTO has a maintenance cost and each have had their own
> distinct fun bugs in the past. Not sure an additional variant is
> worth the maintenance cost, even if it's technically feasible.
If we're allowed to talk about ideal solutions ... I hate putting
code in header files. I'd rather be able to put, eg:
__force_inline int put_page_testzero(struct page *page)
{
VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
return page_ref_dec_and_test(page);
}
__force_inline int folio_put_testzero(struct folio *folio)
{
return put_page_testzero(&folio->page);
}
__force_inline void folio_put(struct folio *folio)
{
if (folio_put_testzero(folio))
__folio_put(folio);
}
into a .c file and have both C and Rust inline folio_put(),
folio_put_testzero(), put_page_testzero(), VM_BUG_ON_PAGE() and
page_ref_dec_and_test(), but not even attempt to inline __folio_put()
(because We Know Better, and have determined that is the point at
which to stop).
On Dec 8, 2023, at 8:19 AM, Miguel Ojeda <[email protected]> wrote:
>
> If LLVM folks think LLVM-wise nothing will break, then we are happy to
> go ahead with that (since it also solves the performance side), but it
> would be nice to know if it will always be OK to build like that, i.e.
> I think Andreas actually tried it and it seemed to work and boot, but
> the worry is whether there is something subtle that could have bad
> codegen in the future.
One potential issue is incompatibility between the LLVM versions used by rustc, Clang, and LLD. At minimum, whichever tool is reading bitcode (LLD in my example) should have an LLVM version >= that of the tools producing bitcode, since newer LLVM versions can read older bitcode but not vice versa. But ideally the tools would all just be linked against the same copy of LLVM.
If you’re getting your tools from a distro, then that may already be true for you. But if you’re using upstream rustc binaries, those are built against a custom branch of LLVM, which is based on upstream release versions but adds a handful of patches [1]; by policy, those patches can include cherry-picks of miscompilation fixes that are upstream but haven’t made it into a release yet [2]. Upstream rustc binaries are accompanied by a copy of LLD linked against the same LLVM library, named rust-lld, but there’s no corresponding copy of Clang [3]. I’d say that agreement between rustc and LLD is the most important thing, but it would be nice if they'd make a matching Clang available through rustup.
[1] https://github.com/llvm/llvm-project/compare/release/17.x...rust-lang:llvm-project:rustc/17.0-2023-09-19
[2] https://rustc-dev-guide.rust-lang.org/backend/updating-llvm.html#bugfix-updates
[3] https://github.com/rust-lang/rust/issues/56371