2023-02-24 10:53:50

by Asahi Lina

[permalink] [raw]
Subject: [PATCH 0/5] rust: Add io_pgtable and RTKit abstractions

Hi everyone!

This series is part of the set of dependencies for the drm/asahi
Apple M1/M2 GPU driver.

It builds on top of the error helper series [1] and adds the skeleton
of a Device abstraction and two proper subsystem abstractions that use
it: io_pgtable and RTKit.

Many kernel subsystems need to take `struct device` pointers. In the
downstream rust-for-linux/rust branch, this is abstracted as a RawDevice
trait that specific `struct device` subtypes (like `platform_device`)
can implement such that all device subclasses can interoperate with
device class-agnostic kernel APIs. Importing this trivial trait is all
that is needed to start building and upstreaming abstractions.

The first patch in this series just adds a private Sealed trait, which
is a common Rust pattern to allow crates to have traits that cannot be
implemented outside the trait. This is important where implementing such
traits incorrectly could lead to soundness issues, and where in general
there is no reason for anything outside the kernel crate to do so.

The next patch introduces the RawDevice trait, which just represents
anything that contains a `struct device`. This is enough for kernel
abstractions that need to pass down such pointers to the underlying C
code to compile.

The third patch adds the io_pgtable abstraction. The Apple GPU driver
uses this common code to manage the GPU page tables, which are mostly
standard ARMv8 page tables. The common code does need a new variant type
to handle the specific PTE permission scheme and a few other details,
which I already submitted [2]. This abstraction does not include that
new type. If the underlying C support goes in first, I can update this
series to add the missing wrapper in a later revision (it's just a few
extra lines). See the panfrost driver for another example of a GPU
driver reusing common io_pgtable code (with a specific format variant
too).

The fourth patch adds the Apple RTKit subsystem abstraction, which is
part of drivers/soc/apple. This builds on some support patches that
should already be in the SoC tree queued for 6.3. While this code only
makes sense on ARM64 (which does not yet have upstream Rust support),
it should compile on x86 too with CONFIG_COMPILE_TEST. If you want to
compile test this yourself at this point, you will need to merge in [3]
(this will no longer be required as soon as Linus pulls the SoC stuff
in this merge window).

Note: Although this patch adds a consumer for the Error stuff introduced
previously, it is conditionally compiled so we still need the dead_code
allow tags to avoid warnings when RTKit is not enabled. Also, since the
Rust build system does not yet have split crates with helpers/bindings
to allow for modular abstractions, this abstraction hard-depends on
CONFIG_APPLE_RTKIT=y. We expect to remove this restriction once the
build system is improved, but from talks with the DRM folks we're okay
with the driver introducing built-in dependencies on some kernel
subsystems (even though the driver itself is buildable as a module).

The fifth patch adds some more bits of the Device abstraction, to give
context on the direction that is headed in. It includes a basic `Device`
type that represents an owned reference to a generic `struct device`.
This doesn't have to go in right now, but I thought it would be useful
to provide some more context.

The branch at [4] contains the full series of patches rebased on
upstream leading to the complete driver, for reference on how these APIs
and abstractions are used.

Please let me know if you have any comments on how this is structured or
would like to see the abstractions upstreamed separate! It's very
difficult to upstream things only when they have a user, since the
dependency chain for that user (the GPU driver) is very long. I hope
that bundling these together helps give some context and get some
momentum going with the upstreaming process.

[1] https://lore.kernel.org/rust-for-linux/[email protected]/T/#t
[2] https://lore.kernel.org/asahi/[email protected]/
[3] https://github.com/AsahiLinux/linux.git asahi-soc-rtkit-pmgr-6.3
[4] https://github.com/AsahiLinux/linux/tree/gpu/rebase-20230224

Signed-off-by: Asahi Lina <[email protected]>
---
Asahi Lina (3):
rust: Add a Sealed trait
rust: io_pgtable: Add io_pgtable abstraction
rust: soc: apple: rtkit: Add Apple RTKit abstraction

Wedson Almeida Filho (2):
rust: device: Add a minimal RawDevice trait
rust: device: Add a stub abstraction for devices
rust/bindings/bindings_helper.h | 3 +
rust/helpers.c | 13 ++
rust/kernel/device.rs | 97 +++++++++++
rust/kernel/io_pgtable.rs | 351 ++++++++++++++++++++++++++++++++++++++++
rust/kernel/lib.rs | 9 ++
rust/kernel/soc/apple/mod.rs | 6 +
rust/kernel/soc/apple/rtkit.rs | 259 +++++++++++++++++++++++++++++
rust/kernel/soc/mod.rs | 5 +
8 files changed, 743 insertions(+)
---
base-commit: 0ac13d87afc0086c0be43e7988173295a0864d5d
change-id: 20230224-rust-iopt-rtkit-6d8b554d204c

Thank you,
~~ Lina



2023-02-24 10:53:54

by Asahi Lina

[permalink] [raw]
Subject: [PATCH 1/5] rust: Add a Sealed trait

Some traits exposed by the kernel crate may not be intended to be
implemented by downstream modules. Add a Sealed trait to allow avoiding
this using the sealed trait pattern.

Signed-off-by: Asahi Lina <[email protected]>
---
rust/kernel/lib.rs | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 223564f9f0cc..82dff6f4cf60 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -46,6 +46,11 @@ pub use macros;
#[doc(hidden)]
pub use build_error::build_error;

+pub(crate) mod private {
+ #[allow(unreachable_pub)]
+ pub trait Sealed {}
+}
+
/// Prefix to appear before log messages printed from within the `kernel` crate.
const __LOG_PREFIX: &[u8] = b"rust_kernel\0";


--
2.35.1


2023-02-24 10:54:06

by Asahi Lina

[permalink] [raw]
Subject: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

From: Wedson Almeida Filho <[email protected]>

Add a RawDevice trait which can be implemented by any type representing
a device class (such as a PlatformDevice). This is the minimum amount of
Device support code required to unblock abstractions that need to take
device pointers.

Lina: Rewrote commit message, and dropped everything except RawDevice.

Co-developed-by: Miguel Ojeda <[email protected]>
Signed-off-by: Miguel Ojeda <[email protected]>
Signed-off-by: Wedson Almeida Filho <[email protected]>
Signed-off-by: Asahi Lina <[email protected]>
---
rust/bindings/bindings_helper.h | 1 +
rust/kernel/device.rs | 23 +++++++++++++++++++++++
rust/kernel/lib.rs | 1 +
3 files changed, 25 insertions(+)

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 75d85bd6c592..3632a39a28a6 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -6,6 +6,7 @@
* Sorted alphabetically.
*/

+#include <linux/device.h>
#include <linux/slab.h>
#include <linux/refcount.h>

diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs
new file mode 100644
index 000000000000..9be021e393ca
--- /dev/null
+++ b/rust/kernel/device.rs
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Generic devices that are part of the kernel's driver model.
+//!
+//! C header: [`include/linux/device.h`](../../../../include/linux/device.h)
+
+use crate::bindings;
+
+/// A raw device.
+///
+/// # Safety
+///
+/// Implementers must ensure that the `*mut device` returned by [`RawDevice::raw_device`] is
+/// related to `self`, that is, actions on it will affect `self`. For example, if one calls
+/// `get_device`, then the refcount on the device represented by `self` will be incremented.
+///
+/// Additionally, implementers must ensure that the device is never renamed. Commit a5462516aa99
+/// ("driver-core: document restrictions on device_rename()") has details on why `device_rename`
+/// should not be used.
+pub unsafe trait RawDevice {
+ /// Returns the raw `struct device` related to `self`.
+ fn raw_device(&self) -> *mut bindings::device;
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 82dff6f4cf60..de44092718f8 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -29,6 +29,7 @@ compile_error!("Missing kernel configuration for conditional compilation");
#[cfg(not(testlib))]
mod allocator;
mod build_assert;
+pub mod device;
pub mod error;
pub mod prelude;
pub mod print;

--
2.35.1


2023-02-24 10:54:23

by Asahi Lina

[permalink] [raw]
Subject: [PATCH 3/5] rust: io_pgtable: Add io_pgtable abstraction

The io_pgtable subsystem implements page table management for various
IOMMU page table formats. This abstraction allows Rust drivers for
devices with an embedded MMU to use this shared code directly.

Gather structures are not implemented yet, since we don't have a
consumer for that functionality. That can be added and refactored when
someone needs it.

It's worth noting that although this abstraction is nominally used to
manage page tables and mapping physical memory addresses, the
abstraction API itself is not unsafe. This is because, by itself, it can
only be used to manage page tables in isolation, which have no effect on
the system. In Rust, we typically use `unsafe` to mark operations that
actually introduce the safety requirements (that is, where the
responsibilities are created). Here, that would be actually installing
the page table root pointer into a device register (the downstream iomem
abstractions are unsafe for this reason, because devices can do DMA).

Signed-off-by: Asahi Lina <[email protected]>
---
rust/bindings/bindings_helper.h | 1 +
rust/kernel/io_pgtable.rs | 351 ++++++++++++++++++++++++++++++++++++++++
rust/kernel/lib.rs | 2 +
3 files changed, 354 insertions(+)

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 3632a39a28a6..88c65431d3ad 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -7,6 +7,7 @@
*/

#include <linux/device.h>
+#include <linux/io-pgtable.h>
#include <linux/slab.h>
#include <linux/refcount.h>

diff --git a/rust/kernel/io_pgtable.rs b/rust/kernel/io_pgtable.rs
new file mode 100644
index 000000000000..19029b1fdfd8
--- /dev/null
+++ b/rust/kernel/io_pgtable.rs
@@ -0,0 +1,351 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! IOMMU page table management
+//!
+//! C header: [`include/io-pgtable.h`](../../../../include/io-pgtable.h)
+
+use crate::{
+ bindings, device,
+ error::{code::*, to_result, Result},
+ types::{ForeignOwnable, ScopeGuard},
+};
+
+use core::marker::PhantomData;
+use core::mem;
+use core::num::NonZeroU64;
+
+/// Protection flags used with IOMMU mappings.
+pub mod prot {
+ /// Read access.
+ pub const READ: u32 = bindings::IOMMU_READ;
+ /// Write access.
+ pub const WRITE: u32 = bindings::IOMMU_WRITE;
+ /// Request cache coherency.
+ pub const CACHE: u32 = bindings::IOMMU_CACHE;
+ /// Request no-execute permission.
+ pub const NOEXEC: u32 = bindings::IOMMU_NOEXEC;
+ /// MMIO peripheral mapping.
+ pub const MMIO: u32 = bindings::IOMMU_MMIO;
+ /// Privileged mapping.
+ pub const PRIV: u32 = bindings::IOMMU_PRIV;
+}
+
+/// Represents a requested io_pgtable configuration.
+pub struct Config {
+ /// Quirk bitmask (type-specific).
+ pub quirks: usize,
+ /// Valid page sizes, as a bitmask of powers of two.
+ pub pgsize_bitmap: usize,
+ /// Input address space size in bits.
+ pub ias: usize,
+ /// Output address space size in bits.
+ pub oas: usize,
+ /// IOMMU uses coherent accesses for page table walks.
+ pub coherent_walk: bool,
+}
+
+/// IOMMU callbacks for TLB and page table management.
+///
+/// Users must implement this trait to perform the TLB flush actions for this IOMMU, if
+/// required.
+pub trait FlushOps {
+ /// User-specified type owned by the IOPagetable that will be passed to TLB operations.
+ type Data: ForeignOwnable + Send + Sync;
+
+ /// Synchronously invalidate the entire TLB context.
+ fn tlb_flush_all(data: <Self::Data as ForeignOwnable>::Borrowed<'_>);
+
+ /// Synchronously invalidate all intermediate TLB state (sometimes referred to as the "walk
+ /// cache") for a virtual address range.
+ fn tlb_flush_walk(
+ data: <Self::Data as ForeignOwnable>::Borrowed<'_>,
+ iova: usize,
+ size: usize,
+ granule: usize,
+ );
+
+ /// Optional callback to queue up leaf TLB invalidation for a single page.
+ ///
+ /// IOMMUs that cannot batch TLB invalidation operations efficiently will typically issue
+ /// them here, but others may decide to update the iommu_iotlb_gather structure and defer
+ /// the invalidation until iommu_iotlb_sync() instead.
+ ///
+ /// TODO: Implement the gather argument for batching.
+ fn tlb_add_page(
+ data: <Self::Data as ForeignOwnable>::Borrowed<'_>,
+ iova: usize,
+ granule: usize,
+ );
+}
+
+/// Inner page table info shared across all table types.
+/// # Invariants
+///
+/// - [`self.ops`] is valid and non-null.
+/// - [`self.cfg`] is valid and non-null.
+#[doc(hidden)]
+pub struct IoPageTableInner {
+ ops: *mut bindings::io_pgtable_ops,
+ cfg: bindings::io_pgtable_cfg,
+ data: *mut core::ffi::c_void,
+}
+
+/// Helper trait to get the config type for a single page table type from the union.
+pub trait GetConfig {
+ /// Returns the specific output configuration for this page table type.
+ fn cfg(iopt: &impl IoPageTable) -> &Self
+ where
+ Self: Sized;
+}
+
+/// A generic IOMMU page table
+pub trait IoPageTable: crate::private::Sealed {
+ #[doc(hidden)]
+ const FLUSH_OPS: bindings::iommu_flush_ops;
+
+ #[doc(hidden)]
+ fn new_fmt<T: FlushOps>(
+ dev: &dyn device::RawDevice,
+ format: u32,
+ config: Config,
+ data: T::Data,
+ ) -> Result<IoPageTableInner> {
+ let ptr = data.into_foreign() as *mut _;
+ let guard = ScopeGuard::new(|| {
+ // SAFETY: `ptr` came from a previous call to `into_foreign`.
+ unsafe { T::Data::from_foreign(ptr) };
+ });
+
+ let mut raw_cfg = bindings::io_pgtable_cfg {
+ quirks: config.quirks.try_into()?,
+ pgsize_bitmap: config.pgsize_bitmap.try_into()?,
+ ias: config.ias.try_into()?,
+ oas: config.oas.try_into()?,
+ coherent_walk: config.coherent_walk,
+ tlb: &Self::FLUSH_OPS,
+ iommu_dev: dev.raw_device(),
+ // SAFETY: This is an output field which is fine to zero-init.
+ __bindgen_anon_1: unsafe { mem::zeroed() },
+ };
+
+ // SAFETY: FFI call, all input pointers are valid.
+ let ops = unsafe {
+ bindings::alloc_io_pgtable_ops(format as bindings::io_pgtable_fmt, &mut raw_cfg, ptr)
+ };
+
+ if ops.is_null() {
+ return Err(EINVAL);
+ }
+
+ guard.dismiss();
+ Ok(IoPageTableInner {
+ ops,
+ cfg: raw_cfg,
+ data: ptr,
+ })
+ }
+
+ /// Map a range of pages.
+ fn map_pages(
+ &mut self,
+ iova: usize,
+ paddr: usize,
+ pgsize: usize,
+ pgcount: usize,
+ prot: u32,
+ ) -> Result<usize> {
+ let mut mapped: usize = 0;
+
+ // SAFETY: FFI call, ops is valid per the type invariant.
+ to_result(unsafe {
+ (*self.inner().ops).map_pages.unwrap()(
+ self.inner().ops,
+ iova as u64,
+ paddr as u64,
+ pgsize,
+ pgcount,
+ prot as i32,
+ bindings::GFP_KERNEL,
+ &mut mapped,
+ )
+ })?;
+
+ Ok(mapped)
+ }
+
+ /// Unmap a range of pages.
+ fn unmap_pages(
+ &mut self,
+ iova: usize,
+ pgsize: usize,
+ pgcount: usize,
+ // TODO: gather: *mut iommu_iotlb_gather,
+ ) -> usize {
+ // SAFETY: FFI call, ops is valid per the type invariant.
+ unsafe {
+ (*self.inner().ops).unmap_pages.unwrap()(
+ self.inner().ops,
+ iova as u64,
+ pgsize,
+ pgcount,
+ core::ptr::null_mut(),
+ )
+ }
+ }
+
+ /// Translate an IOVA to the corresponding physical address, if mapped.
+ fn iova_to_phys(&self, iova: usize) -> Option<NonZeroU64> {
+ // SAFETY: FFI call, ops is valid per the type invariant.
+ NonZeroU64::new(unsafe {
+ (*self.inner().ops).iova_to_phys.unwrap()(self.inner().ops, iova as u64)
+ })
+ }
+
+ #[doc(hidden)]
+ fn inner(&self) -> &IoPageTableInner;
+
+ #[doc(hidden)]
+ fn raw_cfg(&self) -> &bindings::io_pgtable_cfg {
+ &self.inner().cfg
+ }
+}
+
+// SAFETY: All abstraction operations either require mutable references or are thread-safe,
+// and io_pgtable_ops objects can be passed between threads without issue.
+unsafe impl Send for IoPageTableInner {}
+unsafe impl Sync for IoPageTableInner {}
+
+unsafe extern "C" fn tlb_flush_all_callback<T: FlushOps>(cookie: *mut core::ffi::c_void) {
+ // SAFETY: The cookie is always a ForeignOwnable of the right type, per new_fmt().
+ T::tlb_flush_all(unsafe { T::Data::borrow(cookie) });
+}
+
+unsafe extern "C" fn tlb_flush_walk_callback<T: FlushOps>(
+ iova: core::ffi::c_ulong,
+ size: usize,
+ granule: usize,
+ cookie: *mut core::ffi::c_void,
+) {
+ // SAFETY: The cookie is always a ForeignOwnable of the right type, per new_fmt().
+ T::tlb_flush_walk(
+ unsafe { T::Data::borrow(cookie) },
+ iova as usize,
+ size,
+ granule,
+ );
+}
+
+unsafe extern "C" fn tlb_add_page_callback<T: FlushOps>(
+ _gather: *mut bindings::iommu_iotlb_gather,
+ iova: core::ffi::c_ulong,
+ granule: usize,
+ cookie: *mut core::ffi::c_void,
+) {
+ // SAFETY: The cookie is always a ForeignOwnable of the right type, per new_fmt().
+ T::tlb_add_page(unsafe { T::Data::borrow(cookie) }, iova as usize, granule);
+}
+
+macro_rules! iopt_cfg {
+ ($name:ident, $field:ident, $type:ident) => {
+ /// An IOMMU page table configuration for a specific kind of pagetable.
+ pub type $name = bindings::$type;
+
+ impl GetConfig for $name {
+ fn cfg(iopt: &impl IoPageTable) -> &$name {
+ // SAFETY: The type system ensures we are accessing the right union field.
+ unsafe { &iopt.raw_cfg().__bindgen_anon_1.$field }
+ }
+ }
+ };
+}
+
+impl GetConfig for () {
+ fn cfg(_iopt: &impl IoPageTable) -> &() {
+ &()
+ }
+}
+
+macro_rules! iopt_type {
+ ($type:ident, $cfg:ty, $fmt:ident) => {
+ /// Represents an IOPagetable of this type.
+ pub struct $type<T: FlushOps>(IoPageTableInner, PhantomData<T>);
+
+ impl<T: FlushOps> $type<T> {
+ /// Creates a new IOPagetable implementation of this type.
+ pub fn new(dev: &dyn device::RawDevice, config: Config, data: T::Data) -> Result<Self> {
+ Ok(Self(
+ <Self as IoPageTable>::new_fmt::<T>(dev, bindings::$fmt, config, data)?,
+ PhantomData,
+ ))
+ }
+
+ /// Get the configuration for this IOPagetable.
+ pub fn cfg(&self) -> &$cfg {
+ <$cfg as GetConfig>::cfg(self)
+ }
+ }
+
+ impl<T: FlushOps> crate::private::Sealed for $type<T> {}
+
+ impl<T: FlushOps> IoPageTable for $type<T> {
+ const FLUSH_OPS: bindings::iommu_flush_ops = bindings::iommu_flush_ops {
+ tlb_flush_all: Some(tlb_flush_all_callback::<T>),
+ tlb_flush_walk: Some(tlb_flush_walk_callback::<T>),
+ tlb_add_page: Some(tlb_add_page_callback::<T>),
+ };
+
+ fn inner(&self) -> &IoPageTableInner {
+ &self.0
+ }
+ }
+
+ impl<T: FlushOps> Drop for $type<T> {
+ fn drop(&mut self) {
+ // SAFETY: The pointer is valid by the type invariant.
+ unsafe { bindings::free_io_pgtable_ops(self.0.ops) };
+
+ // Free context data.
+ //
+ // SAFETY: This matches the call to `into_foreign` from `new_fmt`.
+ unsafe { T::Data::from_foreign(self.0.data) };
+ }
+ }
+ };
+}
+
+// Ew, bindgen unions really are quite messy...
+iopt_cfg!(
+ ARMLPAES1Cfg,
+ arm_lpae_s1_cfg,
+ io_pgtable_cfg__bindgen_ty_1__bindgen_ty_1
+);
+iopt_cfg!(
+ ARMLPAES2Cfg,
+ arm_lpae_s2_cfg,
+ io_pgtable_cfg__bindgen_ty_1__bindgen_ty_2
+);
+iopt_cfg!(
+ ARMv7SCfg,
+ arm_v7s_cfg,
+ io_pgtable_cfg__bindgen_ty_1__bindgen_ty_3
+);
+iopt_cfg!(
+ ARMMaliLPAECfg,
+ arm_mali_lpae_cfg,
+ io_pgtable_cfg__bindgen_ty_1__bindgen_ty_4
+);
+iopt_cfg!(
+ AppleDARTCfg,
+ apple_dart_cfg,
+ io_pgtable_cfg__bindgen_ty_1__bindgen_ty_5
+);
+
+iopt_type!(ARM32LPAES1, ARMLPAES1Cfg, io_pgtable_fmt_ARM_32_LPAE_S1);
+iopt_type!(ARM32LPAES2, ARMLPAES2Cfg, io_pgtable_fmt_ARM_32_LPAE_S2);
+iopt_type!(ARM64LPAES1, ARMLPAES1Cfg, io_pgtable_fmt_ARM_64_LPAE_S1);
+iopt_type!(ARM64LPAES2, ARMLPAES2Cfg, io_pgtable_fmt_ARM_64_LPAE_S2);
+iopt_type!(ARMv7S, ARMv7SCfg, io_pgtable_fmt_ARM_V7S);
+iopt_type!(ARMMaliLPAE, ARMMaliLPAECfg, io_pgtable_fmt_ARM_MALI_LPAE);
+iopt_type!(AMDIOMMUV1, (), io_pgtable_fmt_AMD_IOMMU_V1);
+iopt_type!(AppleDART, AppleDARTCfg, io_pgtable_fmt_APPLE_DART);
+iopt_type!(AppleDART2, AppleDARTCfg, io_pgtable_fmt_APPLE_DART2);
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index de44092718f8..9944086d7e09 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -31,6 +31,8 @@ mod allocator;
mod build_assert;
pub mod device;
pub mod error;
+#[cfg(CONFIG_IOMMU_IO_PGTABLE)]
+pub mod io_pgtable;
pub mod prelude;
pub mod print;
mod static_assert;

--
2.35.1


2023-02-24 10:54:50

by Asahi Lina

[permalink] [raw]
Subject: [PATCH 4/5] rust: soc: apple: rtkit: Add Apple RTKit abstraction

RTKit is Apple's proprietary real-time operating system framework, used
across many subdevices on Apple Silicon platforms including NVMe, system
management, GPU, etc. Add Rust abstractions for this subsystem, so that
it can be used by upcoming Rust drivers.

This API is safe under the expectation that all RTKit coprocessors
either have no unfiltered access to system memory (SMC), are behind a
DART IOMMU (DCP, etc.), or require other unsafe operations to be granted
access to arbitrary system memory (GFX, where the coprocessor page
tables need to be mutated in bootloader-allocated memory blocks in order
to map additional memory, which is a requirement to get it to do
anything interesting beyond basic startup.)

Note: Although ARM64 support is not yet merged, this can be built on amd64
with CONFIG_COMPILE_TEST=y.

Signed-off-by: Asahi Lina <[email protected]>
---
rust/bindings/bindings_helper.h | 1 +
rust/kernel/lib.rs | 1 +
rust/kernel/soc/apple/mod.rs | 6 +
rust/kernel/soc/apple/rtkit.rs | 259 ++++++++++++++++++++++++++++++++++++++++
rust/kernel/soc/mod.rs | 5 +
5 files changed, 272 insertions(+)

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 88c65431d3ad..c920d6242e3a 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -10,6 +10,7 @@
#include <linux/io-pgtable.h>
#include <linux/slab.h>
#include <linux/refcount.h>
+#include <linux/soc/apple/rtkit.h>

/* `bindgen` gets confused at certain things. */
const gfp_t BINDINGS_GFP_KERNEL = GFP_KERNEL;
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 9944086d7e09..78108cbbf814 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -35,6 +35,7 @@ pub mod error;
pub mod io_pgtable;
pub mod prelude;
pub mod print;
+pub mod soc;
mod static_assert;
#[doc(hidden)]
pub mod std_vendor;
diff --git a/rust/kernel/soc/apple/mod.rs b/rust/kernel/soc/apple/mod.rs
new file mode 100644
index 000000000000..dd69db63677d
--- /dev/null
+++ b/rust/kernel/soc/apple/mod.rs
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0-only OR MIT
+
+//! Apple SoC drivers
+
+#[cfg(CONFIG_APPLE_RTKIT = "y")]
+pub mod rtkit;
diff --git a/rust/kernel/soc/apple/rtkit.rs b/rust/kernel/soc/apple/rtkit.rs
new file mode 100644
index 000000000000..595b9b3dda96
--- /dev/null
+++ b/rust/kernel/soc/apple/rtkit.rs
@@ -0,0 +1,259 @@
+// SPDX-License-Identifier: GPL-2.0-only OR MIT
+
+//! Support for Apple RTKit coprocessors.
+//!
+//! C header: [`include/linux/soc/apple/rtkit.h`](../../../../include/linux/gpio/driver.h)
+
+use crate::{
+ bindings, device,
+ error::{code::*, from_kernel_err_ptr, from_kernel_result, to_result, Result},
+ str::CStr,
+ types::{ForeignOwnable, ScopeGuard},
+};
+
+use alloc::boxed::Box;
+use core::marker::PhantomData;
+use core::ptr;
+use macros::vtable;
+
+/// Trait to represent allocatable buffers for the RTKit core.
+///
+/// Users must implement this trait for their own representation of those allocations.
+pub trait Buffer {
+ /// Returns the IOVA (virtual address) of the buffer from RTKit's point of view, or an error if
+ /// unavailable.
+ fn iova(&self) -> Result<usize>;
+
+ /// Returns a mutable byte slice of the buffer contents, or an
+ /// error if unavailable.
+ fn buf(&mut self) -> Result<&mut [u8]>;
+}
+
+/// Callback operations for an RTKit client.
+#[vtable]
+pub trait Operations {
+ /// Arbitrary user context type.
+ type Data: ForeignOwnable + Send + Sync;
+
+ /// Type representing an allocated buffer for RTKit.
+ type Buffer: Buffer;
+
+ /// Called when RTKit crashes.
+ fn crashed(_data: <Self::Data as ForeignOwnable>::Borrowed<'_>) {}
+
+ /// Called when a message was received on a non-system endpoint. Called in non-IRQ context.
+ fn recv_message(
+ _data: <Self::Data as ForeignOwnable>::Borrowed<'_>,
+ _endpoint: u8,
+ _message: u64,
+ ) {
+ }
+
+ /// Called in IRQ context when a message was received on a non-system endpoint.
+ ///
+ /// Must return `true` if the message is handled, or `false` to process it in
+ /// the handling thread.
+ fn recv_message_early(
+ _data: <Self::Data as ForeignOwnable>::Borrowed<'_>,
+ _endpoint: u8,
+ _message: u64,
+ ) -> bool {
+ false
+ }
+
+ /// Allocate a buffer for use by RTKit.
+ fn shmem_alloc(
+ _data: <Self::Data as ForeignOwnable>::Borrowed<'_>,
+ _size: usize,
+ ) -> Result<Self::Buffer> {
+ Err(EINVAL)
+ }
+
+ /// Map an existing buffer used by RTKit at a device-specified virtual address.
+ fn shmem_map(
+ _data: <Self::Data as ForeignOwnable>::Borrowed<'_>,
+ _iova: usize,
+ _size: usize,
+ ) -> Result<Self::Buffer> {
+ Err(EINVAL)
+ }
+}
+
+/// Represents `struct apple_rtkit *`.
+///
+/// # Invariants
+///
+/// The rtk pointer is valid.
+/// The data pointer is a valid pointer from T::Data::into_foreign().
+pub struct RtKit<T: Operations> {
+ rtk: *mut bindings::apple_rtkit,
+ data: *mut core::ffi::c_void,
+ _p: PhantomData<T>,
+}
+
+unsafe extern "C" fn crashed_callback<T: Operations>(cookie: *mut core::ffi::c_void) {
+ // SAFETY: cookie is always a PointerWrapper of the right type, passed in new().
+ T::crashed(unsafe { T::Data::borrow(cookie) });
+}
+
+unsafe extern "C" fn recv_message_callback<T: Operations>(
+ cookie: *mut core::ffi::c_void,
+ endpoint: u8,
+ message: u64,
+) {
+ // SAFETY: cookie is always a PointerWrapper of the right type, passed in new().
+ T::recv_message(unsafe { T::Data::borrow(cookie) }, endpoint, message);
+}
+
+unsafe extern "C" fn recv_message_early_callback<T: Operations>(
+ cookie: *mut core::ffi::c_void,
+ endpoint: u8,
+ message: u64,
+) -> bool {
+ // SAFETY: cookie is always a PointerWrapper of the right type, passed in new().
+ T::recv_message_early(unsafe { T::Data::borrow(cookie) }, endpoint, message)
+}
+
+unsafe extern "C" fn shmem_setup_callback<T: Operations>(
+ cookie: *mut core::ffi::c_void,
+ bfr: *mut bindings::apple_rtkit_shmem,
+) -> core::ffi::c_int {
+ // SAFETY: `bfr` is a valid buffer.
+ let bfr_mut = unsafe { &mut *bfr };
+
+ let buf = if bfr_mut.iova != 0 {
+ bfr_mut.is_mapped = true;
+ T::shmem_map(
+ // SAFETY: `cookie` came from a previous call to `into_foreign`.
+ unsafe { T::Data::borrow(cookie) },
+ bfr_mut.iova as usize,
+ bfr_mut.size,
+ )
+ } else {
+ bfr_mut.is_mapped = false;
+ // SAFETY: `cookie` came from a previous call to `into_foreign`.
+ T::shmem_alloc(unsafe { T::Data::borrow(cookie) }, bfr_mut.size)
+ };
+
+ from_kernel_result! {
+ let mut buf = buf?;
+ let iova = buf.iova()?;
+ let slice = buf.buf()?;
+
+ if slice.len() < bfr_mut.size {
+ return Err(ENOMEM);
+ }
+
+ bfr_mut.iova = iova as u64;
+ bfr_mut.buffer = slice.as_mut_ptr() as *mut _;
+
+ // Now box the returned buffer type and stash it in the private pointer of the
+ // `apple_rtkit_shmem` struct for safekeeping.
+ bfr_mut.private = Box::into_raw(Box::try_new(buf)?) as *mut _;
+ Ok(0)
+ }
+}
+
+unsafe extern "C" fn shmem_destroy_callback<T: Operations>(
+ _cookie: *mut core::ffi::c_void,
+ bfr: *mut bindings::apple_rtkit_shmem,
+) {
+ // SAFETY: `bfr` is a valid buffer.
+ let bfr_mut = unsafe { &mut *bfr };
+ if !bfr_mut.private.is_null() {
+ // SAFETY: Per shmem_setup_callback, this has to be a pointer to a Buffer if it is set.
+ unsafe {
+ core::mem::drop(Box::from_raw(bfr_mut.private as *mut T::Buffer));
+ }
+ bfr_mut.private = core::ptr::null_mut();
+ }
+}
+
+impl<T: Operations> RtKit<T> {
+ const VTABLE: bindings::apple_rtkit_ops = bindings::apple_rtkit_ops {
+ crashed: Some(crashed_callback::<T>),
+ recv_message: Some(recv_message_callback::<T>),
+ recv_message_early: Some(recv_message_early_callback::<T>),
+ shmem_setup: if T::HAS_SHMEM_ALLOC || T::HAS_SHMEM_MAP {
+ Some(shmem_setup_callback::<T>)
+ } else {
+ None
+ },
+ shmem_destroy: if T::HAS_SHMEM_ALLOC || T::HAS_SHMEM_MAP {
+ Some(shmem_destroy_callback::<T>)
+ } else {
+ None
+ },
+ };
+
+ /// Creates a new RTKit client for a given device and optional mailbox name or index.
+ pub fn new(
+ dev: &dyn device::RawDevice,
+ mbox_name: Option<&'static CStr>,
+ mbox_idx: usize,
+ data: T::Data,
+ ) -> Result<Self> {
+ let ptr = data.into_foreign() as *mut _;
+ let guard = ScopeGuard::new(|| {
+ // SAFETY: `ptr` came from a previous call to `into_foreign`.
+ unsafe { T::Data::from_foreign(ptr) };
+ });
+ // SAFETY: This just calls the C init function.
+ let rtk = unsafe {
+ from_kernel_err_ptr(bindings::apple_rtkit_init(
+ dev.raw_device(),
+ ptr,
+ match mbox_name {
+ Some(s) => s.as_char_ptr(),
+ None => ptr::null(),
+ },
+ mbox_idx.try_into()?,
+ &Self::VTABLE,
+ ))
+ }?;
+
+ guard.dismiss();
+ // INVARIANT: `rtk` and `data` are valid here.
+ Ok(Self {
+ rtk,
+ data: ptr,
+ _p: PhantomData,
+ })
+ }
+
+ /// Boots (wakes up) the RTKit coprocessor.
+ pub fn boot(&mut self) -> Result {
+ // SAFETY: `rtk` is valid per the type invariant.
+ to_result(unsafe { bindings::apple_rtkit_boot(self.rtk) })
+ }
+
+ /// Starts a non-system endpoint.
+ pub fn start_endpoint(&mut self, endpoint: u8) -> Result {
+ // SAFETY: `rtk` is valid per the type invariant.
+ to_result(unsafe { bindings::apple_rtkit_start_ep(self.rtk, endpoint) })
+ }
+
+ /// Sends a message to a given endpoint.
+ pub fn send_message(&mut self, endpoint: u8, message: u64) -> Result {
+ // SAFETY: `rtk` is valid per the type invariant.
+ to_result(unsafe {
+ bindings::apple_rtkit_send_message(self.rtk, endpoint, message, ptr::null_mut(), false)
+ })
+ }
+}
+
+// SAFETY: `RtKit` operations require a mutable reference.
+unsafe impl<T: Operations> Sync for RtKit<T> {}
+unsafe impl<T: Operations> Send for RtKit<T> {}
+
+impl<T: Operations> Drop for RtKit<T> {
+ fn drop(&mut self) {
+ // SAFETY: The pointer is valid by the type invariant.
+ unsafe { bindings::apple_rtkit_free(self.rtk) };
+
+ // Free context data.
+ //
+ // SAFETY: This matches the call to `into_foreign` from `new` in the success case.
+ unsafe { T::Data::from_foreign(self.data) };
+ }
+}
diff --git a/rust/kernel/soc/mod.rs b/rust/kernel/soc/mod.rs
new file mode 100644
index 000000000000..e3024042e74f
--- /dev/null
+++ b/rust/kernel/soc/mod.rs
@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! SoC drivers
+
+pub mod apple;

--
2.35.1


2023-02-24 10:55:03

by Asahi Lina

[permalink] [raw]
Subject: [PATCH 5/5] rust: device: Add a stub abstraction for devices

From: Wedson Almeida Filho <[email protected]>

Add a Device type which represents an owned reference to a generic
struct device. This minimal implementation just handles reference
counting and allows the user to get the device name.

Lina: Rewrote commit message, dropped the Amba bits, and squashed in
simple changes to the core Device code from latter commits in
rust-for-linux/rust. Also include the rust_helper_dev_get_drvdata
helper which will be needed by consumers later on anyway.

Co-developed-by: Miguel Ojeda <[email protected]>
Signed-off-by: Miguel Ojeda <[email protected]>
Signed-off-by: Wedson Almeida Filho <[email protected]>
Signed-off-by: Asahi Lina <[email protected]>
---
rust/helpers.c | 13 +++++++++
rust/kernel/device.rs | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/rust/helpers.c b/rust/helpers.c
index 04b9be46e887..54954fd80c77 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -20,6 +20,7 @@

#include <linux/bug.h>
#include <linux/build_bug.h>
+#include <linux/device.h>
#include <linux/err.h>
#include <linux/refcount.h>

@@ -65,6 +66,18 @@ long rust_helper_PTR_ERR(__force const void *ptr)
}
EXPORT_SYMBOL_GPL(rust_helper_PTR_ERR);

+void *rust_helper_dev_get_drvdata(struct device *dev)
+{
+ return dev_get_drvdata(dev);
+}
+EXPORT_SYMBOL_GPL(rust_helper_dev_get_drvdata);
+
+const char *rust_helper_dev_name(const struct device *dev)
+{
+ return dev_name(dev);
+}
+EXPORT_SYMBOL_GPL(rust_helper_dev_name);
+
/*
* We use `bindgen`'s `--size_t-is-usize` option to bind the C `size_t` type
* as the Rust `usize` type, so we can use it in contexts where Rust
diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs
index 9be021e393ca..e57da622d817 100644
--- a/rust/kernel/device.rs
+++ b/rust/kernel/device.rs
@@ -4,7 +4,7 @@
//!
//! C header: [`include/linux/device.h`](../../../../include/linux/device.h)

-use crate::bindings;
+use crate::{bindings, str::CStr};

/// A raw device.
///
@@ -20,4 +20,78 @@ use crate::bindings;
pub unsafe trait RawDevice {
/// Returns the raw `struct device` related to `self`.
fn raw_device(&self) -> *mut bindings::device;
+
+ /// Returns the name of the device.
+ fn name(&self) -> &CStr {
+ let ptr = self.raw_device();
+
+ // SAFETY: `ptr` is valid because `self` keeps it alive.
+ let name = unsafe { bindings::dev_name(ptr) };
+
+ // SAFETY: The name of the device remains valid while it is alive (because the device is
+ // never renamed, per the safety requirement of this trait). This is guaranteed to be the
+ // case because the reference to `self` outlives the one of the returned `CStr` (enforced
+ // by the compiler because of their lifetimes).
+ unsafe { CStr::from_char_ptr(name) }
+ }
+}
+
+/// A ref-counted device.
+///
+/// # Invariants
+///
+/// `ptr` is valid, non-null, and has a non-zero reference count. One of the references is owned by
+/// `self`, and will be decremented when `self` is dropped.
+pub struct Device {
+ pub(crate) ptr: *mut bindings::device,
+}
+
+// SAFETY: `Device` only holds a pointer to a C device, which is safe to be used from any thread.
+unsafe impl Send for Device {}
+
+// SAFETY: `Device` only holds a pointer to a C device, references to which are safe to be used
+// from any thread.
+unsafe impl Sync for Device {}
+
+impl Device {
+ /// Creates a new device instance.
+ ///
+ /// # Safety
+ ///
+ /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
+ pub unsafe fn new(ptr: *mut bindings::device) -> Self {
+ // SAFETY: By the safety requirements, ptr is valid and its refcounted will be incremented.
+ unsafe { bindings::get_device(ptr) };
+ // INVARIANT: The safety requirements satisfy all but one invariant, which is that `self`
+ // owns a reference. This is satisfied by the call to `get_device` above.
+ Self { ptr }
+ }
+
+ /// Creates a new device instance from an existing [`RawDevice`] instance.
+ pub fn from_dev(dev: &dyn RawDevice) -> Self {
+ // SAFETY: The requirements are satisfied by the existence of `RawDevice` and its safety
+ // requirements.
+ unsafe { Self::new(dev.raw_device()) }
+ }
+}
+
+// SAFETY: The device returned by `raw_device` is the one for which we hold a reference.
+unsafe impl RawDevice for Device {
+ fn raw_device(&self) -> *mut bindings::device {
+ self.ptr
+ }
+}
+
+impl Drop for Device {
+ fn drop(&mut self) {
+ // SAFETY: By the type invariants, we know that `self` owns a reference, so it is safe to
+ // relinquish it now.
+ unsafe { bindings::put_device(self.ptr) };
+ }
+}
+
+impl Clone for Device {
+ fn clone(&self) -> Self {
+ Device::from_dev(self)
+ }
}

--
2.35.1


2023-02-24 11:19:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On Fri, Feb 24, 2023 at 07:53:17PM +0900, Asahi Lina wrote:
> From: Wedson Almeida Filho <[email protected]>
>
> Add a Device type which represents an owned reference to a generic
> struct device. This minimal implementation just handles reference
> counting and allows the user to get the device name.

What good is just the device name? I'm all for proper bindings to hook
up properly to the driver core, but this feels like it's not really
doing much of anything.

Do you have a real user that we can see how this is interacting?

And what does a driver care about the device name anyway? It should
only be using the dev_*() calls to print that info out to the log, and
never messing around with it in any other format as that's what
userspace expects.

> Lina: Rewrote commit message, dropped the Amba bits, and squashed in
> simple changes to the core Device code from latter commits in
> rust-for-linux/rust. Also include the rust_helper_dev_get_drvdata
> helper which will be needed by consumers later on anyway.
>
> Co-developed-by: Miguel Ojeda <[email protected]>
> Signed-off-by: Miguel Ojeda <[email protected]>
> Signed-off-by: Wedson Almeida Filho <[email protected]>
> Signed-off-by: Asahi Lina <[email protected]>
> ---
> rust/helpers.c | 13 +++++++++
> rust/kernel/device.rs | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 88 insertions(+), 1 deletion(-)
>
> diff --git a/rust/helpers.c b/rust/helpers.c
> index 04b9be46e887..54954fd80c77 100644
> --- a/rust/helpers.c
> +++ b/rust/helpers.c
> @@ -20,6 +20,7 @@
>
> #include <linux/bug.h>
> #include <linux/build_bug.h>
> +#include <linux/device.h>
> #include <linux/err.h>
> #include <linux/refcount.h>
>
> @@ -65,6 +66,18 @@ long rust_helper_PTR_ERR(__force const void *ptr)
> }
> EXPORT_SYMBOL_GPL(rust_helper_PTR_ERR);
>
> +void *rust_helper_dev_get_drvdata(struct device *dev)
> +{
> + return dev_get_drvdata(dev);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_dev_get_drvdata);

No matching dev_set_drvdata()? What good is getting a random void
pointer if you couldn't set it in the first place? :)

> +const char *rust_helper_dev_name(const struct device *dev)
> +{
> + return dev_name(dev);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_dev_name);

Again, why? What is going to use this?

And I don't really understand the rules you are putting on the name
string after calling this, more below:

> +
> /*
> * We use `bindgen`'s `--size_t-is-usize` option to bind the C `size_t` type
> * as the Rust `usize` type, so we can use it in contexts where Rust
> diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs
> index 9be021e393ca..e57da622d817 100644
> --- a/rust/kernel/device.rs
> +++ b/rust/kernel/device.rs
> @@ -4,7 +4,7 @@
> //!
> //! C header: [`include/linux/device.h`](../../../../include/linux/device.h)
>
> -use crate::bindings;
> +use crate::{bindings, str::CStr};
>
> /// A raw device.
> ///
> @@ -20,4 +20,78 @@ use crate::bindings;
> pub unsafe trait RawDevice {
> /// Returns the raw `struct device` related to `self`.
> fn raw_device(&self) -> *mut bindings::device;
> +
> + /// Returns the name of the device.
> + fn name(&self) -> &CStr {
> + let ptr = self.raw_device();
> +
> + // SAFETY: `ptr` is valid because `self` keeps it alive.
> + let name = unsafe { bindings::dev_name(ptr) };
> +
> + // SAFETY: The name of the device remains valid while it is alive (because the device is
> + // never renamed, per the safety requirement of this trait). This is guaranteed to be the
> + // case because the reference to `self` outlives the one of the returned `CStr` (enforced
> + // by the compiler because of their lifetimes).
> + unsafe { CStr::from_char_ptr(name) }

Why can the device never be renamed? Devices are renamed all the time,
sometimes when you least expect it (i.e. by userspace). So how is this
considered "safe"? and actually correct?

Again, maybe seeing a real user of this might make more sense, but
as-is, this feels wrong and not needed at all.


> + }
> +}
> +
> +/// A ref-counted device.
> +///
> +/// # Invariants
> +///
> +/// `ptr` is valid, non-null, and has a non-zero reference count. One of the references is owned by
> +/// `self`, and will be decremented when `self` is dropped.
> +pub struct Device {
> + pub(crate) ptr: *mut bindings::device,
> +}
> +
> +// SAFETY: `Device` only holds a pointer to a C device, which is safe to be used from any thread.
> +unsafe impl Send for Device {}
> +
> +// SAFETY: `Device` only holds a pointer to a C device, references to which are safe to be used
> +// from any thread.
> +unsafe impl Sync for Device {}
> +
> +impl Device {
> + /// Creates a new device instance.
> + ///
> + /// # Safety
> + ///
> + /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
> + pub unsafe fn new(ptr: *mut bindings::device) -> Self {
> + // SAFETY: By the safety requirements, ptr is valid and its refcounted will be incremented.
> + unsafe { bindings::get_device(ptr) };

You don't check the return value of get_device()? What if it failed
(hint, it can)?


> + // INVARIANT: The safety requirements satisfy all but one invariant, which is that `self`
> + // owns a reference. This is satisfied by the call to `get_device` above.
> + Self { ptr }
> + }
> +
> + /// Creates a new device instance from an existing [`RawDevice`] instance.
> + pub fn from_dev(dev: &dyn RawDevice) -> Self {

I am a rust newbie, but I don't understand this "RawDevice" here at all.


> + // SAFETY: The requirements are satisfied by the existence of `RawDevice` and its safety
> + // requirements.
> + unsafe { Self::new(dev.raw_device()) }
> + }
> +}
> +
> +// SAFETY: The device returned by `raw_device` is the one for which we hold a reference.
> +unsafe impl RawDevice for Device {
> + fn raw_device(&self) -> *mut bindings::device {
> + self.ptr
> + }

What does this raw device do? What is the relationship to a "real"
device? Maybe it's just my lack of rust knowledge here showing, so any
hints would be appreciated.

> +}
> +
> +impl Drop for Device {
> + fn drop(&mut self) {
> + // SAFETY: By the type invariants, we know that `self` owns a reference, so it is safe to
> + // relinquish it now.
> + unsafe { bindings::put_device(self.ptr) };

So it is now ok for this to be freed?

One meta-comment, why is any of this needed at all? No driver should
ever be dealing with a "raw" struct device pointer at all, right? They
should be calling into subsystems that give it a pointer to a
bus-specific pointer type (gpio, usb, pci, etc.)

So I'm thinking that adding support for "raw" struct device pointers
feels ripe for abuse in a "the code should not be doing that" type of
thing.

Unless you are writing a new bus/class in rust?

thanks,

greg k-h

2023-02-24 11:23:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On Fri, Feb 24, 2023 at 07:53:14PM +0900, Asahi Lina wrote:
> From: Wedson Almeida Filho <[email protected]>
>
> Add a RawDevice trait which can be implemented by any type representing
> a device class (such as a PlatformDevice). This is the minimum amount of
> Device support code required to unblock abstractions that need to take
> device pointers.
>
> Lina: Rewrote commit message, and dropped everything except RawDevice.
>
> Co-developed-by: Miguel Ojeda <[email protected]>
> Signed-off-by: Miguel Ojeda <[email protected]>
> Signed-off-by: Wedson Almeida Filho <[email protected]>
> Signed-off-by: Asahi Lina <[email protected]>
> ---
> rust/bindings/bindings_helper.h | 1 +
> rust/kernel/device.rs | 23 +++++++++++++++++++++++
> rust/kernel/lib.rs | 1 +
> 3 files changed, 25 insertions(+)
>
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 75d85bd6c592..3632a39a28a6 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -6,6 +6,7 @@
> * Sorted alphabetically.
> */
>
> +#include <linux/device.h>
> #include <linux/slab.h>
> #include <linux/refcount.h>
>
> diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs
> new file mode 100644
> index 000000000000..9be021e393ca
> --- /dev/null
> +++ b/rust/kernel/device.rs
> @@ -0,0 +1,23 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Generic devices that are part of the kernel's driver model.
> +//!
> +//! C header: [`include/linux/device.h`](../../../../include/linux/device.h)
> +
> +use crate::bindings;
> +
> +/// A raw device.
> +///
> +/// # Safety
> +///
> +/// Implementers must ensure that the `*mut device` returned by [`RawDevice::raw_device`] is
> +/// related to `self`, that is, actions on it will affect `self`. For example, if one calls
> +/// `get_device`, then the refcount on the device represented by `self` will be incremented.

What is a "implementer" here? Rust code? C code? Who is calling
get_device() here, rust code or the driver code or something else? How
are you matching up the reference logic of this structure with the fact
that the driver core actually owns the reference, not any rust code?


> +///
> +/// Additionally, implementers must ensure that the device is never renamed. Commit a5462516aa99
> +/// ("driver-core: document restrictions on device_rename()") has details on why `device_rename`
> +/// should not be used.

As much as I would have liked that, that commit was from 2010 and
device_rename() is still being used by some pretty large subsystems
(networking and IB) and I don't see that going away any year soon.

So yes, your device will be renamed, so don't mess with the device name
like I mentioned in review of path 5/5 in this series.

> +pub unsafe trait RawDevice {
> + /// Returns the raw `struct device` related to `self`.
> + fn raw_device(&self) -> *mut bindings::device;

Again, what prevents this device from going away? I don't see a call to
get_device() here :(

And again, why are bindings needed for a "raw" struct device at all?
Shouldn't the bus-specific wrappings work better?

thanks,

greg k-h

2023-02-24 13:15:27

by Asahi Lina

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On 24/02/2023 20.23, Greg Kroah-Hartman wrote:
> On Fri, Feb 24, 2023 at 07:53:14PM +0900, Asahi Lina wrote:
>> From: Wedson Almeida Filho <[email protected]>
>>
>> Add a RawDevice trait which can be implemented by any type representing
>> a device class (such as a PlatformDevice). This is the minimum amount of
>> Device support code required to unblock abstractions that need to take
>> device pointers.
>>
>> Lina: Rewrote commit message, and dropped everything except RawDevice.
>>
>> Co-developed-by: Miguel Ojeda <[email protected]>
>> Signed-off-by: Miguel Ojeda <[email protected]>
>> Signed-off-by: Wedson Almeida Filho <[email protected]>
>> Signed-off-by: Asahi Lina <[email protected]>
>> ---
>> rust/bindings/bindings_helper.h | 1 +
>> rust/kernel/device.rs | 23 +++++++++++++++++++++++
>> rust/kernel/lib.rs | 1 +
>> 3 files changed, 25 insertions(+)
>>
>> diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs
>> new file mode 100644
>> index 000000000000..9be021e393ca
>> --- /dev/null
>> +++ b/rust/kernel/device.rs
>> @@ -0,0 +1,23 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Generic devices that are part of the kernel's driver model.
>> +//!
>> +//! C header: [`include/linux/device.h`](../../../../include/linux/device.h)
>> +
>> +use crate::bindings;
>> +
>> +/// A raw device.
>> +///
>> +/// # Safety
>> +///
>> +/// Implementers must ensure that the `*mut device` returned by [`RawDevice::raw_device`] is
>> +/// related to `self`, that is, actions on it will affect `self`. For example, if one calls
>> +/// `get_device`, then the refcount on the device represented by `self` will be incremented.
>
> What is a "implementer" here? Rust code? C code? Who is calling
> get_device() here, rust code or the driver code or something else? How
> are you matching up the reference logic of this structure with the fact
> that the driver core actually owns the reference, not any rust code?

This is a Rust trait, so that would be Rust code. In practice that means
other abstractions for different device buses, and the generic one which
I sent as patch 5. It's not a structure, so it doesn't have any
reference logic itself. That would go into the implementing struct (like
`Device`). What the comment is saying is that other parts of the Rust
kernel crate may make assumptions about the relationship between the
lifetime of the struct implementing RawDevice and the lifetime of the
underlying `struct device`. So for example, it's not legal to return a
borrowed reference to a `struct device` that might go away before `self`
does.

We should probably make this `Sealed` too, to make sure no driver can
even try to implement it (which probably wouldn't make sense)... this
should only ever be implemented by stuff in the kernel crate.

The reference is owned by whoever owns the reference, no? If you call
get_device(), you own a new reference to the device (which means you can
keep the pointer around until you call put_device()). Normally in Rust
you would have logic where whatever Rust structure that owns a reference
(which in practice is most things that implement RawDevice) would also
have a `Clone` trait impl that calls `get_device()` and duplicates
itself, and a `Drop` trait impl that calls `put_device()`. In Rust
terms, the Rust structure "owns" the device reference, and whatever code
creates that structure needs to either be allowed to steal the reference
from its caller or explicitly call `get_device()`.

You can also elide that in some cases though, like when you're just
passing a device reference down from a callback into Rust code. Then the
caller guarantees it has a device reference, which will outlive the
execution of the callback. So the wrapper that calls user Rust code can
materialize a RawDevice implementation without calling `get_device()`,
and then pass a Rust reference to it (which means the downstream code
can't steal it), and then make sure it gets destroyed without a
`put_device()` before the callback returns. In Rust terms, that's a code
path that passes a borrowed reference to the device down into user code.

This kind of ties back to why Rust rather than C... C does not encode
concepts like "borrowing a reference" or "taking over a reference" in
the type system, so it's really easy to mess up the refcounting and end
up with dangling references or leaks. Rust does, so once you wrap a C
API with the matching Rust semantics, there's no way to mess up and have
those issues any more.

>
>> +///
>> +/// Additionally, implementers must ensure that the device is never renamed. Commit a5462516aa99
>> +/// ("driver-core: document restrictions on device_rename()") has details on why `device_rename`
>> +/// should not be used.
>
> As much as I would have liked that, that commit was from 2010 and
> device_rename() is still being used by some pretty large subsystems
> (networking and IB) and I don't see that going away any year soon.
>
> So yes, your device will be renamed, so don't mess with the device name
> like I mentioned in review of path 5/5 in this series.

I think we can just drop that part (and the name stuff) then. Wedson, is
that okay?

>
>> +pub unsafe trait RawDevice {
>> + /// Returns the raw `struct device` related to `self`.
>> + fn raw_device(&self) -> *mut bindings::device;
>
> Again, what prevents this device from going away? I don't see a call to
> get_device() here :(

That would be up to the caller, if it needs to keep the pointer around.
Think of `raw_device()` as just "please return the `struct device *`
pointer for yourself, thanks". If you need to store that pointer for
later, you need to call `get_device()`. But most of the time it will be
used transiently, and then you don't need to take any references since
the safety requirement above guarantees that `self` owns a reference.
The caller then just needs to make sure not to throw away `self` before
it finishes using the returned pointer.

Keep in mind that this is an internal abstraction, it's there to be
implemented and used by the kernel crate itself. User code can't
actually do anything with the returned pointer without using `unsafe`
(so it's safe to expose this) and can't implement the trait without
`unsafe` either. But it can take dynamic trait arguments of type
RawDevice and pass them along, which is perfectly fine. Rust guarantees
that the right `get_device()` and `put_device()` are called when
references get cloned or dropped, for owned objects, and for borrowed
references there is no need to touch the refcount.

> And again, why are bindings needed for a "raw" struct device at all?
> Shouldn't the bus-specific wrappings work better?

Because lots of kernel subsystems need to be able to accept "any" device
and don't care about the bus! That's what this is for. All the bus
wrappers would implement this so they can be used as an argument for all
those subsystems (plus a generic one when you just need to pass around
an actual owned generic reference and no longer need bus-specific
operations - you can materialize that out of a RawDevice impl, which is
when get_device() would be called). That's why I'm introducing this now,
because both io_pgtable and rtkit need to take `struct device` pointers
on the C side so we need some "generic struct device" view on the Rust side.

~~ Lina

2023-02-24 14:11:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

Thanks for the detailed rust explainations, I'd like to just highlight
one thing:

On Fri, Feb 24, 2023 at 10:15:12PM +0900, Asahi Lina wrote:
> On 24/02/2023 20.23, Greg Kroah-Hartman wrote:
> > And again, why are bindings needed for a "raw" struct device at all?
> > Shouldn't the bus-specific wrappings work better?
>
> Because lots of kernel subsystems need to be able to accept "any" device
> and don't care about the bus! That's what this is for.

That's great, but:

> All the bus
> wrappers would implement this so they can be used as an argument for all
> those subsystems (plus a generic one when you just need to pass around
> an actual owned generic reference and no longer need bus-specific
> operations - you can materialize that out of a RawDevice impl, which is
> when get_device() would be called). That's why I'm introducing this now,
> because both io_pgtable and rtkit need to take `struct device` pointers
> on the C side so we need some "generic struct device" view on the Rust side.

In looking at both ftkit and io_pgtable, those seem to be good examples
of how "not to use a struct device", so trying to make safe bindings
from Rust to these frameworks is very ironic :)

rtkit takes a struct device pointer and then never increments it,
despite saving it off, which is unsafe. It then only uses it to print
out messages if things go wrong (or right in some cases), which is odd.
So it can get away from using a device pointer entirely, except for the
devm_apple_rtkit_init() call, which I doubt you want to call from rust
code, right?

for io_pgtable, that's a bit messier, you want to pass in a device that
io_pgtable treats as a "device" but again, it is NEVER properly
reference counted, AND, it is only needed to try to figure out the bus
operations that dma memory should be allocated from for this device. So
what would be better to save off there would be a pointer to the bus,
which is constant and soon will be read-only so there are no lifetime
rules needed at all (see the major struct bus_type changes going into
6.3-rc1 that will enable that to happen).

So the two subsystems you want to call from rust code don't properly
handle the reference count of the object you are going to pass into it,
and only need it for debugging and iommu stuff, which is really only the
bus that the device is on, not good examples to start out with :)

Yeah, this is yack-shaving, sorry, but it's how we clean up core
subsystems for apis and implementations that are not really correct and
were not noticed at the time.

Can we see some users of this code posted so I can see how struct device
is going to work in a rust driver? That's the thing I worry most about
the rust/C interaction here as we have two different ways of thinking
about reference counts from the two worlds and putting them together is
going to be "interesting", as can be seen here already.

thanks,

greg k-h

2023-02-24 14:19:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On Fri, Feb 24, 2023 at 03:11:05PM +0100, Greg Kroah-Hartman wrote:
> Thanks for the detailed rust explainations, I'd like to just highlight
> one thing:
>
> On Fri, Feb 24, 2023 at 10:15:12PM +0900, Asahi Lina wrote:
> > On 24/02/2023 20.23, Greg Kroah-Hartman wrote:
> > > And again, why are bindings needed for a "raw" struct device at all?
> > > Shouldn't the bus-specific wrappings work better?
> >
> > Because lots of kernel subsystems need to be able to accept "any" device
> > and don't care about the bus! That's what this is for.
>
> That's great, but:
>
> > All the bus
> > wrappers would implement this so they can be used as an argument for all
> > those subsystems (plus a generic one when you just need to pass around
> > an actual owned generic reference and no longer need bus-specific
> > operations - you can materialize that out of a RawDevice impl, which is
> > when get_device() would be called). That's why I'm introducing this now,
> > because both io_pgtable and rtkit need to take `struct device` pointers
> > on the C side so we need some "generic struct device" view on the Rust side.
>
> In looking at both ftkit and io_pgtable, those seem to be good examples
> of how "not to use a struct device", so trying to make safe bindings
> from Rust to these frameworks is very ironic :)
>
> rtkit takes a struct device pointer and then never increments it,
> despite saving it off, which is unsafe. It then only uses it to print
> out messages if things go wrong (or right in some cases), which is odd.
> So it can get away from using a device pointer entirely, except for the
> devm_apple_rtkit_init() call, which I doubt you want to call from rust
> code, right?
>
> for io_pgtable, that's a bit messier, you want to pass in a device that
> io_pgtable treats as a "device" but again, it is NEVER properly
> reference counted, AND, it is only needed to try to figure out the bus
> operations that dma memory should be allocated from for this device. So
> what would be better to save off there would be a pointer to the bus,
> which is constant and soon will be read-only so there are no lifetime
> rules needed at all (see the major struct bus_type changes going into
> 6.3-rc1 that will enable that to happen).
>
> So the two subsystems you want to call from rust code don't properly
> handle the reference count of the object you are going to pass into it,
> and only need it for debugging and iommu stuff, which is really only the
> bus that the device is on, not good examples to start out with :)
>
> Yeah, this is yack-shaving, sorry, but it's how we clean up core
> subsystems for apis and implementations that are not really correct and
> were not noticed at the time.
>
> Can we see some users of this code posted so I can see how struct device
> is going to work in a rust driver? That's the thing I worry most about
> the rust/C interaction here as we have two different ways of thinking
> about reference counts from the two worlds and putting them together is
> going to be "interesting", as can be seen here already.

Also, where are you getting your 'struct device' from in the first
place? What bus is createing it and giving it to your rust driver?

thanks,

greg k-h

2023-02-24 14:33:20

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On 2023-02-24 14:11, Greg Kroah-Hartman wrote:
> Thanks for the detailed rust explainations, I'd like to just highlight
> one thing:
>
> On Fri, Feb 24, 2023 at 10:15:12PM +0900, Asahi Lina wrote:
>> On 24/02/2023 20.23, Greg Kroah-Hartman wrote:
>>> And again, why are bindings needed for a "raw" struct device at all?
>>> Shouldn't the bus-specific wrappings work better?
>>
>> Because lots of kernel subsystems need to be able to accept "any" device
>> and don't care about the bus! That's what this is for.
>
> That's great, but:
>
>> All the bus
>> wrappers would implement this so they can be used as an argument for all
>> those subsystems (plus a generic one when you just need to pass around
>> an actual owned generic reference and no longer need bus-specific
>> operations - you can materialize that out of a RawDevice impl, which is
>> when get_device() would be called). That's why I'm introducing this now,
>> because both io_pgtable and rtkit need to take `struct device` pointers
>> on the C side so we need some "generic struct device" view on the Rust side.
>
> In looking at both ftkit and io_pgtable, those seem to be good examples
> of how "not to use a struct device", so trying to make safe bindings
> from Rust to these frameworks is very ironic :)
>
> rtkit takes a struct device pointer and then never increments it,
> despite saving it off, which is unsafe. It then only uses it to print
> out messages if things go wrong (or right in some cases), which is odd.
> So it can get away from using a device pointer entirely, except for the
> devm_apple_rtkit_init() call, which I doubt you want to call from rust
> code, right?
>
> for io_pgtable, that's a bit messier, you want to pass in a device that
> io_pgtable treats as a "device" but again, it is NEVER properly
> reference counted, AND, it is only needed to try to figure out the bus
> operations that dma memory should be allocated from for this device. So
> what would be better to save off there would be a pointer to the bus,
> which is constant and soon will be read-only so there are no lifetime
> rules needed at all (see the major struct bus_type changes going into
> 6.3-rc1 that will enable that to happen).

FWIW the DMA API *has* to know which specific device it's operating
with, since the relevant properties can and do vary even between
different devices within a single bus_type (e.g. DMA masks).

In the case of io-pgtable at least, there's no explicit refcounting
since the struct device must be the one representing the physical
platform/PCI/etc. device consuming the pagetable, so if that were to
disappear from underneath its driver while the pagetable is still in
use, things would already have gone very very wrong indeed :)

Cheers,
Robin.

> So the two subsystems you want to call from rust code don't properly
> handle the reference count of the object you are going to pass into it,
> and only need it for debugging and iommu stuff, which is really only the
> bus that the device is on, not good examples to start out with :)
>
> Yeah, this is yack-shaving, sorry, but it's how we clean up core
> subsystems for apis and implementations that are not really correct and
> were not noticed at the time.
>
> Can we see some users of this code posted so I can see how struct device
> is going to work in a rust driver? That's the thing I worry most about
> the rust/C interaction here as we have two different ways of thinking
> about reference counts from the two worlds and putting them together is
> going to be "interesting", as can be seen here already.
>
> thanks,
>
> greg k-h

2023-02-24 14:44:19

by Asahi Lina

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait



On 2023/02/24 23:11, Greg Kroah-Hartman wrote:
> Thanks for the detailed rust explainations, I'd like to just highlight
> one thing:
>
> On Fri, Feb 24, 2023 at 10:15:12PM +0900, Asahi Lina wrote:
>> On 24/02/2023 20.23, Greg Kroah-Hartman wrote:
>>> And again, why are bindings needed for a "raw" struct device at all?
>>> Shouldn't the bus-specific wrappings work better?
>>
>> Because lots of kernel subsystems need to be able to accept "any" device
>> and don't care about the bus! That's what this is for.
>
> That's great, but:
>
>> All the bus
>> wrappers would implement this so they can be used as an argument for all
>> those subsystems (plus a generic one when you just need to pass around
>> an actual owned generic reference and no longer need bus-specific
>> operations - you can materialize that out of a RawDevice impl, which is
>> when get_device() would be called). That's why I'm introducing this now,
>> because both io_pgtable and rtkit need to take `struct device` pointers
>> on the C side so we need some "generic struct device" view on the Rust side.
>
> In looking at both ftkit and io_pgtable, those seem to be good examples
> of how "not to use a struct device", so trying to make safe bindings
> from Rust to these frameworks is very ironic :)

And this is why I want to use Rust, and why writing the abstractions for
C code is so difficult... Rust encodes all these rules in the type
system, but C doesn't, and so many kernel APIs don't document any of
this or what the requirements are...

> rtkit takes a struct device pointer and then never increments it,
> despite saving it off, which is unsafe. It then only uses it to print
> out messages if things go wrong (or right in some cases), which is odd.
> So it can get away from using a device pointer entirely, except for the
> devm_apple_rtkit_init() call, which I doubt you want to call from rust
> code, right?

That sounds like we need to fix the C side to grab a reference ^^

We do need to pass the device to the init function though
(apple_rtkit_init(), this is in the SoC tree which I mentioned as a
prequisite and already on the way to 6.3-rc1), since at the very least
it has to pick up the mailbox and all that to initialize.

Alternatively we could say that the C API contract is that the user of
rtkit has to own a reference, and then the Rust abstraction would have
to take that reference to make a safe abstraction, but that doesn't
sound like the better option.

What do you recommend for things that want to print device-associated
messages, if not holding a reference to the device? Or did I
misunderstand what you meant? Just pr_foo() isn't great because we have
a lot of instances of rtkit and then you wouldn't know which device the
messages are about...

> for io_pgtable, that's a bit messier, you want to pass in a device that
> io_pgtable treats as a "device" but again, it is NEVER properly
> reference counted, AND, it is only needed to try to figure out the bus
> operations that dma memory should be allocated from for this device. So
> what would be better to save off there would be a pointer to the bus,
> which is constant and soon will be read-only so there are no lifetime
> rules needed at all (see the major struct bus_type changes going into
> 6.3-rc1 that will enable that to happen).
>
> So the two subsystems you want to call from rust code don't properly
> handle the reference count of the object you are going to pass into it,
> and only need it for debugging and iommu stuff, which is really only the
> bus that the device is on, not good examples to start out with :)

Well, they're two examples that are dependencies for the driver I wrote,
and I don't think you want me picking easy examples with zero known
upcoming users... ^^;;

> Yeah, this is yack-shaving, sorry, but it's how we clean up core
> subsystems for apis and implementations that are not really correct and
> were not noticed at the time.

I'm fine with helping fix all this, and I don't expect all the
underlying C code to be perfect already either! I already fixed one
locking bug in DRM and spent a lot of time trying to figure out lifetime
rules there, but I didn't dig into rtkit/io_pgtable and didn't realize
they don't take references properly...

> Can we see some users of this code posted so I can see how struct device
> is going to work in a rust driver? That's the thing I worry most about
> the rust/C interaction here as we have two different ways of thinking
> about reference counts from the two worlds and putting them together is
> going to be "interesting", as can be seen here already.

I linked a tree with everything in the cover letter ([4]), look in
drivers/gpu/drm/asahi for the actual driver. But there are a lot of
other dependencies that have to go in before that will compile
(everything else in that branch...)

I know it's hard to review without examples, but I also can't just post
the driver and everything else as one series now, there's still a lot to
be improved and fixed and I'm working with the Rust folks on figuring
out a roadmap for that... and waiting until "everything" is ready and
perfect would mean we don't get anything done in the meantime and fall
into a pit of endless rebasing and coordinating downstream trees, which
also isn't good...

~~ Lina

2023-02-24 14:45:28

by Asahi Lina

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On 2023/02/24 23:19, Greg Kroah-Hartman wrote:>> Can we see some users
of this code posted so I can see how struct device
>> is going to work in a rust driver? That's the thing I worry most about
>> the rust/C interaction here as we have two different ways of thinking
>> about reference counts from the two worlds and putting them together is
>> going to be "interesting", as can be seen here already.
>
> Also, where are you getting your 'struct device' from in the first
> place? What bus is createing it and giving it to your rust driver?

That would be platform for my GPU driver, matched via OF compatible.

~~ Lina

2023-02-24 14:49:22

by Asahi Lina

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait



On 2023/02/24 23:32, Robin Murphy wrote:
> On 2023-02-24 14:11, Greg Kroah-Hartman wrote:
>> Thanks for the detailed rust explainations, I'd like to just highlight
>> one thing:
>>
>> On Fri, Feb 24, 2023 at 10:15:12PM +0900, Asahi Lina wrote:
>>> On 24/02/2023 20.23, Greg Kroah-Hartman wrote:
>>>> And again, why are bindings needed for a "raw" struct device at all?
>>>> Shouldn't the bus-specific wrappings work better?
>>>
>>> Because lots of kernel subsystems need to be able to accept "any" device
>>> and don't care about the bus! That's what this is for.
>>
>> That's great, but:
>>
>>> All the bus
>>> wrappers would implement this so they can be used as an argument for all
>>> those subsystems (plus a generic one when you just need to pass around
>>> an actual owned generic reference and no longer need bus-specific
>>> operations - you can materialize that out of a RawDevice impl, which is
>>> when get_device() would be called). That's why I'm introducing this now,
>>> because both io_pgtable and rtkit need to take `struct device` pointers
>>> on the C side so we need some "generic struct device" view on the
>>> Rust side.
>>
>> In looking at both ftkit and io_pgtable, those seem to be good examples
>> of how "not to use a struct device", so trying to make safe bindings
>> from Rust to these frameworks is very ironic :)
>>
>> rtkit takes a struct device pointer and then never increments it,
>> despite saving it off, which is unsafe.  It then only uses it to print
>> out messages if things go wrong (or right in some cases), which is odd.
>> So it can get away from using a device pointer entirely, except for the
>> devm_apple_rtkit_init() call, which I doubt you want to call from rust
>> code, right?
>>
>> for io_pgtable, that's a bit messier, you want to pass in a device that
>> io_pgtable treats as a "device" but again, it is NEVER properly
>> reference counted, AND, it is only needed to try to figure out the bus
>> operations that dma memory should be allocated from for this device.  So
>> what would be better to save off there would be a pointer to the bus,
>> which is constant and soon will be read-only so there are no lifetime
>> rules needed at all (see the major struct bus_type changes going into
>> 6.3-rc1 that will enable that to happen).
>
> FWIW the DMA API *has* to know which specific device it's operating
> with, since the relevant properties can and do vary even between
> different devices within a single bus_type (e.g. DMA masks).
>
> In the case of io-pgtable at least, there's no explicit refcounting
> since the struct device must be the one representing the physical
> platform/PCI/etc. device consuming the pagetable, so if that were to
> disappear from underneath its driver while the pagetable is still in
> use, things would already have gone very very wrong indeed :)

There's no terribly good way to encode this relationship in safe Rust as
far as I know. So although it might be "obvious" (and I think my driver
can never violate it as it is currently designed), this means the Rust
abstraction will have to take the device reference if the C side does
not, because safe rust abstractions have to actually make these bugs
impossible and nothing stops a Rust driver from, say, stashing an
io_pgtable reference into a global and letting the device go away.

~~ Lina

2023-02-24 15:15:29

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On 2023-02-24 14:48, Asahi Lina wrote:
>
>
> On 2023/02/24 23:32, Robin Murphy wrote:
>> On 2023-02-24 14:11, Greg Kroah-Hartman wrote:
>>> Thanks for the detailed rust explainations, I'd like to just highlight
>>> one thing:
>>>
>>> On Fri, Feb 24, 2023 at 10:15:12PM +0900, Asahi Lina wrote:
>>>> On 24/02/2023 20.23, Greg Kroah-Hartman wrote:
>>>>> And again, why are bindings needed for a "raw" struct device at all?
>>>>> Shouldn't the bus-specific wrappings work better?
>>>>
>>>> Because lots of kernel subsystems need to be able to accept "any" device
>>>> and don't care about the bus! That's what this is for.
>>>
>>> That's great, but:
>>>
>>>> All the bus
>>>> wrappers would implement this so they can be used as an argument for all
>>>> those subsystems (plus a generic one when you just need to pass around
>>>> an actual owned generic reference and no longer need bus-specific
>>>> operations - you can materialize that out of a RawDevice impl, which is
>>>> when get_device() would be called). That's why I'm introducing this now,
>>>> because both io_pgtable and rtkit need to take `struct device` pointers
>>>> on the C side so we need some "generic struct device" view on the
>>>> Rust side.
>>>
>>> In looking at both ftkit and io_pgtable, those seem to be good examples
>>> of how "not to use a struct device", so trying to make safe bindings
>>> from Rust to these frameworks is very ironic :)
>>>
>>> rtkit takes a struct device pointer and then never increments it,
>>> despite saving it off, which is unsafe.  It then only uses it to print
>>> out messages if things go wrong (or right in some cases), which is odd.
>>> So it can get away from using a device pointer entirely, except for the
>>> devm_apple_rtkit_init() call, which I doubt you want to call from rust
>>> code, right?
>>>
>>> for io_pgtable, that's a bit messier, you want to pass in a device that
>>> io_pgtable treats as a "device" but again, it is NEVER properly
>>> reference counted, AND, it is only needed to try to figure out the bus
>>> operations that dma memory should be allocated from for this device.  So
>>> what would be better to save off there would be a pointer to the bus,
>>> which is constant and soon will be read-only so there are no lifetime
>>> rules needed at all (see the major struct bus_type changes going into
>>> 6.3-rc1 that will enable that to happen).
>>
>> FWIW the DMA API *has* to know which specific device it's operating
>> with, since the relevant properties can and do vary even between
>> different devices within a single bus_type (e.g. DMA masks).
>>
>> In the case of io-pgtable at least, there's no explicit refcounting
>> since the struct device must be the one representing the physical
>> platform/PCI/etc. device consuming the pagetable, so if that were to
>> disappear from underneath its driver while the pagetable is still in
>> use, things would already have gone very very wrong indeed :)
>
> There's no terribly good way to encode this relationship in safe Rust as
> far as I know. So although it might be "obvious" (and I think my driver
> can never violate it as it is currently designed), this means the Rust
> abstraction will have to take the device reference if the C side does
> not, because safe rust abstractions have to actually make these bugs
> impossible and nothing stops a Rust driver from, say, stashing an
> io_pgtable reference into a global and letting the device go away.

If someone did that, then simply holding a struct device reference
wouldn't guarantee much, since it only prevents the pointer itself from
becoming invalid - it still doesn't say any of the data *in* the
structure is still valid and "safe" for what a DMA API call might do
with it.

At the very least you'd probably have to somehow also guarantee that the
device has a driver bound (which is the closest thing to a general
indication of valid DMA ops across all architectures) and block it from
unbinding for the lifetime of the reference, but that would then mean a
simple driver which expects to tear down its io-pgtable from its .remove
callback could never be unbound due to the circular dependency :/

Cheers,
Robin.

2023-02-24 15:18:22

by Asahi Lina

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

Hi!

First, I should say that some of this is really an example, and this
abstraction doesn't need to go in as part of this series. I just added
it at the end as an example of how RawDevice could/will be implemented,
but there is more discussion that needs to happen around Devices
(especially around how actual driver implementations and device data
stuff work) I think. The main goal of having the RawDevice trait now is
that we can start reviewing and upstreaming things that aren't part of
the device model but take device references. Otherwise we end up
serializing everything too much and it will be difficult to get
everything upstream in a reasonable timeframe...

Also note that I didn't write any of this code so if you have questions
about *why* it was all designed like this, I think Wedson will have to
answer that ^^

On 2023/02/24 20:19, Greg Kroah-Hartman wrote:
> On Fri, Feb 24, 2023 at 07:53:17PM +0900, Asahi Lina wrote:
>> From: Wedson Almeida Filho <[email protected]>
>>
>> Add a Device type which represents an owned reference to a generic
>> struct device. This minimal implementation just handles reference
>> counting and allows the user to get the device name.
>
> What good is just the device name? I'm all for proper bindings to hook
> up properly to the driver core, but this feels like it's not really
> doing much of anything.

I think we can just drop this, I don't use it. It was just an example of
how a RawDevice method might work that is in the downstream tree, and I
kept it here.

A more practical example would be `of_node()`, which returns the OF node
associated with a device and is how the OF abstraction I wrote hooks
into the device model. I think I can submit that one pretty soon too, if
I'm not mistaken.

>> +void *rust_helper_dev_get_drvdata(struct device *dev)
>> +{
>> + return dev_get_drvdata(dev);
>> +}
>> +EXPORT_SYMBOL_GPL(rust_helper_dev_get_drvdata);
>
> No matching dev_set_drvdata()? What good is getting a random void
> pointer if you couldn't set it in the first place? :)

I thought something else used this, but looking again the bus drivers
just use stuff like platform_{set,get}_drvdata. I'll drop it, I'm not
sure why I thought I needed this later...

>> +impl Device {
>> + /// Creates a new device instance.
>> + ///
>> + /// # Safety
>> + ///
>> + /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
>> + pub unsafe fn new(ptr: *mut bindings::device) -> Self {
>> + // SAFETY: By the safety requirements, ptr is valid and its refcounted will be incremented.
>> + unsafe { bindings::get_device(ptr) };
>
> You don't check the return value of get_device()? What if it failed
> (hint, it can)?

Really? I looked at the implementation and I don't see how it can fail,
as long as the argument is non-NULL and valid... (which is a
precondition of this unsafe function). Did I miss something?

>> + // INVARIANT: The safety requirements satisfy all but one invariant, which is that `self`
>> + // owns a reference. This is satisfied by the call to `get_device` above.
>> + Self { ptr }
>> + }
>> +
>> + /// Creates a new device instance from an existing [`RawDevice`] instance.
>> + pub fn from_dev(dev: &dyn RawDevice) -> Self {
>
> I am a rust newbie, but I don't understand this "RawDevice" here at all.

RawDevice is a trait, so this means "a dynamic reference to any type
that implements RawDevice". That could be a reference to a bus device
outright (like a PlatformDevice), or another Device, or a type-erased
dynamic object that is "some device type" but you don't know which (like
a reference taken from a Box<dyn RawDevice>).

What this means is that you can always create a Device from a reference
to anything that is a device (a RawDevice impl). This code path does a
get_device(), so it creates a new reference that becomes logically owned
by the Device object.

Think of it like "clone, and erase the specific device type to a generic
device". This is the operation you would use in a kernel abstraction
that needs to take "some device", and then grab its own reference and
keep it for later, but which does not care about the specific bus type.

In particular, the way the bus abstractions work right now downstream,
the specific bus device types are not cloneable and only represent
borrowed device references that exist for the lifetime of the driver
probe callback. The idea is that you grab references to all the
bus-specific resources you need within (there is revocability logic to
make sure resources can disappear without breaking pointers, failing
gracefully if accessed) and do any bus-specific init you need, and then
you no longer need to directly touch the bus device after that. This is
definitely not going to cover some corner cases, but I haven't had any
need to use platform device ops in my GPU driver after probe, so it
works for me. But the driver does need to pass around device references
and keep them in some inner structures to make things like `dev_warn!()`
macros work, and that is what `Device` and `RawDevice` are for: `Device`
is an owning reference that can be cloned and constructed out of the
original `PlatformDevice` and can outlive it.
>> + // SAFETY: The requirements are satisfied by the existence of `RawDevice` and its safety
>> + // requirements.
>> + unsafe { Self::new(dev.raw_device()) }
>> + }
>> +}
>> +
>> +// SAFETY: The device returned by `raw_device` is the one for which we hold a reference.
>> +unsafe impl RawDevice for Device {
>> + fn raw_device(&self) -> *mut bindings::device {
>> + self.ptr
>> + }
>
> What does this raw device do? What is the relationship to a "real"
> device? Maybe it's just my lack of rust knowledge here showing, so any
> hints would be appreciated.

This means that a Device (an owned reference to a generic struct device)
can be used anywhere you need a generic device reference (this sounds
tautological but the idea is that specific bus device types can also be
used, which is why this layer of indirection exists).

In this case it's just an accessor for self.ptr since a Device is just a
wrapper around the pointer.

>
>> +}
>> +
>> +impl Drop for Device {
>> + fn drop(&mut self) {
>> + // SAFETY: By the type invariants, we know that `self` owns a reference, so it is safe to
>> + // relinquish it now.
>> + unsafe { bindings::put_device(self.ptr) };
>
> So it is now ok for this to be freed?

Yes, because the Rust type system guarantees that Drop only gets called
when the Device no longer has an owner. That means there are no
references of any kind left and the object is ceasing to exist. Rust
also blocks you from manually calling drop() (this is a special case,
without it you could drop something more than once).

In other words, a Device represents a reference to a struct device. Once
the Device gets dropped, there is nobody left to potentially use that
reference through the Device by definition. (Of course you can invalidly
stash away the raw pointer like those C API issues you found... but
that's a different problem).

One way to solve the C abstraction issue would be to embed a Device in
the RtKit/IoPageTableInner structs (the respective Rust abstraction
types), creating it out of the passed RawDevice reference with
Device::from_raw() in the constructor for those. That means you'd
automatically `get_device()` on construction and `put_device()` when the
abstraction objects get dropped, and that would solve all the problems
without any extra cleanup or management code.

> One meta-comment, why is any of this needed at all? No driver should
> ever be dealing with a "raw" struct device pointer at all, right? They
> should be calling into subsystems that give it a pointer to a
> bus-specific pointer type (gpio, usb, pci, etc.)

I'm getting the feeling maybe the name is just bad. Maybe it should be
called AbstractDevice or IntoDevice or something like that. It's just
"some device".

Drivers indeed should never be dealing with actual device pointers, the
pointers are for consumption by the kernel crate only. That this is done
using trait functions that can technically be called by the driver code
is just the way Rust models this: you need *some* trait function to get
at the raw pointer (because the kernel crate needs it) and the trait
needs to be public (because it's part of the public interface as the
abstract device type) and there is no such thing as private trait
functions in a public trait in Rust. But the idea is that this doesn't
matter: sure, drivers can *get* a raw device pointer but they can't *do*
anything with it without unsafe.

Technically though, if we wanted to hide this further, we could make it
return a newtype wrapper around the raw pointer. Then as long as the
inner field is not public, drivers wouldn't even be able to get at the
raw pointer but the kernel crate would. I'm not sure if this is worth
it, though. It's not like drivers can't work around that with unsafe
anyway, once you use unsafe you can do all kinds of crazy stuff like raw
memory reinterpretations, copies, and transmutations, which will allow
you to cheat the entire type system if you want... but of course the
whole idea here is that we make things impossible in safe code and then
we look at the unsafe blocks with a magnifying glass during review to
make sure they aren't doing anything actually unsafe or crazy.

~~ Lina

2023-02-24 15:24:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On Fri, Feb 24, 2023 at 11:43:54PM +0900, Asahi Lina wrote:
>
>
> On 2023/02/24 23:11, Greg Kroah-Hartman wrote:
> > Thanks for the detailed rust explainations, I'd like to just highlight
> > one thing:
> >
> > On Fri, Feb 24, 2023 at 10:15:12PM +0900, Asahi Lina wrote:
> >> On 24/02/2023 20.23, Greg Kroah-Hartman wrote:
> >>> And again, why are bindings needed for a "raw" struct device at all?
> >>> Shouldn't the bus-specific wrappings work better?
> >>
> >> Because lots of kernel subsystems need to be able to accept "any" device
> >> and don't care about the bus! That's what this is for.
> >
> > That's great, but:
> >
> >> All the bus
> >> wrappers would implement this so they can be used as an argument for all
> >> those subsystems (plus a generic one when you just need to pass around
> >> an actual owned generic reference and no longer need bus-specific
> >> operations - you can materialize that out of a RawDevice impl, which is
> >> when get_device() would be called). That's why I'm introducing this now,
> >> because both io_pgtable and rtkit need to take `struct device` pointers
> >> on the C side so we need some "generic struct device" view on the Rust side.
> >
> > In looking at both ftkit and io_pgtable, those seem to be good examples
> > of how "not to use a struct device", so trying to make safe bindings
> > from Rust to these frameworks is very ironic :)
>
> And this is why I want to use Rust, and why writing the abstractions for
> C code is so difficult... Rust encodes all these rules in the type
> system, but C doesn't, and so many kernel APIs don't document any of
> this or what the requirements are...

I totally agree, and is why I'm wanting to review this stuff.

> > rtkit takes a struct device pointer and then never increments it,
> > despite saving it off, which is unsafe. It then only uses it to print
> > out messages if things go wrong (or right in some cases), which is odd.
> > So it can get away from using a device pointer entirely, except for the
> > devm_apple_rtkit_init() call, which I doubt you want to call from rust
> > code, right?
>
> That sounds like we need to fix the C side to grab a reference ^^

I agree. Or remove the reference entirely.

> We do need to pass the device to the init function though
> (apple_rtkit_init(), this is in the SoC tree which I mentioned as a
> prequisite and already on the way to 6.3-rc1), since at the very least
> it has to pick up the mailbox and all that to initialize.

As I said, the rtkit code does not actually use the device pointer at
all except for some (I would say pointless) log messages. So we could
change the rtkit code either way, I would go for the removal.

> Alternatively we could say that the C API contract is that the user of
> rtkit has to own a reference, and then the Rust abstraction would have
> to take that reference to make a safe abstraction, but that doesn't
> sound like the better option.

Yeah, that feels odd.

> What do you recommend for things that want to print device-associated
> messages, if not holding a reference to the device?

If you aren't holding a reference to the device, that means you aren't
associated to it at all, so you better not be printing out anything
related to any device as that pointer could be invalid at any point in
time.

> Or did I
> misunderstand what you meant? Just pr_foo() isn't great because we have
> a lot of instances of rtkit and then you wouldn't know which device the
> messages are about...

Then the rtkit code needs to be changed to properly grab the reference
and actually use it for something other than just a log message. If it
only wants it for a log message, then let's just drop it and have the
rtkit code go quiet, as when kernel code is working properly, it should
be quiet. If something goes wrong, the code that called into rtkit can
print out a message based on the error return values.

I have no idea what "rtkit" is, if it's an interface to hardware, why
doesn't it have its own struct device that it creates and manages and
uses instead? In my quick glance, that feels like the real solution
here instead of just "I hope this pointer is going to be valid" like it
lives with today. Odds are you can't remove a rtkit device at runtime,
so no one has noticed this yet...

> > for io_pgtable, that's a bit messier, you want to pass in a device that
> > io_pgtable treats as a "device" but again, it is NEVER properly
> > reference counted, AND, it is only needed to try to figure out the bus
> > operations that dma memory should be allocated from for this device. So
> > what would be better to save off there would be a pointer to the bus,
> > which is constant and soon will be read-only so there are no lifetime
> > rules needed at all (see the major struct bus_type changes going into
> > 6.3-rc1 that will enable that to happen).
> >
> > So the two subsystems you want to call from rust code don't properly
> > handle the reference count of the object you are going to pass into it,
> > and only need it for debugging and iommu stuff, which is really only the
> > bus that the device is on, not good examples to start out with :)
>
> Well, they're two examples that are dependencies for the driver I wrote,
> and I don't think you want me picking easy examples with zero known
> upcoming users... ^^;;

Hey, this is good, I like it, no complaints!

> > Yeah, this is yack-shaving, sorry, but it's how we clean up core
> > subsystems for apis and implementations that are not really correct and
> > were not noticed at the time.
>
> I'm fine with helping fix all this, and I don't expect all the
> underlying C code to be perfect already either! I already fixed one
> locking bug in DRM and spent a lot of time trying to figure out lifetime
> rules there, but I didn't dig into rtkit/io_pgtable and didn't realize
> they don't take references properly...

The iomem code is under heavy-flux right now so let's see what 6.3-rc1
looks like to determine if this still needs to be resolved or not. I
hope the latest set of fixes/reworks for that subsystem land. Either
way, I'll look into what needs to be done there as obviously, the
current code is not correct.

> > Can we see some users of this code posted so I can see how struct device
> > is going to work in a rust driver? That's the thing I worry most about
> > the rust/C interaction here as we have two different ways of thinking
> > about reference counts from the two worlds and putting them together is
> > going to be "interesting", as can be seen here already.
>
> I linked a tree with everything in the cover letter ([4]), look in
> drivers/gpu/drm/asahi for the actual driver. But there are a lot of
> other dependencies that have to go in before that will compile
> (everything else in that branch...)

Sorry, it's hard to notice random github repos and branches, especially
when traveling, that's why we use email for review.

> I know it's hard to review without examples, but I also can't just post
> the driver and everything else as one series now, there's still a lot to
> be improved and fixed and I'm working with the Rust folks on figuring
> out a roadmap for that... and waiting until "everything" is ready and
> perfect would mean we don't get anything done in the meantime and fall
> into a pit of endless rebasing and coordinating downstream trees, which
> also isn't good...

Yeah, it's a chicken and egg issue right now, no worries, I understand.
This is going to take some cycles to get right.

thanks,

greg k-h

2023-02-24 15:25:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On Fri, Feb 24, 2023 at 11:44:59PM +0900, Asahi Lina wrote:
> On 2023/02/24 23:19, Greg Kroah-Hartman wrote:>> Can we see some users
> of this code posted so I can see how struct device
> >> is going to work in a rust driver? That's the thing I worry most about
> >> the rust/C interaction here as we have two different ways of thinking
> >> about reference counts from the two worlds and putting them together is
> >> going to be "interesting", as can be seen here already.
> >
> > Also, where are you getting your 'struct device' from in the first
> > place? What bus is createing it and giving it to your rust driver?
>
> That would be platform for my GPU driver, matched via OF compatible.

Ick, a platform device? The GPU isn't on the PCI bus? Wow, that's
horrid...

2023-02-24 15:32:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On Fri, Feb 24, 2023 at 02:32:47PM +0000, Robin Murphy wrote:
> On 2023-02-24 14:11, Greg Kroah-Hartman wrote:
> > Thanks for the detailed rust explainations, I'd like to just highlight
> > one thing:
> >
> > On Fri, Feb 24, 2023 at 10:15:12PM +0900, Asahi Lina wrote:
> > > On 24/02/2023 20.23, Greg Kroah-Hartman wrote:
> > > > And again, why are bindings needed for a "raw" struct device at all?
> > > > Shouldn't the bus-specific wrappings work better?
> > >
> > > Because lots of kernel subsystems need to be able to accept "any" device
> > > and don't care about the bus! That's what this is for.
> >
> > That's great, but:
> >
> > > All the bus
> > > wrappers would implement this so they can be used as an argument for all
> > > those subsystems (plus a generic one when you just need to pass around
> > > an actual owned generic reference and no longer need bus-specific
> > > operations - you can materialize that out of a RawDevice impl, which is
> > > when get_device() would be called). That's why I'm introducing this now,
> > > because both io_pgtable and rtkit need to take `struct device` pointers
> > > on the C side so we need some "generic struct device" view on the Rust side.
> >
> > In looking at both ftkit and io_pgtable, those seem to be good examples
> > of how "not to use a struct device", so trying to make safe bindings
> > from Rust to these frameworks is very ironic :)
> >
> > rtkit takes a struct device pointer and then never increments it,
> > despite saving it off, which is unsafe. It then only uses it to print
> > out messages if things go wrong (or right in some cases), which is odd.
> > So it can get away from using a device pointer entirely, except for the
> > devm_apple_rtkit_init() call, which I doubt you want to call from rust
> > code, right?
> >
> > for io_pgtable, that's a bit messier, you want to pass in a device that
> > io_pgtable treats as a "device" but again, it is NEVER properly
> > reference counted, AND, it is only needed to try to figure out the bus
> > operations that dma memory should be allocated from for this device. So
> > what would be better to save off there would be a pointer to the bus,
> > which is constant and soon will be read-only so there are no lifetime
> > rules needed at all (see the major struct bus_type changes going into
> > 6.3-rc1 that will enable that to happen).
>
> FWIW the DMA API *has* to know which specific device it's operating with,
> since the relevant properties can and do vary even between different devices
> within a single bus_type (e.g. DMA masks).

What bus_type has different DMA masks depending on the device on that
bus today?

And the iommu ops are on the bus, not the device, but there is a
iommu_group on the device, is that what you are referring to?

Am I getting iommu and dma stuff confused here? A bus also has dma
callbacks, but yet the device itself has dma_ops?

> In the case of io-pgtable at least, there's no explicit refcounting since
> the struct device must be the one representing the physical
> platform/PCI/etc. device consuming the pagetable, so if that were to
> disappear from underneath its driver while the pagetable is still in use,
> things would already have gone very very wrong indeed :)

But that could happen at any point in time, the device can be removed
from the system with no warning, how do you guarantee that io-pgtable is
being initialized with a device that can NOT be removed?

Think of drawers containing CPUs and PCI devices and memory, Linux
has supported hot-removal of those for decades. (well, not for memory,
we could only hot-add that...)

Again, passing in a pointer to a struct device, and saving it off
WITHOUT incrementing the reference count is not ok, that's not how
reference counts work...

But again, let's see about disconnecting the iommu ops entirely from the
device and just relying on the bus, that should work better, rigth?

thanks,

greg k-h

2023-02-24 15:34:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On Sat, Feb 25, 2023 at 12:10:47AM +0900, Asahi Lina wrote:
> >> +impl Device {
> >> + /// Creates a new device instance.
> >> + ///
> >> + /// # Safety
> >> + ///
> >> + /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
> >> + pub unsafe fn new(ptr: *mut bindings::device) -> Self {
> >> + // SAFETY: By the safety requirements, ptr is valid and its refcounted will be incremented.
> >> + unsafe { bindings::get_device(ptr) };
> >
> > You don't check the return value of get_device()? What if it failed
> > (hint, it can)?
>
> Really? I looked at the implementation and I don't see how it can fail,
> as long as the argument is non-NULL and valid... (which is a
> precondition of this unsafe function). Did I miss something?

This function has changed a bit over time, but yes, it could fail if
someone else just dropped the last reference right before you tried to
grab a new one (look at the implementation of kobject_get()).

Yes, if you "know" you have a non-zero reference count first, it should
never fail, but how do you know this? We have the ability to check the
return value of the function, shouldn't that happen?

thanks,

greg k-h

2023-02-24 15:42:59

by Asahi Lina

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On 25/02/2023 00.24, Greg Kroah-Hartman wrote:
>> What do you recommend for things that want to print device-associated
>> messages, if not holding a reference to the device?
>
> If you aren't holding a reference to the device, that means you aren't
> associated to it at all, so you better not be printing out anything
> related to any device as that pointer could be invalid at any point in
> time.

The RTKit code talks to the firmware coprocessor that is part of the
device, so it definitely is associated to it... among other things it
prints out firmware logs from the device and crash logs, manages memory
buffers (which have default implementations but can be overridden by the
user driver), and more. It's essentially library code shared by all
device drivers that interact with devices with these coprocessors.

>> Or did I
>> misunderstand what you meant? Just pr_foo() isn't great because we have
>> a lot of instances of rtkit and then you wouldn't know which device the
>> messages are about...
>
> Then the rtkit code needs to be changed to properly grab the reference
> and actually use it for something other than just a log message. If it
> only wants it for a log message, then let's just drop it and have the
> rtkit code go quiet, as when kernel code is working properly, it should
> be quiet. If something goes wrong, the code that called into rtkit can
> print out a message based on the error return values.

Keep in mind rtkit does things like print out crash logs, and you
wouldn't want the caller to be responsible for that (and we definitely
want those in dmesg since these coprocessors are non-recoverable, it's
almost as bad as a kernel panic: you will have to reboot to be able to
use the machine properly again). I find those crash logs very useful to
figure out what went wrong with the GPU (especially if combined with a
memory dump which we don't expose to regular users right now, but which
I have ideas for... but even without that, just assert messages from the
coprocessor or fault instruction pointers that I can correlate with the
firmware are very useful on their own).

Right now rtkit also prints out syslogs from the coprocessors. That's
noisy for some but I think very useful, since we're dealing with reverse
engineered drivers. We'll probably want to silence those for some noisy
coprocessors at some point, but I don't think we want to do that until
things are all upstream, stable, and with a larger user base... until
then I think we'd much rather be spammy and have a better chance of
debugging rare issues, which often happen with these coprocessors
running big firmware blobs... there are a lot of subtleties in getting
the interfaces right, never mind cache coherence issues!

> I have no idea what "rtkit" is, if it's an interface to hardware, why
> doesn't it have its own struct device that it creates and manages and
> uses instead? In my quick glance, that feels like the real solution
> here instead of just "I hope this pointer is going to be valid" like it
> lives with today. Odds are you can't remove a rtkit device at runtime,
> so no one has noticed this yet...

Well, they're all embedded into the SoC, yes.

RTKit is Apple's firmware RTOS, and also the name for the
semi-standardized mailbox/shared-memory interface shared by different
firmwares using it. The Linux code to drive it doesn't create its own
"struct device" because the rtkit code is just library code that is
extended by the downstream drivers (like mine). How each driver
interacts with rtkit varies widely... NVMe almost doesn't at all other
than for power management, there is actually a downstream "rtkit-helper"
driver that is a proper standalone device wrapper for one case (MTP)
where it really doesn't need to interact at all... in my case with the
GPU, almost everything is shared memory and doorbells over the mailbox.
Other drivers like DCP actually send pointers over multiple mailbox
endpoints, or do most of their data exchange directly over messages like
that (SMC).

So in a way, if we consider it driver library code, it's not
unreasonable for RTKit to require that the device you pass it outlives
it. Certainly, if the device is getting unbound from your driver, you'd
need to tear down RTKit as part of that in any reasonable situation.

>> I know it's hard to review without examples, but I also can't just post
>> the driver and everything else as one series now, there's still a lot to
>> be improved and fixed and I'm working with the Rust folks on figuring
>> out a roadmap for that... and waiting until "everything" is ready and
>> perfect would mean we don't get anything done in the meantime and fall
>> into a pit of endless rebasing and coordinating downstream trees, which
>> also isn't good...
>
> Yeah, it's a chicken and egg issue right now, no worries, I understand.
> This is going to take some cycles to get right.
>

Thank you ^^

~~ Lina

2023-02-24 15:45:28

by Asahi Lina

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On 25/02/2023 00.25, Greg Kroah-Hartman wrote:
> On Fri, Feb 24, 2023 at 11:44:59PM +0900, Asahi Lina wrote:
>> On 2023/02/24 23:19, Greg Kroah-Hartman wrote:>> Can we see some users
>> of this code posted so I can see how struct device
>>>> is going to work in a rust driver? That's the thing I worry most about
>>>> the rust/C interaction here as we have two different ways of thinking
>>>> about reference counts from the two worlds and putting them together is
>>>> going to be "interesting", as can be seen here already.
>>>
>>> Also, where are you getting your 'struct device' from in the first
>>> place? What bus is createing it and giving it to your rust driver?
>>
>> That would be platform for my GPU driver, matched via OF compatible.
>
> Ick, a platform device? The GPU isn't on the PCI bus? Wow, that's
> horrid...

The internal NVMe controller also isn't on the PCI bus (which is worse
since that actually has standard PCI bindings)... nothing is on PCI
buses other than external PCI/Thunderbolt devices. There are no internal
fake-PCI devices on Apple Silicon like there are on Intel. That's how it
is on most SoCs (though Apple is the only vendor who has dared to go as
far as integrating NVMe as far as I know)...

~~ Lina

2023-02-24 15:51:52

by Asahi Lina

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On 25/02/2023 00.34, Greg Kroah-Hartman wrote:
> On Sat, Feb 25, 2023 at 12:10:47AM +0900, Asahi Lina wrote:
>>>> +impl Device {
>>>> + /// Creates a new device instance.
>>>> + ///
>>>> + /// # Safety
>>>> + ///
>>>> + /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
>>>> + pub unsafe fn new(ptr: *mut bindings::device) -> Self {
>>>> + // SAFETY: By the safety requirements, ptr is valid and its refcounted will be incremented.
>>>> + unsafe { bindings::get_device(ptr) };
>>>
>>> You don't check the return value of get_device()? What if it failed
>>> (hint, it can)?
>>
>> Really? I looked at the implementation and I don't see how it can fail,
>> as long as the argument is non-NULL and valid... (which is a
>> precondition of this unsafe function). Did I miss something?
>
> This function has changed a bit over time, but yes, it could fail if
> someone else just dropped the last reference right before you tried to
> grab a new one (look at the implementation of kobject_get()).
>
> Yes, if you "know" you have a non-zero reference count first, it should
> never fail, but how do you know this? We have the ability to check the
> return value of the function, shouldn't that happen?

Well, we know this because it is part of the invariant. If we don't
uphold invariants, things fall apart... that's why we create safe
abstractions that don't let you break those invariants after all!

In this particular case though, I don't see what we can usefully do
here. `device_get()` is going to be part of Clone impls and things like
that which are non-fallible. At most we can assert!() and rust-panic
(which is a BUG as far as I know) if it fails... would that be preferable?

~~ Lina

2023-02-24 16:23:26

by Asahi Lina

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On 25/02/2023 00.14, Robin Murphy wrote:
> On 2023-02-24 14:48, Asahi Lina wrote:
>>
>>
>> On 2023/02/24 23:32, Robin Murphy wrote:
>>> FWIW the DMA API *has* to know which specific device it's operating
>>> with, since the relevant properties can and do vary even between
>>> different devices within a single bus_type (e.g. DMA masks).
>>>
>>> In the case of io-pgtable at least, there's no explicit refcounting
>>> since the struct device must be the one representing the physical
>>> platform/PCI/etc. device consuming the pagetable, so if that were to
>>> disappear from underneath its driver while the pagetable is still in
>>> use, things would already have gone very very wrong indeed :)
>>
>> There's no terribly good way to encode this relationship in safe Rust as
>> far as I know. So although it might be "obvious" (and I think my driver
>> can never violate it as it is currently designed), this means the Rust
>> abstraction will have to take the device reference if the C side does
>> not, because safe rust abstractions have to actually make these bugs
>> impossible and nothing stops a Rust driver from, say, stashing an
>> io_pgtable reference into a global and letting the device go away.
>
> If someone did that, then simply holding a struct device reference
> wouldn't guarantee much, since it only prevents the pointer itself from
> becoming invalid - it still doesn't say any of the data *in* the
> structure is still valid and "safe" for what a DMA API call might do
> with it.
>
> At the very least you'd probably have to somehow also guarantee that the
> device has a driver bound (which is the closest thing to a general
> indication of valid DMA ops across all architectures) and block it from
> unbinding for the lifetime of the reference, but that would then mean a
> simple driver which expects to tear down its io-pgtable from its .remove
> callback could never be unbound due to the circular dependency :/

I'd like to hear from the other Rust folks about ideas on how to solve
this ^^. I think it might fit into the Revocable model for device
resources, but it's going to be a bit of an issue for my driver because
for me, the io_pgtable objects are deep in the Vm object that has its
own refcounting, and those are created and destroyed dynamically at
runtime, not just during device probe...

But it's also possible that we have to resign ourselves to a somewhat
leaky/unsound abstraction under the understanding that the C API has
requirements that are difficult to encode in the Rust type system...

There's actually a much longer story to tell here though... I've done a
lot of thinking about what "safe" means in the context of the kernel,
what the useful safety boundaries are, and things like that. Especially
within complex drivers, the answer is a lot greyer than what you tend to
get with Rust userspace code. But at the end of the day, Rust lets you
make things much safer and reduce the chances of bugs creeping in
dramatically, even if you have some slightly unsound abstractions and
other issues. Maybe I should write a long form blog post about this,
would people find that useful?

For whatever this anecdote is worth, my GPU driver is about 16000 lines
of Rust and has ~100 unsafe blocks, and has been in a few downstream
distro repos since December with actual users and nobody has ever
managed to oops it as far as I know, including other Mesa developers who
have been feeding it all kinds of broken command buffers and things like
that. It has survived OOMs leading to untested error paths, UAPI
fuzzing, and endless GPU faults/timeouts... and I definitely am not a
genius who writes bug-free code! The firmware itself is also very easy
to crash (raw unsafe pointers everywhere in shared memory), and crashes
are unrecoverable (you need to reboot), but I try to use Rust's type
system and ownership model to extend safety to that interface as much as
possible and I think people other than me have only managed to get the
GPU firmware to crash twice (both cache coherency related, I *think*
I've finally eliminated that issue at the root now). The uptime on my M2
streaming machine running a decent GPU workload for 20 hours or so
weekly is 44 days and counting... I think that old kernel even has some
minor memory leaks that I've since fixed (you *can* get those in Rust
with circular references), but it's so minor it's going to be at least a
few more months before it becomes a problem.

So I guess what I'm saying is that at the end of the day, if we can't
get an interface to be 100% safe and sound and usable, that's probably
okay. We're still getting a lot of safe mileage out of the other 99%! ^^

~~ Lina

2023-02-24 16:32:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On Sat, Feb 25, 2023 at 12:51:38AM +0900, Asahi Lina wrote:
> On 25/02/2023 00.34, Greg Kroah-Hartman wrote:
> > On Sat, Feb 25, 2023 at 12:10:47AM +0900, Asahi Lina wrote:
> >>>> +impl Device {
> >>>> + /// Creates a new device instance.
> >>>> + ///
> >>>> + /// # Safety
> >>>> + ///
> >>>> + /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
> >>>> + pub unsafe fn new(ptr: *mut bindings::device) -> Self {
> >>>> + // SAFETY: By the safety requirements, ptr is valid and its refcounted will be incremented.
> >>>> + unsafe { bindings::get_device(ptr) };
> >>>
> >>> You don't check the return value of get_device()? What if it failed
> >>> (hint, it can)?
> >>
> >> Really? I looked at the implementation and I don't see how it can fail,
> >> as long as the argument is non-NULL and valid... (which is a
> >> precondition of this unsafe function). Did I miss something?
> >
> > This function has changed a bit over time, but yes, it could fail if
> > someone else just dropped the last reference right before you tried to
> > grab a new one (look at the implementation of kobject_get()).
> >
> > Yes, if you "know" you have a non-zero reference count first, it should
> > never fail, but how do you know this? We have the ability to check the
> > return value of the function, shouldn't that happen?
>
> Well, we know this because it is part of the invariant. If we don't
> uphold invariants, things fall apart... that's why we create safe
> abstractions that don't let you break those invariants after all!
>
> In this particular case though, I don't see what we can usefully do
> here. `device_get()` is going to be part of Clone impls and things like
> that which are non-fallible. At most we can assert!() and rust-panic
> (which is a BUG as far as I know) if it fails... would that be preferable?

That's a good question, I don't know. In thinking about it some more,
I think we are ok with this as-is for now.

BUT, I want to see any code that is actually using a struct device, and
I really want to review the heck out of the platform device api that you
must have somewhere, as that might show some other issues around this
type of thing that might be lurking.

thanks,

greg k-h

2023-02-24 16:53:32

by Asahi Lina

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On 25/02/2023 01.31, Greg Kroah-Hartman wrote:
> On Sat, Feb 25, 2023 at 12:51:38AM +0900, Asahi Lina wrote:
>> On 25/02/2023 00.34, Greg Kroah-Hartman wrote:
>>> On Sat, Feb 25, 2023 at 12:10:47AM +0900, Asahi Lina wrote:
>>>>>> +impl Device {
>>>>>> + /// Creates a new device instance.
>>>>>> + ///
>>>>>> + /// # Safety
>>>>>> + ///
>>>>>> + /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
>>>>>> + pub unsafe fn new(ptr: *mut bindings::device) -> Self {
>>>>>> + // SAFETY: By the safety requirements, ptr is valid and its refcounted will be incremented.
>>>>>> + unsafe { bindings::get_device(ptr) };
>>>>>
>>>>> You don't check the return value of get_device()? What if it failed
>>>>> (hint, it can)?
>>>>
>>>> Really? I looked at the implementation and I don't see how it can fail,
>>>> as long as the argument is non-NULL and valid... (which is a
>>>> precondition of this unsafe function). Did I miss something?
>>>
>>> This function has changed a bit over time, but yes, it could fail if
>>> someone else just dropped the last reference right before you tried to
>>> grab a new one (look at the implementation of kobject_get()).
>>>
>>> Yes, if you "know" you have a non-zero reference count first, it should
>>> never fail, but how do you know this? We have the ability to check the
>>> return value of the function, shouldn't that happen?
>>
>> Well, we know this because it is part of the invariant. If we don't
>> uphold invariants, things fall apart... that's why we create safe
>> abstractions that don't let you break those invariants after all!
>>
>> In this particular case though, I don't see what we can usefully do
>> here. `device_get()` is going to be part of Clone impls and things like
>> that which are non-fallible. At most we can assert!() and rust-panic
>> (which is a BUG as far as I know) if it fails... would that be preferable?
>
> That's a good question, I don't know. In thinking about it some more,
> I think we are ok with this as-is for now.
>
> BUT, I want to see any code that is actually using a struct device, and
> I really want to review the heck out of the platform device api that you
> must have somewhere, as that might show some other issues around this
> type of thing that might be lurking.

The platform abstraction (which I didn't write either) is here [1],
though that's just pulled straight from RfL right now without proper
commit/attribution info and has a few other dependencies still.

But the refcounting is actually really boring on that one... there is
none! A platform::Device instance only exists during probe and cannot be
cloned or stolen, so it only represents a borrowed reference during that
time, and obviously the subsystem is guaranteed to hold a reference
while the driver probe callback runs (right? I mean, things would be
incredibly broken if that weren't the case and probe could randomly lose
the reference halfway through). There is definitely going to come a time
when this isn't enough, but right now with the device resource model
stuff it works, and of course it's really easy to review since it's so
simple!

More to the point though: I think it's probably okay if some things
can't be checked in these initial submissions, but then we check them as
more users (in `kernel` and drivers) come in later? As you mentioned
there's a chicken-and-egg issue, and also I'd say a mismatch between
"what you need to validate certain invariants/design decisions" and
"what is reasonable to submit as a single series". But at the end of the
day we're going to have to refer back to this code when reviewing
further code that uses it inside `kernel` anyway (especially this early
on), so it's probably okay to fix anything that we missed or discover is
broken then? I certainly don't mind adding a bunch of "fix broken thing"
patches at the head of later reviews if we run into issues ^^

[1]
https://github.com/Rust-for-Linux/linux/pull/969/commits/33770890aa71a491d81ec2cb2b7a45d953d1b7dd


~~ Lina

2023-02-24 18:52:18

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On 2023-02-24 15:32, Greg Kroah-Hartman wrote:
> On Fri, Feb 24, 2023 at 02:32:47PM +0000, Robin Murphy wrote:
>> On 2023-02-24 14:11, Greg Kroah-Hartman wrote:
>>> Thanks for the detailed rust explainations, I'd like to just highlight
>>> one thing:
>>>
>>> On Fri, Feb 24, 2023 at 10:15:12PM +0900, Asahi Lina wrote:
>>>> On 24/02/2023 20.23, Greg Kroah-Hartman wrote:
>>>>> And again, why are bindings needed for a "raw" struct device at all?
>>>>> Shouldn't the bus-specific wrappings work better?
>>>>
>>>> Because lots of kernel subsystems need to be able to accept "any" device
>>>> and don't care about the bus! That's what this is for.
>>>
>>> That's great, but:
>>>
>>>> All the bus
>>>> wrappers would implement this so they can be used as an argument for all
>>>> those subsystems (plus a generic one when you just need to pass around
>>>> an actual owned generic reference and no longer need bus-specific
>>>> operations - you can materialize that out of a RawDevice impl, which is
>>>> when get_device() would be called). That's why I'm introducing this now,
>>>> because both io_pgtable and rtkit need to take `struct device` pointers
>>>> on the C side so we need some "generic struct device" view on the Rust side.
>>>
>>> In looking at both ftkit and io_pgtable, those seem to be good examples
>>> of how "not to use a struct device", so trying to make safe bindings
>>> from Rust to these frameworks is very ironic :)
>>>
>>> rtkit takes a struct device pointer and then never increments it,
>>> despite saving it off, which is unsafe. It then only uses it to print
>>> out messages if things go wrong (or right in some cases), which is odd.
>>> So it can get away from using a device pointer entirely, except for the
>>> devm_apple_rtkit_init() call, which I doubt you want to call from rust
>>> code, right?
>>>
>>> for io_pgtable, that's a bit messier, you want to pass in a device that
>>> io_pgtable treats as a "device" but again, it is NEVER properly
>>> reference counted, AND, it is only needed to try to figure out the bus
>>> operations that dma memory should be allocated from for this device. So
>>> what would be better to save off there would be a pointer to the bus,
>>> which is constant and soon will be read-only so there are no lifetime
>>> rules needed at all (see the major struct bus_type changes going into
>>> 6.3-rc1 that will enable that to happen).
>>
>> FWIW the DMA API *has* to know which specific device it's operating with,
>> since the relevant properties can and do vary even between different devices
>> within a single bus_type (e.g. DMA masks).
>
> What bus_type has different DMA masks depending on the device on that
> bus today?

Certainly PCI and platform, which are about 99% of what matters in terms
of DMA. You may say "ewww" again, but even on something as "normal" as a
PCIe GPU chip, the audio codec function may well have a different DMA
mask from the GPU function, let alone distinct devices across
cards/slots/segments/etc. differing - e.g. some PCIe USB controllers are
still limited to 32 bits, where your more-capable GPU/NVMe/NIC etc.
would definitely not be cool with a lowest-common-denominator constraint.

> And the iommu ops are on the bus, not the device, but there is a
> iommu_group on the device, is that what you are referring to?
>
> Am I getting iommu and dma stuff confused here? A bus also has dma
> callbacks, but yet the device itself has dma_ops?

Confusion is excusable here - the bus "dma_configure" callbacks are
mostly actually IOMMU configuration, which is a bit of historical legacy
that still needs more cleaning up (because it makes things happen in the
wrong order from what the IOMMU API really wants). I'm hoping to get to
that soon once we land my current series finishing the bus->iommu_ops
removal.

(FWIW what I want to do is flip things around so the buses just provide
a method for the IOMMU API to to retrieve whatever bus-defined IDs it
needs to associate with a given device, rather than having the bus code
call into the IOMMU API itself as currently.)

>> In the case of io-pgtable at least, there's no explicit refcounting since
>> the struct device must be the one representing the physical
>> platform/PCI/etc. device consuming the pagetable, so if that were to
>> disappear from underneath its driver while the pagetable is still in use,
>> things would already have gone very very wrong indeed :)
>
> But that could happen at any point in time, the device can be removed
> from the system with no warning, how do you guarantee that io-pgtable is
> being initialized with a device that can NOT be removed?
>
> Think of drawers containing CPUs and PCI devices and memory, Linux
> has supported hot-removal of those for decades. (well, not for memory,
> we could only hot-add that...)
>
> Again, passing in a pointer to a struct device, and saving it off
> WITHOUT incrementing the reference count is not ok, that's not how
> reference counts work...

io-pgtable is just a utility library which that device's driver is
calling into. It never does anything with that pointer by itself, and
it's implicitly expected that the ops are only going to be called by
whoever allocated them. If the driver needs to cope with a surprise
removal event while bound, that's on the driver, but I'd assume the bus
code is still expected to at least give it a chance to clean up
gracefully before the struct device completely disappears. I don't think
I've ever seen a driver explicitly take a reference on its own device
for "normal" operation, so either everything's wrong, or that guarantee
must already come from deeper in the driver core.

Yes, *technically* someone could invoke io-pgtable ops from some context
other than the driver which allocated them, but it would be wrong and
the answer would be "don't do that". We've got a huge quantity of APIs
all over Linux where the expected usage model is intuitive, but
impractical to formally assert, so it's just assumed that you do the
right thing and expect fireworks if you don't. I do fully appreciate
that "don't do that" is not an easy thing for Rust to nail down.

There's nothing inherently special about saving a pointer in a structure
either - if someone's failing to respect the expected conventions, then
the lifetime of a pointer *anywhere* matters, and even something as
innocuous as dev_name() should strictly take a reference on its argument
in case it's called concurrently with the device being freed. In fact,
consider if get_device() is pre-empted right on entry, and by the time
it comes back to dereference the kobject, someone else has racily
dropped the last reference and already freed it. Unless we can draw a
line of reasonable expectations somewhere, we've no pretence of being
able to guarantee *anything*.

> But again, let's see about disconnecting the iommu ops entirely from the
> device and just relying on the bus, that should work better, rigth?

Um, other way - iommu_ops have almost finished moving *off* the bus to
be per-device. You acked the patch the other week and seemed pleased
with it ;)

But either way that's orthogonal to this case, since the device that
io-pgtable deals with cannot have iommu_ops of its own (it *is* the
IOMMU, or at least a GPU MMU acting as one).

Cheers,
Robin.

2023-02-24 19:22:21

by Miguel Ojeda

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On Fri, Feb 24, 2023 at 5:23 PM Asahi Lina <[email protected]> wrote:
>
> So I guess what I'm saying is that at the end of the day, if we can't
> get an interface to be 100% safe and sound and usable, that's probably
> okay. We're still getting a lot of safe mileage out of the other 99%! ^^

We talked a few times about what approach to take for things where a
fully safe API is not feasible. There have been differing opinions in
the past.

One approach would be requiring a "global `unsafe`" so to speak, once,
somewhere in the API -- it does not improve things much, but at least
it would make the user acknowledge the pitfalls of that particular
API/feature/subsystem/hardware/... e.g. for module unloading, one
could ask for an `unsafe` inside the `module!` macro invocation (like
`unsafe unloadable: true,`). This would allow for `// SAFETY: ...`
comments.

Another approach would be declaring some of those "external entities"
outside the scope of Rust's safety guarantees, like it is done for
e.g. `/proc/self/mem` in userspace Rust [1]. They would be documented
wherever relevant, and perhaps we could have an "acknowledged
soundness holes" list.

Having said that, of course, what we definitely don't want to allow is
for subsystems to provide unsound safe APIs for no reason.

[1] https://doc.rust-lang.org/stable/std/os/unix/io/index.html#procselfmem-and-similar-os-features

Cheers,
Miguel

2023-02-25 07:00:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On Fri, Feb 24, 2023 at 06:52:04PM +0000, Robin Murphy wrote:
> > But again, let's see about disconnecting the iommu ops entirely from the
> > device and just relying on the bus, that should work better, rigth?
>
> Um, other way - iommu_ops have almost finished moving *off* the bus to be
> per-device. You acked the patch the other week and seemed pleased with it ;)

Oops, yes, sorry, I meant the other way around in my comment, see, it's
confusing :)

And yes, I want to see your patch series merged, it will make my "clean
up the driver core lifetime rules" work a lot easier, is it still
planned for 6.3-rc1?

> But either way that's orthogonal to this case, since the device that
> io-pgtable deals with cannot have iommu_ops of its own (it *is* the IOMMU,
> or at least a GPU MMU acting as one).

Agreed, and thanks for the detailed explaination, it makes more sense
now. But I still think the pointer reference needs to be incremented
when saved off to be at least a bit more sane.

thanks,

greg k-h

2023-02-25 17:17:34

by Alyssa Rosenzweig

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

February 24, 2023 10:25 AM, "Greg Kroah-Hartman" <[email protected]> wrote:

> On Fri, Feb 24, 2023 at 11:44:59PM +0900, Asahi Lina wrote:
>
>> On 2023/02/24 23:19, Greg Kroah-Hartman wrote:>> Can we see some users
>> of this code posted so I can see how struct device
>> is going to work in a rust driver? That's the thing I worry most about
>> the rust/C interaction here as we have two different ways of thinking
>> about reference counts from the two worlds and putting them together is
>> going to be "interesting", as can be seen here already.
>>
>> Also, where are you getting your 'struct device' from in the first
>> place? What bus is createing it and giving it to your rust driver?
>>
>> That would be platform for my GPU driver, matched via OF compatible.
>
> Ick, a platform device? The GPU isn't on the PCI bus? Wow, that's
> horrid...

This is bog standard for Arm SoCs... As far as I know, it's all platform devices in the Arm GPU world: Mali, Adreno, Tegra, VideoCore, and yes, Imaginapple. not really sure what good PCI would do for integrated GPUs.

2023-03-05 06:39:43

by Wedson Almeida Filho

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

Hi Greg,

As Lina points out, I wrote this code, so I'll try to answer your questions.

Lina, apologies for the delay in participating, I've been distracted
with other stuff.

On Fri, 24 Feb 2023 at 08:19, Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Fri, Feb 24, 2023 at 07:53:17PM +0900, Asahi Lina wrote:
> > From: Wedson Almeida Filho <[email protected]>
> >
> > Add a Device type which represents an owned reference to a generic
> > struct device. This minimal implementation just handles reference
> > counting and allows the user to get the device name.
>
> What good is just the device name? I'm all for proper bindings to hook
> up properly to the driver core, but this feels like it's not really
> doing much of anything.
>
> Do you have a real user that we can see how this is interacting?
>
> And what does a driver care about the device name anyway? It should
> only be using the dev_*() calls to print that info out to the log, and
> never messing around with it in any other format as that's what
> userspace expects.

I added this for the pl061 gpio driver, which initialises the gpio
chip label with the device name:
https://elixir.bootlin.com/linux/v6.2.2/source/drivers/gpio/gpio-pl061.c#L333

Someone suggested this driver as good example in some of these earlier
interminable threads. The source code for it is in our Rust-for-Linux
rust branch.

> > Lina: Rewrote commit message, dropped the Amba bits, and squashed in
> > simple changes to the core Device code from latter commits in
> > rust-for-linux/rust. Also include the rust_helper_dev_get_drvdata
> > helper which will be needed by consumers later on anyway.
> >
> > Co-developed-by: Miguel Ojeda <[email protected]>
> > Signed-off-by: Miguel Ojeda <[email protected]>
> > Signed-off-by: Wedson Almeida Filho <[email protected]>
> > Signed-off-by: Asahi Lina <[email protected]>
> > ---
> > rust/helpers.c | 13 +++++++++
> > rust/kernel/device.rs | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 88 insertions(+), 1 deletion(-)
> >
> > diff --git a/rust/helpers.c b/rust/helpers.c
> > index 04b9be46e887..54954fd80c77 100644
> > --- a/rust/helpers.c
> > +++ b/rust/helpers.c
> > @@ -20,6 +20,7 @@
> >
> > #include <linux/bug.h>
> > #include <linux/build_bug.h>
> > +#include <linux/device.h>
> > #include <linux/err.h>
> > #include <linux/refcount.h>
> >
> > @@ -65,6 +66,18 @@ long rust_helper_PTR_ERR(__force const void *ptr)
> > }
> > EXPORT_SYMBOL_GPL(rust_helper_PTR_ERR);
> >
> > +void *rust_helper_dev_get_drvdata(struct device *dev)
> > +{
> > + return dev_get_drvdata(dev);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_dev_get_drvdata);
>
> No matching dev_set_drvdata()? What good is getting a random void
> pointer if you couldn't set it in the first place? :)
>
> > +const char *rust_helper_dev_name(const struct device *dev)
> > +{
> > + return dev_name(dev);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_dev_name);
>
> Again, why? What is going to use this?
>
> And I don't really understand the rules you are putting on the name
> string after calling this, more below:
>
> > +
> > /*
> > * We use `bindgen`'s `--size_t-is-usize` option to bind the C `size_t` type
> > * as the Rust `usize` type, so we can use it in contexts where Rust
> > diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs
> > index 9be021e393ca..e57da622d817 100644
> > --- a/rust/kernel/device.rs
> > +++ b/rust/kernel/device.rs
> > @@ -4,7 +4,7 @@
> > //!
> > //! C header: [`include/linux/device.h`](../../../../include/linux/device.h)
> >
> > -use crate::bindings;
> > +use crate::{bindings, str::CStr};
> >
> > /// A raw device.
> > ///
> > @@ -20,4 +20,78 @@ use crate::bindings;
> > pub unsafe trait RawDevice {
> > /// Returns the raw `struct device` related to `self`.
> > fn raw_device(&self) -> *mut bindings::device;
> > +
> > + /// Returns the name of the device.
> > + fn name(&self) -> &CStr {
> > + let ptr = self.raw_device();
> > +
> > + // SAFETY: `ptr` is valid because `self` keeps it alive.
> > + let name = unsafe { bindings::dev_name(ptr) };
> > +
> > + // SAFETY: The name of the device remains valid while it is alive (because the device is
> > + // never renamed, per the safety requirement of this trait). This is guaranteed to be the
> > + // case because the reference to `self` outlives the one of the returned `CStr` (enforced
> > + // by the compiler because of their lifetimes).
> > + unsafe { CStr::from_char_ptr(name) }
>
> Why can the device never be renamed? Devices are renamed all the time,
> sometimes when you least expect it (i.e. by userspace). So how is this
> considered "safe"? and actually correct?
>
> Again, maybe seeing a real user of this might make more sense, but
> as-is, this feels wrong and not needed at all.

This requirement is to allow callers to use the string without having
to make a copy of it.

If subsystems/buses are not following what the C documentation says,
as you point out in another thread, we have a several options: (a)
remove access to names altogether, (b) leave things as they are, then
those subsystems wouldn't be able to honour the safety requirements of
this trait therefore they wouldn't implement it, (c) make a copy of
the string, etc.

Since we don't have immediate plans to upstream the pl061 driver, I
think we should just remove the name functionality for now and revisit
it later if the need arises.

>
> > + }
> > +}
> > +
> > +/// A ref-counted device.
> > +///
> > +/// # Invariants
> > +///
> > +/// `ptr` is valid, non-null, and has a non-zero reference count. One of the references is owned by
> > +/// `self`, and will be decremented when `self` is dropped.
> > +pub struct Device {
> > + pub(crate) ptr: *mut bindings::device,
> > +}
> > +
> > +// SAFETY: `Device` only holds a pointer to a C device, which is safe to be used from any thread.
> > +unsafe impl Send for Device {}
> > +
> > +// SAFETY: `Device` only holds a pointer to a C device, references to which are safe to be used
> > +// from any thread.
> > +unsafe impl Sync for Device {}
> > +
> > +impl Device {
> > + /// Creates a new device instance.
> > + ///
> > + /// # Safety
> > + ///
> > + /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
> > + pub unsafe fn new(ptr: *mut bindings::device) -> Self {
> > + // SAFETY: By the safety requirements, ptr is valid and its refcounted will be incremented.
> > + unsafe { bindings::get_device(ptr) };
>
> You don't check the return value of get_device()? What if it failed
> (hint, it can)?

I think this was already discussed, but anyway: the caller owns a
reference to this device, so `get_device` is safe to call and cannot
fail.

Note that `get_device` calls `kobject_get`, which doesn't fail. I
suppose you're hinting at `kobject_get_unless_zero`, which of course
can fail but isn't in use here.

>
> > + // INVARIANT: The safety requirements satisfy all but one invariant, which is that `self`
> > + // owns a reference. This is satisfied by the call to `get_device` above.
> > + Self { ptr }
> > + }
> > +
> > + /// Creates a new device instance from an existing [`RawDevice`] instance.
> > + pub fn from_dev(dev: &dyn RawDevice) -> Self {
>
> I am a rust newbie, but I don't understand this "RawDevice" here at all.

Different buses will have their own Rust "Device" type, for example,
pci::Device, amba::Device, platform::Device that wrap their C
counterparts pci_dev, amba_device, platform_device.

"RawDevice" is a trait for functionality that is common to all
devices. It exposes the "struct device" of each bus/subsystem so that
functions that work on any "struct device", for example, `clk_get`,
`pr_info`. will automatically work on all subsystems.

For example, as part writing Rust abstractions for a platform devices,
we have a platform::Device type, which is wrapper around `struct
platform_device`. It has a bunch of associated functions that do
things that are specific to the platform bus. But then they also
implement the `RawDevice` trait (by implementing `raw_device` that
returns &pdev->dev), which allows drivers to call `clk_get` and the
printing functions directly.

Let's say `pdev` is a platform device; if we wanted to call `clk_get`
in C, we'd do something like:

clk = clk_get(&pdev->dev, NULL);

In Rust, we'd do:

clk = pdev.clk_get(None);

(Note that we didn't have to know that pdev had a field whose type is
a `struct device` that we could use to call clk_get on; `RawDevice`
encoded this information.)

Does the intent of the abstraction make sense to you now?

> > + // SAFETY: The requirements are satisfied by the existence of `RawDevice` and its safety
> > + // requirements.
> > + unsafe { Self::new(dev.raw_device()) }
> > + }
> > +}
> > +
> > +// SAFETY: The device returned by `raw_device` is the one for which we hold a reference.
> > +unsafe impl RawDevice for Device {
> > + fn raw_device(&self) -> *mut bindings::device {
> > + self.ptr
> > + }
>
> What does this raw device do? What is the relationship to a "real"
> device? Maybe it's just my lack of rust knowledge here showing, so any
> hints would be appreciated.
>

This is the function that returns the `struct device` field embedded
in the different devices for each bus.

> > +}
> > +
> > +impl Drop for Device {
> > + fn drop(&mut self) {
> > + // SAFETY: By the type invariants, we know that `self` owns a reference, so it is safe to
> > + // relinquish it now.
> > + unsafe { bindings::put_device(self.ptr) };
>
> So it is now ok for this to be freed?
>
> One meta-comment, why is any of this needed at all? No driver should
> ever be dealing with a "raw" struct device pointer at all, right? They
> should be calling into subsystems that give it a pointer to a
> bus-specific pointer type (gpio, usb, pci, etc.)
>
> So I'm thinking that adding support for "raw" struct device pointers
> feels ripe for abuse in a "the code should not be doing that" type of
> thing.
>
> Unless you are writing a new bus/class in rust?

Drivers will not use this directly. They'll benefit from it as I
described above to be able to call bus-agnostic functions.

Implementers of bus abstractions will use this to say "my device also
has a `struct device` in it, and I want it to benefit from all
bus-agnostic device manipulation functions".

Implementers of abstractions of bus-agnostic device functions will add
their abstractions to `RawDevice` instead of bus-specific types so
that they work with all devices.

Cheers,
-Wedson

2023-03-05 06:52:25

by Wedson Almeida Filho

[permalink] [raw]
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On Fri, 24 Feb 2023 at 13:23, Asahi Lina <[email protected]> wrote:
>
> On 25/02/2023 00.14, Robin Murphy wrote:
> > On 2023-02-24 14:48, Asahi Lina wrote:
> >>
> >>
> >> On 2023/02/24 23:32, Robin Murphy wrote:
> >>> FWIW the DMA API *has* to know which specific device it's operating
> >>> with, since the relevant properties can and do vary even between
> >>> different devices within a single bus_type (e.g. DMA masks).
> >>>
> >>> In the case of io-pgtable at least, there's no explicit refcounting
> >>> since the struct device must be the one representing the physical
> >>> platform/PCI/etc. device consuming the pagetable, so if that were to
> >>> disappear from underneath its driver while the pagetable is still in
> >>> use, things would already have gone very very wrong indeed :)
> >>
> >> There's no terribly good way to encode this relationship in safe Rust as
> >> far as I know. So although it might be "obvious" (and I think my driver
> >> can never violate it as it is currently designed), this means the Rust
> >> abstraction will have to take the device reference if the C side does
> >> not, because safe rust abstractions have to actually make these bugs
> >> impossible and nothing stops a Rust driver from, say, stashing an
> >> io_pgtable reference into a global and letting the device go away.
> >
> > If someone did that, then simply holding a struct device reference
> > wouldn't guarantee much, since it only prevents the pointer itself from
> > becoming invalid - it still doesn't say any of the data *in* the
> > structure is still valid and "safe" for what a DMA API call might do
> > with it.
> >
> > At the very least you'd probably have to somehow also guarantee that the
> > device has a driver bound (which is the closest thing to a general
> > indication of valid DMA ops across all architectures) and block it from
> > unbinding for the lifetime of the reference, but that would then mean a
> > simple driver which expects to tear down its io-pgtable from its .remove
> > callback could never be unbound due to the circular dependency :/
>
> I'd like to hear from the other Rust folks about ideas on how to solve
> this ^^. I think it might fit into the Revocable model for device
> resources, but it's going to be a bit of an issue for my driver because
> for me, the io_pgtable objects are deep in the Vm object that has its
> own refcounting, and those are created and destroyed dynamically at
> runtime, not just during device probe...

Our bus abstractions all require that driver data stored in devices
implement the `DeviceRemoval` trait, which has a single function:
device_remove. The idea is that this function will do something to
relinquish access to resources that shouldn't be accessed after the
device is unbound (device_remove is called, for whatever reason). This
mechanism is generic and allows us to break circular dependencies.

One way to implement the `DeviceRemoval` trait is to hold all
resources in "revocable" wrappers (of which we have several that have
different constraints/requirements), and then revoke access when
`device_remove` is called (this results in blocking while concurrent
users are running, then running "destructors" when they're all done
and before returning). The part that isn't great about this approach
is that we introduce error paths everywhere these resources are used
(because they may have been revoked).

2023-03-09 11:27:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On Sun, Mar 05, 2023 at 03:39:25AM -0300, Wedson Almeida Filho wrote:
> > > + /// Returns the name of the device.
> > > + fn name(&self) -> &CStr {
> > > + let ptr = self.raw_device();
> > > +
> > > + // SAFETY: `ptr` is valid because `self` keeps it alive.
> > > + let name = unsafe { bindings::dev_name(ptr) };
> > > +
> > > + // SAFETY: The name of the device remains valid while it is alive (because the device is
> > > + // never renamed, per the safety requirement of this trait). This is guaranteed to be the
> > > + // case because the reference to `self` outlives the one of the returned `CStr` (enforced
> > > + // by the compiler because of their lifetimes).
> > > + unsafe { CStr::from_char_ptr(name) }
> >
> > Why can the device never be renamed? Devices are renamed all the time,
> > sometimes when you least expect it (i.e. by userspace). So how is this
> > considered "safe"? and actually correct?
> >
> > Again, maybe seeing a real user of this might make more sense, but
> > as-is, this feels wrong and not needed at all.
>
> This requirement is to allow callers to use the string without having
> to make a copy of it.
>
> If subsystems/buses are not following what the C documentation says,
> as you point out in another thread, we have a several options: (a)
> remove access to names altogether, (b) leave things as they are, then
> those subsystems wouldn't be able to honour the safety requirements of
> this trait therefore they wouldn't implement it, (c) make a copy of
> the string, etc.

How about we fix the documentation in the .c code and also drop this as
you really don't need it now.

Want to send a patch for the driver core code fix?

> > > + // owns a reference. This is satisfied by the call to `get_device` above.
> > > + Self { ptr }
> > > + }
> > > +
> > > + /// Creates a new device instance from an existing [`RawDevice`] instance.
> > > + pub fn from_dev(dev: &dyn RawDevice) -> Self {
> >
> > I am a rust newbie, but I don't understand this "RawDevice" here at all.
>
> Different buses will have their own Rust "Device" type, for example,
> pci::Device, amba::Device, platform::Device that wrap their C
> counterparts pci_dev, amba_device, platform_device.
>
> "RawDevice" is a trait for functionality that is common to all
> devices. It exposes the "struct device" of each bus/subsystem so that
> functions that work on any "struct device", for example, `clk_get`,
> `pr_info`. will automatically work on all subsystems.

Why is this being called "Raw" then? Why not just "Device" to follow
along with the naming scheme that the kernel already uses?

> For example, as part writing Rust abstractions for a platform devices,
> we have a platform::Device type, which is wrapper around `struct
> platform_device`. It has a bunch of associated functions that do
> things that are specific to the platform bus. But then they also
> implement the `RawDevice` trait (by implementing `raw_device` that
> returns &pdev->dev), which allows drivers to call `clk_get` and the
> printing functions directly.
>
> Let's say `pdev` is a platform device; if we wanted to call `clk_get`
> in C, we'd do something like:
>
> clk = clk_get(&pdev->dev, NULL);
>
> In Rust, we'd do:
>
> clk = pdev.clk_get(None);
>
> (Note that we didn't have to know that pdev had a field whose type is
> a `struct device` that we could use to call clk_get on; `RawDevice`
> encoded this information.)
>
> Does the intent of the abstraction make sense to you now?

A bit more, yes. But I want to see some real users before agreeing that
it is sane :)

thanks,

greg k-h

2023-03-09 16:55:49

by Wedson Almeida Filho

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On Thu, 9 Mar 2023 at 08:24, Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Sun, Mar 05, 2023 at 03:39:25AM -0300, Wedson Almeida Filho wrote:
> > > > + /// Returns the name of the device.
> > > > + fn name(&self) -> &CStr {
> > > > + let ptr = self.raw_device();
> > > > +
> > > > + // SAFETY: `ptr` is valid because `self` keeps it alive.
> > > > + let name = unsafe { bindings::dev_name(ptr) };
> > > > +
> > > > + // SAFETY: The name of the device remains valid while it is alive (because the device is
> > > > + // never renamed, per the safety requirement of this trait). This is guaranteed to be the
> > > > + // case because the reference to `self` outlives the one of the returned `CStr` (enforced
> > > > + // by the compiler because of their lifetimes).
> > > > + unsafe { CStr::from_char_ptr(name) }
> > >
> > > Why can the device never be renamed? Devices are renamed all the time,
> > > sometimes when you least expect it (i.e. by userspace). So how is this
> > > considered "safe"? and actually correct?
> > >
> > > Again, maybe seeing a real user of this might make more sense, but
> > > as-is, this feels wrong and not needed at all.
> >
> > This requirement is to allow callers to use the string without having
> > to make a copy of it.
> >
> > If subsystems/buses are not following what the C documentation says,
> > as you point out in another thread, we have a several options: (a)
> > remove access to names altogether, (b) leave things as they are, then
> > those subsystems wouldn't be able to honour the safety requirements of
> > this trait therefore they wouldn't implement it, (c) make a copy of
> > the string, etc.
>
> How about we fix the documentation in the .c code and also drop this as
> you really don't need it now.
>
> Want to send a patch for the driver core code fix?

Sure, will do.

> > > > + // owns a reference. This is satisfied by the call to `get_device` above.
> > > > + Self { ptr }
> > > > + }
> > > > +
> > > > + /// Creates a new device instance from an existing [`RawDevice`] instance.
> > > > + pub fn from_dev(dev: &dyn RawDevice) -> Self {
> > >
> > > I am a rust newbie, but I don't understand this "RawDevice" here at all.
> >
> > Different buses will have their own Rust "Device" type, for example,
> > pci::Device, amba::Device, platform::Device that wrap their C
> > counterparts pci_dev, amba_device, platform_device.
> >
> > "RawDevice" is a trait for functionality that is common to all
> > devices. It exposes the "struct device" of each bus/subsystem so that
> > functions that work on any "struct device", for example, `clk_get`,
> > `pr_info`. will automatically work on all subsystems.
>
> Why is this being called "Raw" then? Why not just "Device" to follow
> along with the naming scheme that the kernel already uses?

Because it gives us access to underlying raw `struct device` pointer,
in Rust raw pointers are those unsafe `*mut T` or `*const T`. I'm not
married to the name though, we should probably look for a better one
if this one is confusing.

Just "Device" is already taken. It's a ref-counted `struct device` (it
calls get_device/put_device in the right places automatically,
guarantees no dandling pointers); it is meant to be used by code that
needs to hold on to devices when they don't care about the bus. (It in
fact implements `RawDevice`.)

How about `IsDevice`?

Then, for example, the platform bus would implement `IsDevice` for
`plaform::Device`.

> > For example, as part writing Rust abstractions for a platform devices,
> > we have a platform::Device type, which is wrapper around `struct
> > platform_device`. It has a bunch of associated functions that do
> > things that are specific to the platform bus. But then they also
> > implement the `RawDevice` trait (by implementing `raw_device` that
> > returns &pdev->dev), which allows drivers to call `clk_get` and the
> > printing functions directly.
> >
> > Let's say `pdev` is a platform device; if we wanted to call `clk_get`
> > in C, we'd do something like:
> >
> > clk = clk_get(&pdev->dev, NULL);
> >
> > In Rust, we'd do:
> >
> > clk = pdev.clk_get(None);
> >
> > (Note that we didn't have to know that pdev had a field whose type is
> > a `struct device` that we could use to call clk_get on; `RawDevice`
> > encoded this information.)
> >
> > Does the intent of the abstraction make sense to you now?
>
> A bit more, yes. But I want to see some real users before agreeing that
> it is sane :)

Fair enough.

Cheers,
-Wedson

2023-03-09 17:14:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On Thu, Mar 09, 2023 at 01:46:39PM -0300, Wedson Almeida Filho wrote:
> On Thu, 9 Mar 2023 at 08:24, Greg Kroah-Hartman
> <[email protected]> wrote:
> > > > > + // owns a reference. This is satisfied by the call to `get_device` above.
> > > > > + Self { ptr }
> > > > > + }
> > > > > +
> > > > > + /// Creates a new device instance from an existing [`RawDevice`] instance.
> > > > > + pub fn from_dev(dev: &dyn RawDevice) -> Self {
> > > >
> > > > I am a rust newbie, but I don't understand this "RawDevice" here at all.
> > >
> > > Different buses will have their own Rust "Device" type, for example,
> > > pci::Device, amba::Device, platform::Device that wrap their C
> > > counterparts pci_dev, amba_device, platform_device.
> > >
> > > "RawDevice" is a trait for functionality that is common to all
> > > devices. It exposes the "struct device" of each bus/subsystem so that
> > > functions that work on any "struct device", for example, `clk_get`,
> > > `pr_info`. will automatically work on all subsystems.
> >
> > Why is this being called "Raw" then? Why not just "Device" to follow
> > along with the naming scheme that the kernel already uses?
>
> Because it gives us access to underlying raw `struct device` pointer,
> in Rust raw pointers are those unsafe `*mut T` or `*const T`. I'm not
> married to the name though, we should probably look for a better one
> if this one is confusing.
>
> Just "Device" is already taken. It's a ref-counted `struct device` (it
> calls get_device/put_device in the right places automatically,
> guarantees no dandling pointers); it is meant to be used by code that
> needs to hold on to devices when they don't care about the bus. (It in
> fact implements `RawDevice`.)

I don't understand, why do you need both of these? Why can't one just
do? Why would you need one without the other? I would think that
"Device" and "RawDevice" here would be the same thing, that is a way to
refer to a "larger" underlying struct device memory chunk in a way that
can be passed around without knowing, or caring, what the "real" device
type is.

> How about `IsDevice`?

That sounds like a question, and would return a boolean, not a structure :)

> Then, for example, the platform bus would implement `IsDevice` for
> `plaform::Device`.

I don't really understand that, sorry.

thanks,

greg k-h

2023-03-09 19:06:23

by Wedson Almeida Filho

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On Thu, 9 Mar 2023 at 14:11, Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Thu, Mar 09, 2023 at 01:46:39PM -0300, Wedson Almeida Filho wrote:
> > On Thu, 9 Mar 2023 at 08:24, Greg Kroah-Hartman
> > <[email protected]> wrote:
> > > > > > + // owns a reference. This is satisfied by the call to `get_device` above.
> > > > > > + Self { ptr }
> > > > > > + }
> > > > > > +
> > > > > > + /// Creates a new device instance from an existing [`RawDevice`] instance.
> > > > > > + pub fn from_dev(dev: &dyn RawDevice) -> Self {
> > > > >
> > > > > I am a rust newbie, but I don't understand this "RawDevice" here at all.
> > > >
> > > > Different buses will have their own Rust "Device" type, for example,
> > > > pci::Device, amba::Device, platform::Device that wrap their C
> > > > counterparts pci_dev, amba_device, platform_device.
> > > >
> > > > "RawDevice" is a trait for functionality that is common to all
> > > > devices. It exposes the "struct device" of each bus/subsystem so that
> > > > functions that work on any "struct device", for example, `clk_get`,
> > > > `pr_info`. will automatically work on all subsystems.
> > >
> > > Why is this being called "Raw" then? Why not just "Device" to follow
> > > along with the naming scheme that the kernel already uses?
> >
> > Because it gives us access to underlying raw `struct device` pointer,
> > in Rust raw pointers are those unsafe `*mut T` or `*const T`. I'm not
> > married to the name though, we should probably look for a better one
> > if this one is confusing.
> >
> > Just "Device" is already taken. It's a ref-counted `struct device` (it
> > calls get_device/put_device in the right places automatically,
> > guarantees no dandling pointers); it is meant to be used by code that
> > needs to hold on to devices when they don't care about the bus. (It in
> > fact implements `RawDevice`.)
>
> I don't understand, why do you need both of these? Why can't one just
> do? Why would you need one without the other? I would think that
> "Device" and "RawDevice" here would be the same thing, that is a way to
> refer to a "larger" underlying struct device memory chunk in a way that
> can be passed around without knowing, or caring, what the "real" device
> type is.

`Device` is a struct, it is the Rust abstraction for C's `struct device`.

Let's use the platform bus as our running example: we have
`platform::Device` as the Rust abstraction for C's `struct
platform_device`.

Let's use `clk_get`as our running example of a function that takes a
`struct device` as argument.

If we have a platform device, we can't just call `clk_get` because the
types don't match. In C, we access the `dev` field of `struct
platform_device` before we call `clk_get` (i.e., we call
clk_get(&pdev->dev, ...)), but in Rust we don't want to make the
fields of `platform::Device` public, especially because they're fields
of a C struct. So as part of `platform::Device` we'd have to implement
something like:

impl platform::Device {
fn get_device(&self) -> &Device {
...
}
}

Then calling `clk_get` would be something like:

pdev.get_device().clk_get(...)

The problem is that `clk_get` doesn't know that `platform::Device` is
a device, that's why we need this `get_device()` call on each bus
abstraction of a device, plus on each call to bus-agnostic device
functions.

Since we're implementing this "adapter" function anyway, we may as
well put in a _trait_ and improve how people use it. We say: every
struct that is a device should implement the `RawDevice` (or
`IsDevice`, or whatever we decide to call it) trait. Then
`platform::Device` would still have to implement something like:

impl RawDevice for platform::Device {
fn get_device(&self) -> &Device {
...
}
}

(Note that we went from `impl X` to `impl RawDevice for X`.)

With this trait, users can call `clk_get` on a platform device as follows:

pdev.clk_get(...)

So we improve the experience of driver developers by having bus
abstraction developers add "RawDevice for" to their impl block of the
`get_device` function for their device.

> > How about `IsDevice`?
>
> That sounds like a question, and would return a boolean, not a structure :)

`RawDevice` is not a struct, it's a trait that is implemented by
bus-specific structs.

> > Then, for example, the platform bus would implement `IsDevice` for
> > `plaform::Device`.
>
> I don't really understand that, sorry.

I tried to explain it above. I think the source of the problem is the
distinction between `struct` and `trait`, the latter of course doesn't
exist in C.

Cheers,
-Wedson

2023-03-09 19:43:47

by Miguel Ojeda

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On Thu, Mar 9, 2023 at 6:11 PM Greg Kroah-Hartman
<[email protected]> wrote:
>
> I don't understand, why do you need both of these? Why can't one just
> do? Why would you need one without the other? I would think that
> "Device" and "RawDevice" here would be the same thing, that is a way to
> refer to a "larger" underlying struct device memory chunk in a way that
> can be passed around without knowing, or caring, what the "real" device
> type is.
>
> That sounds like a question, and would return a boolean, not a structure :)
>
> I don't really understand that, sorry.

To complement what Wedson wrote, perhaps this helps a bit, since there
are a few `*Device` things around (I will prefix things with the
module they are in for clarity):

- `device::RawDevice` is a trait, not a struct. This means it is
just an interface/property/tag/feature. Then types (like structs) can
say "I support/provide/implement this trait". In this case, it
requires those types implementing `device::RawDevice` to provide a
suitable `raw_device()` method.

- Then, you have `device::Device`. This is an actual struct,
containing a `struct device *`. It implements `device::RawDevice`, by
returning the pointer.

- Then there are others like `platform::Device` or `amba::Device`.
They are also structs, containing e.g. `struct platform_device *` or
anything they may need. They implement `device::RawDevice` too if it
makes sense, say, by returning a pointer to the `->dev` in their C
struct.

So you have two different kinds of entities here. One is the trait,
and the others are structs that promise to implement that trait
correctly, regardless of how each struct manages to do it.

The trait can also then provide extra default functionality for those
that have implemented it. For instance, the `clk_get()` method Wedson
mentioned can be defined in the trait, and in its body we can call the
`raw_device()` to do so. `raw_device()` is available because we are
inside the trait, even if each struct implementing the trait provides
a different implementation.

Traits are sometimes named after the "main" method they contain, e.g.
`ToString` may be a trait with a `to_string()` method. Here
`device::RawDevice` has a `raw_device()` method, so it makes sense in
that way.

Wedson was suggesting `device::IsDevice` as an alternative, because
`device::Device` is taken for the ref-counted device. Of course, other
arrangements of names could be possible.

`IsDevice` uses Pascal case, so by convention it would be fairly clear
that it would not be a function returning a boolean. But one may
expect that it is a trait that implements a `is_device()` method,
though. (And if that was the case, which it isn't, one could then
indeed expect that such a method would return the boolean you
expected).

Cheers,
Miguel

2023-03-13 17:03:59

by Gary Guo

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On Thu, 9 Mar 2023 16:06:06 -0300
Wedson Almeida Filho <[email protected]> wrote:

> On Thu, 9 Mar 2023 at 14:11, Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Thu, Mar 09, 2023 at 01:46:39PM -0300, Wedson Almeida Filho wrote:
> > > On Thu, 9 Mar 2023 at 08:24, Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > > > > > + // owns a reference. This is satisfied by the call to `get_device` above.
> > > > > > > + Self { ptr }
> > > > > > > + }
> > > > > > > +
> > > > > > > + /// Creates a new device instance from an existing [`RawDevice`] instance.
> > > > > > > + pub fn from_dev(dev: &dyn RawDevice) -> Self {
> > > > > >
> > > > > > I am a rust newbie, but I don't understand this "RawDevice" here at all.
> > > > >
> > > > > Different buses will have their own Rust "Device" type, for example,
> > > > > pci::Device, amba::Device, platform::Device that wrap their C
> > > > > counterparts pci_dev, amba_device, platform_device.
> > > > >
> > > > > "RawDevice" is a trait for functionality that is common to all
> > > > > devices. It exposes the "struct device" of each bus/subsystem so that
> > > > > functions that work on any "struct device", for example, `clk_get`,
> > > > > `pr_info`. will automatically work on all subsystems.
> > > >
> > > > Why is this being called "Raw" then? Why not just "Device" to follow
> > > > along with the naming scheme that the kernel already uses?
> > >
> > > Because it gives us access to underlying raw `struct device` pointer,
> > > in Rust raw pointers are those unsafe `*mut T` or `*const T`. I'm not
> > > married to the name though, we should probably look for a better one
> > > if this one is confusing.
> > >
> > > Just "Device" is already taken. It's a ref-counted `struct device` (it
> > > calls get_device/put_device in the right places automatically,
> > > guarantees no dandling pointers); it is meant to be used by code that
> > > needs to hold on to devices when they don't care about the bus. (It in
> > > fact implements `RawDevice`.)
> >
> > I don't understand, why do you need both of these? Why can't one just
> > do? Why would you need one without the other? I would think that
> > "Device" and "RawDevice" here would be the same thing, that is a way to
> > refer to a "larger" underlying struct device memory chunk in a way that
> > can be passed around without knowing, or caring, what the "real" device
> > type is.
>
> `Device` is a struct, it is the Rust abstraction for C's `struct device`.
>
> Let's use the platform bus as our running example: we have
> `platform::Device` as the Rust abstraction for C's `struct
> platform_device`.
>
> Let's use `clk_get`as our running example of a function that takes a
> `struct device` as argument.
>
> If we have a platform device, we can't just call `clk_get` because the
> types don't match. In C, we access the `dev` field of `struct
> platform_device` before we call `clk_get` (i.e., we call
> clk_get(&pdev->dev, ...)), but in Rust we don't want to make the
> fields of `platform::Device` public, especially because they're fields
> of a C struct. So as part of `platform::Device` we'd have to implement
> something like:
>
> impl platform::Device {
> fn get_device(&self) -> &Device {
> ...
> }
> }
>
> Then calling `clk_get` would be something like:
>
> pdev.get_device().clk_get(...)

Just thinking out loud here, would it work if `platform::Device`
implements `Deref<Target = Device>`?

Best,
Gary

2023-03-13 17:52:43

by Gary Guo

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On Fri, 24 Feb 2023 19:53:17 +0900
Asahi Lina <[email protected]> wrote:

> From: Wedson Almeida Filho <[email protected]>
>
> Add a Device type which represents an owned reference to a generic
> struct device. This minimal implementation just handles reference
> counting and allows the user to get the device name.
>
> Lina: Rewrote commit message, dropped the Amba bits, and squashed in
> simple changes to the core Device code from latter commits in
> rust-for-linux/rust. Also include the rust_helper_dev_get_drvdata
> helper which will be needed by consumers later on anyway.
>
> Co-developed-by: Miguel Ojeda <[email protected]>
> Signed-off-by: Miguel Ojeda <[email protected]>
> Signed-off-by: Wedson Almeida Filho <[email protected]>
> Signed-off-by: Asahi Lina <[email protected]>
> ---
> rust/helpers.c | 13 +++++++++
> rust/kernel/device.rs | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 88 insertions(+), 1 deletion(-)
>
> diff --git a/rust/helpers.c b/rust/helpers.c
> index 04b9be46e887..54954fd80c77 100644
> --- a/rust/helpers.c
> +++ b/rust/helpers.c
> @@ -20,6 +20,7 @@
>
> #include <linux/bug.h>
> #include <linux/build_bug.h>
> +#include <linux/device.h>
> #include <linux/err.h>
> #include <linux/refcount.h>
>
> @@ -65,6 +66,18 @@ long rust_helper_PTR_ERR(__force const void *ptr)
> }
> EXPORT_SYMBOL_GPL(rust_helper_PTR_ERR);
>
> +void *rust_helper_dev_get_drvdata(struct device *dev)
> +{
> + return dev_get_drvdata(dev);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_dev_get_drvdata);
> +
> +const char *rust_helper_dev_name(const struct device *dev)
> +{
> + return dev_name(dev);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_dev_name);
> +
> /*
> * We use `bindgen`'s `--size_t-is-usize` option to bind the C `size_t` type
> * as the Rust `usize` type, so we can use it in contexts where Rust
> diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs
> index 9be021e393ca..e57da622d817 100644
> --- a/rust/kernel/device.rs
> +++ b/rust/kernel/device.rs
> @@ -4,7 +4,7 @@
> //!
> //! C header: [`include/linux/device.h`](../../../../include/linux/device.h)
>
> -use crate::bindings;
> +use crate::{bindings, str::CStr};
>
> /// A raw device.
> ///
> @@ -20,4 +20,78 @@ use crate::bindings;
> pub unsafe trait RawDevice {
> /// Returns the raw `struct device` related to `self`.
> fn raw_device(&self) -> *mut bindings::device;
> +
> + /// Returns the name of the device.
> + fn name(&self) -> &CStr {
> + let ptr = self.raw_device();
> +
> + // SAFETY: `ptr` is valid because `self` keeps it alive.
> + let name = unsafe { bindings::dev_name(ptr) };
> +
> + // SAFETY: The name of the device remains valid while it is alive (because the device is
> + // never renamed, per the safety requirement of this trait). This is guaranteed to be the
> + // case because the reference to `self` outlives the one of the returned `CStr` (enforced
> + // by the compiler because of their lifetimes).
> + unsafe { CStr::from_char_ptr(name) }
> + }
> +}
> +
> +/// A ref-counted device.
> +///
> +/// # Invariants
> +///
> +/// `ptr` is valid, non-null, and has a non-zero reference count. One of the references is owned by
> +/// `self`, and will be decremented when `self` is dropped.
> +pub struct Device {
> + pub(crate) ptr: *mut bindings::device,
> +}
> +

Shouldn't this be

#[repr(transparent)]
pub struct Device(Opaque<bindings::device>);

?

Best,
Gary

2023-03-13 18:06:27

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 5/5] rust: device: Add a stub abstraction for devices

On Mon, Mar 13, 2023 at 05:52:02PM +0000, Gary Guo wrote:
[...]
> > diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs
> > index 9be021e393ca..e57da622d817 100644
> > --- a/rust/kernel/device.rs
> > +++ b/rust/kernel/device.rs
> > @@ -4,7 +4,7 @@
> > //!
> > //! C header: [`include/linux/device.h`](../../../../include/linux/device.h)
> >
> > -use crate::bindings;
> > +use crate::{bindings, str::CStr};
> >
> > /// A raw device.
> > ///
> > @@ -20,4 +20,78 @@ use crate::bindings;
> > pub unsafe trait RawDevice {
> > /// Returns the raw `struct device` related to `self`.
> > fn raw_device(&self) -> *mut bindings::device;
> > +
> > + /// Returns the name of the device.
> > + fn name(&self) -> &CStr {
> > + let ptr = self.raw_device();
> > +
> > + // SAFETY: `ptr` is valid because `self` keeps it alive.
> > + let name = unsafe { bindings::dev_name(ptr) };
> > +
> > + // SAFETY: The name of the device remains valid while it is alive (because the device is
> > + // never renamed, per the safety requirement of this trait). This is guaranteed to be the
> > + // case because the reference to `self` outlives the one of the returned `CStr` (enforced
> > + // by the compiler because of their lifetimes).
> > + unsafe { CStr::from_char_ptr(name) }
> > + }
> > +}
> > +
> > +/// A ref-counted device.
> > +///
> > +/// # Invariants
> > +///
> > +/// `ptr` is valid, non-null, and has a non-zero reference count. One of the references is owned by
> > +/// `self`, and will be decremented when `self` is dropped.
> > +pub struct Device {
> > + pub(crate) ptr: *mut bindings::device,
> > +}
> > +
>
> Shouldn't this be
>
> #[repr(transparent)]
> pub struct Device(Opaque<bindings::device>);
>
> ?

I have the same feeling, for `task_struct`, we have

#[repr(transparent)]
pub struct Task(pub(crate) UnsafeCell<bindings::task_struct>);

and

pub struct TaskRef<'a> {
task: &'a Task,
_not_send: PhantomData<*mut ()>,
}

I wonder whether we should do the similar for `Device`, meaning `Device`
is just a tranparent representation for `struct device` and another
type (say, `DeviceRef`) is introduced as the reference type, or we can
just use `ARef`.

Regards,
Boqun



>
> Best,
> Gary