Add an (always) reference-counted abstraction for a generic C `struct
device`. This abstraction encapsulates existing `struct device` instances
and manages its reference count.
Subsystems may use this abstraction as a base to abstract subsystem
specific device instances based on a generic `struct device`, such as
`struct pci_dev`.
Co-developed-by: Wedson Almeida Filho <[email protected]>
Signed-off-by: Wedson Almeida Filho <[email protected]>
Signed-off-by: Danilo Krummrich <[email protected]>
---
rust/helpers.c | 1 +
rust/kernel/device.rs | 97 +++++++++++++++++++++++++++++++++++++++++++
rust/kernel/lib.rs | 1 +
3 files changed, 99 insertions(+)
create mode 100644 rust/kernel/device.rs
diff --git a/rust/helpers.c b/rust/helpers.c
index 2c37a0f5d7a8..0e02b2c64c72 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -23,6 +23,7 @@
#include <kunit/test-bug.h>
#include <linux/bug.h>
#include <linux/build_bug.h>
+#include <linux/device.h>
#include <linux/err.h>
#include <linux/errname.h>
#include <linux/mutex.h>
diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs
new file mode 100644
index 000000000000..6e22b0b97928
--- /dev/null
+++ b/rust/kernel/device.rs
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Generic devices that are part of the kernel's driver model.
+//!
+//! C header: [`include/linux/device.h`](srctree/include/linux/device.h)
+
+use crate::{
+ bindings,
+ types::{ARef, Opaque},
+};
+use core::ptr;
+
+/// A reference-counted device.
+///
+/// This structure represents the Rust abstraction for a C `struct device`. This implementation
+/// abstracts the usage of an already existing C `struct device` within Rust code that we get
+/// passed from the C side.
+///
+/// An instance of this abstraction can be obtained temporarily or permanent.
+///
+/// A temporary one is bound to the lifetime of the C `struct device` pointer used for creation.
+/// A permanent instance is always reference-counted and hence not restricted by any lifetime
+/// boundaries.
+///
+/// For subsystems it is recommended to create a permanent instance to wrap into a subsystem
+/// specifc device structure (e.g. `pci::Device`). This is useful for passing it to drivers in
+/// `T::probe()`, such that a driver can store the `ARef<Device>` (equivalent to storing a
+/// `struct device` pointer in a C driver) for arbitrary purposes, e.g. allocating DMA coherent
+/// memory.
+///
+/// # Invariants
+///
+/// The pointer stored in `Self` is non-null and valid for the lifetime of the `ARef` instance. In
+/// particular, the `ARef` instance owns an increment on the underlying object’s reference count.
+#[repr(transparent)]
+pub struct Device(Opaque<bindings::device>);
+
+impl Device {
+ /// Creates a new reference-counted abstraction instance of an existing `struct device` pointer.
+ ///
+ /// # Safety
+ ///
+ /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count,
+ /// i.e. it must be ensured that the reference count of the C `struct device` `ptr` points to
+ /// can't drop to zero, for the duration of this function call.
+ pub unsafe fn from_raw(ptr: *mut bindings::device) -> ARef<Self> {
+ // SAFETY: By the safety requirements, ptr is valid.
+ // Initially increase the reference count by one to compensate for the final decrement once
+ // this newly created `ARef<Device>` instance is dropped.
+ unsafe { bindings::get_device(ptr) };
+
+ // CAST: `Self` is a `repr(transparent)` wrapper around `bindings::device`.
+ let ptr = ptr.cast::<Self>();
+
+ // SAFETY: By the safety requirements, ptr is valid.
+ unsafe { ARef::from_raw(ptr::NonNull::new_unchecked(ptr)) }
+ }
+
+ /// Obtain the raw `struct device *`.
+ pub(crate) fn as_raw(&self) -> *mut bindings::device {
+ self.0.get()
+ }
+
+ /// Convert a raw C `struct device` pointer to a `&'a Device`.
+ ///
+ /// # Safety
+ ///
+ /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count,
+ /// i.e. it must be ensured that the reference count of the C `struct device` `ptr` points to
+ /// can't drop to zero, for the duration of this function call and the entire duration when the
+ /// returned reference exists.
+ pub unsafe fn as_ref<'a>(ptr: *mut bindings::device) -> &'a Self {
+ // SAFETY: Guaranteed by the safety requirements of the function.
+ unsafe { &*ptr.cast() }
+ }
+}
+
+// SAFETY: Instances of `Device` are always reference-counted.
+unsafe impl crate::types::AlwaysRefCounted for Device {
+ fn inc_ref(&self) {
+ // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
+ unsafe { bindings::get_device(self.as_raw()) };
+ }
+
+ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
+ // SAFETY: The safety requirements guarantee that the refcount is non-zero.
+ unsafe { bindings::put_device(obj.cast().as_ptr()) }
+ }
+}
+
+// SAFETY: `Device` only holds a pointer to a C `struct device`, which is safe to be used from any
+// thread.
+unsafe impl Send for Device {}
+
+// SAFETY: `Device` only holds a pointer to a C `struct device`, references to which are safe to be
+// used from any thread.
+unsafe impl Sync for Device {}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index fbd91a48ff8b..dd1207f1a873 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -28,6 +28,7 @@
pub mod alloc;
mod build_assert;
+pub mod device;
pub mod error;
pub mod init;
pub mod ioctl;
--
2.45.1
On Mon, Jun 10, 2024 at 08:02:27PM +0200, Danilo Krummrich wrote:
[...]
> +/// A reference-counted device.
> +///
> +/// This structure represents the Rust abstraction for a C `struct device`. This implementation
> +/// abstracts the usage of an already existing C `struct device` within Rust code that we get
> +/// passed from the C side.
> +///
> +/// An instance of this abstraction can be obtained temporarily or permanent.
> +///
> +/// A temporary one is bound to the lifetime of the C `struct device` pointer used for creation.
> +/// A permanent instance is always reference-counted and hence not restricted by any lifetime
> +/// boundaries.
> +///
> +/// For subsystems it is recommended to create a permanent instance to wrap into a subsystem
> +/// specifc device structure (e.g. `pci::Device`). This is useful for passing it to drivers in
> +/// `T::probe()`, such that a driver can store the `ARef<Device>` (equivalent to storing a
> +/// `struct device` pointer in a C driver) for arbitrary purposes, e.g. allocating DMA coherent
> +/// memory.
> +///
> +/// # Invariants
> +///
> +/// The pointer stored in `Self` is non-null and valid for the lifetime of the `ARef` instance. In
> +/// particular, the `ARef` instance owns an increment on the underlying object’s reference count.
> +#[repr(transparent)]
> +pub struct Device(Opaque<bindings::device>);
> +
[...]
> +
> +// SAFETY: `Device` only holds a pointer to a C `struct device`, which is safe to be used from any
> +// thread.
> +unsafe impl Send for Device {}
> +
> +// SAFETY: `Device` only holds a pointer to a C `struct device`, references to which are safe to be
> +// used from any thread.
> +unsafe impl Sync for Device {}
These comments need some rework, `Device` is not a pointer to `struct
device` anymore. For the `Sync` one, how about:
// SAFETY: `Device` can be shared among threads because all immutable
// methods are protected by the synchronization in `struct device`.
unsafe impl Sync for Device {}
and for `Send`, I actually don't think we can easily say the generic
`Device` is `Send`: you can create a `struct device` where `->release`
requires to be run on the same thread that creates the `device`, and
nothing is wrong about it, I think (e.g. making a thread be the sole
owner of some special devices). Unless, in the #Invariants of `Device`,
and the #safety of `from_ptr`, you mention that `Device` assume its
`->release` can be called on any thread.
Regards,
Boqun
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index fbd91a48ff8b..dd1207f1a873 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -28,6 +28,7 @@
>
> pub mod alloc;
> mod build_assert;
> +pub mod device;
> pub mod error;
> pub mod init;
> pub mod ioctl;
> --
> 2.45.1
>
On Mon, Jun 10, 2024 at 11:38:07AM -0700, Boqun Feng wrote:
> On Mon, Jun 10, 2024 at 08:02:27PM +0200, Danilo Krummrich wrote:
> [...]
> > +/// A reference-counted device.
> > +///
> > +/// This structure represents the Rust abstraction for a C `struct device`. This implementation
> > +/// abstracts the usage of an already existing C `struct device` within Rust code that we get
> > +/// passed from the C side.
> > +///
> > +/// An instance of this abstraction can be obtained temporarily or permanent.
> > +///
> > +/// A temporary one is bound to the lifetime of the C `struct device` pointer used for creation.
> > +/// A permanent instance is always reference-counted and hence not restricted by any lifetime
> > +/// boundaries.
> > +///
> > +/// For subsystems it is recommended to create a permanent instance to wrap into a subsystem
> > +/// specifc device structure (e.g. `pci::Device`). This is useful for passing it to drivers in
> > +/// `T::probe()`, such that a driver can store the `ARef<Device>` (equivalent to storing a
> > +/// `struct device` pointer in a C driver) for arbitrary purposes, e.g. allocating DMA coherent
> > +/// memory.
> > +///
> > +/// # Invariants
> > +///
> > +/// The pointer stored in `Self` is non-null and valid for the lifetime of the `ARef` instance. In
> > +/// particular, the `ARef` instance owns an increment on the underlying object’s reference count.
> > +#[repr(transparent)]
> > +pub struct Device(Opaque<bindings::device>);
> > +
> [...]
> > +
> > +// SAFETY: `Device` only holds a pointer to a C `struct device`, which is safe to be used from any
> > +// thread.
> > +unsafe impl Send for Device {}
> > +
> > +// SAFETY: `Device` only holds a pointer to a C `struct device`, references to which are safe to be
> > +// used from any thread.
> > +unsafe impl Sync for Device {}
>
> These comments need some rework, `Device` is not a pointer to `struct
> device` anymore. For the `Sync` one, how about:
Indeed, I forgot to update them.
>
> // SAFETY: `Device` can be shared among threads because all immutable
> // methods are protected by the synchronization in `struct device`.
> unsafe impl Sync for Device {}
Sounds good.
>
> and for `Send`, I actually don't think we can easily say the generic
> `Device` is `Send`: you can create a `struct device` where `->release`
> requires to be run on the same thread that creates the `device`, and
> nothing is wrong about it, I think (e.g. making a thread be the sole
Hm, I guess in this case we actually can't argue that it's the owners fault to
pass a pointer of this device somewhere else. Since it's C the owner can't
enforce that someone else is not taking a reference and prevent sharing
ownership...
> owner of some special devices). Unless, in the #Invariants of `Device`,
> and the #safety of `from_ptr`, you mention that `Device` assume its
> `->release` can be called on any thread.
...hence, I agree we should indeed add to the #Invariants and #Safety section
that `->release` must be callable from any thread.
However, this is just theory, do we actually have cases where `device::release`
is not allowed to be called from any thread? If so, this would be very confusing
for a reference counted type from a design point of view...
- Danilo
>
> Regards,
> Boqun
>
> > diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> > index fbd91a48ff8b..dd1207f1a873 100644
> > --- a/rust/kernel/lib.rs
> > +++ b/rust/kernel/lib.rs
> > @@ -28,6 +28,7 @@
> >
> > pub mod alloc;
> > mod build_assert;
> > +pub mod device;
> > pub mod error;
> > pub mod init;
> > pub mod ioctl;
> > --
> > 2.45.1
> >
>
On Tue, Jun 11, 2024 at 03:21:31PM +0200, Danilo Krummrich wrote:
> ...hence, I agree we should indeed add to the #Invariants and #Safety section
> that `->release` must be callable from any thread.
>
> However, this is just theory, do we actually have cases where `device::release`
> is not allowed to be called from any thread? If so, this would be very confusing
> for a reference counted type from a design point of view...
What do you mean exactly "by any thread"? Maybe not from interrupt
context, but any other normal thread (i.e. that you can sleep in), it
should be fine to call release() in.
thanks,
greg k-h
On Tue, Jun 11, 2024 at 03:29:22PM +0200, Greg KH wrote:
> On Tue, Jun 11, 2024 at 03:21:31PM +0200, Danilo Krummrich wrote:
> > ...hence, I agree we should indeed add to the #Invariants and #Safety section
> > that `->release` must be callable from any thread.
> >
> > However, this is just theory, do we actually have cases where `device::release`
@Danilo, right, it's only theorical, but it's good to call it out since
it's the requirement for a safe Rust abstraction.
> > is not allowed to be called from any thread? If so, this would be very confusing
> > for a reference counted type from a design point of view...
>
> What do you mean exactly "by any thread"? Maybe not from interrupt
The `Send` trait here doesn't really differ between interrupt contexts
and process contexts, so "by any thread", it includes all the contexts.
However, we rely on klint[1] to detect context mismatch in compile time
(it's still a WIP though). For this case, we would need to mark the
`Device::dec_ref` function as might sleep.
Regards,
Boqun
[1]: https://rust-for-linux.com/klint
> context, but any other normal thread (i.e. that you can sleep in), it
> should be fine to call release() in.
>
> thanks,
>
> greg k-h
On 6/11/24 18:13, Boqun Feng wrote:
> On Tue, Jun 11, 2024 at 03:29:22PM +0200, Greg KH wrote:
>> On Tue, Jun 11, 2024 at 03:21:31PM +0200, Danilo Krummrich wrote:
>>> ...hence, I agree we should indeed add to the #Invariants and #Safety section
>>> that `->release` must be callable from any thread.
>>>
>>> However, this is just theory, do we actually have cases where `device::release`
>
> @Danilo, right, it's only theorical, but it's good to call it out since
> it's the requirement for a safe Rust abstraction.
>
>>> is not allowed to be called from any thread? If so, this would be very confusing
>>> for a reference counted type from a design point of view...
>>
>> What do you mean exactly "by any thread"? Maybe not from interrupt
>
> The `Send` trait here doesn't really differ between interrupt contexts
> and process contexts, so "by any thread", it includes all the contexts.
> However, we rely on klint[1] to detect context mismatch in compile time
> (it's still a WIP though). For this case, we would need to mark the
> `Device::dec_ref` function as might sleep.
I think once we can detect it on compile time we should do that.
As for the might_sleep(), if we want to have this check, we should probably
do it in device_release() [1], rather than in the Rust code only.
[1] https://elixir.bootlin.com/linux/latest/source/drivers/base/core.c#L2557
>
> Regards,
> Boqun
>
> [1]: https://rust-for-linux.com/klint
>
>> context, but any other normal thread (i.e. that you can sleep in), it
>> should be fine to call release() in.
>>
>> thanks,
>>
>> greg k-h
>
On 6/11/24 18:13, Boqun Feng wrote:
> On Tue, Jun 11, 2024 at 03:29:22PM +0200, Greg KH wrote:
>> On Tue, Jun 11, 2024 at 03:21:31PM +0200, Danilo Krummrich wrote:
>>> ...hence, I agree we should indeed add to the #Invariants and #Safety section
>>> that `->release` must be callable from any thread.
>>>
>>> However, this is just theory, do we actually have cases where `device::release`
>
> @Danilo, right, it's only theorical, but it's good to call it out since
> it's the requirement for a safe Rust abstraction.
Similar to my previous reply, if we want to call this out as safety requirement
in `Device::from_raw`, we probably want to add it to the documentation of the C
`struct device`, such that we can argue that this is an invariant of C's
`struct device`.
Otherwise we'd have to write something like:
"It must also be ensured that the `->release` function of a `struct device` can
be called from any non-atomic context. While not being officially documented this
is guaranteed by the invariant of `struct device`."
>
>>> is not allowed to be called from any thread? If so, this would be very confusing
>>> for a reference counted type from a design point of view...
>>
>> What do you mean exactly "by any thread"? Maybe not from interrupt
>
> The `Send` trait here doesn't really differ between interrupt contexts
> and process contexts, so "by any thread", it includes all the contexts.
> However, we rely on klint[1] to detect context mismatch in compile time
> (it's still a WIP though). For this case, we would need to mark the
> `Device::dec_ref` function as might sleep.
>
> Regards,
> Boqun
>
> [1]: https://rust-for-linux.com/klint
>
>> context, but any other normal thread (i.e. that you can sleep in), it
>> should be fine to call release() in.
>>
>> thanks,
>>
>> greg k-h
>
On Wed, Jun 12, 2024 at 04:51:42PM +0200, Danilo Krummrich wrote:
> On 6/11/24 18:13, Boqun Feng wrote:
> > On Tue, Jun 11, 2024 at 03:29:22PM +0200, Greg KH wrote:
> > > On Tue, Jun 11, 2024 at 03:21:31PM +0200, Danilo Krummrich wrote:
> > > > ...hence, I agree we should indeed add to the #Invariants and #Safety section
> > > > that `->release` must be callable from any thread.
> > > >
> > > > However, this is just theory, do we actually have cases where `device::release`
> >
> > @Danilo, right, it's only theorical, but it's good to call it out since
> > it's the requirement for a safe Rust abstraction.
>
> Similar to my previous reply, if we want to call this out as safety requirement
> in `Device::from_raw`, we probably want to add it to the documentation of the C
> `struct device`, such that we can argue that this is an invariant of C's
> `struct device`.
>
> Otherwise we'd have to write something like:
>
> "It must also be ensured that the `->release` function of a `struct device` can
> be called from any non-atomic context. While not being officially documented this
> is guaranteed by the invariant of `struct device`."
In the 20+ years of the driver model being part of the kernel, I don't
think this has come up yet, so maybe you can call the release function
in irq context. I don't know, I was just guessing :)
So let's not go adding constraints that we just do not have please.
Same goes for the C code, so the rust code is no different here.
thanks,
greg k-h
On Wed, Jun 12, 2024 at 05:35:21PM +0200, Danilo Krummrich wrote:
> On Wed, Jun 12, 2024 at 05:02:52PM +0200, Greg KH wrote:
> > On Wed, Jun 12, 2024 at 04:51:42PM +0200, Danilo Krummrich wrote:
> > > On 6/11/24 18:13, Boqun Feng wrote:
> > > > On Tue, Jun 11, 2024 at 03:29:22PM +0200, Greg KH wrote:
> > > > > On Tue, Jun 11, 2024 at 03:21:31PM +0200, Danilo Krummrich wrote:
> > > > > > ...hence, I agree we should indeed add to the #Invariants and #Safety section
> > > > > > that `->release` must be callable from any thread.
> > > > > >
> > > > > > However, this is just theory, do we actually have cases where `device::release`
> > > >
> > > > @Danilo, right, it's only theorical, but it's good to call it out since
> > > > it's the requirement for a safe Rust abstraction.
> > >
> > > Similar to my previous reply, if we want to call this out as safety requirement
> > > in `Device::from_raw`, we probably want to add it to the documentation of the C
> > > `struct device`, such that we can argue that this is an invariant of C's
> > > `struct device`.
> > >
> > > Otherwise we'd have to write something like:
> > >
> > > "It must also be ensured that the `->release` function of a `struct device` can
> > > be called from any non-atomic context. While not being officially documented this
> > > is guaranteed by the invariant of `struct device`."
> >
> > In the 20+ years of the driver model being part of the kernel, I don't
> > think this has come up yet, so maybe you can call the release function
> > in irq context. I don't know, I was just guessing :)
>
> Ah, I see. I thought you know and it's defined, but just not documented.
>
> This means it's simply undefined what we expect to happen when the last
> reference of a device is dropped from atomic context.
>
> Now, I understand (and would even expect) that practically this has never been
> an issue. You'd need two circumstances, release() actually does something that
> is not allowed in atomic context plus the last device reference is dropped from
> atomic context - rather unlikely.
>
> >
> > So let's not go adding constraints that we just do not have please.
> > Same goes for the C code, so the rust code is no different here.
>
> I agree we shouldn't add random constraints, but for writing safe code we also
> have to rely on defined behavior.
As the rust code is relying on C code that could change at any point in
time, how can that ever be "safe"? :)
Sorry, this type of definition annoys me.
> I see two options:
>
> (1) We globally (for struct device) define from which context release() is
> allowed to be called.
If you want, feel free to do that work please. And then find out how to
enforce it in the driver core.
> (2) We define it for the Rust abstraction only and just constrain it to
> non-atomic context to be able to give a safety guarantee. We can't say
> "might be safe from any context, but we don't know".
Why can't you say that? Your "saftey" barrier ends/begins at the
interaction with the rust/c layer. You can't "guarantee" anything on
the C side...
> But again, this is really just a formality, the C code does it all the way and
> practically there never was an issue, which means we actually do follow some
> rules, it's just about writing them down. :)
Again, this has NEVER come up in 20+ years, so maybe it's just not an
issue? Not to say it isn't, but maybe do some tests and see what
happens...
thanks,
greg k-h
On Wed, Jun 12, 2024 at 05:02:52PM +0200, Greg KH wrote:
> On Wed, Jun 12, 2024 at 04:51:42PM +0200, Danilo Krummrich wrote:
> > On 6/11/24 18:13, Boqun Feng wrote:
> > > On Tue, Jun 11, 2024 at 03:29:22PM +0200, Greg KH wrote:
> > > > On Tue, Jun 11, 2024 at 03:21:31PM +0200, Danilo Krummrich wrote:
> > > > > ...hence, I agree we should indeed add to the #Invariants and #Safety section
> > > > > that `->release` must be callable from any thread.
> > > > >
> > > > > However, this is just theory, do we actually have cases where `device::release`
> > >
> > > @Danilo, right, it's only theorical, but it's good to call it out since
> > > it's the requirement for a safe Rust abstraction.
> >
> > Similar to my previous reply, if we want to call this out as safety requirement
> > in `Device::from_raw`, we probably want to add it to the documentation of the C
> > `struct device`, such that we can argue that this is an invariant of C's
> > `struct device`.
> >
> > Otherwise we'd have to write something like:
> >
> > "It must also be ensured that the `->release` function of a `struct device` can
> > be called from any non-atomic context. While not being officially documented this
> > is guaranteed by the invariant of `struct device`."
>
> In the 20+ years of the driver model being part of the kernel, I don't
> think this has come up yet, so maybe you can call the release function
> in irq context. I don't know, I was just guessing :)
Ah, I see. I thought you know and it's defined, but just not documented.
This means it's simply undefined what we expect to happen when the last
reference of a device is dropped from atomic context.
Now, I understand (and would even expect) that practically this has never been
an issue. You'd need two circumstances, release() actually does something that
is not allowed in atomic context plus the last device reference is dropped from
atomic context - rather unlikely.
>
> So let's not go adding constraints that we just do not have please.
> Same goes for the C code, so the rust code is no different here.
I agree we shouldn't add random constraints, but for writing safe code we also
have to rely on defined behavior.
I see two options:
(1) We globally (for struct device) define from which context release() is
allowed to be called.
(2) We define it for the Rust abstraction only and just constrain it to
non-atomic context to be able to give a safety guarantee. We can't say
"might be safe from any context, but we don't know".
But again, this is really just a formality, the C code does it all the way and
practically there never was an issue, which means we actually do follow some
rules, it's just about writing them down. :)
- Danilo
>
> thanks,
>
> greg k-h
>
On Wed, Jun 12, 2024 at 05:50:42PM +0200, Greg KH wrote:
> On Wed, Jun 12, 2024 at 05:35:21PM +0200, Danilo Krummrich wrote:
> > On Wed, Jun 12, 2024 at 05:02:52PM +0200, Greg KH wrote:
> > > On Wed, Jun 12, 2024 at 04:51:42PM +0200, Danilo Krummrich wrote:
> > > > On 6/11/24 18:13, Boqun Feng wrote:
> > > > > On Tue, Jun 11, 2024 at 03:29:22PM +0200, Greg KH wrote:
> > > > > > On Tue, Jun 11, 2024 at 03:21:31PM +0200, Danilo Krummrich wrote:
> > > > > > > ...hence, I agree we should indeed add to the #Invariants and #Safety section
> > > > > > > that `->release` must be callable from any thread.
> > > > > > >
> > > > > > > However, this is just theory, do we actually have cases where `device::release`
> > > > >
> > > > > @Danilo, right, it's only theorical, but it's good to call it out since
> > > > > it's the requirement for a safe Rust abstraction.
> > > >
> > > > Similar to my previous reply, if we want to call this out as safety requirement
> > > > in `Device::from_raw`, we probably want to add it to the documentation of the C
> > > > `struct device`, such that we can argue that this is an invariant of C's
> > > > `struct device`.
> > > >
> > > > Otherwise we'd have to write something like:
> > > >
> > > > "It must also be ensured that the `->release` function of a `struct device` can
> > > > be called from any non-atomic context. While not being officially documented this
> > > > is guaranteed by the invariant of `struct device`."
> > >
> > > In the 20+ years of the driver model being part of the kernel, I don't
> > > think this has come up yet, so maybe you can call the release function
> > > in irq context. I don't know, I was just guessing :)
> >
> > Ah, I see. I thought you know and it's defined, but just not documented.
> >
> > This means it's simply undefined what we expect to happen when the last
> > reference of a device is dropped from atomic context.
> >
> > Now, I understand (and would even expect) that practically this has never been
> > an issue. You'd need two circumstances, release() actually does something that
> > is not allowed in atomic context plus the last device reference is dropped from
> > atomic context - rather unlikely.
> >
> > >
> > > So let's not go adding constraints that we just do not have please.
> > > Same goes for the C code, so the rust code is no different here.
> >
> > I agree we shouldn't add random constraints, but for writing safe code we also
> > have to rely on defined behavior.
>
> As the rust code is relying on C code that could change at any point in
> time, how can that ever be "safe"? :)
That's the same as with any other API. If the logic of an API is changed the
users (e.g a Rust abstraction) of the API have to be adjusted.
>
> Sorry, this type of definition annoys me.
>
> > I see two options:
> >
> > (1) We globally (for struct device) define from which context release() is
> > allowed to be called.
>
> If you want, feel free to do that work please. And then find out how to
> enforce it in the driver core.
If we *would* define non-atomic context only, we could enforce it with
might_sleep() for instance.
If we *would* define any context, there is nothing to enforce, but we'd need to
validate that no implementer of release() voids that.
The former is a constaint you don't want to add, the latter a lot of work. What
if we at least define that implementers of release() must *minimally* make sure
that it can be call from any non-atomic context.
That'd be something we can rely on in Rust.
>
> > (2) We define it for the Rust abstraction only and just constrain it to
> > non-atomic context to be able to give a safety guarantee. We can't say
> > "might be safe from any context, but we don't know".
>
> Why can't you say that? Your "saftey" barrier ends/begins at the
> interaction with the rust/c layer. You can't "guarantee" anything on
> the C side...
Not guarantee as in "technically enforce it", but guarantee as in "we have rules
that we follow".
The former would be best, but it isn't always possible. The latter we can always
do (and should IMHO).
>
> > But again, this is really just a formality, the C code does it all the way and
> > practically there never was an issue, which means we actually do follow some
> > rules, it's just about writing them down. :)
>
> Again, this has NEVER come up in 20+ years, so maybe it's just not an
> issue? Not to say it isn't, but maybe do some tests and see what
> happens...
Oh, I fully agree with that. Let me try to explain a bit what this is about:
In Rust we have the `Send` and `Sync` marker traits. If a type (e.g. `Device`)
implements `Send` it means that it's safe to pass an instance of this type
between threads. Which is clearly something we want to do with a `Device`.
If I don't implement `Sync` for `Device` the compiler will prevent me from
sending it between threads, e.g. by disallowing me to put an instance of
`Device` into another data structure that is potentially passed between threads.
If I implement `Sync` I have to add a safety comment on why it is safe to pass
`Device` between threads. And here we have what Boqun pointed out: `Device` can
only be passed between threads when we're allowed to drop the last reference
from any thread. In the case of the kernel this can be any non-atomic context,
any context or any other subset. But I have to write something here that is
a defined rule and can be relied on.
- Danilo
>
> thanks,
>
> greg k-h
>
On Wed, Jun 12, 2024 at 06:18:38PM +0200, Danilo Krummrich wrote:
> On Wed, Jun 12, 2024 at 05:50:42PM +0200, Greg KH wrote:
> > On Wed, Jun 12, 2024 at 05:35:21PM +0200, Danilo Krummrich wrote:
> > > On Wed, Jun 12, 2024 at 05:02:52PM +0200, Greg KH wrote:
> > > > On Wed, Jun 12, 2024 at 04:51:42PM +0200, Danilo Krummrich wrote:
> > > > > On 6/11/24 18:13, Boqun Feng wrote:
> > > > > > On Tue, Jun 11, 2024 at 03:29:22PM +0200, Greg KH wrote:
> > > > > > > On Tue, Jun 11, 2024 at 03:21:31PM +0200, Danilo Krummrich wrote:
> > > > > > > > ...hence, I agree we should indeed add to the #Invariants and #Safety section
> > > > > > > > that `->release` must be callable from any thread.
> > > > > > > >
> > > > > > > > However, this is just theory, do we actually have cases where `device::release`
> > > > > >
> > > > > > @Danilo, right, it's only theorical, but it's good to call it out since
> > > > > > it's the requirement for a safe Rust abstraction.
> > > > >
> > > > > Similar to my previous reply, if we want to call this out as safety requirement
> > > > > in `Device::from_raw`, we probably want to add it to the documentation of the C
> > > > > `struct device`, such that we can argue that this is an invariant of C's
> > > > > `struct device`.
> > > > >
> > > > > Otherwise we'd have to write something like:
> > > > >
> > > > > "It must also be ensured that the `->release` function of a `struct device` can
> > > > > be called from any non-atomic context. While not being officially documented this
> > > > > is guaranteed by the invariant of `struct device`."
> > > >
> > > > In the 20+ years of the driver model being part of the kernel, I don't
> > > > think this has come up yet, so maybe you can call the release function
> > > > in irq context. I don't know, I was just guessing :)
> > >
> > > Ah, I see. I thought you know and it's defined, but just not documented.
> > >
> > > This means it's simply undefined what we expect to happen when the last
> > > reference of a device is dropped from atomic context.
> > >
> > > Now, I understand (and would even expect) that practically this has never been
> > > an issue. You'd need two circumstances, release() actually does something that
> > > is not allowed in atomic context plus the last device reference is dropped from
> > > atomic context - rather unlikely.
> > >
> > > >
> > > > So let's not go adding constraints that we just do not have please.
> > > > Same goes for the C code, so the rust code is no different here.
> > >
> > > I agree we shouldn't add random constraints, but for writing safe code we also
> > > have to rely on defined behavior.
> >
> > As the rust code is relying on C code that could change at any point in
> > time, how can that ever be "safe"? :)
>
> That's the same as with any other API. If the logic of an API is changed the
> users (e.g a Rust abstraction) of the API have to be adjusted.
Agreed, just like any other in-kernel code, so there shouldn't be
anything special here.
> > Sorry, this type of definition annoys me.
> >
> > > I see two options:
> > >
> > > (1) We globally (for struct device) define from which context release() is
> > > allowed to be called.
> >
> > If you want, feel free to do that work please. And then find out how to
> > enforce it in the driver core.
>
> If we *would* define non-atomic context only, we could enforce it with
> might_sleep() for instance.
might_sleep() isn't always correct from what I remember.
> If we *would* define any context, there is nothing to enforce, but we'd need to
> validate that no implementer of release() voids that.
Trying to validate that might be hard, again, I don't think it's worth
it.
> The former is a constaint you don't want to add, the latter a lot of work. What
> if we at least define that implementers of release() must *minimally* make sure
> that it can be call from any non-atomic context.
>
> That'd be something we can rely on in Rust.
Determining if you are, or are not, in atomic context is almost
impossible in C, I don't know how you are going to do that at build time
in Rust. Good luck!
> Oh, I fully agree with that. Let me try to explain a bit what this is about:
>
> In Rust we have the `Send` and `Sync` marker traits. If a type (e.g. `Device`)
> implements `Send` it means that it's safe to pass an instance of this type
> between threads. Which is clearly something we want to do with a `Device`.
>
> If I don't implement `Sync` for `Device` the compiler will prevent me from
> sending it between threads, e.g. by disallowing me to put an instance of
> `Device` into another data structure that is potentially passed between threads.
>
> If I implement `Sync` I have to add a safety comment on why it is safe to pass
> `Device` between threads. And here we have what Boqun pointed out: `Device` can
> only be passed between threads when we're allowed to drop the last reference
> from any thread. In the case of the kernel this can be any non-atomic context,
> any context or any other subset. But I have to write something here that is
> a defined rule and can be relied on.
You really have two things here, a matrix of:
can transfer between threads
can call in irq context
that are independent and not related to each other at all.
Looks like Rust has built in support for the first. And nothing for the
second as that is a very kernel-specific thing.
So let's not confuse the two please. `Send` and `Sync` should be fine
for a device pointer to be passed around, as long as the reference is
incremented, as that's what all of the kernel C code does today. Let's
not worry about irq context at all, that's independent and can be
handled at a later time, if at all, with a different "marking" as it's
independent of the current two things.
thanks,
greg k-h
On Wed, Jun 12, 2024 at 07:13:31PM +0200, Greg KH wrote:
> On Wed, Jun 12, 2024 at 06:18:38PM +0200, Danilo Krummrich wrote:
> > On Wed, Jun 12, 2024 at 05:50:42PM +0200, Greg KH wrote:
> > > On Wed, Jun 12, 2024 at 05:35:21PM +0200, Danilo Krummrich wrote:
> > > > On Wed, Jun 12, 2024 at 05:02:52PM +0200, Greg KH wrote:
> > > > > On Wed, Jun 12, 2024 at 04:51:42PM +0200, Danilo Krummrich wrote:
> > > > > > On 6/11/24 18:13, Boqun Feng wrote:
> > > > > > > On Tue, Jun 11, 2024 at 03:29:22PM +0200, Greg KH wrote:
> > > > > > > > On Tue, Jun 11, 2024 at 03:21:31PM +0200, Danilo Krummrich wrote:
> > > > > > > > > ...hence, I agree we should indeed add to the #Invariants and #Safety section
> > > > > > > > > that `->release` must be callable from any thread.
> > > > > > > > >
> > > > > > > > > However, this is just theory, do we actually have cases where `device::release`
> > > > > > >
> > > > > > > @Danilo, right, it's only theorical, but it's good to call it out since
> > > > > > > it's the requirement for a safe Rust abstraction.
> > > > > >
> > > > > > Similar to my previous reply, if we want to call this out as safety requirement
> > > > > > in `Device::from_raw`, we probably want to add it to the documentation of the C
> > > > > > `struct device`, such that we can argue that this is an invariant of C's
> > > > > > `struct device`.
> > > > > >
> > > > > > Otherwise we'd have to write something like:
> > > > > >
> > > > > > "It must also be ensured that the `->release` function of a `struct device` can
> > > > > > be called from any non-atomic context. While not being officially documented this
> > > > > > is guaranteed by the invariant of `struct device`."
> > > > >
> > > > > In the 20+ years of the driver model being part of the kernel, I don't
> > > > > think this has come up yet, so maybe you can call the release function
> > > > > in irq context. I don't know, I was just guessing :)
> > > >
> > > > Ah, I see. I thought you know and it's defined, but just not documented.
> > > >
> > > > This means it's simply undefined what we expect to happen when the last
> > > > reference of a device is dropped from atomic context.
> > > >
> > > > Now, I understand (and would even expect) that practically this has never been
> > > > an issue. You'd need two circumstances, release() actually does something that
> > > > is not allowed in atomic context plus the last device reference is dropped from
> > > > atomic context - rather unlikely.
> > > >
> > > > >
> > > > > So let's not go adding constraints that we just do not have please.
> > > > > Same goes for the C code, so the rust code is no different here.
> > > >
> > > > I agree we shouldn't add random constraints, but for writing safe code we also
> > > > have to rely on defined behavior.
> > >
> > > As the rust code is relying on C code that could change at any point in
> > > time, how can that ever be "safe"? :)
> >
> > That's the same as with any other API. If the logic of an API is changed the
> > users (e.g a Rust abstraction) of the API have to be adjusted.
>
> Agreed, just like any other in-kernel code, so there shouldn't be
> anything special here.
>
> > > Sorry, this type of definition annoys me.
> > >
> > > > I see two options:
> > > >
> > > > (1) We globally (for struct device) define from which context release() is
> > > > allowed to be called.
> > >
> > > If you want, feel free to do that work please. And then find out how to
> > > enforce it in the driver core.
> >
> > If we *would* define non-atomic context only, we could enforce it with
> > might_sleep() for instance.
>
> might_sleep() isn't always correct from what I remember.
>
> > If we *would* define any context, there is nothing to enforce, but we'd need to
> > validate that no implementer of release() voids that.
>
> Trying to validate that might be hard, again, I don't think it's worth
> it.
>
> > The former is a constaint you don't want to add, the latter a lot of work. What
> > if we at least define that implementers of release() must *minimally* make sure
> > that it can be call from any non-atomic context.
> >
> > That'd be something we can rely on in Rust.
>
> Determining if you are, or are not, in atomic context is almost
> impossible in C, I don't know how you are going to do that at build time
> in Rust. Good luck!
Note, the real problem with calling release() on a device, is when you
do so from within the device's sysfs file. That causes all sorts of
"fun" times that people have had to work around over the years. It's
not something you probably need to worry about yet, as you are not
writing a bus subsystem in Rust yet, but it is something that can get
tricky very quickly. Look at the hoops that scsi has to deal with if
you want to see the gory details...
But this is all just "logic" bugs, not thread or irq issues at all.
Nothing is different, nor should be different, for code doing this in C
or in rust.
What I'm trying to get at is that you can't document all of the
"constraints" you are dealing with when handling C apis like this,
sorry. It's going to be hard, if not impossible, to worry about that at
this point in the api. It's better left to be documented where the real
call happens (i.e. in the C code), than attempting to keep it up to date
in the c<->rust binding point.
thanks,
greg k-h
On Wed, Jun 12, 2024 at 07:13:31PM +0200, Greg KH wrote:
> On Wed, Jun 12, 2024 at 06:18:38PM +0200, Danilo Krummrich wrote:
> > On Wed, Jun 12, 2024 at 05:50:42PM +0200, Greg KH wrote:
> > > On Wed, Jun 12, 2024 at 05:35:21PM +0200, Danilo Krummrich wrote:
> > > > On Wed, Jun 12, 2024 at 05:02:52PM +0200, Greg KH wrote:
> > > > > On Wed, Jun 12, 2024 at 04:51:42PM +0200, Danilo Krummrich wrote:
> > > > > > On 6/11/24 18:13, Boqun Feng wrote:
> > > > > > > On Tue, Jun 11, 2024 at 03:29:22PM +0200, Greg KH wrote:
> > > > > > > > On Tue, Jun 11, 2024 at 03:21:31PM +0200, Danilo Krummrich wrote:
> > > > > > > > > ...hence, I agree we should indeed add to the #Invariants and #Safety section
> > > > > > > > > that `->release` must be callable from any thread.
> > > > > > > > >
> > > > > > > > > However, this is just theory, do we actually have cases where `device::release`
> > > > > > >
> > > > > > > @Danilo, right, it's only theorical, but it's good to call it out since
> > > > > > > it's the requirement for a safe Rust abstraction.
> > > > > >
> > > > > > Similar to my previous reply, if we want to call this out as safety requirement
> > > > > > in `Device::from_raw`, we probably want to add it to the documentation of the C
> > > > > > `struct device`, such that we can argue that this is an invariant of C's
> > > > > > `struct device`.
> > > > > >
> > > > > > Otherwise we'd have to write something like:
> > > > > >
> > > > > > "It must also be ensured that the `->release` function of a `struct device` can
> > > > > > be called from any non-atomic context. While not being officially documented this
> > > > > > is guaranteed by the invariant of `struct device`."
> > > > >
> > > > > In the 20+ years of the driver model being part of the kernel, I don't
> > > > > think this has come up yet, so maybe you can call the release function
> > > > > in irq context. I don't know, I was just guessing :)
> > > >
> > > > Ah, I see. I thought you know and it's defined, but just not documented.
> > > >
> > > > This means it's simply undefined what we expect to happen when the last
> > > > reference of a device is dropped from atomic context.
> > > >
> > > > Now, I understand (and would even expect) that practically this has never been
> > > > an issue. You'd need two circumstances, release() actually does something that
> > > > is not allowed in atomic context plus the last device reference is dropped from
> > > > atomic context - rather unlikely.
> > > >
> > > > >
> > > > > So let's not go adding constraints that we just do not have please.
> > > > > Same goes for the C code, so the rust code is no different here.
> > > >
> > > > I agree we shouldn't add random constraints, but for writing safe code we also
> > > > have to rely on defined behavior.
> > >
> > > As the rust code is relying on C code that could change at any point in
> > > time, how can that ever be "safe"? :)
> >
> > That's the same as with any other API. If the logic of an API is changed the
> > users (e.g a Rust abstraction) of the API have to be adjusted.
>
> Agreed, just like any other in-kernel code, so there shouldn't be
> anything special here.
>
> > > Sorry, this type of definition annoys me.
> > >
> > > > I see two options:
> > > >
> > > > (1) We globally (for struct device) define from which context release() is
> > > > allowed to be called.
> > >
> > > If you want, feel free to do that work please. And then find out how to
> > > enforce it in the driver core.
> >
> > If we *would* define non-atomic context only, we could enforce it with
> > might_sleep() for instance.
>
> might_sleep() isn't always correct from what I remember.
>
> > If we *would* define any context, there is nothing to enforce, but we'd need to
> > validate that no implementer of release() voids that.
>
> Trying to validate that might be hard, again, I don't think it's worth
> it.
>
> > The former is a constaint you don't want to add, the latter a lot of work. What
> > if we at least define that implementers of release() must *minimally* make sure
> > that it can be call from any non-atomic context.
> >
> > That'd be something we can rely on in Rust.
>
> Determining if you are, or are not, in atomic context is almost
> impossible in C, I don't know how you are going to do that at build time
> in Rust. Good luck!
We can't always enforce things, but we can still follow rules even if we can't
enforce them with tools or language features.
In particular, nothing prevents C code to define from which context things are
allowed to be called. When I was working on drm_gpuvm [1] this was one of the
major aspects I had to get right and documented; keyword dma_fence signalling
critical sections.
[1] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gpuvm.c
>
> > Oh, I fully agree with that. Let me try to explain a bit what this is about:
> >
> > In Rust we have the `Send` and `Sync` marker traits. If a type (e.g. `Device`)
> > implements `Send` it means that it's safe to pass an instance of this type
> > between threads. Which is clearly something we want to do with a `Device`.
> >
> > If I don't implement `Sync` for `Device` the compiler will prevent me from
> > sending it between threads, e.g. by disallowing me to put an instance of
> > `Device` into another data structure that is potentially passed between threads.
> >
> > If I implement `Sync` I have to add a safety comment on why it is safe to pass
> > `Device` between threads. And here we have what Boqun pointed out: `Device` can
> > only be passed between threads when we're allowed to drop the last reference
> > from any thread. In the case of the kernel this can be any non-atomic context,
> > any context or any other subset. But I have to write something here that is
> > a defined rule and can be relied on.
>
> You really have two things here, a matrix of:
> can transfer between threads
> can call in irq context
> that are independent and not related to each other at all.
>
> Looks like Rust has built in support for the first. And nothing for the
> second as that is a very kernel-specific thing.
The language documentation defines `Send` as "can be transferred between
threads", likely because it's written from a userspace perspective. But in
the kernel context it actually means can be transferred between any context,
thread, IRQ, etc.
If this isn't true, then we have to add a comment what is allowed (e.g. any
non-atomic context) and what's not allowed.
>
> So let's not confuse the two please. `Send` and `Sync` should be fine
> for a device pointer to be passed around, as long as the reference is
> incremented, as that's what all of the kernel C code does today. Let's
> not worry about irq context at all, that's independent and can be
> handled at a later time, if at all, with a different "marking" as it's
> independent of the current two things.
That'd be great, but as mentioned above, we only have `Send`, but nothing like
`SendIrq`, hence `Send` really means any context.
Given your proposal, to just say it's fine to pass between (actual) threads and
ignore IRQ context for now, we have to implement `Send`, but document that IRQ
context is not covered.
We can either do that in the Rust abstraction as safety requirement, or we can,
as proposed previously, add a comment to the C `struct device` documentation
that implementers of release() must *at least* make sure that it can be called
from any non-atomic context. We can then refer to that.
- Danilo
>
> thanks,
>
> greg k-h
>
On Wed, Jun 12, 2024 at 10:56:43PM +0200, Danilo Krummrich wrote:
> > > Oh, I fully agree with that. Let me try to explain a bit what this is about:
> > >
> > > In Rust we have the `Send` and `Sync` marker traits. If a type (e.g. `Device`)
> > > implements `Send` it means that it's safe to pass an instance of this type
> > > between threads. Which is clearly something we want to do with a `Device`.
> > >
> > > If I don't implement `Sync` for `Device` the compiler will prevent me from
> > > sending it between threads, e.g. by disallowing me to put an instance of
> > > `Device` into another data structure that is potentially passed between threads.
> > >
> > > If I implement `Sync` I have to add a safety comment on why it is safe to pass
> > > `Device` between threads. And here we have what Boqun pointed out: `Device` can
> > > only be passed between threads when we're allowed to drop the last reference
> > > from any thread. In the case of the kernel this can be any non-atomic context,
> > > any context or any other subset. But I have to write something here that is
> > > a defined rule and can be relied on.
> >
> > You really have two things here, a matrix of:
> > can transfer between threads
> > can call in irq context
> > that are independent and not related to each other at all.
> >
> > Looks like Rust has built in support for the first. And nothing for the
> > second as that is a very kernel-specific thing.
>
> The language documentation defines `Send` as "can be transferred between
> threads", likely because it's written from a userspace perspective. But in
> the kernel context it actually means can be transferred between any context,
> thread, IRQ, etc.
>
> If this isn't true, then we have to add a comment what is allowed (e.g. any
> non-atomic context) and what's not allowed.
That isn't good, you are going to have to change how `Send` works here
if you think it's safe to do stuff in all of those different contexts,
as that is almost never going to be true.
Why not just stick with "accessed from another thread" and leave it at
that, as again, the combinations here are a matrix, not a boolean yes/no
type of thing.
> > So let's not confuse the two please. `Send` and `Sync` should be fine
> > for a device pointer to be passed around, as long as the reference is
> > incremented, as that's what all of the kernel C code does today. Let's
> > not worry about irq context at all, that's independent and can be
> > handled at a later time, if at all, with a different "marking" as it's
> > independent of the current two things.
>
> That'd be great, but as mentioned above, we only have `Send`, but nothing like
> `SendIrq`, hence `Send` really means any context.
>
> Given your proposal, to just say it's fine to pass between (actual) threads and
> ignore IRQ context for now, we have to implement `Send`, but document that IRQ
> context is not covered.
>
> We can either do that in the Rust abstraction as safety requirement, or we can,
> as proposed previously, add a comment to the C `struct device` documentation
> that implementers of release() must *at least* make sure that it can be called
> from any non-atomic context. We can then refer to that.
As someone who has written comments in code for functions that are
always ignored, I am happy to see you think that this is actually going
to do anything :)
Heck, I have put huge messages in kernel code for when people implement
the api wrong[1], and they still ignore that at runtime. Only way to get
it noticed is if you break someone's build.
So you all need to really define what `Send` means, as it sounds like a
userspace thing that isn't going to fit in well within the kernel model.
thanks,
greg k-h
[1] See the pr_debug() calls in kobject_cleanup() as proof, people just
create an "empty" release function to shut them up, thinking that is
the correct solution when obviously it is not...
On Thu, Jun 13, 2024 at 07:47:13AM +0200, Greg KH wrote:
> On Wed, Jun 12, 2024 at 10:56:43PM +0200, Danilo Krummrich wrote:
> > > > Oh, I fully agree with that. Let me try to explain a bit what this is about:
> > > >
> > > > In Rust we have the `Send` and `Sync` marker traits. If a type (e.g. `Device`)
> > > > implements `Send` it means that it's safe to pass an instance of this type
> > > > between threads. Which is clearly something we want to do with a `Device`.
> > > >
> > > > If I don't implement `Sync` for `Device` the compiler will prevent me from
> > > > sending it between threads, e.g. by disallowing me to put an instance of
> > > > `Device` into another data structure that is potentially passed between threads.
> > > >
> > > > If I implement `Sync` I have to add a safety comment on why it is safe to pass
> > > > `Device` between threads. And here we have what Boqun pointed out: `Device` can
> > > > only be passed between threads when we're allowed to drop the last reference
> > > > from any thread. In the case of the kernel this can be any non-atomic context,
> > > > any context or any other subset. But I have to write something here that is
> > > > a defined rule and can be relied on.
> > >
> > > You really have two things here, a matrix of:
> > > can transfer between threads
> > > can call in irq context
> > > that are independent and not related to each other at all.
> > >
> > > Looks like Rust has built in support for the first. And nothing for the
> > > second as that is a very kernel-specific thing.
> >
> > The language documentation defines `Send` as "can be transferred between
> > threads", likely because it's written from a userspace perspective. But in
> > the kernel context it actually means can be transferred between any context,
> > thread, IRQ, etc.
> >
> > If this isn't true, then we have to add a comment what is allowed (e.g. any
> > non-atomic context) and what's not allowed.
>
> That isn't good, you are going to have to change how `Send` works here
> if you think it's safe to do stuff in all of those different contexts,
> as that is almost never going to be true.
>
> Why not just stick with "accessed from another thread" and leave it at
> that, as again, the combinations here are a matrix, not a boolean yes/no
> type of thing.
I probably didn't phrase this very well, let me try again.
What the compiler can do for us currently is to check whether a data structure
is kept only in it's current context or whether it is send to another one - it
can't distinguish context types though.
So, the actual definition of `Send` isn't really "can be send across threads",
but "is not restricted to be kept in a single context".
This means that if something is "is not restricted to be kept in a single
context", but limited to certain context type, we, unfortunately, just have to
make it `Send`, but document the restriction.
>
> > > So let's not confuse the two please. `Send` and `Sync` should be fine
> > > for a device pointer to be passed around, as long as the reference is
> > > incremented, as that's what all of the kernel C code does today. Let's
> > > not worry about irq context at all, that's independent and can be
> > > handled at a later time, if at all, with a different "marking" as it's
> > > independent of the current two things.
> >
> > That'd be great, but as mentioned above, we only have `Send`, but nothing like
> > `SendIrq`, hence `Send` really means any context.
> >
> > Given your proposal, to just say it's fine to pass between (actual) threads and
> > ignore IRQ context for now, we have to implement `Send`, but document that IRQ
> > context is not covered.
> >
> > We can either do that in the Rust abstraction as safety requirement, or we can,
> > as proposed previously, add a comment to the C `struct device` documentation
> > that implementers of release() must *at least* make sure that it can be called
> > from any non-atomic context. We can then refer to that.
>
> As someone who has written comments in code for functions that are
> always ignored, I am happy to see you think that this is actually going
> to do anything :)
One advantage we have in Rust is, that if something has a specific requirement
we mark it as `unsafe` and the user of the API has to put it in an `unsafe`
block, which we require a sfety comment for, where the user of the API has to
explain why this is the correct thing to do in this case.
In other words we can technically enforce that people read the comment and
explain how they ensure to fulfill what the comment asks for.
>
> Heck, I have put huge messages in kernel code for when people implement
> the api wrong[1], and they still ignore that at runtime. Only way to get
> it noticed is if you break someone's build.
You only see the ones who still do it wrong. You probably don't have visibility
of the ones who did it right due to your effort to write it down. :-)
>
> So you all need to really define what `Send` means, as it sounds like a
> userspace thing that isn't going to fit in well within the kernel model.
Please see the explanation above.
>
> thanks,
>
> greg k-h
>
> [1] See the pr_debug() calls in kobject_cleanup() as proof, people just
> create an "empty" release function to shut them up, thinking that is
> the correct solution when obviously it is not...
>
On Thu, 2024-06-13 at 14:22 +0200, Danilo Krummrich wrote:
> On Thu, Jun 13, 2024 at 07:47:13AM +0200, Greg KH wrote:
> > On Wed, Jun 12, 2024 at 10:56:43PM +0200, Danilo Krummrich wrote:
> >
> > As someone who has written comments in code for functions that are
> > always ignored, I am happy to see you think that this is actually
> > going
> > to do anything :)
>
> One advantage we have in Rust is, that if something has a specific
> requirement
> we mark it as `unsafe` and the user of the API has to put it in an
> `unsafe`
> block, which we require a sfety comment for, where the user of the
> API has to
> explain why this is the correct thing to do in this case.
>
> In other words we can technically enforce that people read the
> comment and
> explain how they ensure to fulfill what the comment asks for.
Been going through this thread to catch up on the status of the patches
here: I've been working with danilo for a while, but have been busy
with kernel modesetting bindings for rust. Anyway, as someone who's
spent quite a lot of time writing up really detailed documentation for
some of the DisplayPort MST kernel APIs only to keep running into
situations where said documentation was ignored (not to criticize
anyone! I'm certainly guilty of having misread documentation or not
reading something myself :) I like to think about unsafe blocks in the
sense of being one thing I'm less likely to have to keep repeating
myself about over a mailing list. Because I know at the very least that
a contributor has to actively ignore a compiler warning or be aware in
some sense that there is some kind of safety constraint they're being
held responsible for that will come up during code review. If it's just
a comment in C code, I don't even have a guarantee that a contributor's
IDE isn't hiding large kdoc comments by default.
Also, in regards to rust code relying on C code not being guaranteed
safe even if it is 'safe' code: frankly, sometimes I wish "unsafe" was
just named "danger". "un-safe" implies code outside the block is
perfectly "safe" - but it is still possible to introduce UB in entirely
safe code bases (source: cve-rs) and most rust contributors I've talked
to don't see safe code as being perfectly-safe either.
>
> >
> > Heck, I have put huge messages in kernel code for when people
> > implement
> > the api wrong[1], and they still ignore that at runtime. Only way
> > to get
> > it noticed is if you break someone's build.
>
> You only see the ones who still do it wrong. You probably don't have
> visibility
> of the ones who did it right due to your effort to write it down. :-)
>
> >
> > So you all need to really define what `Send` means, as it sounds
> > like a
> > userspace thing that isn't going to fit in well within the kernel
> > model.
>
> Please see the explanation above.
>
> >
> > thanks,
> >
> > greg k-h
> >
> > [1] See the pr_debug() calls in kobject_cleanup() as proof, people
> > just
> > create an "empty" release function to shut them up, thinking
> > that is
> > the correct solution when obviously it is not...
> >
>
--
Cheers,
Lyude Paul (she/her)
Software Engineer at Red Hat