2024-02-14 17:57:14

by Danilo Krummrich

[permalink] [raw]
Subject: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString

Add functions to convert a CString to upper- / lowercase, either
in-place or by creating a copy of the original CString.

Naming followes the one from the Rust stdlib, where functions starting
with 'to' create a copy and functions starting with 'make' perform an
in-place conversion.

This is required by the Nova project (GSP only Rust successor of
Nouveau) to convert stringified enum values (representing different GPU
chipsets) to strings in order to generate the corresponding firmware
paths. See also [1].

[1] https://rust-for-linux.zulipchat.com/#narrow/stream/288089-General/topic/String.20manipulation.20in.20kernel.20Rust

Signed-off-by: Danilo Krummrich <[email protected]>
---
Changes in V3:
- add an `impl DerefMut for CString`, such that these functions can be defined
for `CStr` as `&mut self` and still be called on a `CString`
Changes in V2:
- expand commit message mentioning the use case
- expand function doc comments to match the ones from Rust's stdlib
- rename to_* to make_* and add the actual to_* implementations
---
rust/kernel/str.rs | 81 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 80 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
index 7d848b83add4..02d6e510b852 100644
--- a/rust/kernel/str.rs
+++ b/rust/kernel/str.rs
@@ -5,7 +5,7 @@
use alloc::alloc::AllocError;
use alloc::vec::Vec;
use core::fmt::{self, Write};
-use core::ops::{self, Deref, Index};
+use core::ops::{self, Deref, DerefMut, Index};

use crate::{
bindings,
@@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
unsafe { core::mem::transmute(bytes) }
}

+ /// Creates a mutable [`CStr`] from a `[u8]` without performing any
+ /// additional checks.
+ ///
+ /// # Safety
+ ///
+ /// `bytes` *must* end with a `NUL` byte, and should only have a single
+ /// `NUL` byte (or the string will be truncated).
+ #[inline]
+ pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
+ // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
+ unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
+ }
+
/// Returns a C pointer to the string.
#[inline]
pub const fn as_char_ptr(&self) -> *const core::ffi::c_char {
@@ -206,6 +219,32 @@ pub unsafe fn as_str_unchecked(&self) -> &str {
pub fn to_cstring(&self) -> Result<CString, AllocError> {
CString::try_from(self)
}
+
+ /// Converts this [`CStr`] to its ASCII lower case equivalent in-place.
+ ///
+ /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
+ /// but non-ASCII letters are unchanged.
+ ///
+ /// To return a new lowercased value without modifying the existing one, use
+ /// [`to_ascii_lowercase()`].
+ ///
+ /// [`to_ascii_lowercase()`]: #method.to_ascii_lowercase
+ pub fn make_ascii_lowercase(&mut self) {
+ self.0.make_ascii_lowercase();
+ }
+
+ /// Converts this [`CStr`] to its ASCII upper case equivalent in-place.
+ ///
+ /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
+ /// but non-ASCII letters are unchanged.
+ ///
+ /// To return a new uppercased value without modifying the existing one, use
+ /// [`to_ascii_uppercase()`].
+ ///
+ /// [`to_ascii_uppercase()`]: #method.to_ascii_uppercase
+ pub fn make_ascii_uppercase(&mut self) {
+ self.0.make_ascii_uppercase();
+ }
}

impl fmt::Display for CStr {
@@ -581,6 +620,40 @@ pub fn try_from_fmt(args: fmt::Arguments<'_>) -> Result<Self, Error> {
// exist in the buffer.
Ok(Self { buf })
}
+
+ /// Returns a copy of this [`CString`] where each character is mapped to its
+ /// ASCII lower case equivalent.
+ ///
+ /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
+ /// but non-ASCII letters are unchanged.
+ ///
+ /// To lowercase the value in-place, use [`make_ascii_lowercase`].
+ ///
+ /// [`make_ascii_lowercase`]: str::make_ascii_lowercase
+ pub fn to_ascii_lowercase(&self) -> Result<CString, AllocError> {
+ let mut s = (*self).to_cstring()?;
+
+ s.make_ascii_lowercase();
+
+ return Ok(s);
+ }
+
+ /// Returns a copy of this [`CString`] where each character is mapped to its
+ /// ASCII upper case equivalent.
+ ///
+ /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
+ /// but non-ASCII letters are unchanged.
+ ///
+ /// To uppercase the value in-place, use [`make_ascii_uppercase`].
+ ///
+ /// [`make_ascii_uppercase`]: str::make_ascii_uppercase
+ pub fn to_ascii_uppercase(&self) -> Result<CString, AllocError> {
+ let mut s = (*self).to_cstring()?;
+
+ s.make_ascii_uppercase();
+
+ return Ok(s);
+ }
}

impl Deref for CString {
@@ -593,6 +666,12 @@ fn deref(&self) -> &Self::Target {
}
}

+impl DerefMut for CString {
+ fn deref_mut(&mut self) -> &mut Self::Target {
+ unsafe { CStr::from_bytes_with_nul_unchecked_mut(&mut *self.buf) }
+ }
+}
+
impl<'a> TryFrom<&'a CStr> for CString {
type Error = AllocError;


base-commit: 7e90b5c295ec1e47c8ad865429f046970c549a66
--
2.43.0



2024-02-14 19:29:49

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString

On Wed, Feb 14, 2024 at 06:24:10PM +0100, Danilo Krummrich wrote:
> Add functions to convert a CString to upper- / lowercase, either
> in-place or by creating a copy of the original CString.
>
> Naming followes the one from the Rust stdlib, where functions starting
> with 'to' create a copy and functions starting with 'make' perform an
> in-place conversion.
>
> This is required by the Nova project (GSP only Rust successor of
> Nouveau) to convert stringified enum values (representing different GPU
> chipsets) to strings in order to generate the corresponding firmware
> paths. See also [1].
>
> [1] https://rust-for-linux.zulipchat.com/#narrow/stream/288089-General/topic/String.20manipulation.20in.20kernel.20Rust
>
> Signed-off-by: Danilo Krummrich <[email protected]>
> ---
> Changes in V3:
> - add an `impl DerefMut for CString`, such that these functions can be defined
> for `CStr` as `&mut self` and still be called on a `CString`
> Changes in V2:
> - expand commit message mentioning the use case
> - expand function doc comments to match the ones from Rust's stdlib
> - rename to_* to make_* and add the actual to_* implementations
> ---
> rust/kernel/str.rs | 81 +++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 80 insertions(+), 1 deletion(-)
>
> diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
> index 7d848b83add4..02d6e510b852 100644
> --- a/rust/kernel/str.rs
> +++ b/rust/kernel/str.rs
> @@ -5,7 +5,7 @@
> use alloc::alloc::AllocError;
> use alloc::vec::Vec;
> use core::fmt::{self, Write};
> -use core::ops::{self, Deref, Index};
> +use core::ops::{self, Deref, DerefMut, Index};
>
> use crate::{
> bindings,
> @@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
> unsafe { core::mem::transmute(bytes) }
> }
>
> + /// Creates a mutable [`CStr`] from a `[u8]` without performing any
> + /// additional checks.
> + ///
> + /// # Safety
> + ///
> + /// `bytes` *must* end with a `NUL` byte, and should only have a single
> + /// `NUL` byte (or the string will be truncated).
> + #[inline]
> + pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> + // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
> + unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }

First `.cast::<[u8]>().cast::<CStr>()` is preferred than `as`. Besides,
I think the dereference (or reborrow) is only safe if `CStr` is
`#[repr(transparent)]. I.e.

#[repr(transparent)]
pub struct CStr([u8]);

with that you can implement the function as (you can still use `cast()`
implementation, but I sometimes find `transmute` is more simple).

pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
// SAFETY: `CStr` is transparent to `[u8]`, so the transmute is
// safe to do, and per the function safety requirement, `bytes`
// is a valid `CStr`.
unsafe { core::mem::transmute(bytes) }
}

but this is just my thought, better wait for others' feedback as well.

Regards,
Boqun

2024-02-14 20:36:46

by Alice Ryhl

[permalink] [raw]
Subject: Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString

On 2/14/24 20:27, Boqun Feng wrote:
> On Wed, Feb 14, 2024 at 06:24:10PM +0100, Danilo Krummrich wrote:
>> Add functions to convert a CString to upper- / lowercase, either
>> in-place or by creating a copy of the original CString.
>>
>> Naming followes the one from the Rust stdlib, where functions starting
>> with 'to' create a copy and functions starting with 'make' perform an
>> in-place conversion.
>>
>> This is required by the Nova project (GSP only Rust successor of
>> Nouveau) to convert stringified enum values (representing different GPU
>> chipsets) to strings in order to generate the corresponding firmware
>> paths. See also [1].
>>
>> [1] https://rust-for-linux.zulipchat.com/#narrow/stream/288089-General/topic/String.20manipulation.20in.20kernel.20Rust
>>
>> Signed-off-by: Danilo Krummrich <[email protected]>
>> ---
>> Changes in V3:
>> - add an `impl DerefMut for CString`, such that these functions can be defined
>> for `CStr` as `&mut self` and still be called on a `CString`
>> Changes in V2:
>> - expand commit message mentioning the use case
>> - expand function doc comments to match the ones from Rust's stdlib
>> - rename to_* to make_* and add the actual to_* implementations
>> ---
>> rust/kernel/str.rs | 81 +++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 80 insertions(+), 1 deletion(-)
>>
>> diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
>> index 7d848b83add4..02d6e510b852 100644
>> --- a/rust/kernel/str.rs
>> +++ b/rust/kernel/str.rs
>> @@ -5,7 +5,7 @@
>> use alloc::alloc::AllocError;
>> use alloc::vec::Vec;
>> use core::fmt::{self, Write};
>> -use core::ops::{self, Deref, Index};
>> +use core::ops::{self, Deref, DerefMut, Index};
>>
>> use crate::{
>> bindings,
>> @@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
>> unsafe { core::mem::transmute(bytes) }
>> }
>>
>> + /// Creates a mutable [`CStr`] from a `[u8]` without performing any
>> + /// additional checks.
>> + ///
>> + /// # Safety
>> + ///
>> + /// `bytes` *must* end with a `NUL` byte, and should only have a single
>> + /// `NUL` byte (or the string will be truncated).
>> + #[inline]
>> + pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
>> + // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
>> + unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
>
> First `.cast::<[u8]>().cast::<CStr>()` is preferred than `as`. Besides,
> I think the dereference (or reborrow) is only safe if `CStr` is
> `#[repr(transparent)]. I.e.
>
> #[repr(transparent)]
> pub struct CStr([u8]);
>
> with that you can implement the function as (you can still use `cast()`
> implementation, but I sometimes find `transmute` is more simple).
>
> pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> // SAFETY: `CStr` is transparent to `[u8]`, so the transmute is
> // safe to do, and per the function safety requirement, `bytes`
> // is a valid `CStr`.
> unsafe { core::mem::transmute(bytes) }
> }
>
> but this is just my thought, better wait for others' feedback as well.

Transmuting references is generally frowned upon. It's better to use a
pointer cast.

As for .cast() vs the `as` operator, I'm not sure you can use .cast() in
this case since the pointers are unsized. So you might have to use `as`
instead.

Alice

2024-02-15 01:18:33

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString

On Wed, Feb 14, 2024 at 08:59:06PM +0100, Alice Ryhl wrote:
> On 2/14/24 20:27, Boqun Feng wrote:
> > On Wed, Feb 14, 2024 at 06:24:10PM +0100, Danilo Krummrich wrote:
> > > Add functions to convert a CString to upper- / lowercase, either
> > > in-place or by creating a copy of the original CString.
> > >
> > > Naming followes the one from the Rust stdlib, where functions starting
> > > with 'to' create a copy and functions starting with 'make' perform an
> > > in-place conversion.
> > >
> > > This is required by the Nova project (GSP only Rust successor of
> > > Nouveau) to convert stringified enum values (representing different GPU
> > > chipsets) to strings in order to generate the corresponding firmware
> > > paths. See also [1].
> > >
> > > [1] https://rust-for-linux.zulipchat.com/#narrow/stream/288089-General/topic/String.20manipulation.20in.20kernel.20Rust
> > >
> > > Signed-off-by: Danilo Krummrich <[email protected]>
> > > ---
> > > Changes in V3:
> > > - add an `impl DerefMut for CString`, such that these functions can be defined
> > > for `CStr` as `&mut self` and still be called on a `CString`
> > > Changes in V2:
> > > - expand commit message mentioning the use case
> > > - expand function doc comments to match the ones from Rust's stdlib
> > > - rename to_* to make_* and add the actual to_* implementations
> > > ---
> > > rust/kernel/str.rs | 81 +++++++++++++++++++++++++++++++++++++++++++++-
> > > 1 file changed, 80 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
> > > index 7d848b83add4..02d6e510b852 100644
> > > --- a/rust/kernel/str.rs
> > > +++ b/rust/kernel/str.rs
> > > @@ -5,7 +5,7 @@
> > > use alloc::alloc::AllocError;
> > > use alloc::vec::Vec;
> > > use core::fmt::{self, Write};
> > > -use core::ops::{self, Deref, Index};
> > > +use core::ops::{self, Deref, DerefMut, Index};
> > > use crate::{
> > > bindings,
> > > @@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
> > > unsafe { core::mem::transmute(bytes) }
> > > }
> > > + /// Creates a mutable [`CStr`] from a `[u8]` without performing any
> > > + /// additional checks.
> > > + ///
> > > + /// # Safety
> > > + ///
> > > + /// `bytes` *must* end with a `NUL` byte, and should only have a single
> > > + /// `NUL` byte (or the string will be truncated).
> > > + #[inline]
> > > + pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > + // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
> > > + unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
> >
> > First `.cast::<[u8]>().cast::<CStr>()` is preferred than `as`. Besides,
> > I think the dereference (or reborrow) is only safe if `CStr` is
> > `#[repr(transparent)]. I.e.
> >
> > #[repr(transparent)]
> > pub struct CStr([u8]);
> >
> > with that you can implement the function as (you can still use `cast()`
> > implementation, but I sometimes find `transmute` is more simple).
> >
> > pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > // SAFETY: `CStr` is transparent to `[u8]`, so the transmute is
> > // safe to do, and per the function safety requirement, `bytes`
> > // is a valid `CStr`.
> > unsafe { core::mem::transmute(bytes) }
> > }
> >
> > but this is just my thought, better wait for others' feedback as well.
>
> Transmuting references is generally frowned upon. It's better to use a
> pointer cast.
>

Ok, but honestly, I don't think the pointer casting is better ;-) What
wants to be done here is simply converting a `&mut [u8]` to `&mut CStr`,
adding two levels of pointer casting is kinda noise. (Also
`from_bytes_with_nul` uses `transmute` as well).

> As for .cast() vs the `as` operator, I'm not sure you can use .cast() in
> this case since the pointers are unsized. So you might have to use `as`
> instead.
>

You're right, that's a bit unfortunate..

Regards,
Boqun

> Alice

2024-02-15 10:01:55

by Alice Ryhl

[permalink] [raw]
Subject: Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString

On Thu, Feb 15, 2024 at 2:18 AM Boqun Feng <[email protected]> wrote:
>
> On Wed, Feb 14, 2024 at 08:59:06PM +0100, Alice Ryhl wrote:
> > On 2/14/24 20:27, Boqun Feng wrote:
> > > On Wed, Feb 14, 2024 at 06:24:10PM +0100, Danilo Krummrich wrote:
> > > > --- a/rust/kernel/str.rs
> > > > +++ b/rust/kernel/str.rs
> > > > @@ -5,7 +5,7 @@
> > > > use alloc::alloc::AllocError;
> > > > use alloc::vec::Vec;
> > > > use core::fmt::{self, Write};
> > > > -use core::ops::{self, Deref, Index};
> > > > +use core::ops::{self, Deref, DerefMut, Index};
> > > > use crate::{
> > > > bindings,
> > > > @@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
> > > > unsafe { core::mem::transmute(bytes) }
> > > > }
> > > > + /// Creates a mutable [`CStr`] from a `[u8]` without performing any
> > > > + /// additional checks.
> > > > + ///
> > > > + /// # Safety
> > > > + ///
> > > > + /// `bytes` *must* end with a `NUL` byte, and should only have a single
> > > > + /// `NUL` byte (or the string will be truncated).
> > > > + #[inline]
> > > > + pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > > + // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
> > > > + unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
> > >
> > > First `.cast::<[u8]>().cast::<CStr>()` is preferred than `as`. Besides,
> > > I think the dereference (or reborrow) is only safe if `CStr` is
> > > `#[repr(transparent)]. I.e.
> > >
> > > #[repr(transparent)]
> > > pub struct CStr([u8]);
> > >
> > > with that you can implement the function as (you can still use `cast()`
> > > implementation, but I sometimes find `transmute` is more simple).
> > >
> > > pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > // SAFETY: `CStr` is transparent to `[u8]`, so the transmute is
> > > // safe to do, and per the function safety requirement, `bytes`
> > > // is a valid `CStr`.
> > > unsafe { core::mem::transmute(bytes) }
> > > }
> > >
> > > but this is just my thought, better wait for others' feedback as well.
> >
> > Transmuting references is generally frowned upon. It's better to use a
> > pointer cast.
> >
>
> Ok, but honestly, I don't think the pointer casting is better ;-) What
> wants to be done here is simply converting a `&mut [u8]` to `&mut CStr`,
> adding two levels of pointer casting is kinda noise. (Also
> `from_bytes_with_nul` uses `transmute` as well).

Here's my logic for preferring pointer casts: Transmute raises
questions about the layout of fat pointers, whereas pointer casts are
obviously okay.

Alice

2024-02-15 16:52:06

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString

On Thu, Feb 15, 2024 at 10:38:07AM +0100, Alice Ryhl wrote:
> On Thu, Feb 15, 2024 at 2:18 AM Boqun Feng <[email protected]> wrote:
> >
> > On Wed, Feb 14, 2024 at 08:59:06PM +0100, Alice Ryhl wrote:
> > > On 2/14/24 20:27, Boqun Feng wrote:
> > > > On Wed, Feb 14, 2024 at 06:24:10PM +0100, Danilo Krummrich wrote:
> > > > > --- a/rust/kernel/str.rs
> > > > > +++ b/rust/kernel/str.rs
> > > > > @@ -5,7 +5,7 @@
> > > > > use alloc::alloc::AllocError;
> > > > > use alloc::vec::Vec;
> > > > > use core::fmt::{self, Write};
> > > > > -use core::ops::{self, Deref, Index};
> > > > > +use core::ops::{self, Deref, DerefMut, Index};
> > > > > use crate::{
> > > > > bindings,
> > > > > @@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
> > > > > unsafe { core::mem::transmute(bytes) }
> > > > > }
> > > > > + /// Creates a mutable [`CStr`] from a `[u8]` without performing any
> > > > > + /// additional checks.
> > > > > + ///
> > > > > + /// # Safety
> > > > > + ///
> > > > > + /// `bytes` *must* end with a `NUL` byte, and should only have a single
> > > > > + /// `NUL` byte (or the string will be truncated).
> > > > > + #[inline]
> > > > > + pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > > > + // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
> > > > > + unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
> > > >
> > > > First `.cast::<[u8]>().cast::<CStr>()` is preferred than `as`. Besides,
> > > > I think the dereference (or reborrow) is only safe if `CStr` is
> > > > `#[repr(transparent)]. I.e.
> > > >
> > > > #[repr(transparent)]
> > > > pub struct CStr([u8]);
> > > >
> > > > with that you can implement the function as (you can still use `cast()`
> > > > implementation, but I sometimes find `transmute` is more simple).
> > > >
> > > > pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > > // SAFETY: `CStr` is transparent to `[u8]`, so the transmute is
> > > > // safe to do, and per the function safety requirement, `bytes`
> > > > // is a valid `CStr`.
> > > > unsafe { core::mem::transmute(bytes) }
> > > > }
> > > >
> > > > but this is just my thought, better wait for others' feedback as well.
> > >
> > > Transmuting references is generally frowned upon. It's better to use a
> > > pointer cast.
> > >
> >
> > Ok, but honestly, I don't think the pointer casting is better ;-) What
> > wants to be done here is simply converting a `&mut [u8]` to `&mut CStr`,
> > adding two levels of pointer casting is kinda noise. (Also
> > `from_bytes_with_nul` uses `transmute` as well).
>
> Here's my logic for preferring pointer casts: Transmute raises
> questions about the layout of fat pointers, whereas pointer casts are
> obviously okay.
>

But in this case, eventually you need to worry about fat pointer layout
when you dereference the `*mut CStr`, right? In other words, the
dereference is only safe if `*mut [u8]` has the same fat pointer layout
as `*mut CStr`. I prefer to transmute here because it's a newtype
paradigm, and transmute kinda makes that clear.

Regards,
Boqun

> Alice

2024-02-16 09:14:32

by Alice Ryhl

[permalink] [raw]
Subject: Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString

On Thu, Feb 15, 2024 at 5:51 PM Boqun Feng <[email protected]> wrote:
>
> On Thu, Feb 15, 2024 at 10:38:07AM +0100, Alice Ryhl wrote:
> > On Thu, Feb 15, 2024 at 2:18 AM Boqun Feng <[email protected]> wrote:
> > >
> > > On Wed, Feb 14, 2024 at 08:59:06PM +0100, Alice Ryhl wrote:
> > > > On 2/14/24 20:27, Boqun Feng wrote:
> > > > > On Wed, Feb 14, 2024 at 06:24:10PM +0100, Danilo Krummrich wrote:
> > > > > > --- a/rust/kernel/str.rs
> > > > > > +++ b/rust/kernel/str.rs
> > > > > > @@ -5,7 +5,7 @@
> > > > > > use alloc::alloc::AllocError;
> > > > > > use alloc::vec::Vec;
> > > > > > use core::fmt::{self, Write};
> > > > > > -use core::ops::{self, Deref, Index};
> > > > > > +use core::ops::{self, Deref, DerefMut, Index};
> > > > > > use crate::{
> > > > > > bindings,
> > > > > > @@ -143,6 +143,19 @@ pub const fn from_bytes_with_nul(bytes: &[u8]) -> Result<&Self, CStrConvertError
> > > > > > unsafe { core::mem::transmute(bytes) }
> > > > > > }
> > > > > > + /// Creates a mutable [`CStr`] from a `[u8]` without performing any
> > > > > > + /// additional checks.
> > > > > > + ///
> > > > > > + /// # Safety
> > > > > > + ///
> > > > > > + /// `bytes` *must* end with a `NUL` byte, and should only have a single
> > > > > > + /// `NUL` byte (or the string will be truncated).
> > > > > > + #[inline]
> > > > > > + pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > > > > + // SAFETY: Properties of `bytes` guaranteed by the safety precondition.
> > > > > > + unsafe { &mut *(bytes as *mut [u8] as *mut CStr) }
> > > > >
> > > > > First `.cast::<[u8]>().cast::<CStr>()` is preferred than `as`. Besides,
> > > > > I think the dereference (or reborrow) is only safe if `CStr` is
> > > > > `#[repr(transparent)]. I.e.
> > > > >
> > > > > #[repr(transparent)]
> > > > > pub struct CStr([u8]);
> > > > >
> > > > > with that you can implement the function as (you can still use `cast()`
> > > > > implementation, but I sometimes find `transmute` is more simple).
> > > > >
> > > > > pub const unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
> > > > > // SAFETY: `CStr` is transparent to `[u8]`, so the transmute is
> > > > > // safe to do, and per the function safety requirement, `bytes`
> > > > > // is a valid `CStr`.
> > > > > unsafe { core::mem::transmute(bytes) }
> > > > > }
> > > > >
> > > > > but this is just my thought, better wait for others' feedback as well.
> > > >
> > > > Transmuting references is generally frowned upon. It's better to use a
> > > > pointer cast.
> > > >
> > >
> > > Ok, but honestly, I don't think the pointer casting is better ;-) What
> > > wants to be done here is simply converting a `&mut [u8]` to `&mut CStr`,
> > > adding two levels of pointer casting is kinda noise. (Also
> > > `from_bytes_with_nul` uses `transmute` as well).
> >
> > Here's my logic for preferring pointer casts: Transmute raises
> > questions about the layout of fat pointers, whereas pointer casts are
> > obviously okay.
> >
>
> But in this case, eventually you need to worry about fat pointer layout
> when you dereference the `*mut CStr`, right? In other words, the
> dereference is only safe if `*mut [u8]` has the same fat pointer layout
> as `*mut CStr`. I prefer to transmute here because it's a newtype
> paradigm, and transmute kinda makes that clear.

No, if the `*mut CStr` and `*mut [u8]` types disagree on whether the
data or vtable pointer is first in the layout, then an as cast should
swap them.

The question of whether their vtables (well I guess it's just a length
in this case) are compatible is separate.

Alice

2024-02-16 16:53:54

by Alice Ryhl

[permalink] [raw]
Subject: Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString

> + pub fn make_ascii_lowercase(&mut self) {
> + self.0.make_ascii_lowercase();
> + }

It's important to note here that this doesn't remove or introduce NUL
bytes.

pub fn make_ascii_lowercase(&mut self) {
// INVARIANT: This doesn't introduce or remove NUL bytes in the c
// string.
self.0.make_ascii_lowercase();
}

Ditto for make_ascii_uppercase. (But not the to_* methods.)

> + /// Returns a copy of this [`CString`] where each character is mapped to its
> + /// ASCII lower case equivalent.
> + ///
> + /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
> + /// but non-ASCII letters are unchanged.
> + ///
> + /// To lowercase the value in-place, use [`make_ascii_lowercase`].
> + ///
> + /// [`make_ascii_lowercase`]: str::make_ascii_lowercase
> + pub fn to_ascii_lowercase(&self) -> Result<CString, AllocError> {
> + let mut s = (*self).to_cstring()?;
> +
> + s.make_ascii_lowercase();
> +
> + return Ok(s);
> + }
> +
> + /// Returns a copy of this [`CString`] where each character is mapped to its
> + /// ASCII upper case equivalent.
> + ///
> + /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
> + /// but non-ASCII letters are unchanged.
> + ///
> + /// To uppercase the value in-place, use [`make_ascii_uppercase`].
> + ///
> + /// [`make_ascii_uppercase`]: str::make_ascii_uppercase
> + pub fn to_ascii_uppercase(&self) -> Result<CString, AllocError> {
> + let mut s = (*self).to_cstring()?;
> +
> + s.make_ascii_uppercase();
> +
> + return Ok(s);
> + }

Please move these to `CStr` as well.

> +impl DerefMut for CString {
> + fn deref_mut(&mut self) -> &mut Self::Target {
> + unsafe { CStr::from_bytes_with_nul_unchecked_mut(&mut *self.buf) }
> + }
> +}

Needs a safety comment.

impl DerefMut for CString {
fn deref_mut(&mut self) -> &mut Self::Target {
// SAFETY: A `CString` is always NUL-terminated and contains no
// other NUL bytes.
unsafe { CStr::from_bytes_with_nul_unchecked_mut(&mut *self.buf) }
}
}

Alice

2024-02-16 17:14:02

by Danilo Krummrich

[permalink] [raw]
Subject: Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString

On 2/16/24 17:53, Alice Ryhl wrote:
>> + pub fn make_ascii_lowercase(&mut self) {
>> + self.0.make_ascii_lowercase();
>> + }
>
> It's important to note here that this doesn't remove or introduce NUL
> bytes.
>
> pub fn make_ascii_lowercase(&mut self) {
> // INVARIANT: This doesn't introduce or remove NUL bytes in the c
> // string.
> self.0.make_ascii_lowercase();
> }
>
> Ditto for make_ascii_uppercase. (But not the to_* methods.)
>
>> + /// Returns a copy of this [`CString`] where each character is mapped to its
>> + /// ASCII lower case equivalent.
>> + ///
>> + /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z',
>> + /// but non-ASCII letters are unchanged.
>> + ///
>> + /// To lowercase the value in-place, use [`make_ascii_lowercase`].
>> + ///
>> + /// [`make_ascii_lowercase`]: str::make_ascii_lowercase
>> + pub fn to_ascii_lowercase(&self) -> Result<CString, AllocError> {
>> + let mut s = (*self).to_cstring()?;
>> +
>> + s.make_ascii_lowercase();
>> +
>> + return Ok(s);
>> + }
>> +
>> + /// Returns a copy of this [`CString`] where each character is mapped to its
>> + /// ASCII upper case equivalent.
>> + ///
>> + /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
>> + /// but non-ASCII letters are unchanged.
>> + ///
>> + /// To uppercase the value in-place, use [`make_ascii_uppercase`].
>> + ///
>> + /// [`make_ascii_uppercase`]: str::make_ascii_uppercase
>> + pub fn to_ascii_uppercase(&self) -> Result<CString, AllocError> {
>> + let mut s = (*self).to_cstring()?;
>> +
>> + s.make_ascii_uppercase();
>> +
>> + return Ok(s);
>> + }
>
> Please move these to `CStr` as well.

That would result into two copies if I actually want a CString, wouldn't it?

Also, what would be the use case? And even if someone wants to have a CStr
again, couldn't we just deref the resulting CString?

- Danilo

>
>> +impl DerefMut for CString {
>> + fn deref_mut(&mut self) -> &mut Self::Target {
>> + unsafe { CStr::from_bytes_with_nul_unchecked_mut(&mut *self.buf) }
>> + }
>> +}
>
> Needs a safety comment.
>
> impl DerefMut for CString {
> fn deref_mut(&mut self) -> &mut Self::Target {
> // SAFETY: A `CString` is always NUL-terminated and contains no
> // other NUL bytes.
> unsafe { CStr::from_bytes_with_nul_unchecked_mut(&mut *self.buf) }
> }
> }
>
> Alice
>


2024-02-16 19:26:04

by Alice Ryhl

[permalink] [raw]
Subject: Re: [PATCH v3] rust: str: add {make,to}_{upper,lower}case() to CString

On 2/16/24 18:11, Danilo Krummrich wrote:
> On 2/16/24 17:53, Alice Ryhl wrote:
>>> +    /// Returns a copy of this [`CString`] where each character is
>>> mapped to its
>>> +    /// ASCII upper case equivalent.
>>> +    ///
>>> +    /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
>>> +    /// but non-ASCII letters are unchanged.
>>> +    ///
>>> +    /// To uppercase the value in-place, use [`make_ascii_uppercase`].
>>> +    ///
>>> +    /// [`make_ascii_uppercase`]: str::make_ascii_uppercase
>>> +    pub fn to_ascii_uppercase(&self) -> Result<CString, AllocError> {
>>> +        let mut s = (*self).to_cstring()?;
>>> +
>>> +        s.make_ascii_uppercase();
>>> +
>>> +        return Ok(s);
>>> +    }
>>
>> Please move these to `CStr` as well.
>
> That would result into two copies if I actually want a CString, wouldn't
> it?
>
> Also, what would be the use case? And even if someone wants to have a CStr
> again, couldn't we just deref the resulting CString?

To clarify, I want you to move it to the `impl CStr` block. That changes
the type of the `self` argument. I don't want you to change the return
type - that should still be `CString`.

Currently, if I have a `&CStr` and I want an uppercase `CString`, I
can't do that with this method.

Alice