2022-08-06 10:50:31

by Konstantin Shelekhin

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

> +unsafe impl GlobalAlloc for KernelAllocator {
> + unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
> + // `krealloc()` is used instead of `kmalloc()` because the latter is
> + // an inline function and cannot be bound to as a result.
> + unsafe { bindings::krealloc(ptr::null(), layout.size(), bindings::GFP_KERNEL) as *mut u8 }
> + }
> +
> + unsafe fn dealloc(&self, ptr: *mut u8, _layout: Layout) {
> + unsafe {
> + bindings::kfree(ptr as *const core::ffi::c_void);
> + }
> + }
> +}

I sense possible problems here. It's common for a kernel code to pass
flags during memory allocations.

For example:

struct bio *bio;

for (...) {
bio = bio_alloc_bioset(bdev, nr_vecs, opf, GFP_NOIO, bs);
if (!bio)
return -ENOMEM;
}

Without GFP_NOIO we can run into a deadlock, because the kernel will try
give us free memory by flushing the dirty pages and we need the memory
to actually do it and boom, deadlock.

Or we can be allocating some structs under spinlock (yeah, that happens too):

struct efc_vport *vport;

spin_lock_irqsave(...);
vport = kzalloc(sizeof(*vport), GFP_ATOMIC);
if (!vport) {
spin_unlock_irqrestore(...);
return NULL;
}
spin_unlock(...);

Same can (and probably will) happen to e.g. Vec elements. So some form
of flags passing should be supported in try_* variants:

let mut vec = Vec::try_new(GFP_ATOMIC)?;

vec.try_push(GFP_ATOMIC, 1)?;
vec.try_push(GFP_ATOMIC, 2)?;
vec.try_push(GFP_ATOMIC, 3)?;


2022-08-06 11:54:22

by Miguel Ojeda

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Sat, Aug 6, 2022 at 12:25 PM Konstantin Shelekhin
<[email protected]> wrote:
>
> I sense possible problems here. It's common for a kernel code to pass
> flags during memory allocations.

Yes, of course. We will support this, but how exactly it will look
like, to what extent upstream Rust's `alloc` could support our use
cases, etc. has been on discussion for a long time.

For instance, see https://github.com/Rust-for-Linux/linux/pull/815 for
a potential extension trait approach with no allocator carried on the
type that Andreas wrote after a discussion in the last informal call:

let a = Box::try_new_atomic(101)?;

Cheers,
Miguel

2022-08-06 12:17:52

by Konstantin Shelekhin

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Sat, Aug 06, 2022 at 01:22:52PM +0200, Miguel Ojeda wrote:
> > I sense possible problems here. It's common for a kernel code to pass
> > flags during memory allocations.
>
> Yes, of course. We will support this, but how exactly it will look
> like, to what extent upstream Rust's `alloc` could support our use
> cases, etc. has been on discussion for a long time.
>
> For instance, see https://github.com/Rust-for-Linux/linux/pull/815 for
> a potential extension trait approach with no allocator carried on the
> type that Andreas wrote after a discussion in the last informal call:
>
> let a = Box::try_new_atomic(101)?;

IMO it's just easier to always pass flags like this:

let a = Box::try_new(GFP_KERNEL | GFP_DMA, 101)?;

But if allocate_with_flags() will be somehow present in the API that's
just what we need.

P.S. Thanks for a quick reply!

2022-08-06 15:22:24

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Sat, Aug 06, 2022 at 01:22:52PM +0200, Miguel Ojeda wrote:
> On Sat, Aug 6, 2022 at 12:25 PM Konstantin Shelekhin
> <[email protected]> wrote:
> >
> > I sense possible problems here. It's common for a kernel code to pass
> > flags during memory allocations.
>
> Yes, of course. We will support this, but how exactly it will look
> like, to what extent upstream Rust's `alloc` could support our use
> cases, etc. has been on discussion for a long time.
>
> For instance, see https://github.com/Rust-for-Linux/linux/pull/815 for
> a potential extension trait approach with no allocator carried on the
> type that Andreas wrote after a discussion in the last informal call:
>
> let a = Box::try_new_atomic(101)?;

Something I've been wondering about for a while is ...

struct task_struct {
...
+ gfp_t gfp_flags;
...
};

We've already done some work towards this with the scoped allocation
API for NOIO and NOFS, but having spin_lock() turn current->gfp_flags
into GFP_ATOMIC might not be the worst idea in the world.

2022-09-19 14:14:06

by Wedson Almeida Filho

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Sat, Aug 06, 2022 at 03:57:35PM +0100, Matthew Wilcox wrote:
> On Sat, Aug 06, 2022 at 01:22:52PM +0200, Miguel Ojeda wrote:
> > On Sat, Aug 6, 2022 at 12:25 PM Konstantin Shelekhin
> > <[email protected]> wrote:
> > >
> > > I sense possible problems here. It's common for a kernel code to pass
> > > flags during memory allocations.
> >
> > Yes, of course. We will support this, but how exactly it will look
> > like, to what extent upstream Rust's `alloc` could support our use
> > cases, etc. has been on discussion for a long time.
> >
> > For instance, see https://github.com/Rust-for-Linux/linux/pull/815 for
> > a potential extension trait approach with no allocator carried on the
> > type that Andreas wrote after a discussion in the last informal call:
> >
> > let a = Box::try_new_atomic(101)?;
>
> Something I've been wondering about for a while is ...
>
> struct task_struct {
> ...
> + gfp_t gfp_flags;
> ...
> };

For GFP_ATOMIC, we could use preempt_count except that it isn't always
enabled. Conveniently, it is already separated out into its own config.
How do people feel about removing CONFIG_PREEMPT_COUNT and having the
count always enabled?

We would then have a way to reliably detect when we are in atomic
context and we could catch other scenarios beyond allocation. For
example, I recently noticed that the following code (as a minimal
reproduction) does _not_ lead to a deadlock when CONFIG_PREEMPT=n:

rcu_read_lock();
synchronize_rcu();

Boqun explained to me that the reason is that synchronize_rcu() is not
supposed to be called from an rcu read-side critical section and the
current implementation takes advantage of this fact plus there is no way
to detect if we're in atomic context. This is all well and good, but if
one makes this mistake, the result is a potential user-after-free.
Always having preempt_count would allow us to behave differently here --
this is another conversation but at least we'll have to option to choose
what to do. (And it seems to me that detecting this and deadlocking or
BUG'ing would be preferable over the alternative [CC'ing Kees].)

Anyway, objections to a patch series that would amount to the changes
below?

Thanks,
-Wedson

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 90fbe4a3f9c8..c88c932dba5b 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -224,7 +224,6 @@ THUMB( fpreg .req r7 )
/*
* Increment/decrement the preempt count.
*/
-#ifdef CONFIG_PREEMPT_COUNT
.macro inc_preempt_count, ti, tmp
ldr \tmp, [\ti, #TI_PREEMPT] @ get preempt count
add \tmp, \tmp, #1 @ increment it
@@ -241,16 +240,6 @@ THUMB( fpreg .req r7 )
get_thread_info \ti
dec_preempt_count \ti, \tmp
.endm
-#else
- .macro inc_preempt_count, ti, tmp
- .endm
-
- .macro dec_preempt_count, ti, tmp
- .endm
-
- .macro dec_preempt_count_ti, ti, tmp
- .endm
-#endif

#define USERL(l, x...) \
9999: x; \
diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S
index d2b4ac06e4ed..ecdeaa4d5190 100644
--- a/arch/arm/kernel/iwmmxt.S
+++ b/arch/arm/kernel/iwmmxt.S
@@ -95,9 +95,7 @@ ENTRY(iwmmxt_task_enable)
mov r2, r2 @ cpwait
bl concan_save

-#ifdef CONFIG_PREEMPT_COUNT
get_thread_info r10
-#endif
4: dec_preempt_count r10, r3
ret r9 @ normal exit from exception

diff --git a/arch/xtensa/kernel/entry.S b/arch/xtensa/kernel/entry.S
index 272fff587907..8ad94e13d0f0 100644
--- a/arch/xtensa/kernel/entry.S
+++ b/arch/xtensa/kernel/entry.S
@@ -832,7 +832,7 @@ ENTRY(debug_exception)
* preemption if we have HW breakpoints to preserve DEBUGCAUSE.DBNUM
* meaning.
*/
-#if defined(CONFIG_PREEMPT_COUNT) && defined(CONFIG_HAVE_HW_BREAKPOINT)
+#if defined(CONFIG_HAVE_HW_BREAKPOINT)
GET_THREAD_INFO(a2, a1)
l32i a3, a2, TI_PRE_COUNT
addi a3, a3, 1
diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
index 47e845353ffa..811f34dbb80b 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -22,7 +22,6 @@ config DRM_I915_DEBUG
depends on EXPERT # only for developers
depends on !COMPILE_TEST # never built by robots
select DEBUG_FS
- select PREEMPT_COUNT
select I2C_CHARDEV
select STACKDEPOT
select DRM_DP_AUX_CHARDEV
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index c10d68cdc3ca..93686bdb9707 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -293,8 +293,7 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
(Wmax))
#define wait_for(COND, MS) _wait_for((COND), (MS) * 1000, 10, 1000)

-/* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
-#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
+#if defined(CONFIG_DRM_I915_DEBUG)
# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
#else
# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h
index bbc4730a6505..1e03d54b0b6f 100644
--- a/include/linux/bit_spinlock.h
+++ b/include/linux/bit_spinlock.h
@@ -90,10 +90,8 @@ static inline int bit_spin_is_locked(int bitnum, unsigned long *addr)
{
#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
return test_bit(bitnum, addr);
-#elif defined CONFIG_PREEMPT_COUNT
- return preempt_count();
#else
- return 1;
+ return preempt_count();
#endif
}

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 1f1099dac3f0..a05e40dccb0a 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -600,16 +600,14 @@ do { \

#define lockdep_assert_preemption_enabled() \
do { \
- WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT) && \
- __lockdep_enabled && \
+ WARN_ON_ONCE(__lockdep_enabled && \
(preempt_count() != 0 || \
!this_cpu_read(hardirqs_enabled))); \
} while (0)

#define lockdep_assert_preemption_disabled() \
do { \
- WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT) && \
- __lockdep_enabled && \
+ WARN_ON_ONCE(__lockdep_enabled && \
(preempt_count() == 0 && \
this_cpu_read(hardirqs_enabled))); \
} while (0)
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 2e677e6ad09f..4cddcb17489a 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -271,9 +271,7 @@ static inline bool folio_ref_try_add_rcu(struct folio *folio, int count)
* context, so (on !SMP) we only need preemption to be disabled
* and TINY_RCU does that for us.
*/
-# ifdef CONFIG_PREEMPT_COUNT
VM_BUG_ON(!in_atomic() && !irqs_disabled());
-# endif
VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio);
folio_ref_add(folio, count);
#else
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index b4381f255a5c..77da73007375 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -56,8 +56,7 @@
#define PREEMPT_DISABLED (PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)

/*
- * Disable preemption until the scheduler is running -- use an unconditional
- * value so that it also works on !PREEMPT_COUNT kernels.
+ * Disable preemption until the scheduler is running.
*
* Reset by start_kernel()->sched_init()->init_idle()->init_idle_preempt_count().
*/
@@ -69,7 +68,6 @@
*
* preempt_count() == 2*PREEMPT_DISABLE_OFFSET
*
- * Note: PREEMPT_DISABLE_OFFSET is 0 for !PREEMPT_COUNT kernels.
* Note: See finish_task_switch().
*/
#define FORK_PREEMPT_COUNT (2*PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
@@ -133,11 +131,7 @@ static __always_inline unsigned char interrupt_context_level(void)
/*
* The preempt_count offset after preempt_disable();
*/
-#if defined(CONFIG_PREEMPT_COUNT)
-# define PREEMPT_DISABLE_OFFSET PREEMPT_OFFSET
-#else
-# define PREEMPT_DISABLE_OFFSET 0
-#endif
+#define PREEMPT_DISABLE_OFFSET PREEMPT_OFFSET

/*
* The preempt_count offset after spin_lock()
@@ -154,7 +148,7 @@ static __always_inline unsigned char interrupt_context_level(void)
*
* spin_lock_bh()
*
- * Which need to disable both preemption (CONFIG_PREEMPT_COUNT) and
+ * Which need to disable both preemption and
* softirqs, such that unlock sequences of:
*
* spin_unlock();
@@ -196,8 +190,6 @@ extern void preempt_count_sub(int val);
#define preempt_count_inc() preempt_count_add(1)
#define preempt_count_dec() preempt_count_sub(1)

-#ifdef CONFIG_PREEMPT_COUNT
-
#define preempt_disable() \
do { \
preempt_count_inc(); \
@@ -263,27 +255,6 @@ do { \
__preempt_count_dec(); \
} while (0)

-#else /* !CONFIG_PREEMPT_COUNT */
-
-/*
- * Even if we don't have any preemption, we need preempt disable/enable
- * to be barriers, so that we don't have things like get_user/put_user
- * that can cause faults and scheduling migrate into our preempt-protected
- * region.
- */
-#define preempt_disable() barrier()
-#define sched_preempt_enable_no_resched() barrier()
-#define preempt_enable_no_resched() barrier()
-#define preempt_enable() barrier()
-#define preempt_check_resched() do { } while (0)
-
-#define preempt_disable_notrace() barrier()
-#define preempt_enable_no_resched_notrace() barrier()
-#define preempt_enable_notrace() barrier()
-#define preemptible() 0
-
-#endif /* CONFIG_PREEMPT_COUNT */
-
#ifdef MODULE
/*
* Modules have no business playing preemption tricks.
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 47e5d374c7eb..8761b85c7874 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -225,9 +225,6 @@ static inline bool pagefault_disabled(void)
*
* This function should only be used by the fault handlers. Other users should
* stick to pagefault_disabled().
- * Please NEVER use preempt_disable() to disable the fault handler. With
- * !CONFIG_PREEMPT_COUNT, this is like a NOP. So the handler won't be disabled.
- * in_atomic() will report different values based on !CONFIG_PREEMPT_COUNT.
*/
#define faulthandler_disabled() (pagefault_disabled() || in_atomic())

diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index c2f1fd95a821..3d2f7ded0fee 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -86,12 +86,8 @@ config PREEMPT_RT

endchoice

-config PREEMPT_COUNT
- bool
-
config PREEMPTION
bool
- select PREEMPT_COUNT

config PREEMPT_DYNAMIC
bool "Preemption behaviour defined on boot"
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 1b0c41d490f0..34d395f007a7 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -122,7 +122,6 @@ config RCU_STRICT_GRACE_PERIOD
bool "Provide debug RCU implementation with short grace periods"
depends on DEBUG_KERNEL && RCU_EXPERT && NR_CPUS <= 4 && !TINY_RCU
default n
- select PREEMPT_COUNT if PREEMPT=n
help
Select this option to build an RCU variant that is strict about
grace periods, making them as short as it can. This limits
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79aea7df4345..c21a83e7b534 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2478,7 +2478,7 @@ static __latent_entropy void rcu_core(void)
WARN_ON_ONCE(!rdp->beenonline);

/* Report any deferred quiescent states if preemption enabled. */
- if (IS_ENABLED(CONFIG_PREEMPT_COUNT) && (!(preempt_count() & PREEMPT_MASK))) {
+ if (!(preempt_count() & PREEMPT_MASK)) {
rcu_preempt_deferred_qs(current);
} else if (rcu_preempt_need_deferred_qs(current)) {
set_tsk_need_resched(current);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 438ecae6bd7e..432aa6e6cff7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -29,7 +29,7 @@ static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
(IS_ENABLED(CONFIG_HOTPLUG_CPU) && lockdep_is_cpus_held()) ||
rcu_lockdep_is_held_nocb(rdp) ||
(rdp == this_cpu_ptr(&rcu_data) &&
- !(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preemptible())) ||
+ !preemptible()) ||
rcu_current_is_nocb_kthread(rdp)),
"Unsafe read of RCU_NOCB offloaded state"
);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ee28253c9ac0..938f41569ae9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5111,8 +5111,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
* finish_task_switch() for details.
*
* finish_task_switch() will drop rq->lock() and lower preempt_count
- * and the preempt_enable() will end up enabling preemption (on
- * PREEMPT_COUNT kernels).
+ * and the preempt_enable() will end up enabling preemption.
*/

finish_task_switch(prev);
@@ -9901,9 +9900,6 @@ void __cant_sleep(const char *file, int line, int preempt_offset)
if (irqs_disabled())
return;

- if (!IS_ENABLED(CONFIG_PREEMPT_COUNT))
- return;
-
if (preempt_count() > preempt_offset)
return;

@@ -9933,9 +9929,6 @@ void __cant_migrate(const char *file, int line)
if (is_migration_disabled(current))
return;

- if (!IS_ENABLED(CONFIG_PREEMPT_COUNT))
- return;
-
if (preempt_count() > 0)
return;

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index bcbe60d6c80c..3dd4c8ccc7b2 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1230,7 +1230,6 @@ config PROVE_LOCKING
select DEBUG_RWSEMS
select DEBUG_WW_MUTEX_SLOWPATH
select DEBUG_LOCK_ALLOC
- select PREEMPT_COUNT if !ARCH_NO_PREEMPT
select TRACE_IRQFLAGS
default n
help
@@ -1431,7 +1430,6 @@ config DEBUG_LOCKDEP

config DEBUG_ATOMIC_SLEEP
bool "Sleep inside atomic section checking"
- select PREEMPT_COUNT
depends on DEBUG_KERNEL
depends on !ARCH_NO_PREEMPT
help
diff --git a/mm/slub.c b/mm/slub.c
index 862dbd9af4f5..b01767226476 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3106,19 +3106,15 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
{
void *p;

-#ifdef CONFIG_PREEMPT_COUNT
/*
* We may have been preempted and rescheduled on a different
* cpu before disabling preemption. Need to reload cpu area
* pointer.
*/
c = slub_get_cpu_ptr(s->cpu_slab);
-#endif
-
p = ___slab_alloc(s, gfpflags, node, addr, c);
-#ifdef CONFIG_PREEMPT_COUNT
slub_put_cpu_ptr(s->cpu_slab);
-#endif
+
return p;
}

diff --git a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-T b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-T
index c70cf0405f24..e332b9b4d8c3 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-T
+++ b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-T
@@ -9,4 +9,3 @@ CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_DEBUG_ATOMIC_SLEEP=y
-#CHECK#CONFIG_PREEMPT_COUNT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-U b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-U
index bc9eeabaa1b1..fac0047579c2 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-U
+++ b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-U
@@ -7,4 +7,3 @@ CONFIG_PREEMPT_DYNAMIC=n
CONFIG_RCU_TRACE=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
-CONFIG_PREEMPT_COUNT=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TINY01 b/tools/testing/selftests/rcutorture/configs/rcu/TINY01
index 0953c52fcfd7..8363b0b546b7 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TINY01
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TINY01
@@ -11,4 +11,3 @@ CONFIG_RCU_TRACE=n
#CHECK#CONFIG_RCU_STALL_COMMON=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
-CONFIG_PREEMPT_COUNT=n
diff --git a/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt b/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt
index a75b16991a92..4c596905b6b2 100644
--- a/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt
+++ b/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt
@@ -4,7 +4,6 @@ This document gives a brief rationale for the TINY_RCU test cases.
Kconfig Parameters:

CONFIG_DEBUG_LOCK_ALLOC -- Do all three and none of the three.
-CONFIG_PREEMPT_COUNT
CONFIG_RCU_TRACE

The theory here is that randconfig testing will hit the other six possible
diff --git a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
index 42acb1a64ce1..9e851c80c5eb 100644
--- a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
+++ b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
@@ -42,7 +42,6 @@ CONFIG_64BIT

Used only to check CONFIG_RCU_FANOUT value, inspection suffices.

-CONFIG_PREEMPT_COUNT
CONFIG_PREEMPT_RCU

Redundant with CONFIG_PREEMPT, ignore.
diff --git a/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h b/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h
index 283d7103334f..d0d485d48649 100644
--- a/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h
+++ b/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h
@@ -8,7 +8,6 @@
#undef CONFIG_HOTPLUG_CPU
#undef CONFIG_MODULES
#undef CONFIG_NO_HZ_FULL_SYSIDLE
-#undef CONFIG_PREEMPT_COUNT
#undef CONFIG_PREEMPT_RCU
#undef CONFIG_PROVE_RCU
#undef CONFIG_RCU_NOCB_CPU

2022-09-19 17:12:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Mon, Sep 19, 2022 at 7:07 AM Wedson Almeida Filho <[email protected]> wrote:
>
> For GFP_ATOMIC, we could use preempt_count except that it isn't always
> enabled. Conveniently, it is already separated out into its own config.
> How do people feel about removing CONFIG_PREEMPT_COUNT and having the
> count always enabled?
>
> We would then have a way to reliably detect when we are in atomic
> context [..]

No.

First off, it's not true. There are non-preempt atomic contexts too,
like interrupts disabled etc. Can you enumerate all those? Possibly.

But more importantly, doing "depending on context, I silently and
automatically do different things" is simply WRONG. Don't do it. It's
a disaster.

Doing that for *debugging* is fine. So having a

WARN_ON_ONCE(in_atomic_context());

is a perfectly fine debug check to find people who do bad bad things,
and we have lots of variations of that theme (ie might_sleep(), but
also things like lockdep_assert_held() and friends that assert other
constraints entirely).

But having *behavior changes* depending on context is a total
disaster. And that's invariably why people want this disgusting thing.

They want to do broken things like "I want to allocate memory, and I
don't want to care where I am, so I want the memory allocator to just
do the whole GFP_ATOMIC for me".

And that is FUNDAMENTALLY BROKEN.

If you want to allocate memory, and you don't want to care about what
context you are in, or whether you are holding spinlocks etc, then you
damn well shouldn't be doing kernel programming. Not in C, and not in
Rust.

It really is that simple. Contexts like this ("I am in a critical
region, I must not do memory allocation or use sleeping locks") is
*fundamental* to kernel programming. It has nothing to do with the
language, and everything to do with the problem space.

So don't go down this "let's have the allocator just know if you're in
an atomic context automatically" path. It's wrong. It's complete
garbage. It may generate kernel code that superficially "works", but
one that is fundamentally broken, and will fail and becaome unreliable
under memory pressure.

The thing is, when you do kernel programming, and you're holding a
spinlock and need to allocate memory, you generally shouldn't allocate
memory at all, you should go "Oh, maybe I need to do the allocation
*before* getting the lock".

And it's not just locks. It's also "I'm in a hardware interrupt", but
now the solution is fundamentally different. Maybe you still want to
do pre-allocation, but now you're a driver interrupt and the
pre-allocation has to happen in another code sequence entirely,
because obviously the interrupt itself is asynchronous.

But more commonly, you just want to use GFP_ATOMIC, and go "ok, I know
the VM layer tries to keep a _limited_ set of pre-allocated buffers
around".

But it needs to be explicit, because that GFP_ATOMIC pool of
allocations really is very limited, and you as the allocator need to
make it *explicit* that yeah, now you're not just doing a random
allocation, you are doing one of these *special* allocations that will
eat into that very limited global pool of allocations.

So no, you cannot and MUST NOT have an allocator that silently just
dips into that special pool, without the user being aware or
requesting it.

That really is very very fundamental. Allocators that "just work" in
different contexts are broken garbage within the context of a kernel.

Please just accept this, and really *internalize* it. Because this
isn't actually just about allocators. Allocators may be one very
common special case of this kind of issue, and they come up quite
often as a result, but this whole "your code needs to *understand* the
special restrictions that the kernel is under" is something that is
quite fundamental in general.

It shows up in various other situations too, like "Oh, this code can
run in an interrupt handler" (which is *different* from the atomicity
of just "while holding a lock", because it implies a level of random
nesting that is very relevant for locking).

Or sometimes it's subtler things than just correctness, ie "I'm
running under a critical lock, so I must be very careful because there
are scalability issues". The code may *work* just fine, but things
like tasklist_lock are very very special.

So embrace that kernel requirement. Kernels are special. We do things,
and we have constraints that pretty much no other environment has.

It is often what makes kernel programming interesting - because it's
challenging. We have a very limited stack. We have some very direct
and deep interactions with the CPU, with things like IO and
interrupts. Some constraints are always there, and others are
context-dependent.

The whole "really know what context this code is running within" is
really important. You may want to write very explicit comments about
it.

And you definitely should always write your code knowing about it.

Linus

2022-09-19 18:12:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Mon, Sep 19, 2022 at 9:09 AM Linus Torvalds
<[email protected]> wrote:
>
> The whole "really know what context this code is running within" is
> really important. You may want to write very explicit comments about
> it.

Side note: a corollary of this is that people should avoid "dynamic
context" things like the plague, because it makes for such pain when
the context isn't statically obvious.

So things like conditional locking should generally be avoided as much
as humanly possible. Either you take the lock or you don't - don't
write code where the lock context depends on some argument value or
flag, for example.

Code like this is fine:

if (some_condition) {
spin_lock(&mylock);
xyz();
spin_unlock(&mylock);
}

because 'xyz()' is always run in the same context. But avoid patterns like

if (some_condition)
spin_lock(&mylock);
xyz();
if (same_condition)
spin_unlock(&mylock);

where now 'xyz()' sometimes does something with the lock held, and
sometimes not. That way lies insanity.

Now, obviously, the context for helper functions (like the Rust kernel
crate is, pretty much by definition) obviously depends on the context
of the callers of said helpers, so in that sense the above kind of
"sometimes in locked context, sometimes not" will always be the case.

So those kinds of helper functions will generally need to be either
insensitive to context and usable in all contexts (best), or
documented - and verify with debug code like 'might_sleep()' - that
they only run in certain contexts.

And then in the worst case there's a gfp_flag that says "you can only
do these kinds of allocations" or whatever, but even then you should
strive to never have other dynamic behavior (ie please try to avoid
behavior like having a "already locked" argument and then taking a
lock depending on that).

Because if you follow those rules, at least you can statically see the
context from a call chain (so, for example, the stack trace of an oops
will make the context unambiguous, because there's hopefully no lock
or interrupt disabling or similar that has some dynamic behavior like
in that second example of "xyz()".

Do we have places in the kernel that do conditional locking? Yes we
do. Examples like that second case do exist. It's bad. Sometimes you
can't avoid it. But you can always *strive* to avoid it, and
minimizing those kinds of "context depends on other things"
situations.

And we should strive very hard to make those kinds of contexts very
clear and explicit and not dynamic exactly because it's so important
in the kernel, and it has subtle implications wrt other locking, and
memory allocations.

Linus

2022-09-19 18:35:58

by Wedson Almeida Filho

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Mon, Sep 19, 2022 at 10:20:52AM -0700, Linus Torvalds wrote:
> On Mon, Sep 19, 2022 at 9:09 AM Linus Torvalds
> <[email protected]> wrote:
> >
> > The whole "really know what context this code is running within" is
> > really important. You may want to write very explicit comments about
> > it.
>
> Side note: a corollary of this is that people should avoid "dynamic
> context" things like the plague, because it makes for such pain when
> the context isn't statically obvious.

As you know, we're trying to guarantee the absence of undefined
behaviour for code written in Rust. And the context is _really_
important, so important that leaving it up to comments isn't enough.

I don't care as much about allocation flags as I do about sleeping in an
rcu read-side critical region. When CONFIG_PREEMPT=n, if some CPU makes
the mistake of sleeping between rcu_read_lock()/rcu_read_unlock(), RCU
will take that as a quiescent state, which may cause unsuspecting code
waiting for a grace period to wake up too early and potentially free
memory that is still in use, which is obviously undefined behaviour.

We generally have two routes to avoid undefined behaviour: detect at
compile time (and fail compilation) or at runtime (and stop things
before they go too far). The former, while feasible, would require some
static analysi or passing tokens as arguments to guarantee that we're in
sleepable context when sleeping (all ellided at compile time, so
zero-cost in terms of run-time performance), but likely painful to
program use.

Always having preempt_count would allow us to detect such issues in RCU
at runtime (for both C and Rust) and prevent user-after-frees.

Do you have an opinion on the above?

Cheers,
-Wedson

2022-09-19 21:01:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Mon, Sep 19, 2022 at 11:05 AM Wedson Almeida Filho
<[email protected]> wrote:
>
> As you know, we're trying to guarantee the absence of undefined
> behaviour for code written in Rust. And the context is _really_
> important, so important that leaving it up to comments isn't enough.

You need to realize that

(a) reality trumps fantasy

(b) kernel needs trump any Rust needs

And the *reality* is that there are no absolute guarantees. Ever. The
"Rust is safe" is not some kind of absolute guarantee of code safety.
Never has been. Anybody who believes that should probably re-take
their kindergarten year, and stop believing in the Easter bunny and
Santa Claus.

Even "safe" rust code in user space will do things like panic when
things go wrong (overflows, allocation failures, etc). If you don't
realize that that is NOT some kind of true safely, I don't know what
to say.

Not completing the operation at all, is *not* really any better than
getting the wrong answer, it's only more debuggable.

In the kernel, "panic and stop" is not an option (it's actively worse
than even the wrong answer, since it's really not debugable), so the
kernel version of "panic" is "WARN_ON_ONCE()" and continue with the
wrong answer.

So this is something that I really *need* the Rust people to
understand. That whole reality of "safe" not being some absolute
thing, and the reality that the kernel side *requires* slightly
different rules than user space traditionally does.

> I don't care as much about allocation flags as I do about sleeping in an
> rcu read-side critical region. When CONFIG_PREEMPT=n, if some CPU makes
> the mistake of sleeping between rcu_read_lock()/rcu_read_unlock(), RCU
> will take that as a quiescent state, which may cause unsuspecting code
> waiting for a grace period to wake up too early and potentially free
> memory that is still in use, which is obviously undefined behaviour.

So?

You had a bug. Shit happens. We have a lot of debugging tools that
will give you a *HUGE* warning when said shit happens, including
sending automated reports to the distro maker. And then you fix the
bug.

Think of that "debugging tools give a huge warning" as being the
equivalent of std::panic in standard rust. Yes, the kernel will
continue (unless you have panic-on-warn set), because the kernel
*MUST* continue in order for that "report to upstream" to have a
chance of happening.

So it's technically a veryu different implementation from std:panic,
but you should basically see it as exactly that: a *technical*
difference, not a conceptual one. The rules for how the kernel deals
with bugs is just different, because we don't have core-files and
debuggers in the general case.

(And yes, you can have a kernel debugger, and you can just have the
WARN_ON_ONCE trigger the debugger, but think of all those billions of
devices that are in normal users hands).

And yes, in certain configurations, even those warnings will be turned
off because the state tracking isn't done. Again, that's just reality.
You don't need to use those configurations yourself if you don't like
them, but that does *NOT* mean that you get to say "nobody else gets
to use those configurations either".

Deal with it.

Or, you know, if you can't deal with the rules that the kernel
requires, then just don't do kernel programming.

Because in the end it really is that simple. I really need you to
understand that Rust in the kernel is dependent on *kernel* rules. Not
some other random rules that exist elsewhere.

Linus

2022-09-19 23:12:19

by Wedson Almeida Filho

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Mon, Sep 19, 2022 at 01:42:44PM -0700, Linus Torvalds wrote:
> On Mon, Sep 19, 2022 at 11:05 AM Wedson Almeida Filho
> <[email protected]> wrote:
> >
> > As you know, we're trying to guarantee the absence of undefined
> > behaviour for code written in Rust. And the context is _really_
> > important, so important that leaving it up to comments isn't enough.
>
> You need to realize that
>
> (a) reality trumps fantasy
>
> (b) kernel needs trump any Rust needs
>
> And the *reality* is that there are no absolute guarantees. Ever. The
> "Rust is safe" is not some kind of absolute guarantee of code safety.
> Never has been. Anybody who believes that should probably re-take
> their kindergarten year, and stop believing in the Easter bunny and
> Santa Claus.
>
> Even "safe" rust code in user space will do things like panic when
> things go wrong (overflows, allocation failures, etc). If you don't
> realize that that is NOT some kind of true safely, I don't know what
> to say.

No one is talking about absolute safety guarantees. I am talking about
specific ones that Rust makes: these are well-documented and formally
defined.

> Not completing the operation at all, is *not* really any better than
> getting the wrong answer, it's only more debuggable.
>
> In the kernel, "panic and stop" is not an option (it's actively worse
> than even the wrong answer, since it's really not debugable), so the
> kernel version of "panic" is "WARN_ON_ONCE()" and continue with the
> wrong answer.
>
> So this is something that I really *need* the Rust people to
> understand. That whole reality of "safe" not being some absolute
> thing, and the reality that the kernel side *requires* slightly
> different rules than user space traditionally does.
>
> > I don't care as much about allocation flags as I do about sleeping in an
> > rcu read-side critical region. When CONFIG_PREEMPT=n, if some CPU makes
> > the mistake of sleeping between rcu_read_lock()/rcu_read_unlock(), RCU
> > will take that as a quiescent state, which may cause unsuspecting code
> > waiting for a grace period to wake up too early and potentially free
> > memory that is still in use, which is obviously undefined behaviour.
>
> So?
>
> You had a bug. Shit happens. We have a lot of debugging tools that
> will give you a *HUGE* warning when said shit happens, including
> sending automated reports to the distro maker. And then you fix the
> bug.
>
> Think of that "debugging tools give a huge warning" as being the
> equivalent of std::panic in standard rust. Yes, the kernel will
> continue (unless you have panic-on-warn set), because the kernel
> *MUST* continue in order for that "report to upstream" to have a
> chance of happening.
>
> So it's technically a veryu different implementation from std:panic,
> but you should basically see it as exactly that: a *technical*
> difference, not a conceptual one. The rules for how the kernel deals
> with bugs is just different, because we don't have core-files and
> debuggers in the general case.
>
> (And yes, you can have a kernel debugger, and you can just have the
> WARN_ON_ONCE trigger the debugger, but think of all those billions of
> devices that are in normal users hands).
>
> And yes, in certain configurations, even those warnings will be turned
> off because the state tracking isn't done. Again, that's just reality.
> You don't need to use those configurations yourself if you don't like
> them, but that does *NOT* mean that you get to say "nobody else gets
> to use those configurations either".
>
> Deal with it.

While I disagree with some of what you write, the point is taken.

But I won't give up on Rust guarantees just yet, I'll try to find
ergonomic ways to enforce them at compile time.

Thanks,
-Wedson

2022-09-19 23:56:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Mon, Sep 19, 2022 at 3:35 PM Wedson Almeida Filho <[email protected]> wrote:
>
> No one is talking about absolute safety guarantees. I am talking about
> specific ones that Rust makes: these are well-documented and formally
> defined.

If you cannot get over the fact that the kernel may have other
requirements that trump any language standards, we really can't work
together.

Those Rust rules may make sense in other environments. But the kernel
really does have hard requirements that you continue to limp along
even if some fundamental rule has been violated. Exactly because
there's often no separate environment outside the kernel that can deal
with it.

End result: a compiler - or language infrastructure - that says "my
rules are so ingrained that I cannot do that" is not one that is valid
for kernel work.

This is not really any different from the whole notion of "allocation
failures cannot panic" that Rust people seemed to readily understand
is a major kernel requirement, and that the kernel needed a graceful
failure return instead of a hard panic.

Also note that the kernel is perfectly willing to say "I will use
compiler flags that disable certain guarantees". We do it all the
time.

For example, the C standard has a lot of "the compiler is allowed to
make this assumption". And then we disagree with those, and so "kernel
C" is different.

For example, the standard says that dereferencing a NULL pointer is
undefined behavior, so a C compiler can see a dereference of a pointer
to be a guarantee that said pointer isn't NULL, and remove any
subsequent NULL pointer tests.

That turns out to be one of those "obviously true in a perfect world,
but problematic in a real world with bugs", and we tell the compiler
to not do that by passing it the '-fno-delete-null-pointer-checks'
flag, because the compiler _depending_ on undefined behavior and
changing code generation in the build ends up being a really bad idea
from a security standpoint.

Now, in C, most of these kinds of things come from the C standard
being very lax, and having much too many "this is undefined behavior"
rules. So in almost all cases we end up saying "we want the
well-defined implementation, not the 'strictly speaking, the language
specs allows the compiler to do Xyz".

Rust comes from a different direction than C, and it may well be that
we very much need some of the rules to be relaxed.

And hey, Rust people do know about "sometimes the rules have to be
relaxed". When it comes to integer overflows etc, there's a
"overflow-checks" flag, typically used for debug vs release builds.

The kernel has similar issues where sometimes you might want the
strict checking (lockdep etc), and sometimes you may end up being less
strict and miss a few rules (eg "we don't maintain a preempt count for
this config, so we can't check RCU mode violations").

> But I won't give up on Rust guarantees just yet, I'll try to find
> ergonomic ways to enforce them at compile time.

I think that compile-time static checking is wonderful, and as much as
possible should be done 100% statically so that people cannot write
incorrect programs.

But we all know that static checking is limited, and then the amount
of dynamic checking for violations is often something that will have
to depend on environment flags, because it may come with an exorbitant
price in the checking.

Exactly like integer overflow checking.

Linus

2022-09-20 00:19:45

by Alex Gaynor

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

I think there is some amount of talking past each other here.

Rust's rules are that a function that's safe must not exhibit UB, no
matter what arguments they're called with. This can be done with
static checking or dynamic checking, with obvious trade offs between
the two.

We've had pretty good success, thus far, modeling various kernel
subsystems with APIs that follow this rule. But when there's not a
good way, consistent with the kernel's idioms, to expose a kernel API
in Rust that's safe, that doesn't mean it's impossible! In those cases
we expose an `unsafe fn` in Rust. This means that the caller (e.g.,
driver code) needs to ensure it meets the required pre-conditions for
calling that function.

This is how we square the circle of: How do we prioritize the kernel's
goals, while also staying consistent with Rust's notion of safety?

Wedson's point is that, when possible, finding ways to expose safe
functions is better, since it puts less of a burden on driver authors.
But, sadly, we all know we won't be able to find them in all
circumstances -- we just want to have it as a goal to find them
whenever possible.

Regards,
Alex

On Mon, Sep 19, 2022 at 7:40 PM Linus Torvalds
<[email protected]> wrote:
>
> On Mon, Sep 19, 2022 at 3:35 PM Wedson Almeida Filho <[email protected]> wrote:
> >
> > No one is talking about absolute safety guarantees. I am talking about
> > specific ones that Rust makes: these are well-documented and formally
> > defined.
>
> If you cannot get over the fact that the kernel may have other
> requirements that trump any language standards, we really can't work
> together.
>
> Those Rust rules may make sense in other environments. But the kernel
> really does have hard requirements that you continue to limp along
> even if some fundamental rule has been violated. Exactly because
> there's often no separate environment outside the kernel that can deal
> with it.
>
> End result: a compiler - or language infrastructure - that says "my
> rules are so ingrained that I cannot do that" is not one that is valid
> for kernel work.
>
> This is not really any different from the whole notion of "allocation
> failures cannot panic" that Rust people seemed to readily understand
> is a major kernel requirement, and that the kernel needed a graceful
> failure return instead of a hard panic.
>
> Also note that the kernel is perfectly willing to say "I will use
> compiler flags that disable certain guarantees". We do it all the
> time.
>
> For example, the C standard has a lot of "the compiler is allowed to
> make this assumption". And then we disagree with those, and so "kernel
> C" is different.
>
> For example, the standard says that dereferencing a NULL pointer is
> undefined behavior, so a C compiler can see a dereference of a pointer
> to be a guarantee that said pointer isn't NULL, and remove any
> subsequent NULL pointer tests.
>
> That turns out to be one of those "obviously true in a perfect world,
> but problematic in a real world with bugs", and we tell the compiler
> to not do that by passing it the '-fno-delete-null-pointer-checks'
> flag, because the compiler _depending_ on undefined behavior and
> changing code generation in the build ends up being a really bad idea
> from a security standpoint.
>
> Now, in C, most of these kinds of things come from the C standard
> being very lax, and having much too many "this is undefined behavior"
> rules. So in almost all cases we end up saying "we want the
> well-defined implementation, not the 'strictly speaking, the language
> specs allows the compiler to do Xyz".
>
> Rust comes from a different direction than C, and it may well be that
> we very much need some of the rules to be relaxed.
>
> And hey, Rust people do know about "sometimes the rules have to be
> relaxed". When it comes to integer overflows etc, there's a
> "overflow-checks" flag, typically used for debug vs release builds.
>
> The kernel has similar issues where sometimes you might want the
> strict checking (lockdep etc), and sometimes you may end up being less
> strict and miss a few rules (eg "we don't maintain a preempt count for
> this config, so we can't check RCU mode violations").
>
> > But I won't give up on Rust guarantees just yet, I'll try to find
> > ergonomic ways to enforce them at compile time.
>
> I think that compile-time static checking is wonderful, and as much as
> possible should be done 100% statically so that people cannot write
> incorrect programs.
>
> But we all know that static checking is limited, and then the amount
> of dynamic checking for violations is often something that will have
> to depend on environment flags, because it may come with an exorbitant
> price in the checking.
>
> Exactly like integer overflow checking.
>
> Linus



--
All that is necessary for evil to succeed is for good people to do nothing.

2022-09-20 00:47:18

by Wedson Almeida Filho

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Mon, Sep 19, 2022 at 04:39:56PM -0700, Linus Torvalds wrote:
> On Mon, Sep 19, 2022 at 3:35 PM Wedson Almeida Filho <[email protected]> wrote:
> >
> > No one is talking about absolute safety guarantees. I am talking about
> > specific ones that Rust makes: these are well-documented and formally
> > defined.
>
> If you cannot get over the fact that the kernel may have other
> requirements that trump any language standards, we really can't work
> together.
>
> Those Rust rules may make sense in other environments. But the kernel
> really does have hard requirements that you continue to limp along
> even if some fundamental rule has been violated. Exactly because
> there's often no separate environment outside the kernel that can deal
> with it.
>
> End result: a compiler - or language infrastructure - that says "my
> rules are so ingrained that I cannot do that" is not one that is valid
> for kernel work.
>
> This is not really any different from the whole notion of "allocation
> failures cannot panic" that Rust people seemed to readily understand
> is a major kernel requirement, and that the kernel needed a graceful
> failure return instead of a hard panic.

I am not a programming language/compiler person. My background is in
kernel programming (Linux is not the only kernel in existence) so I have
no difficulty whatsoever with kernel requirements; the graceful handling
of allocation failures that you brought up is in fact something I added
soon after I started started working on rust-for-linux and realised (in
admittedly a bit of a shock) that userland Rust didn't have this at the
time.

>
> Also note that the kernel is perfectly willing to say "I will use
> compiler flags that disable certain guarantees". We do it all the
> time.
>
> For example, the C standard has a lot of "the compiler is allowed to
> make this assumption". And then we disagree with those, and so "kernel
> C" is different.
>
> For example, the standard says that dereferencing a NULL pointer is
> undefined behavior, so a C compiler can see a dereference of a pointer
> to be a guarantee that said pointer isn't NULL, and remove any
> subsequent NULL pointer tests.
>
> That turns out to be one of those "obviously true in a perfect world,
> but problematic in a real world with bugs", and we tell the compiler
> to not do that by passing it the '-fno-delete-null-pointer-checks'
> flag, because the compiler _depending_ on undefined behavior and
> changing code generation in the build ends up being a really bad idea
> from a security standpoint.
>
> Now, in C, most of these kinds of things come from the C standard
> being very lax, and having much too many "this is undefined behavior"
> rules. So in almost all cases we end up saying "we want the
> well-defined implementation, not the 'strictly speaking, the language
> specs allows the compiler to do Xyz".
>
> Rust comes from a different direction than C, and it may well be that
> we very much need some of the rules to be relaxed.

Sure. In fact, we are likely to be able to influence the Rust language
more and more quickly than C. So if things don't make sense, we may be
able to change them.

There are also opportunities here, for example, a compatible memory
model and guaranteed honouring of dependency chains for more efficient
synchronisation primitives. I'm not claiming this is surely going to
happen or that it's easy, just that there's an opportunity to do better
than C here.

> And hey, Rust people do know about "sometimes the rules have to be
> relaxed". When it comes to integer overflows etc, there's a
> "overflow-checks" flag, typically used for debug vs release builds.
>
> The kernel has similar issues where sometimes you might want the
> strict checking (lockdep etc), and sometimes you may end up being less
> strict and miss a few rules (eg "we don't maintain a preempt count for
> this config, so we can't check RCU mode violations").
>
> > But I won't give up on Rust guarantees just yet, I'll try to find
> > ergonomic ways to enforce them at compile time.
>
> I think that compile-time static checking is wonderful, and as much as
> possible should be done 100% statically so that people cannot write
> incorrect programs.
>
> But we all know that static checking is limited, and then the amount
> of dynamic checking for violations is often something that will have
> to depend on environment flags, because it may come with an exorbitant
> price in the checking.

If we can't find an ergonomic way to enforce safety guarantees, we have
fallbacks like unsafe functions or runtime enforcement. And as you
point out, if the runtime cost is too high in certain scenarios, we
do allow users to give up safety with a config as in the integer
overflow case you mentioned. We do try to minimise those.

Cheers,
-Wedson

2022-09-21 11:28:47

by Konstantin Shelekhin

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Sat, Aug 06, 2022 at 01:22:52PM +0200, Miguel Ojeda wrote:
> On Sat, Aug 6, 2022 at 12:25 PM Konstantin Shelekhin
> <[email protected]> wrote:
> >
> > I sense possible problems here. It's common for a kernel code to pass
> > flags during memory allocations.
>
> Yes, of course. We will support this, but how exactly it will look
> like, to what extent upstream Rust's `alloc` could support our use
> cases, etc. has been on discussion for a long time.
>
> For instance, see https://github.com/Rust-for-Linux/linux/pull/815 for
> a potential extension trait approach with no allocator carried on the
> type that Andreas wrote after a discussion in the last informal call:
>
> let a = Box::try_new_atomic(101)?;

In my opinion, the rest of the thread clearly shows that the
conservative approach is currently the only solid option. I suggest the
following explicit API:

let a = Box::try_new(size, flags)?;
Vec::try_push(item, flags)?;

etc. Whadda you think?

2022-09-21 12:02:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate

On Wed, Sep 21, 2022 at 02:23:42PM +0300, Konstantin Shelekhin wrote:
> On Sat, Aug 06, 2022 at 01:22:52PM +0200, Miguel Ojeda wrote:
> > On Sat, Aug 6, 2022 at 12:25 PM Konstantin Shelekhin
> > <[email protected]> wrote:
> > >
> > > I sense possible problems here. It's common for a kernel code to pass
> > > flags during memory allocations.
> >
> > Yes, of course. We will support this, but how exactly it will look
> > like, to what extent upstream Rust's `alloc` could support our use
> > cases, etc. has been on discussion for a long time.
> >
> > For instance, see https://github.com/Rust-for-Linux/linux/pull/815 for
> > a potential extension trait approach with no allocator carried on the
> > type that Andreas wrote after a discussion in the last informal call:
> >
> > let a = Box::try_new_atomic(101)?;
>
> In my opinion, the rest of the thread clearly shows that the
> conservative approach is currently the only solid option. I suggest the
> following explicit API:
>
> let a = Box::try_new(size, flags)?;
> Vec::try_push(item, flags)?;
>
> etc. Whadda you think?

Please, yes. This fits the current kernel memory allocation pattern and
allows for proper propagation of the allocation flags as needed through
the system. This is going to be required in any non-trivial kernel code
anyway, might as well do it correct from the beginning.

It also allows for flags to change over time, which also happens.

thanks,

greg k-h