From: Tom Zanussi <[email protected]>
Dear RT Folks,
This is the RT stable review cycle of patch 5.4.161-rt67-rc1.
Please scream at me if I messed something up. Please test the patches
too.
The -rc release will be uploaded to kernel.org and will be deleted
when the final release is out. This is just a review release (or
release candidate).
The pre-releases will not be pushed to the git repository, only the
final release is.
If all goes well, this patch will be converted to the next main
release on 2021-12-04.
To build 5.4.161-rt67-rc1 directly, the following patches should be applied:
https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.4.tar.xz
https://www.kernel.org/pub/linux/kernel/v5.x/patch-5.4.161.xz
https://www.kernel.org/pub/linux/kernel/projects/rt/5.4/patch-5.4.161-rt67-rc1.patch.xz
You can also build from 5.4.161-rt66 by applying the incremental patch:
https://www.kernel.org/pub/linux/kernel/projects/rt/5.4/incr/patch-5.4.161-rt66-rt67-rc1.patch.xz
Enjoy,
-- Tom
Mike Galbraith (1):
mm, zsmalloc: Convert zsmalloc_handle.lock to spinlock_t
Sebastian Andrzej Siewior (6):
sched: Switch wait_task_inactive to HRTIMER_MODE_REL_HARD
preempt: Move preempt_enable_no_resched() to the RT block
mm: Disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on
PREEMPT_RT
fscache: Use only one fscache_object_cong_wait.
fscache: Use only one fscache_object_cong_wait.
locking: Drop might_resched() from might_sleep_no_state_check()
Tom Zanussi (1):
Linux 5.4.161-rt67-rc1
fs/fscache/internal.h | 1 -
fs/fscache/main.c | 6 ------
fs/fscache/object.c | 13 +++++--------
include/linux/kernel.h | 2 +-
include/linux/preempt.h | 6 +++---
init/Kconfig | 2 +-
kernel/sched/core.c | 2 +-
localversion-rt | 2 +-
mm/zsmalloc.c | 12 ++++++------
9 files changed, 18 insertions(+), 28 deletions(-)
--
2.17.1
From: Sebastian Andrzej Siewior <[email protected]>
v5.4.161-rt67-rc1 stable review patch.
If anyone has any objections, please let me know.
-----------
[ Upstream commit 39609ed79d420e0b966e16a1d695733c2d3b9a7f ]
With PREEMPT_RT enabled all hrtimers callbacks will be invoked in
softirq mode unless they are explicitly marked as HRTIMER_MODE_HARD.
During boot kthread_bind() is used for the creation of per-CPU threads
and then hangs in wait_task_inactive() if the ksoftirqd is not
yet up and running.
The hang disappeared since commit
26c7295be0c5e ("kthread: Do not preempt current task if it is going to call schedule()")
but enabling function trace on boot reliably leads to the freeze on boot
behaviour again.
The timer in wait_task_inactive() can not be directly used by an user
interface to abuse it and create a mass wake of several tasks at the
same time which would to long sections with disabled interrupts.
Therefore it is safe to make the timer HRTIMER_MODE_REL_HARD.
Switch the timer to HRTIMER_MODE_REL_HARD.
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9b32fbded588..022c7b78642d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2158,7 +2158,7 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state)
ktime_t to = NSEC_PER_SEC / HZ;
set_current_state(TASK_UNINTERRUPTIBLE);
- schedule_hrtimeout(&to, HRTIMER_MODE_REL);
+ schedule_hrtimeout(&to, HRTIMER_MODE_REL_HARD);
continue;
}
--
2.17.1
From: Mike Galbraith <[email protected]>
v5.4.161-rt67-rc1 stable review patch.
If anyone has any objections, please let me know.
-----------
[ Upstream 5.10 commit f2d9006d27c9b12563b8e577951ff5021f3b36b2 ]
local_lock_t becoming a synonym of spinlock_t had consequences for the RT
mods to zsmalloc, which were taking a mutex while holding a local_lock,
inspiring a lockdep "BUG: Invalid wait context" gripe.
Converting zsmalloc_handle.lock to a spinlock_t restored lockdep silence.
Cc: [email protected]
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Tom Zanussi <[email protected]>
---
mm/zsmalloc.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e64eca4b0601..9fc494fe70ea 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -81,7 +81,7 @@
struct zsmalloc_handle {
unsigned long addr;
- struct mutex lock;
+ spinlock_t lock;
};
#define ZS_HANDLE_ALLOC_SIZE (sizeof(struct zsmalloc_handle))
@@ -368,7 +368,7 @@ static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp)
if (p) {
struct zsmalloc_handle *zh = p;
- mutex_init(&zh->lock);
+ spin_lock_init(&zh->lock);
}
#endif
return (unsigned long)p;
@@ -926,7 +926,7 @@ static inline int testpin_tag(unsigned long handle)
#ifdef CONFIG_PREEMPT_RT
struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
- return mutex_is_locked(&zh->lock);
+ return spin_is_locked(&zh->lock);
#else
return bit_spin_is_locked(HANDLE_PIN_BIT, (unsigned long *)handle);
#endif
@@ -937,7 +937,7 @@ static inline int trypin_tag(unsigned long handle)
#ifdef CONFIG_PREEMPT_RT
struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
- return mutex_trylock(&zh->lock);
+ return spin_trylock(&zh->lock);
#else
return bit_spin_trylock(HANDLE_PIN_BIT, (unsigned long *)handle);
#endif
@@ -948,7 +948,7 @@ static void pin_tag(unsigned long handle)
#ifdef CONFIG_PREEMPT_RT
struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
- return mutex_lock(&zh->lock);
+ return spin_lock(&zh->lock);
#else
bit_spin_lock(HANDLE_PIN_BIT, (unsigned long *)handle);
#endif
@@ -959,7 +959,7 @@ static void unpin_tag(unsigned long handle)
#ifdef CONFIG_PREEMPT_RT
struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
- return mutex_unlock(&zh->lock);
+ return spin_unlock(&zh->lock);
#else
bit_spin_unlock(HANDLE_PIN_BIT, (unsigned long *)handle);
#endif
--
2.17.1
From: Sebastian Andrzej Siewior <[email protected]>
v5.4.161-rt67-rc1 stable review patch.
If anyone has any objections, please let me know.
-----------
[ Upstream commit 1a45b3551ef852193c3d338888132c4925d0690d ]
preempt_enable_no_resched() should point to preempt_enable() on
PREEMPT_RT so nobody is playing any preempt tricks and enables
preemption without checking for the need-resched flag.
This was misplaced in v3.14.0-rt1 und remained unnoticed until now.
Point preempt_enable_no_resched() and preempt_enable() on RT.
Cc: [email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/preempt.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index adb085fe31e4..bbc3592b6f04 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -211,12 +211,12 @@ do { \
preempt_count_dec(); \
} while (0)
-#ifdef CONFIG_PREEMPT_RT
+#ifndef CONFIG_PREEMPT_RT
# define preempt_enable_no_resched() sched_preempt_enable_no_resched()
-# define preempt_check_resched_rt() preempt_check_resched()
+# define preempt_check_resched_rt() barrier();
#else
# define preempt_enable_no_resched() preempt_enable()
-# define preempt_check_resched_rt() barrier();
+# define preempt_check_resched_rt() preempt_check_resched()
#endif
#define preemptible() (preempt_count() == 0 && !irqs_disabled())
--
2.17.1
From: Sebastian Andrzej Siewior <[email protected]>
v5.4.161-rt67-rc1 stable review patch.
If anyone has any objections, please let me know.
-----------
[ Upstream commit aae93144898af113331668f53f80cb83f5a07360 ]
TRANSPARENT_HUGEPAGE:
There are potential non-deterministic delays to an RT thread if a critical
memory region is not THP-aligned and a non-RT buffer is located in the same
hugepage-aligned region. It's also possible for an unrelated thread to migrate
pages belonging to an RT task incurring unexpected page faults due to memory
defragmentation even if khugepaged is disabled.
Regular HUGEPAGEs are not affected by this can be used.
NUMA_BALANCING:
There is a non-deterministic delay to mark PTEs PROT_NONE to gather NUMA fault
samples, increased page faults of regions even if mlocked and non-deterministic
delays when migrating pages.
[Mel Gorman worded 99% of the commit description].
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected]
Cc: Mel Gorman <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Tom Zanussi <[email protected]>
---
init/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/init/Kconfig b/init/Kconfig
index 266802704c06..c733392fe237 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -769,7 +769,7 @@ config NUMA_BALANCING
bool "Memory placement aware NUMA scheduler"
depends on ARCH_SUPPORTS_NUMA_BALANCING
depends on !ARCH_WANT_NUMA_VARIABLE_LOCALITY
- depends on SMP && NUMA && MIGRATION
+ depends on SMP && NUMA && MIGRATION && !PREEMPT_RT
help
This option adds support for automatic NUMA aware memory/task placement.
The mechanism is quite primitive and is based on migrating memory when
--
2.17.1
From: Sebastian Andrzej Siewior <[email protected]>
v5.4.161-rt67-rc1 stable review patch.
If anyone has any objections, please let me know.
-----------
[ Upstream commit 74920695ab51a6d180dcd6554193cc8427758360 ]
In the commit mentioned below, fscache was converted from slow-work to
workqueue. slow_work_enqueue() and slow_work_sleep_till_thread_needed()
did not use a per-CPU workqueue. They choose from two global waitqueues
depending on the SLOW_WORK_VERY_SLOW bit which was not set so it always
one waitqueue.
I can't find out how it is ensured that a waiter on certain CPU is woken
up be the other side. My guess is that the timeout in schedule_timeout()
ensures that it does not wait forever (or a random wake up).
fscache_object_sleep_till_congested() must be invoked from preemptible
context in order for schedule() to work. In this case this_cpu_ptr()
should complain with CONFIG_DEBUG_PREEMPT enabled except the thread is
bound to one CPU.
wake_up() wakes only one waiter and I'm not sure if it is guaranteed
that only one waiter exists.
Replace the per-CPU waitqueue with one global waitqueue.
Fixes: 8b8edefa2fffb ("fscache: convert object to use workqueue instead of slow-work")
Reported-by: Gregor Beck <[email protected]>
Cc: [email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Tom Zanussi <[email protected]>
---
fs/fscache/internal.h | 1 -
fs/fscache/main.c | 6 ------
fs/fscache/object.c | 11 +++++------
3 files changed, 5 insertions(+), 13 deletions(-)
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index d09d4e69c818..b557eb2263d2 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -95,7 +95,6 @@ extern unsigned fscache_debug;
extern struct kobject *fscache_root;
extern struct workqueue_struct *fscache_object_wq;
extern struct workqueue_struct *fscache_op_wq;
-DECLARE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait);
extern unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n);
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index e1f1083b61a5..00367233ef8a 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -41,8 +41,6 @@ struct kobject *fscache_root;
struct workqueue_struct *fscache_object_wq;
struct workqueue_struct *fscache_op_wq;
-DEFINE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait);
-
/* these values serve as lower bounds, will be adjusted in fscache_init() */
static unsigned fscache_object_max_active = 4;
static unsigned fscache_op_max_active = 2;
@@ -139,7 +137,6 @@ unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n)
static int __init fscache_init(void)
{
unsigned int nr_cpus = num_possible_cpus();
- unsigned int cpu;
int ret;
fscache_object_max_active =
@@ -162,9 +159,6 @@ static int __init fscache_init(void)
if (!fscache_op_wq)
goto error_op_wq;
- for_each_possible_cpu(cpu)
- init_waitqueue_head(&per_cpu(fscache_object_cong_wait, cpu));
-
ret = fscache_proc_init();
if (ret < 0)
goto error_proc;
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index cfeba839a0f2..c93860274f2f 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -807,6 +807,8 @@ void fscache_object_destroy(struct fscache_object *object)
}
EXPORT_SYMBOL(fscache_object_destroy);
+static DECLARE_WAIT_QUEUE_HEAD(fscache_object_cong_wait);
+
/*
* enqueue an object for metadata-type processing
*/
@@ -815,12 +817,10 @@ void fscache_enqueue_object(struct fscache_object *object)
_enter("{OBJ%x}", object->debug_id);
if (fscache_get_object(object, fscache_obj_get_queue) >= 0) {
- wait_queue_head_t *cong_wq =
- &get_cpu_var(fscache_object_cong_wait);
if (queue_work(fscache_object_wq, &object->work)) {
if (fscache_object_congested())
- wake_up(cong_wq);
+ wake_up(&fscache_object_cong_wait);
} else
fscache_put_object(object, fscache_obj_put_queue);
@@ -842,16 +842,15 @@ void fscache_enqueue_object(struct fscache_object *object)
*/
bool fscache_object_sleep_till_congested(signed long *timeoutp)
{
- wait_queue_head_t *cong_wq = this_cpu_ptr(&fscache_object_cong_wait);
DEFINE_WAIT(wait);
if (fscache_object_congested())
return true;
- add_wait_queue_exclusive(cong_wq, &wait);
+ add_wait_queue_exclusive(&fscache_object_cong_wait, &wait);
if (!fscache_object_congested())
*timeoutp = schedule_timeout(*timeoutp);
- finish_wait(cong_wq, &wait);
+ finish_wait(&fscache_object_cong_wait, &wait);
return fscache_object_congested();
}
--
2.17.1
From: Sebastian Andrzej Siewior <[email protected]>
v5.4.161-rt67-rc1 stable review patch.
If anyone has any objections, please let me know.
-----------
[ Upstream commit 514342eb43a760575d6d9a366506a41ab7ec4888 ]
This is an update of the original patch, removing put_cpu_var() which
was overseen in the initial patch.
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Tom Zanussi <[email protected]>
---
fs/fscache/object.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index c93860274f2f..959384c91f79 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -823,8 +823,6 @@ void fscache_enqueue_object(struct fscache_object *object)
wake_up(&fscache_object_cong_wait);
} else
fscache_put_object(object, fscache_obj_put_queue);
-
- put_cpu_var(fscache_object_cong_wait);
}
}
--
2.17.1
From: Sebastian Andrzej Siewior <[email protected]>
v5.4.161-rt67-rc1 stable review patch.
If anyone has any objections, please let me know.
-----------
[ Upstream 5.10 commit e88f48e796b2286b565ee95ca8c46f32e051cd8c ]
might_sleep_no_state_check() serves the same purpose as might_sleep()
except it is used before sleeping locks are acquired and therefore does
not check task_struct::state because the state is preserved.
That state is preserved in the locking slow path so we must not schedule
at the begin of the locking function because the state will be lost and
not preserved at that time.
Remove might_resched() from might_sleep_no_state_check() to avoid losing the
state before it is preserved.
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/kernel.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index f5ec1ddbfe07..fac917085516 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -229,7 +229,7 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
# define might_sleep_no_state_check() \
- do { ___might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
+ do { ___might_sleep(__FILE__, __LINE__, 0); } while (0)
/**
* cant_sleep - annotation for functions that cannot sleep
--
2.17.1
From: Tom Zanussi <[email protected]>
v5.4.161-rt67-rc1 stable review patch.
If anyone has any objections, please let me know.
-----------
Signed-off-by: Tom Zanussi <[email protected]>
---
localversion-rt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/localversion-rt b/localversion-rt
index d42c0971b041..6f295236237c 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt66
+-rt67-rc1
--
2.17.1