Dear RT folks!
I'm pleased to announce the v3.18.7-rt1 patch set. It was running over
the weekend on my x86 box and was still alive this morning. However it
is still the first release for the v3.18 -RT series.
I haven't follow the mailing list or commented / applied any patches
from the list for -RT while being busy getting this release done (except
one patch I needed to have anyway). This is about to change. I will try to
go through my RT-inbox before doing the next release.
Changes since v3.14.25-rt22
- rebased to v3.18
- Added "Work Simple" from Daniel Wagner. I needed it for cgroup's RCU
handling.
- "timers: do not raise softirq unconditionally" has been reverted. It
has been added to optimize the FULL NOHZ case on CPUs which could avoid
a softirq / timer wake up. Its been reverted because it breaks the
switch to highres.
- added a few patches to get MQ-BLK and cgroups working.
Known issues:
- bcache is disabled.
- lazy preempt on x86_64 leads to a crash with some load.
- CPU hotplug works in general. Steven's test script however
deadlocks usually on the second invocation.
- xor / raid_pq
I had max latency jumping up to 67563us on one CPU while the next
lower max was 58us. I tracked it down to module's init code of
xor and raid_pq. Both disable preemption while measuring the
measuring the performance of the individual implementation.
The RT patch against 3.18.7 can be found here:
https://www.kernel.org/pub/linux/kernel/projects/rt/3.18/patch-3.18.7-rt1.patch.xz
The split quilt queue is available at:
https://www.kernel.org/pub/linux/kernel/projects/rt/3.18/patches-3.18.7-rt1.tar.xz
Sebastian
* Sebastian Andrzej Siewior | 2015-02-16 12:18:22 [+0100]:
>Known issues:
>
> - xor / raid_pq
> I had max latency jumping up to 67563us on one CPU while the next
> lower max was 58us. I tracked it down to module's init code of
> xor and raid_pq. Both disable preemption while measuring the
> measuring the performance of the individual implementation.
The patch at the bottom gets rid of it.
How important is this preempt_disable() and how likely is that we could
use precomputed priority lists of function instead this of this runtime
check? XOR already prefers AVX based-xor if available and numbers/test
at runtime could be removed.
Is there a case where SSE worse on CPU X better than MMX and this is why
we do it?
diff --git a/crypto/xor.c b/crypto/xor.c
index 35d6b3a..19e20f5 100644
--- a/crypto/xor.c
+++ b/crypto/xor.c
@@ -70,7 +70,7 @@ do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
tmpl->next = template_list;
template_list = tmpl;
- preempt_disable();
+ preempt_disable_nort();
/*
* Count the number of XORs done during a whole jiffy, and use
@@ -94,7 +94,7 @@ do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
max = count;
}
- preempt_enable();
+ preempt_enable_nort();
speed = max * (HZ * BENCH_SIZE / 1024);
tmpl->speed = speed;
diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 7d0e5cd..e9920d4 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -142,7 +142,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
perf = 0;
- preempt_disable();
+ preempt_disable_nort();
j0 = jiffies;
while ((j1 = jiffies) == j0)
cpu_relax();
@@ -151,7 +151,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
(*algo)->gen_syndrome(disks, PAGE_SIZE, *dptrs);
perf++;
}
- preempt_enable();
+ preempt_enable_nort();
if (perf > bestperf) {
bestperf = perf;
Sebastian
On Mon, Feb 16, 2015 at 12:18:22PM +0100, Sebastian Andrzej Siewior wrote:
> - CPU hotplug works in general. Steven's test script however
> deadlocks usually on the second invocation.
Where can I find Steve's hotplug test script?
Thanks,
Richard
On Mon, 16 Feb 2015 21:12:40 +0100
Richard Cochran <[email protected]> wrote:
> On Mon, Feb 16, 2015 at 12:18:22PM +0100, Sebastian Andrzej Siewior wrote:
> > - CPU hotplug works in general. Steven's test script however
> > deadlocks usually on the second invocation.
>
> Where can I find Steve's hotplug test script?
Here, I posted it before. Hmm, I wonder if we should post it on some
web site. I could add it to my own.
/me does that...
OK, it's now here too:
http://rostedt.homelinux.com/code/stress-cpu-hotplug
-- Steve
Hi Sebastian,
On 02/16/2015 12:18 PM, Sebastian Andrzej Siewior wrote:
> Dear RT folks!
>
> I'm pleased to announce the v3.18.7-rt1 patch set. It was running over
> the weekend on my x86 box and was still alive this morning. However it
> is still the first release for the v3.18 -RT series.
> I haven't follow the mailing list or commented / applied any patches
> from the list for -RT while being busy getting this release done (except
> one patch I needed to have anyway). This is about to change. I will try to
> go through my RT-inbox before doing the next release.
I needed the patch below to get it running stable under load on my shiny box.
cheers,
daniel
>From c517743659575932d7b7c94a08276d0cee8a2fdd Mon Sep 17 00:00:00 2001
From: Daniel Wagner <[email protected]>
Date: Fri, 11 Jul 2014 15:26:13 +0200
Subject: [PATCH] thermal: Defer thermal wakups to threads
On RT the spin lock in pkg_temp_thermal_platfrom_thermal_notify will
call schedule while we run in irq context.
[<ffffffff816850ac>] dump_stack+0x4e/0x8f
[<ffffffff81680f7d>] __schedule_bug+0xa6/0xb4
[<ffffffff816896b4>] __schedule+0x5b4/0x700
[<ffffffff8168982a>] schedule+0x2a/0x90
[<ffffffff8168a8b5>] rt_spin_lock_slowlock+0xe5/0x2d0
[<ffffffff8168afd5>] rt_spin_lock+0x25/0x30
[<ffffffffa03a7b75>] pkg_temp_thermal_platform_thermal_notify+0x45/0x134 [x86_pkg_temp_thermal]
[<ffffffff8103d4db>] ? therm_throt_process+0x1b/0x160
[<ffffffff8103d831>] intel_thermal_interrupt+0x211/0x250
[<ffffffff8103d8c1>] smp_thermal_interrupt+0x21/0x40
[<ffffffff8169415d>] thermal_interrupt+0x6d/0x80
Let's defer the work to a kthread.
Signed-off-by: Daniel Wagner <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
---
drivers/thermal/x86_pkg_temp_thermal.c | 49 ++++++++++++++++++++++++++++++++--
1 file changed, 47 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/x86_pkg_temp_thermal.c b/drivers/thermal/x86_pkg_temp_thermal.c
index 9ea3d9d..001ba02 100644
--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -29,6 +29,7 @@
#include <linux/pm.h>
#include <linux/thermal.h>
#include <linux/debugfs.h>
+#include <linux/work-simple.h>
#include <asm/cpu_device_id.h>
#include <asm/mce.h>
@@ -352,7 +353,7 @@ static void pkg_temp_thermal_threshold_work_fn(struct work_struct *work)
}
}
-static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+static void platform_thermal_notify_work(struct swork_event *event)
{
unsigned long flags;
int cpu = smp_processor_id();
@@ -369,7 +370,7 @@ static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
pkg_work_scheduled[phy_id]) {
disable_pkg_thres_interrupt();
spin_unlock_irqrestore(&pkg_work_lock, flags);
- return -EINVAL;
+ return;
}
pkg_work_scheduled[phy_id] = 1;
spin_unlock_irqrestore(&pkg_work_lock, flags);
@@ -378,9 +379,48 @@ static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
schedule_delayed_work_on(cpu,
&per_cpu(pkg_temp_thermal_threshold_work, cpu),
msecs_to_jiffies(notify_delay_ms));
+}
+
+#ifdef CONFIG_PREEMPT_RT_FULL
+static struct swork_event notify_work;
+
+static int thermal_notify_work_init(void)
+{
+ int err;
+
+ err = swork_get();
+ if (!err)
+ return err;
+
+ INIT_SWORK(¬ify_work, platform_thermal_notify_work);
+ return 0;
+}
+
+static void thermal_notify_work_cleanup(void)
+{
+ swork_put();
+}
+
+static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+{
+ swork_queue(¬ify_work);
return 0;
}
+#else /* !CONFIG_PREEMPT_RT_FULL */
+
+static int thermal_notify_work_init(void) { return 0; }
+
+static int thermal_notify_work_cleanup(void) { }
+
+static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+{
+ platform_thermal_notify_work(NULL);
+
+ return 0;
+}
+#endif /* CONFIG_PREEMPT_RT_FULL */
+
static int find_siblings_cpu(int cpu)
{
int i;
@@ -594,6 +634,10 @@ static int __init pkg_temp_thermal_init(void)
for_each_online_cpu(i)
if (get_core_online(i))
goto err_ret;
+
+ if (!thermal_notify_work_init())
+ goto err_ret;
+
__register_hotcpu_notifier(&pkg_temp_thermal_notifier);
cpu_notifier_register_done();
@@ -619,6 +663,7 @@ static void __exit pkg_temp_thermal_exit(void)
cpu_notifier_register_begin();
__unregister_hotcpu_notifier(&pkg_temp_thermal_notifier);
+ thermal_notify_work_cleanup();
mutex_lock(&phy_dev_list_mutex);
list_for_each_entry_safe(phdev, n, &phy_dev_list, list) {
/* Retore old MSR value for package thermal interrupt */
--
2.1.0
On Mon, 2015-02-16 at 12:18 +0100, Sebastian Andrzej Siewior wrote:
> Known issues:
>
> - lazy preempt on x86_64 leads to a crash with some load.
The below still works for me. (it doesn't make nohz_full actually work
in rt, but at least folks who want to tinker with it can do so)
If context tracking is enabled, we can recurse, and explode violently.
Add missing checks to preempt_schedule_context().
Fix other inconsistencies spotted while searching for the little SOB.
Signed-off-by: Mike Galbraith <[email protected]>
---
arch/x86/Kconfig | 2 +-
arch/x86/include/asm/thread_info.h | 1 +
include/linux/preempt.h | 2 +-
include/linux/preempt_mask.h | 10 ++++++++--
kernel/fork.c | 1 +
kernel/sched/core.c | 18 ++++++------------
kernel/sched/fair.c | 2 +-
7 files changed, 19 insertions(+), 17 deletions(-)
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -21,7 +21,7 @@ config X86_64
### Arch settings
config X86
def_bool y
- select HAVE_PREEMPT_LAZY if X86_32
+ select HAVE_PREEMPT_LAZY
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
select ARCH_HAS_FAST_MULTIPLIER
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -46,6 +46,7 @@ struct thread_info {
.flags = 0, \
.cpu = 0, \
.saved_preempt_count = INIT_PREEMPT_COUNT, \
+ .preempt_lazy_count = 0, \
.addr_limit = KERNEL_DS, \
.restart_block = { \
.fn = do_no_restart_syscall, \
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -91,8 +91,8 @@ do { \
#define preempt_lazy_enable() \
do { \
- dec_preempt_lazy_count(); \
barrier(); \
+ dec_preempt_lazy_count(); \
preempt_check_resched(); \
} while (0)
--- a/include/linux/preempt_mask.h
+++ b/include/linux/preempt_mask.h
@@ -118,9 +118,15 @@ extern int in_serving_softirq(void);
((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_CHECK_OFFSET)
#ifdef CONFIG_PREEMPT_COUNT
-# define preemptible() (preempt_count() == 0 && !irqs_disabled())
+# define preemptible() (preempt_count() == 0 && !irqs_disabled())
+#ifdef CONFIG_PREEMPT_LAZY
+# define preemptible_lazy() (preempt_lazy_count() == 0 || need_resched_now())
#else
-# define preemptible() 0
+# define preemptible_lazy() 1
+#endif
+#else
+# define preemptible() 0
+# define preemptible_lazy() 0
#endif
#endif /* LINUX_PREEMPT_MASK_H */
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -348,6 +348,7 @@ static struct task_struct *dup_task_stru
setup_thread_stack(tsk, orig);
clear_user_return_notifier(tsk);
clear_tsk_need_resched(tsk);
+ clear_tsk_need_resched_lazy(tsk);
set_task_stack_end_magic(tsk);
#ifdef CONFIG_CC_STACKPROTECTOR
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2861,8 +2861,8 @@ void migrate_enable(void)
p->migrate_disable = 0;
unpin_current_cpu();
- preempt_enable();
preempt_lazy_enable();
+ preempt_enable();
}
EXPORT_SYMBOL(migrate_enable);
#else
@@ -3099,19 +3099,13 @@ asmlinkage __visible void __sched notrac
{
/*
* If there is a non-zero preempt_count or interrupts are disabled,
- * we do not want to preempt the current task. Just return..
+ * we do not want to preempt the current task. Just return. For
+ * lazy preemption we also check for non-zero preempt_count_lazy,
+ * and bail if no immediate preemption is required.
*/
- if (likely(!preemptible()))
+ if (likely(!preemptible() || !preemptible_lazy()))
return;
-#ifdef CONFIG_PREEMPT_LAZY
- /*
- * Check for lazy preemption
- */
- if (current_thread_info()->preempt_lazy_count &&
- !test_thread_flag(TIF_NEED_RESCHED))
- return;
-#endif
do {
__preempt_count_add(PREEMPT_ACTIVE);
/*
@@ -3155,7 +3149,7 @@ asmlinkage __visible void __sched notrac
{
enum ctx_state prev_ctx;
- if (likely(!preemptible()))
+ if (likely(!preemptible() || !preemptible_lazy()))
return;
do {
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4761,7 +4761,7 @@ static void check_preempt_wakeup(struct
* prevents us from potentially nominating it as a false LAST_BUDDY
* below.
*/
- if (test_tsk_need_resched(curr))
+ if (test_tsk_need_resched(curr) || test_tsk_need_resched_lazy(curr))
return;
/* Idle tasks are by definition preempted by non-idle tasks. */
The shuts gripe up.
[ 57.170576] ------------[ cut here ]------------
[ 57.170583] WARNING: CPU: 3 PID: 71 at kernel/time/tick-sched.c:167 can_stop_full_tick+0x1ae/0x260()
[ 57.170617] CPU: 3 PID: 71 Comm: sirq-timer/3 Not tainted 3.18.7-rt1 #4
[ 57.170617] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
[ 57.170618] 0000000000000009 ffff88040b21fc48 ffffffff815bb722 0000000000000002
[ 57.170619] 0000000000000000 ffff88040b21fc88 ffffffff810512f1 ffff88040b21fcc8
[ 57.170620] ffff88041eccd360 ffff88041ecc0000 000000000000000b 0000000000000000
[ 57.170620] Call Trace:
[ 57.170624] [<ffffffff815bb722>] dump_stack+0x4f/0x9e
[ 57.170626] [<ffffffff810512f1>] warn_slowpath_common+0x81/0xc0
[ 57.170628] [<ffffffff810513ea>] warn_slowpath_null+0x1a/0x20
[ 57.170628] [<ffffffff810c1f2e>] can_stop_full_tick+0x1ae/0x260
[ 57.170629] [<ffffffff810c2071>] __tick_nohz_full_check+0x71/0xb0
[ 57.170630] [<ffffffff810c20be>] nohz_full_kick_work_func+0xe/0x10
[ 57.170631] [<ffffffff81115d3f>] irq_work_run_list+0x3f/0x60
[ 57.170632] [<ffffffff8111617e>] irq_work_tick+0x3e/0x90
[ 57.170634] [<ffffffff810b04c5>] run_timer_softirq+0x35/0x2f0
[ 57.170635] [<ffffffff8107f0a2>] ? __vtime_account_system+0x32/0x40
[ 57.170637] [<ffffffff81055794>] do_current_softirqs.isra.11+0x1c4/0x3c0
[ 57.170638] [<ffffffff81055a55>] run_ksoftirqd+0x25/0x40
[ 57.170639] [<ffffffff810723dd>] smpboot_thread_fn+0x1dd/0x340
[ 57.170640] [<ffffffff81072200>] ? smpboot_register_percpu_thread+0x100/0x100
[ 57.170641] [<ffffffff8106e4ab>] kthread+0xbb/0xe0
[ 57.170642] [<ffffffff8106e3f0>] ? kthread_worker_fn+0x190/0x190
[ 57.170644] [<ffffffff815c1cac>] ret_from_fork+0x7c/0xb0
[ 57.170644] [<ffffffff8106e3f0>] ? kthread_worker_fn+0x190/0x190
[ 57.170645] ---[ end trace 0000000000000002 ]---
Signed-off-by: Mike Galbraith <[email protected]>
---
kernel/time/tick-sched.c | 2 ++
1 file changed, 2 insertions(+)
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -222,6 +222,8 @@ void __tick_nohz_full_check(void)
static void nohz_full_kick_work_func(struct irq_work *work)
{
+ if (in_serving_softirq())
+ return;
__tick_nohz_full_check();
}
Locking functions previously using read_lock_irq()/read_lock_irqsave() were
changed to local_irq_disable/save(), leading to gripes. Use nort variants.
[ 2423.966857] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
[ 2423.966858] in_atomic(): 0, irqs_disabled(): 1, pid: 5947, name: alsa-sink-ALC88
[ 2423.966860] CPU: 5 PID: 5947 Comm: alsa-sink-ALC88 Not tainted 3.18.7-rt1 #9
[ 2423.966860] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
[ 2423.966862] ffff880409316240 ffff88040866fa38 ffffffff815bdeb5 0000000000000002
[ 2423.966863] 0000000000000000 ffff88040866fa58 ffffffff81073c86 ffffffffa03b2640
[ 2423.966864] ffff88040239ec00 ffff88040866fa78 ffffffff815c3d34 ffffffffa03b2640
[ 2423.966864] Call Trace:
[ 2423.966868] [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
[ 2423.966870] [<ffffffff81073c86>] __might_sleep+0xe6/0x150
[ 2423.966880] [<ffffffff815c3d34>] __rt_spin_lock+0x24/0x50
[ 2423.966883] [<ffffffff815c4044>] rt_read_lock+0x34/0x40
[ 2423.966887] [<ffffffffa03a2979>] snd_pcm_stream_lock+0x29/0x70 [snd_pcm]
[ 2423.966890] [<ffffffffa03a355d>] snd_pcm_playback_poll+0x5d/0x120 [snd_pcm]
[ 2423.966892] [<ffffffff811937a2>] do_sys_poll+0x322/0x5b0
[ 2423.966895] [<ffffffff812bcfe7>] ? debug_smp_processor_id+0x17/0x20
[ 2423.966897] [<ffffffffa03fa023>] ? azx_cc_read+0x23/0x30 [snd_hda_controller]
[ 2423.966899] [<ffffffff810b9029>] ? timecounter_read+0x19/0x50
[ 2423.966901] [<ffffffffa03fb187>] ? azx_get_wallclock_tstamp+0x97/0xc0 [snd_hda_controller]
[ 2423.966904] [<ffffffffa03aaed1>] ? snd_pcm_update_hw_ptr0+0x1b1/0x470 [snd_pcm]
[ 2423.966906] [<ffffffff812bcfe7>] ? debug_smp_processor_id+0x17/0x20
[ 2423.966907] [<ffffffff81051d2a>] ? unpin_current_cpu+0x1a/0x70
[ 2423.966910] [<ffffffff81079ec0>] ? migrate_enable+0xe0/0x1e0
[ 2423.966912] [<ffffffff811923c0>] ? poll_select_copy_remaining+0x130/0x130
[ 2423.966914] [<ffffffff811923c0>] ? poll_select_copy_remaining+0x130/0x130
[ 2423.966915] [<ffffffff811923c0>] ? poll_select_copy_remaining+0x130/0x130
[ 2423.966916] [<ffffffff811923c0>] ? poll_select_copy_remaining+0x130/0x130
[ 2423.966919] [<ffffffffa03a5ba2>] ? snd_pcm_common_ioctl1+0x1c2/0xda0 [snd_pcm]
[ 2423.966920] [<ffffffff812bcfe7>] ? debug_smp_processor_id+0x17/0x20
[ 2423.966923] [<ffffffffa03a690b>] ? snd_pcm_playback_ioctl1+0x18b/0x2d0 [snd_pcm]
[ 2423.966924] [<ffffffff811c6fa9>] ? eventfd_ctx_read+0x179/0x1e0
[ 2423.966926] [<ffffffffa03a6a84>] ? snd_pcm_playback_ioctl+0x34/0x40 [snd_pcm]
[ 2423.966927] [<ffffffff810b661e>] ? ktime_get_ts64+0x4e/0xf0
[ 2423.966928] [<ffffffff81193d48>] SyS_ppoll+0x1a8/0x1c0
[ 2423.966930] [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <[email protected]>
---
sound/core/pcm_native.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -104,7 +104,7 @@ EXPORT_SYMBOL_GPL(snd_pcm_stream_unlock)
void snd_pcm_stream_lock_irq(struct snd_pcm_substream *substream)
{
if (!substream->pcm->nonatomic)
- local_irq_disable();
+ local_irq_disable_nort();
snd_pcm_stream_lock(substream);
}
EXPORT_SYMBOL_GPL(snd_pcm_stream_lock_irq);
@@ -113,7 +113,7 @@ void snd_pcm_stream_unlock_irq(struct sn
{
snd_pcm_stream_unlock(substream);
if (!substream->pcm->nonatomic)
- local_irq_enable();
+ local_irq_enable_nort();
}
EXPORT_SYMBOL_GPL(snd_pcm_stream_unlock_irq);
@@ -121,7 +121,7 @@ unsigned long _snd_pcm_stream_lock_irqsa
{
unsigned long flags = 0;
if (!substream->pcm->nonatomic)
- local_irq_save(flags);
+ local_irq_save_nort(flags);
snd_pcm_stream_lock(substream);
return flags;
}
@@ -132,7 +132,7 @@ void snd_pcm_stream_unlock_irqrestore(st
{
snd_pcm_stream_unlock(substream);
if (!substream->pcm->nonatomic)
- local_irq_restore(flags);
+ local_irq_restore_nort(flags);
}
EXPORT_SYMBOL_GPL(snd_pcm_stream_unlock_irqrestore);
[ 37.667792] Installing knfsd (copyright (C) 1996 [email protected]).
[ 37.720307] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
[ 37.720307] in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
[ 37.720318] Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
[ 37.720320] CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
[ 37.720320] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
[ 37.720321] ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
[ 37.720322] 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
[ 37.720322] ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
[ 37.720323] Call Trace:
[ 37.720326] [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
[ 37.720328] [<ffffffff81073c86>] __might_sleep+0xe6/0x150
[ 37.720329] [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
[ 37.720335] [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
[ 37.720340] [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
[ 37.720345] [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
[ 37.720349] [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
[ 37.720351] [<ffffffff810b6f47>] ? __getnstimeofday64+0x37/0xd0
[ 37.720352] [<ffffffff810b6fee>] ? getnstimeofday64+0xe/0x30
[ 37.720353] [<ffffffff810b702a>] ? do_gettimeofday+0x1a/0x50
[ 37.720356] [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
[ 37.720358] [<ffffffff812bcfe7>] ? debug_smp_processor_id+0x17/0x20
[ 37.720361] [<ffffffff811512c7>] ? might_fault+0x47/0x50
[ 37.720363] [<ffffffffa072e410>] ? write_versions+0x320/0x320 [nfsd]
[ 37.720365] [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
[ 37.720366] [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
[ 37.720368] [<ffffffff814c5082>] ? release_sock+0x152/0x1a0
[ 37.720369] [<ffffffff8117f889>] SyS_write+0x49/0xb0
[ 37.720371] [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <[email protected]>
---
net/sunrpc/svc_xprt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -357,7 +357,7 @@ static void svc_xprt_do_enqueue(struct s
return;
}
- cpu = get_cpu();
+ cpu = get_cpu_light();
pool = svc_pool_for_cpu(xprt->xpt_server, cpu);
spin_lock_bh(&pool->sp_lock);
@@ -390,7 +390,7 @@ static void svc_xprt_do_enqueue(struct s
}
spin_unlock_bh(&pool->sp_lock);
- put_cpu();
+ put_cpu_light();
}
/*
On Wed, 2015-02-18 at 12:27 +0100, Mike Galbraith wrote:
> The shuts gripe up.
Gee, that was a very particularly bad way to shut it up. Never mind :)
-Mike
On Wed, 2015-02-18 at 12:21 +0100, Mike Galbraith wrote:
> On Mon, 2015-02-16 at 12:18 +0100, Sebastian Andrzej Siewior wrote:
>
> > Known issues:
> >
> > - lazy preempt on x86_64 leads to a crash with some load.
>
> The below still works for me. (it doesn't make nohz_full actually work
> in rt, but at least folks who want to tinker with it can do so)
Actually, with the below (and other local hacks obviously), it's looking
pretty darn good on my little desktop box.
---
kernel/sched/core.c | 2 +-
kernel/time/tick-sched.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -810,7 +810,7 @@ bool sched_can_stop_tick(void)
* nr_running update is assumed to be visible
* after IPI is sent from wakers.
*/
- if (this_rq()->nr_running > 1)
+ if (this_rq()->nr_running - task_is_softirqd(current) > 1)
return false;
return true;
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -164,7 +164,7 @@ bool tick_nohz_full_running;
static bool can_stop_full_tick(void)
{
- WARN_ON_ONCE(!irqs_disabled());
+ WARN_ON_ONCE_NONRT(!irqs_disabled()); <== better way to shut warning up ;-)
if (!sched_can_stop_tick()) {
trace_tick_stop(0, "more than 1 task in runqueue\n");
With 2 shielded cored measuring perturbations (output 1 line/min)
homer:~ # cgexec -g cpuset:rtcpus taskset -c 2 pert 60
3591.74 MHZ CPU
perturbation threshold 0.013 usecs.
pert/s: 12 >7.76us: 0 min: 0.04 max: 0.78 avg: 0.09 sum/s: 1us overhead: 0.00%
pert/s: 7 >7.02us: 0 min: 0.04 max: 0.70 avg: 0.05 sum/s: 0us overhead: 0.00%
pert/s: 14 >8.11us: 1 min: 0.04 max: 35.72 avg: 0.09 sum/s: 1us overhead: 0.00%
pert/s: 39 >7.35us: 0 min: 0.04 max: 0.93 avg: 0.11 sum/s: 4us overhead: 0.00%
pert/s: 44 >6.66us: 0 min: 0.04 max: 0.84 avg: 0.11 sum/s: 5us overhead: 0.00%
pert/s: 46 >6.05us: 0 min: 0.04 max: 1.00 avg: 0.11 sum/s: 5us overhead: 0.00%
pert/s: 36 >5.49us: 0 min: 0.04 max: 0.71 avg: 0.10 sum/s: 4us overhead: 0.00%
pert/s: 42 >5.21us: 0 min: 0.04 max: 5.43 avg: 0.10 sum/s: 4us overhead: 0.00%
pert/s: 24 >4.73us: 0 min: 0.04 max: 0.74 avg: 0.10 sum/s: 2us overhead: 0.00%
pert/s: 33 >4.31us: 0 min: 0.04 max: 0.90 avg: 0.10 sum/s: 3us overhead: 0.00%
pert/s: 34 >3.92us: 0 min: 0.04 max: 0.70 avg: 0.10 sum/s: 4us overhead: 0.00%
pert/s: 64 >3.59us: 0 min: 0.04 max: 1.05 avg: 0.12 sum/s: 7us overhead: 0.00%
pert/s: 53 >3.27us: 0 min: 0.04 max: 0.66 avg: 0.11 sum/s: 6us overhead: 0.00%
pert/s: 55 >2.98us: 0 min: 0.04 max: 0.72 avg: 0.11 sum/s: 6us overhead: 0.00%
pert/s: 38 >2.70us: 0 min: 0.04 max: 0.35 avg: 0.10 sum/s: 4us overhead: 0.00%
pert/s: 35 >2.46us: 0 min: 0.04 max: 0.43 avg: 0.10 sum/s: 4us overhead: 0.00%
pert/s: 36 >2.25us: 0 min: 0.04 max: 0.58 avg: 0.10 sum/s: 4us overhead: 0.00%
pert/s: 44 >2.08us: 0 min: 0.04 max: 0.93 avg: 0.11 sum/s: 5us overhead: 0.00%
pert/s: 45 >1.91us: 0 min: 0.04 max: 0.76 avg: 0.11 sum/s: 5us overhead: 0.00%
pert/s: 33 >1.75us: 0 min: 0.04 max: 0.55 avg: 0.10 sum/s: 3us overhead: 0.00%
Terminated
homer:~ # cgexec -g cpuset:rtcpus taskset -c 3 pert 60
3591.73 MHZ CPU
perturbation threshold 0.013 usecs.
pert/s: 11 >6.26us: 0 min: 0.04 max: 0.77 avg: 0.08 sum/s: 1us overhead: 0.00%
pert/s: 7 >5.67us: 0 min: 0.04 max: 0.71 avg: 0.06 sum/s: 0us overhead: 0.00%
pert/s: 8 >5.14us: 0 min: 0.04 max: 0.57 avg: 0.05 sum/s: 0us overhead: 0.00%
pert/s: 39 >4.67us: 0 min: 0.04 max: 0.94 avg: 0.11 sum/s: 4us overhead: 0.00%
pert/s: 44 >4.25us: 0 min: 0.04 max: 0.83 avg: 0.11 sum/s: 5us overhead: 0.00%
pert/s: 46 >3.89us: 0 min: 0.04 max: 1.12 avg: 0.12 sum/s: 5us overhead: 0.00%
pert/s: 36 >3.54us: 0 min: 0.04 max: 0.70 avg: 0.11 sum/s: 4us overhead: 0.00%
pert/s: 40 >3.42us: 1 min: 0.04 max: 4.63 avg: 0.10 sum/s: 4us overhead: 0.00%
pert/s: 24 >3.12us: 0 min: 0.04 max: 0.60 avg: 0.10 sum/s: 2us overhead: 0.00%
pert/s: 34 >2.86us: 0 min: 0.04 max: 0.97 avg: 0.11 sum/s: 4us overhead: 0.00%
pert/s: 35 >2.60us: 0 min: 0.04 max: 0.39 avg: 0.10 sum/s: 4us overhead: 0.00%
pert/s: 64 >2.39us: 0 min: 0.04 max: 0.94 avg: 0.12 sum/s: 8us overhead: 0.00%
pert/s: 52 >2.19us: 0 min: 0.04 max: 0.73 avg: 0.12 sum/s: 6us overhead: 0.00%
pert/s: 55 >2.01us: 0 min: 0.04 max: 0.62 avg: 0.12 sum/s: 6us overhead: 0.00%
pert/s: 38 >1.83us: 0 min: 0.04 max: 0.35 avg: 0.11 sum/s: 4us overhead: 0.00%
pert/s: 35 >1.68us: 0 min: 0.04 max: 0.43 avg: 0.10 sum/s: 4us overhead: 0.00%
pert/s: 36 >1.54us: 0 min: 0.04 max: 0.61 avg: 0.11 sum/s: 4us overhead: 0.00%
pert/s: 45 >1.44us: 0 min: 0.04 max: 0.92 avg: 0.11 sum/s: 5us overhead: 0.00%
pert/s: 43 >1.33us: 0 min: 0.04 max: 0.64 avg: 0.11 sum/s: 5us overhead: 0.00%
pert/s: 34 >1.23us: 0 min: 0.04 max: 0.54 avg: 0.10 sum/s: 4us overhead: 0.00%
Terminated
On Tue, 17 Feb 2015 09:37:44 +0100
Daniel Wagner <[email protected]> wrote:
> I needed the patch below to get it running stable under load on my
> shiny box.
FWIW, this patch makes 3.18-rt survive thermal events on my laptop.
> From c517743659575932d7b7c94a08276d0cee8a2fdd Mon Sep 17 00:00:00 2001
> From: Daniel Wagner <[email protected]>
> Date: Fri, 11 Jul 2014 15:26:13 +0200
> Subject: [PATCH] thermal: Defer thermal wakups to threads
>
> On RT the spin lock in pkg_temp_thermal_platfrom_thermal_notify will
> call schedule while we run in irq context.
>
> [<ffffffff816850ac>] dump_stack+0x4e/0x8f
> [<ffffffff81680f7d>] __schedule_bug+0xa6/0xb4
> [<ffffffff816896b4>] __schedule+0x5b4/0x700
> [<ffffffff8168982a>] schedule+0x2a/0x90
> [<ffffffff8168a8b5>] rt_spin_lock_slowlock+0xe5/0x2d0
> [<ffffffff8168afd5>] rt_spin_lock+0x25/0x30
> [<ffffffffa03a7b75>]
> pkg_temp_thermal_platform_thermal_notify+0x45/0x134
> [x86_pkg_temp_thermal] [<ffffffff8103d4db>] ?
> therm_throt_process+0x1b/0x160 [<ffffffff8103d831>]
> intel_thermal_interrupt+0x211/0x250 [<ffffffff8103d8c1>]
> smp_thermal_interrupt+0x21/0x40 [<ffffffff8169415d>]
> thermal_interrupt+0x6d/0x80
>
> Let's defer the work to a kthread.
>
> Signed-off-by: Daniel Wagner <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> ---
> drivers/thermal/x86_pkg_temp_thermal.c | 49
> ++++++++++++++++++++++++++++++++-- 1 file changed, 47 insertions(+),
> 2 deletions(-)
>
> diff --git a/drivers/thermal/x86_pkg_temp_thermal.c
> b/drivers/thermal/x86_pkg_temp_thermal.c index 9ea3d9d..001ba02 100644
> --- a/drivers/thermal/x86_pkg_temp_thermal.c
> +++ b/drivers/thermal/x86_pkg_temp_thermal.c
> @@ -29,6 +29,7 @@
> #include <linux/pm.h>
> #include <linux/thermal.h>
> #include <linux/debugfs.h>
> +#include <linux/work-simple.h>
> #include <asm/cpu_device_id.h>
> #include <asm/mce.h>
>
> @@ -352,7 +353,7 @@ static void
> pkg_temp_thermal_threshold_work_fn(struct work_struct *work) }
> }
>
> -static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
> +static void platform_thermal_notify_work(struct swork_event *event)
> {
> unsigned long flags;
> int cpu = smp_processor_id();
> @@ -369,7 +370,7 @@ static int
> pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
> pkg_work_scheduled[phy_id]) { disable_pkg_thres_interrupt();
> spin_unlock_irqrestore(&pkg_work_lock, flags);
> - return -EINVAL;
> + return;
> }
> pkg_work_scheduled[phy_id] = 1;
> spin_unlock_irqrestore(&pkg_work_lock, flags);
> @@ -378,9 +379,48 @@ static int
> pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
> schedule_delayed_work_on(cpu,
> &per_cpu(pkg_temp_thermal_threshold_work, cpu),
> msecs_to_jiffies(notify_delay_ms)); +}
> +
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +static struct swork_event notify_work;
> +
> +static int thermal_notify_work_init(void)
> +{
> + int err;
> +
> + err = swork_get();
> + if (!err)
> + return err;
> +
> + INIT_SWORK(¬ify_work, platform_thermal_notify_work);
> + return 0;
> +}
> +
> +static void thermal_notify_work_cleanup(void)
> +{
> + swork_put();
> +}
> +
> +static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
> +{
> + swork_queue(¬ify_work);
> return 0;
> }
>
> +#else /* !CONFIG_PREEMPT_RT_FULL */
> +
> +static int thermal_notify_work_init(void) { return 0; }
> +
> +static int thermal_notify_work_cleanup(void) { }
> +
> +static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
> +{
> + platform_thermal_notify_work(NULL);
> +
> + return 0;
> +}
> +#endif /* CONFIG_PREEMPT_RT_FULL */
> +
> static int find_siblings_cpu(int cpu)
> {
> int i;
> @@ -594,6 +634,10 @@ static int __init pkg_temp_thermal_init(void)
> for_each_online_cpu(i)
> if (get_core_online(i))
> goto err_ret;
> +
> + if (!thermal_notify_work_init())
> + goto err_ret;
> +
> __register_hotcpu_notifier(&pkg_temp_thermal_notifier);
> cpu_notifier_register_done();
>
> @@ -619,6 +663,7 @@ static void __exit pkg_temp_thermal_exit(void)
>
> cpu_notifier_register_begin();
> __unregister_hotcpu_notifier(&pkg_temp_thermal_notifier);
> + thermal_notify_work_cleanup();
> mutex_lock(&phy_dev_list_mutex);
> list_for_each_entry_safe(phdev, n, &phy_dev_list, list) {
> /* Retore old MSR value for package thermal
> interrupt */
--
Joakim
* Joakim Hernberg | 2015-02-19 10:36:01 [+0100]:
>On Tue, 17 Feb 2015 09:37:44 +0100
>Daniel Wagner <[email protected]> wrote:
>
>> I needed the patch below to get it running stable under load on my
>> shiny box.
>
>FWIW, this patch makes 3.18-rt survive thermal events on my laptop.
Okay. I applied a slightly modified version of it. That init() is
invoked before the callback function is assigned.
On module exit there is one piece missing: After the callback is
removed we should flush the swork queue to ensure that one callback
is neither pending nor executing. But this won't work now becuase the
swait API lacks that feature so it has to be postponed.
Sebastian
* Mike Galbraith | 2015-02-18 15:09:23 [+0100]:
>Locking functions previously using read_lock_irq()/read_lock_irqsave() were
>changed to local_irq_disable/save(), leading to gripes. Use nort variants.
applied
Sebastian
* Mike Galbraith | 2015-02-18 16:05:28 [+0100]:
>[ 37.667792] Installing knfsd (copyright (C) 1996 [email protected]).
>[ 37.720307] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
>[ 37.720307] in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
>[ 37.720318] Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
applied
Sebastian
* Sebastian Andrzej Siewior | 2015-02-25 14:55:01 [+0100]:
>Okay. I applied a slightly modified version of it. That init() is
>From 8334ac498f104c00e5d93e3e83d3bcec1a993cec Mon Sep 17 00:00:00 2001
From: Daniel Wagner <[email protected]>
Date: Tue, 17 Feb 2015 09:37:44 +0100
Subject: [PATCH] thermal: Defer thermal wakups to threads
On RT the spin lock in pkg_temp_thermal_platfrom_thermal_notify will
call schedule while we run in irq context.
[<ffffffff816850ac>] dump_stack+0x4e/0x8f
[<ffffffff81680f7d>] __schedule_bug+0xa6/0xb4
[<ffffffff816896b4>] __schedule+0x5b4/0x700
[<ffffffff8168982a>] schedule+0x2a/0x90
[<ffffffff8168a8b5>] rt_spin_lock_slowlock+0xe5/0x2d0
[<ffffffff8168afd5>] rt_spin_lock+0x25/0x30
[<ffffffffa03a7b75>] pkg_temp_thermal_platform_thermal_notify+0x45/0x134 [x86_pkg_temp_thermal]
[<ffffffff8103d4db>] ? therm_throt_process+0x1b/0x160
[<ffffffff8103d831>] intel_thermal_interrupt+0x211/0x250
[<ffffffff8103d8c1>] smp_thermal_interrupt+0x21/0x40
[<ffffffff8169415d>] thermal_interrupt+0x6d/0x80
Let's defer the work to a kthread.
Signed-off-by: Daniel Wagner <[email protected]>
[bigeasy: reoder init/denit position. TODO: flush swork on exit]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
drivers/thermal/x86_pkg_temp_thermal.c | 51 ++++++++++++++++++++++++++++++++--
1 file changed, 48 insertions(+), 3 deletions(-)
diff --git a/drivers/thermal/x86_pkg_temp_thermal.c b/drivers/thermal/x86_pkg_temp_thermal.c
index 9ea3d9d49ffc..ab7c400a9d5c 100644
--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -29,6 +29,7 @@
#include <linux/pm.h>
#include <linux/thermal.h>
#include <linux/debugfs.h>
+#include <linux/work-simple.h>
#include <asm/cpu_device_id.h>
#include <asm/mce.h>
@@ -352,7 +353,7 @@ static void pkg_temp_thermal_threshold_work_fn(struct work_struct *work)
}
}
-static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+static void platform_thermal_notify_work(struct swork_event *event)
{
unsigned long flags;
int cpu = smp_processor_id();
@@ -369,7 +370,7 @@ static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
pkg_work_scheduled[phy_id]) {
disable_pkg_thres_interrupt();
spin_unlock_irqrestore(&pkg_work_lock, flags);
- return -EINVAL;
+ return;
}
pkg_work_scheduled[phy_id] = 1;
spin_unlock_irqrestore(&pkg_work_lock, flags);
@@ -378,9 +379,48 @@ static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
schedule_delayed_work_on(cpu,
&per_cpu(pkg_temp_thermal_threshold_work, cpu),
msecs_to_jiffies(notify_delay_ms));
+}
+
+#ifdef CONFIG_PREEMPT_RT_FULL
+static struct swork_event notify_work;
+
+static int thermal_notify_work_init(void)
+{
+ int err;
+
+ err = swork_get();
+ if (!err)
+ return err;
+
+ INIT_SWORK(¬ify_work, platform_thermal_notify_work);
return 0;
}
+static void thermal_notify_work_cleanup(void)
+{
+ swork_put();
+}
+
+static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+{
+ swork_queue(¬ify_work);
+ return 0;
+}
+
+#else /* !CONFIG_PREEMPT_RT_FULL */
+
+static int thermal_notify_work_init(void) { return 0; }
+
+static int thermal_notify_work_cleanup(void) { }
+
+static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+{
+ platform_thermal_notify_work(NULL);
+
+ return 0;
+}
+#endif /* CONFIG_PREEMPT_RT_FULL */
+
static int find_siblings_cpu(int cpu)
{
int i;
@@ -584,6 +624,9 @@ static int __init pkg_temp_thermal_init(void)
if (!x86_match_cpu(pkg_temp_thermal_ids))
return -ENODEV;
+ if (!thermal_notify_work_init())
+ return -ENODEV;
+
spin_lock_init(&pkg_work_lock);
platform_thermal_package_notify =
pkg_temp_thermal_platform_thermal_notify;
@@ -594,6 +637,7 @@ static int __init pkg_temp_thermal_init(void)
for_each_online_cpu(i)
if (get_core_online(i))
goto err_ret;
+
__register_hotcpu_notifier(&pkg_temp_thermal_notifier);
cpu_notifier_register_done();
@@ -608,7 +652,7 @@ static int __init pkg_temp_thermal_init(void)
kfree(pkg_work_scheduled);
platform_thermal_package_notify = NULL;
platform_thermal_package_rate_control = NULL;
-
+ thermal_notify_work_cleanup();
return -ENODEV;
}
@@ -633,6 +677,7 @@ static void __exit pkg_temp_thermal_exit(void)
mutex_unlock(&phy_dev_list_mutex);
platform_thermal_package_notify = NULL;
platform_thermal_package_rate_control = NULL;
+ thermal_notify_work_cleanup();
for_each_online_cpu(i)
cancel_delayed_work_sync(
&per_cpu(pkg_temp_thermal_threshold_work, i));
--
2.1.4
Hi Sebastian
On 02/26/2015 09:48 AM, Sebastian Andrzej Siewior wrote:
> * Sebastian Andrzej Siewior | 2015-02-25 14:55:01 [+0100]:
> +
> +static int thermal_notify_work_init(void)
> +{
> + int err;
> +
> + err = swork_get();
> + if (!err)
> + return err;
It think this should be:
if (err)
return err;
cheers,
daniel
On 02/27/2015 07:40 AM, Daniel Wagner wrote:
> Hi Sebastian
Hi Daniel,
> On 02/26/2015 09:48 AM, Sebastian Andrzej Siewior wrote:
>> * Sebastian Andrzej Siewior | 2015-02-25 14:55:01 [+0100]:
>> +
>> +static int thermal_notify_work_init(void)
>> +{
>> + int err;
>> +
>> + err = swork_get();
>> + if (!err)
>> + return err;
>
> It think this should be:
>
> if (err)
> return err;
Makes sense.
>
> cheers,
> daniel
Sebastian