2022-09-18 23:21:50

by Joel Fernandes

[permalink] [raw]
Subject: [RFC] rcu/nocb: Fix possible bugs in rcu_barrier()

When going through the lazy-rcu work, I noticed that
rcu_barrier_entrain() does not really wake up the rcuog GP thread in any
path after entraining. This means it is possible the GP thread is not
awakened soon (say there were no CBs in the cblist after entraining
time).

Further, nothing appears to be calling the rcu_barrier callback
directly in the case the ->cblist was empty which means if the IPI gets
delayed enough to make the ->cblist empty and it turns out to be the last
CPU holding, then nothing calls completes rcu_state.barrier_completion.

Fix both these issues.

A note on the wakeup, there are 3 cases AFAICS after the call to
rcu_nocb_flush_bypass():

1. The rdp->cblist has pending CBs.

2. The rdp->cblist has all done CBs.

3. The rdp->cblist has no CBs at all (say the IPI took a long time to
arrive and some other path dequeued them in the meanwhile).

For #3, entraining a CB is not needed and we should bail. For #1 and
needed. But for #2 it is needed.

Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
I only build tested this and wanted to post it in advance for discussions. I
will test it more soon. Thanks.

kernel/rcu/tree.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 18f07e167d5e..65d439286757 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3904,10 +3904,11 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
/*
* If needed, entrain an rcu_barrier() callback on rdp->cblist.
*/
-static void rcu_barrier_entrain(struct rcu_data *rdp)
+static void rcu_barrier_entrain(struct rcu_data *rdp, unsigned long flags)
{
unsigned long gseq = READ_ONCE(rcu_state.barrier_sequence);
unsigned long lseq = READ_ONCE(rdp->barrier_seq_snap);
+ bool was_alldone;

lockdep_assert_held(&rcu_state.barrier_lock);
if (rcu_seq_state(lseq) || !rcu_seq_state(gseq) || rcu_seq_ctr(lseq) != rcu_seq_ctr(gseq))
@@ -3916,14 +3917,20 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
rdp->barrier_head.func = rcu_barrier_callback;
debug_rcu_head_queue(&rdp->barrier_head);
rcu_nocb_lock(rdp);
+ was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
+
if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
atomic_inc(&rcu_state.barrier_cpu_count);
+ __call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
} else {
+ /* rdp->cblist is empty so directly call the callback. */
+ atomic_inc(&rcu_state.barrier_cpu_count);
+ rcu_barrier_callback(&rdp->barrier_head);
debug_rcu_head_unqueue(&rdp->barrier_head);
rcu_barrier_trace(TPS("IRQNQ"), -1, rcu_state.barrier_sequence);
+ rcu_nocb_unlock(rdp);
}
- rcu_nocb_unlock(rdp);
smp_store_release(&rdp->barrier_seq_snap, gseq);
}

@@ -3932,15 +3939,16 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
*/
static void rcu_barrier_handler(void *cpu_in)
{
+ unsigned long flags;
uintptr_t cpu = (uintptr_t)cpu_in;
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);

lockdep_assert_irqs_disabled();
WARN_ON_ONCE(cpu != rdp->cpu);
WARN_ON_ONCE(cpu != smp_processor_id());
- raw_spin_lock(&rcu_state.barrier_lock);
- rcu_barrier_entrain(rdp);
- raw_spin_unlock(&rcu_state.barrier_lock);
+ raw_spin_lock_irqsave(&rcu_state.barrier_lock, flags);
+ rcu_barrier_entrain(rdp, flags);
+ raw_spin_unlock_irqrestore(&rcu_state.barrier_lock, flags);
}

/**
@@ -4007,7 +4015,7 @@ void rcu_barrier(void)
continue;
}
if (!rcu_rdp_cpu_online(rdp)) {
- rcu_barrier_entrain(rdp);
+ rcu_barrier_entrain(rdp, flags);
WARN_ON_ONCE(READ_ONCE(rdp->barrier_seq_snap) != gseq);
raw_spin_unlock_irqrestore(&rcu_state.barrier_lock, flags);
rcu_barrier_trace(TPS("OfflineNoCBQ"), cpu, rcu_state.barrier_sequence);
@@ -4333,7 +4341,7 @@ void rcutree_migrate_callbacks(int cpu)

raw_spin_lock_irqsave(&rcu_state.barrier_lock, flags);
WARN_ON_ONCE(rcu_rdp_cpu_online(rdp));
- rcu_barrier_entrain(rdp);
+ rcu_barrier_entrain(rdp, flags);
my_rdp = this_cpu_ptr(&rcu_data);
my_rnp = my_rdp->mynode;
rcu_nocb_lock(my_rdp); /* irqs already disabled. */
--
2.37.3.968.ga6b4b080e4-goog


2022-09-18 23:22:39

by Joel Fernandes

[permalink] [raw]
Subject: Re: [RFC] rcu/nocb: Fix possible bugs in rcu_barrier()



On 9/18/2022 6:12 PM, Joel Fernandes (Google) wrote:
> When going through the lazy-rcu work, I noticed that
> rcu_barrier_entrain() does not really wake up the rcuog GP thread in any
> path after entraining. This means it is possible the GP thread is not
> awakened soon (say there were no CBs in the cblist after entraining
> time).
>
> Further, nothing appears to be calling the rcu_barrier callback
> directly in the case the ->cblist was empty which means if the IPI gets
> delayed enough to make the ->cblist empty and it turns out to be the last
> CPU holding, then nothing calls completes rcu_state.barrier_completion.
>
> Fix both these issues.
>
> A note on the wakeup, there are 3 cases AFAICS after the call to
> rcu_nocb_flush_bypass():
>
> 1. The rdp->cblist has pending CBs.
>
> 2. The rdp->cblist has all done CBs.
>
> 3. The rdp->cblist has no CBs at all (say the IPI took a long time to
> arrive and some other path dequeued them in the meanwhile).
>
> For #3, entraining a CB is not needed and we should bail. For #1 and
> needed. But for #2 it is needed.
>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> I only build tested this and wanted to post it in advance for discussions. I
> will test it more soon. Thanks.
>
> kernel/rcu/tree.c | 22 +++++++++++++++-------
> 1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 18f07e167d5e..65d439286757 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3904,10 +3904,11 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
> /*
> * If needed, entrain an rcu_barrier() callback on rdp->cblist.
> */
> -static void rcu_barrier_entrain(struct rcu_data *rdp)
> +static void rcu_barrier_entrain(struct rcu_data *rdp, unsigned long flags)
> {
> unsigned long gseq = READ_ONCE(rcu_state.barrier_sequence);
> unsigned long lseq = READ_ONCE(rdp->barrier_seq_snap);
> + bool was_alldone;
>
> lockdep_assert_held(&rcu_state.barrier_lock);
> if (rcu_seq_state(lseq) || !rcu_seq_state(gseq) || rcu_seq_ctr(lseq) != rcu_seq_ctr(gseq))
> @@ -3916,14 +3917,20 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
> rdp->barrier_head.func = rcu_barrier_callback;
> debug_rcu_head_queue(&rdp->barrier_head);
> rcu_nocb_lock(rdp);
> + was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> +
> if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
> atomic_inc(&rcu_state.barrier_cpu_count);
> + __call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */

I noticed that __call_rcu_nocb_wake() does a restore of the flags, which may not
work for all callers of rcu_nocb_entrain(). Sigh. The patch needs to be fixed
for that, but at least it highlights the potential issues.

2022-09-19 10:18:30

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC] rcu/nocb: Fix possible bugs in rcu_barrier()

On Sun, Sep 18, 2022 at 10:12:31PM +0000, Joel Fernandes (Google) wrote:
> When going through the lazy-rcu work, I noticed that
> rcu_barrier_entrain() does not really wake up the rcuog GP thread in any
> path after entraining. This means it is possible the GP thread is not
> awakened soon (say there were no CBs in the cblist after entraining
> time).

Right.

>
> Further, nothing appears to be calling the rcu_barrier callback
> directly in the case the ->cblist was empty which means if the IPI gets
> delayed enough to make the ->cblist empty and it turns out to be the last
> CPU holding, then nothing calls completes rcu_state.barrier_completion.

No need for that, if the cblist is empty there is no need for a callback
to enqueue.

Thanks.

2022-09-19 21:35:11

by Joel Fernandes

[permalink] [raw]
Subject: Re: [RFC] rcu/nocb: Fix possible bugs in rcu_barrier()



On 9/19/2022 5:34 AM, Frederic Weisbecker wrote:
> On Sun, Sep 18, 2022 at 10:12:31PM +0000, Joel Fernandes (Google) wrote:
>> When going through the lazy-rcu work, I noticed that
>> rcu_barrier_entrain() does not really wake up the rcuog GP thread in any
>> path after entraining. This means it is possible the GP thread is not
>> awakened soon (say there were no CBs in the cblist after entraining
>> time).
>
> Right.
>
>>
>> Further, nothing appears to be calling the rcu_barrier callback
>> directly in the case the ->cblist was empty which means if the IPI gets
>> delayed enough to make the ->cblist empty and it turns out to be the last
>> CPU holding, then nothing calls completes rcu_state.barrier_completion.
>
> No need for that, if the cblist is empty there is no need for a callback
> to enqueue.
>

Thanks! I was worried about the race where an smp_call_function_single() takes a
long time to IPI. But I missed that the smp_call_function_single() in
rcu_barrier() is in a synchronous wait. I wrongly thought thought that the
waiting was facilitated by barrier_completion which has a whole different purpose.

- Joel

2022-09-21 08:28:54

by kernel test robot

[permalink] [raw]
Subject: [rcu/nocb] 79b696b5c3: WARNING:at_kernel/rcu/tree_nocb.h:#rcu_barrier_entrain

Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 79b696b5c31c8729415f065a03f806031377a036 ("[RFC] rcu/nocb: Fix possible bugs in rcu_barrier()")
url: https://github.com/intel-lab-lkp/linux/commits/Joel-Fernandes-Google/rcu-nocb-Fix-possible-bugs-in-rcu_barrier/20220919-061429
base: https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev
patch link: https://lore.kernel.org/rcu/[email protected]

in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


[ 36.665659][ T1] ------------[ cut here ]------------
[ 36.666591][ T1] WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_nocb.h:1570 rcu_barrier_entrain+0x82/0xa8
[ 36.668410][ T1] Modules linked in:
[ 36.672563][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc1-00134-g79b696b5c31c #1 f5b9e13ee9995ca41e807f76606276f1115eac0d
[ 36.674561][ T1] EIP: rcu_barrier_entrain+0x82/0xa8
[ 36.675625][ T1] Code: 5c c7 87 b8 00 00 00 6b 0d 0b 41 e8 e8 5b 00 00 a1 40 b5 89 43 89 f2 8d 47 5c e8 8a 5c 00 00 84 c0 74 0b f0 ff 05 f4 56 bf 43 <0f> 0b eb 13 f0 ff 05 f4 56 bf 43 89 f0 e8 e4 fe ff ff a1 f4 56 bf
All code
========
0: 5c pop %rsp
1: c7 87 b8 00 00 00 6b movl $0x410b0d6b,0xb8(%rdi)
8: 0d 0b 41
b: e8 e8 5b 00 00 callq 0x5bf8
10: a1 40 b5 89 43 89 f2 movabs 0x478df2894389b540,%eax
17: 8d 47
19: 5c pop %rsp
1a: e8 8a 5c 00 00 callq 0x5ca9
1f: 84 c0 test %al,%al
21: 74 0b je 0x2e
23: f0 ff 05 f4 56 bf 43 lock incl 0x43bf56f4(%rip) # 0x43bf571e
2a:* 0f 0b ud2 <-- trapping instruction
2c: eb 13 jmp 0x41
2e: f0 ff 05 f4 56 bf 43 lock incl 0x43bf56f4(%rip) # 0x43bf5729
35: 89 f0 mov %esi,%eax
37: e8 e4 fe ff ff callq 0xffffffffffffff20
3c: a1 .byte 0xa1
3d: f4 hlt
3e: 56 push %rsi
3f: bf .byte 0xbf

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: eb 13 jmp 0x17
4: f0 ff 05 f4 56 bf 43 lock incl 0x43bf56f4(%rip) # 0x43bf56ff
b: 89 f0 mov %esi,%eax
d: e8 e4 fe ff ff callq 0xfffffffffffffef6
12: a1 .byte 0xa1
13: f4 hlt
14: 56 push %rsi
15: bf .byte 0xbf
[ 36.678492][ T1] EAX: 46021101 EBX: 00000001 ECX: 00000000 EDX: 00000000
[ 36.679585][ T1] ESI: eca91ff4 EDI: eca91f40 EBP: 40847f00 ESP: 40847ef4
[ 36.680728][ T1] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010006
[ 36.681932][ T1] CR0: 80050033 CR2: ff9ff000 CR3: 04208000 CR4: 00040690
[ 36.683037][ T1] Call Trace:
[ 36.683451][ T1] ? rcu_barrier_handler (tree.c:?)
[ 36.684262][ T1] ? rcu_barrier_entrain+0xa8/0xa8
[ 36.685262][ T1] ? generic_exec_single (smp.c:?)
[ 36.686100][ T1] ? smp_call_function_single (??:?)
[ 36.687136][ T1] ? rcu_barrier_entrain+0xa8/0xa8
[ 36.688124][ T1] ? rcu_barrier_entrain+0xa8/0xa8
[ 36.689128][ T1] ? lockdep_hardirqs_on_prepare (lockdep.c:?)
[ 36.689990][ T1] ? trace_hardirqs_on (??:?)
[ 36.690794][ T1] ? rcu_barrier (??:?)
[ 36.691662][ T1] ? rcu_barrier (??:?)
[ 36.692398][ T1] ? rest_init (main.c:?)
[ 36.693144][ T1] ? kernel_init (main.c:?)
[ 36.693700][ T1] ? ret_from_fork (??:?)
[ 36.694311][ T1] irq event stamp: 1063192
[ 36.694951][ T1] hardirqs last enabled at (1063191): _raw_spin_unlock_irqrestore (??:?)
[ 36.696452][ T1] hardirqs last disabled at (1063192): generic_exec_single (smp.c:?)
[ 36.698020][ T1] softirqs last enabled at (1052092): __do_softirq (??:?)
[ 36.699459][ T1] softirqs last disabled at (1051913): do_softirq_own_stack (??:?)
[ 36.700959][ T1] ---[ end trace 0000000000000000 ]---
[ 36.737332][ T1] Write protecting kernel text and read-only data: 41556k
[ 36.767108][ T1] Run /init as init process
[ 36.767902][ T1] with arguments:
[ 36.768500][ T1] /init
[ 36.769014][ T1] with environment:
[ 36.769603][ T1] HOME=/
[ 36.770115][ T1] TERM=linux
[ 36.770690][ T1] RESULT_ROOT=/result/boot/1/vm-snb-i386/openwrt-i386-generic-20190428.cgz/i386-randconfig-a015-20220919/gcc-11/79b696b5c31c8729415f065a03f806031377a036/3
[ 36.773157][ T1] BOOT_IMAGE=/pkg/linux/i386-randconfig-a015-20220919/gcc-11/79b696b5c31c8729415f065a03f806031377a036/vmlinuz-6.0.0-rc1-00134-g79b696b5c31c
[ 36.775405][ T1] branch=linux-review/Joel-Fernandes-Google/rcu-nocb-Fix-possible-bugs-in-rcu_barrier/20220919-061429
[ 36.777284][ T1] job=/lkp/jobs/scheduled/vm-meta-24/boot-1-openwrt-i386-generic-20190428.cgz-79b696b5c31c8729415f065a03f806031377a036-20220920-43941-5brzx0-3.yaml
[ 36.779634][ T1] user=lkp
[ 36.780172][ T1] ARCH=i386
[ 36.780736][ T1] kconfig=i386-randconfig-a015-20220919
[ 36.781730][ T1] commit=79b696b5c31c8729415f065a03f806031377a036
[ 36.782780][ T1] max_uptime=600
[ 36.783407][ T1] LKP_SERVER=internal-lkp-server
[ 36.784202][ T1] selinux=0
[ 36.784713][ T1] softlockup_panic=1
[ 36.785424][ T1] nmi_watchdog=panic
[ 36.786093][ T1] prompt_ramdisk=0
[ 36.786969][ T1] vga=normal
[ 36.848294][ T1] init: Console is alive
[ 36.857169][ T218] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[ 36.859638][ T218] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[ 36.861425][ T218] kmodloader (218) used greatest stack depth: 6456 bytes left
[ 36.871627][ T1] init: - preinit -
[ 36.914550][ T223] cat (223) used greatest stack depth: 6384 bytes left
Press the [f] key and hit [enter] to enter failsafe mode
Press the [1], [2], [3] or [4] key and hit [enter] to select the debug level
[ 39.989086][ T243] mount_root: mounting /dev/root
[ 40.005076][ T220] urandom-seed: Seed file not found (/etc/urandom.seed)
[ 40.010546][ T220] sh (220) used greatest stack depth: 6244 bytes left
[ 40.015695][ T1] procd: - early -
[ 40.139445][ T247] udevtrigger (247) used greatest stack depth: 6180 bytes left
[ 40.643117][ T1] procd: - ubus -
[ 45.766969][ C0] random: crng init done
[ 45.774093][ T1] procd: - init -
Please press Enter to activate this console.
[ 45.856832][ T268] kmodloader: loading kernel modules from /etc/modules.d/*
[ 45.860342][ T268] kmodloader: done loading kernel modules from /etc/modules.d/*
[ 47.791780][ T466] fw3 (466) used greatest stack depth: 6080 bytes left
LKP: ttyS0: 746: Kernel tests: Boot OK!
LKP: ttyS0: 746: HOSTNAME vm-snb-i386, MAC 52:54:00:12:34:56, kernel 6.0.0-rc1-00134-g79b696b5c31c 1
LKP: ttyS0: 746: /lkp/lkp/src/bin/run-lkp /lkp/jobs/scheduled/vm-meta-24/boot-1-openwrt-i386-generic-20190428.cgz-79b696b5c31c8729415f065a03f806031377a036-20220920-43941-5brzx0-3.yaml
[ 50.752708][ T759] mkdir: can't create directory '/sys/kernel/debug': Operation not permitted
[ 50.752708][ T759] mount: mounting debug on /sys/kernel/debug failed: No such file or directory
[ 52.897596][ T891] process 891 (wait) attempted a POSIX timer syscall while CONFIG_POSIX_TIMERS is not set
LKP: ttyS0: 746: LKP: rebooting forcely
[ 76.720983][ T746] sysrq: Emergency Sync
[ 76.722077][ T12] Emergency Sync complete
[ 76.722841][ T746] sysrq: Resetting


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/r/[email protected]


To reproduce:

# build kernel
cd linux
cp config-6.0.0-rc1-00134-g79b696b5c31c .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.


--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (8.55 kB)
config-6.0.0-rc1-00134-g79b696b5c31c (174.96 kB)
job-script (5.06 kB)
dmesg.xz (19.00 kB)
Download all attachments