2014-04-10 06:54:19

by Fengguang Wu

[permalink] [raw]
Subject: [uml/do_xor_speed] WARNING: CPU: 0 PID: 1 at init/main.c:708 do_one_initcall()

Hi Boaz,

I catch the below warning again in

git://git.open-osd.org/linux-open-osd.git raid6

commit 60e833d86a01ad9d8204a04b8db324106ab50395
Author: Boaz Harrosh <[email protected]>
AuthorDate: Thu Jul 19 15:22:37 2012 +0300
Commit: Boaz Harrosh <[email protected]>
CommitDate: Tue Apr 8 08:42:14 2014 +0300

RFC: do_xor_speed Broken on UML do to jiffies

Remember that hang I reported a while back on UML. Well
I'm at it again, and it still hangs and I found why.

I have dprinted jiffies and it never advances during the
loop at do_xor_speed. There for it is stuck in an endless
loop. I have also dprinted current_kernel_time() and it
returns the same constant value as well.

Note that it does usually work on UML, only during
the modprobe of xor.ko while that test is running. It looks
like some lucking is preventing the clock from ticking.

However ktime_get_ts does work for me so I changed the code
as below, so I can work. See how I put several safety
guards, to never get hangs again.
And I think my time based approach is more accurate then
previous system.

UML guys please investigate the jiffies issue? what is
xor.ko not doing right?

Signed-off-by: Boaz Harrosh <[email protected]>

+------------------------------------------------------------------+------------+------------+
| | 60e833d86a | 158b685625 |
+------------------------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 22 | 13 |
| WARNING:CPU:PID:at_init/main.c:do_one_initcall() | 22 | 12 |
| initcall_calibrate_xor_blocks_returned_with_preemption_imbalance | 22 | 12 |
| backtrace:warn_slowpath_fmt | 22 | 12 |
| backtrace:kernel_init_freeable | 22 | 12 |
| INFO:possible_irq_lock_inversion_dependency_detected | 0 | 12 |
| backtrace:page_writeback_init | 0 | 12 |
| backtrace:smp_init | 0 | 12 |
| backtrace:kvm_guest_init | 0 | 12 |
| backtrace:kvmclock_init | 0 | 12 |
| BUG:kernel_early_hang_without_any_printk_output | 0 | 1 |
+------------------------------------------------------------------+------------+------------+

[ 0.381056] xor: using function: pIII_sse (0.192 MB/sec)
[ 0.382268] ------------[ cut here ]------------
[ 0.382268] ------------[ cut here ]------------
[ 0.383315] WARNING: CPU: 0 PID: 1 at init/main.c:708 do_one_initcall+0x11d/0x136()

git bisect start 158b6856256ef0171c2c7db83832b249861e9d3b 455c6fdbd219161bd09b1165f11699d6d73de11c --
git bisect good 646a0bcc54fefa29c32fc03bebde6ff613a3047b # 00:26 22+ 0 Merge 'drm-exynos/exynos-drm-next-todo' into devel-f4-i386-201404081359
git bisect good 65c4c808e7aa0d49fac20baeebbb1869ee5dfe14 # 00:34 22+ 22 Merge 'vireshk/isolate-cpusets' into devel-f4-i386-201404081359
git bisect bad 78fb990bc3dd5c7a71b01c87e9e94d218720bfac # 00:59 0- 8 Merge 'open-osd/raid6' into devel-f4-i386-201404081359
git bisect good 523fd2062839e80b615d5382bc9c44736cc24c8c # 01:20 22+ 0 ore: (trivial) reformat some code
git bisect good a7b6083b7dfa79c3109e200317fe190d7b61d1eb # 01:30 22+ 0 ore: Support for raid 6
git bisect bad edffc9c132ffb305179d016fc68d8e947bad83f4 # 01:41 0- 20 SQUASHME: ore: 2 raid6 bugs
git bisect bad 60e833d86a01ad9d8204a04b8db324106ab50395 # 01:49 0- 20 RFC: do_xor_speed Broken on UML do to jiffies
# first bad commit: [60e833d86a01ad9d8204a04b8db324106ab50395] RFC: do_xor_speed Broken on UML do to jiffies
git bisect good a7b6083b7dfa79c3109e200317fe190d7b61d1eb # 01:54 66+ 0 ore: Support for raid 6
git bisect bad 158b6856256ef0171c2c7db83832b249861e9d3b # 01:54 0- 13 0day head guard for 'devel-f4-i386-201404081359'
git bisect good 39de65aa2c3eee901db020a4f1396998e09602a3 # 01:59 66+ 0 Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
git bisect good 35e2933ca69ef1b8061e5aee090535410336e063 # 02:05 66+ 0 Add linux-next specific files for 20140409


This script may reproduce the error.

-----------------------------------------------------------------------------
#!/bin/bash

kernel=$1
initrd=quantal-core-i386.cgz

wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/blob/master/initrd/$initrd

kvm=(
qemu-system-x86_64 -cpu kvm64 -enable-kvm
-kernel $kernel
-initrd $initrd
-smp 2
-m 256M
-net nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio
-net user,vlan=0
-net nic,vlan=1,model=e1000
-net user,vlan=1
-boot order=nc
-no-reboot
-watchdog i6300esb
-serial stdio
-display none
-monitor null
)

append=(
debug
sched_debug
apic=debug
ignore_loglevel
sysrq_always_enabled
panic=10
prompt_ramdisk=0
earlyprintk=ttyS0,115200
console=ttyS0,115200
console=tty0
vga=normal
root=/dev/ram0
rw
)

"${kvm[@]}" --append "${append[*]}"
-----------------------------------------------------------------------------

Thanks,
Fengguang


Attachments:
(No filename) (5.52 kB)
dmesg-quantal-lkp-ib03-65:20140410014919:i386-randconfig-fd1-0408:3.14.0-rc8-00007-g60e833d:2 (146.98 kB)
i386-randconfig-fd1-0408-158b6856256ef0171c2c7db83832b249861e9d3b-WARNING:---at----do_one_initcall+-x-86190.log (24.86 kB)
config-3.14.0-rc8-00007-g60e833d (90.71 kB)
Download all attachments

2014-04-10 07:44:00

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [uml/do_xor_speed] WARNING: CPU: 0 PID: 1 at init/main.c:708 do_one_initcall()

On 04/10/2014 09:53 AM, Fengguang Wu wrote:
> Hi Boaz,
>
> I catch the below warning again in
>
> git://git.open-osd.org/linux-open-osd.git raid6
>

OK Thanks. Daa I thought I fixed it but I forgot the obvious problem.

I pushed a new tree I think it might be good this time. (finally)

Thanks
Boaz


> commit 60e833d86a01ad9d8204a04b8db324106ab50395
> Author: Boaz Harrosh <[email protected]>
> AuthorDate: Thu Jul 19 15:22:37 2012 +0300
> Commit: Boaz Harrosh <[email protected]>
> CommitDate: Tue Apr 8 08:42:14 2014 +0300
>
> RFC: do_xor_speed Broken on UML do to jiffies
>
> Remember that hang I reported a while back on UML. Well
> I'm at it again, and it still hangs and I found why.
>
> I have dprinted jiffies and it never advances during the
> loop at do_xor_speed. There for it is stuck in an endless
> loop. I have also dprinted current_kernel_time() and it
> returns the same constant value as well.
>
> Note that it does usually work on UML, only during
> the modprobe of xor.ko while that test is running. It looks
> like some lucking is preventing the clock from ticking.
>
> However ktime_get_ts does work for me so I changed the code
> as below, so I can work. See how I put several safety
> guards, to never get hangs again.
> And I think my time based approach is more accurate then
> previous system.
>
> UML guys please investigate the jiffies issue? what is
> xor.ko not doing right?
>
> Signed-off-by: Boaz Harrosh <[email protected]>
>
<>
> [ 0.383315] WARNING: CPU: 0 PID: 1 at init/main.c:708 do_one_initcall+0x11d/0x136()