2010-08-27 12:35:08

by Qian Cai

[permalink] [raw]
Subject: kdump regression compared to v2.6.35

Just a head-up, the kdump kernel is stuck here. Bisect indicated that cc41f5cede3c63836d1c0958204630b07f5b5ee7 was also good.

Kernel command line: ro root=/dev/mapper/vg_intels3e3601-lv_root rd_LVM_LV=vg_intels3e3601/lv_root rd_LVM_LV=vg_intels3e3601/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us console=ttyS0,115200 irqpoll maxcpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=640K@0K memmap=130412K@49792K elfcorehdr=180204K memmap=128K$896K memmap=2124K#1978520K memmap=4280K#1980644K memmap=464K$1984924K memmap=156K#1985388K memmap=84K$1985544K memmap=92K#1985628K memmap=8K$1985720K memmap=124K#1985728K memmap=136K$1985852K memmap=192K#1985988K memmap=260K$1986180K memmap=2816K#1986440K memmap=176K$1989256K memmap=3028K#1989432K memmap=2048K#1992460K memmap=2072K#1994508K memmap=8K$1996580K memmap=324K#1996588K memmap=428K$1996912K memmap=31584K#1997340K memmap=960K$2028924K memmap=1248K#2029884K memmap=288K#2031132K memmap=192K#2031420K memmap=327684K$2031612K memmap=16384K$4128768K memmap=16K$4174960K memmap=16384K$417!
7920K
Misrouted IRQ fixup and polling support enabled
This may significantly impact system performance
Disabling memory control group subsystem
PID hash table entries: 512 (order: 0, 4096 bytes)
Checking aperture...
No AGP bridge found
Queued invalidation will be enabled to support x2apic and Intr-remapping.
Subtract (141 early reservations)
#1 [0004000000 - 00052b1548] TEXT DATA BSS
#2 [000ab63000 - 000afef000] RAMDISK
#3 [00052b2000 - 00052b2220] BRK
#4 [000009b000 - 00000fdd20] BIOS reserved
#5 [00000fdd30 - 0000100000] BIOS reserved
#6 [00000fdd20 - 00000fdd30] MP-table mpf
#7 [0000000012 - 000000f012] MP-table mpc
#8 [0000010000 - 0000012000] TRAMPOLINE
#9 [0000012000 - 0000016000] ACPI WAKEUP
#10 [0000016000 - 0000017000] PGTABLE
#11 [0000017000 - 000001703c] ACPI SLIT
#12 [000000f040 - 000000f4c0] MEMNODEMAP
#13 [00030a0000 - 00030c7000] NODE_DATA
#14 [00030c7000 - 00030c8000] BOOTMEM
#15 [00034c8000 - 00034c8030] BOOTMEM
#16 [00038c9000 - 00038ca000] BOOTMEM
#17 [00038ca000 - 00038cb000] BOOTMEM
#18 [0003a00000 - 0003e00000] MEMMAP 0
#19 [00030c8000 - 00030e0000] BOOTMEM
#20 [00030e0000 - 00030f8000] BOOTMEM
#21 [00030f8000 - 00030f9000] BOOTMEM
#22 [00030f9000 - 00030f9041] BOOTMEM
#23 [00030f9080 - 00030f9149] BOOTMEM
#24 [00030f9180 - 00030f9768] BOOTMEM
#25 [00030f9780 - 00030f97e8] BOOTMEM
#26 [00030f9800 - 00030f9868] BOOTMEM
#27 [00030f9880 - 00030f98e8] BOOTMEM
#28 [00030f9900 - 00030f9968] BOOTMEM
#29 [00030f9980 - 00030f99e8] BOOTMEM
#30 [00030f9a00 - 00030f9a68] BOOTMEM
#31 [00030f9a80 - 00030f9ae8] BOOTMEM
#32 [00030f9b00 - 00030f9b68] BOOTMEM
#33 [00030f9b80 - 00030f9be8] BOOTMEM
#34 [00030f9c00 - 00030f9c68] BOOTMEM
#35 [00030f9c80 - 00030f9ce8] BOOTMEM
#36 [00030f9d00 - 00030f9d68] BOOTMEM
#37 [00030f9d80 - 00030f9de8] BOOTMEM
#38 [00030f9e00 - 00030f9e68] BOOTMEM
#39 [00030f9e80 - 00030f9ee8] BOOTMEM
#40 [00030f9f00 - 00030f9f68] BOOTMEM
#41 [00030f9f80 - 00030f9fe8] BOOTMEM
#42 [00030fa000 - 00030fa068] BOOTMEM
#43 [00030fa080 - 00030fa0e8] BOOTMEM
#44 [00030fa100 - 00030fa168] BOOTMEM
#45 [00030fa180 - 00030fa1e8] BOOTMEM
#46 [00030fa200 - 00030fa268] BOOTMEM
#47 [00030fa280 - 00030fa2e8] BOOTMEM
#48 [00030fa300 - 00030fa368] BOOTMEM
#49 [00030fa380 - 00030fa3e8] BOOTMEM
#50 [00030fa400 - 00030fa468] BOOTMEM
#51 [00030fa480 - 00030fa4e8] BOOTMEM
#52 [00030fa500 - 00030fa568] BOOTMEM
#53 [00030fa580 - 00030fa5e8] BOOTMEM
#54 [00030fa600 - 00030fa668] BOOTMEM
#55 [00030fa680 - 00030fa6e8] BOOTMEM
#56 [00030fa700 - 00030fa768] BOOTMEM
#57 [00030fa780 - 00030fa7e8] BOOTMEM
#58 [00030fa800 - 00030fa820] BOOTMEM
#59 [00030fa840 - 00030fac0e] BOOTMEM
#60 [00030fac40 - 00030fb00e] BOOTMEM
#61 [0005400000 - 000541e000] BOOTMEM
#62 [0005420000 - 000543e000] BOOTMEM
#63 [0005440000 - 000545e000] BOOTMEM
#64 [0005460000 - 000547e000] BOOTMEM
#65 [0005480000 - 000549e000] BOOTMEM
#66 [00054a0000 - 00054be000] BOOTMEM
#67 [00054c0000 - 00054de000] BOOTMEM
#68 [00054e0000 - 00054fe000] BOOTMEM
#69 [0005500000 - 000551e000] BOOTMEM
#70 [0005520000 - 000553e000] BOOTMEM
#71 [0005540000 - 000555e000] BOOTMEM
#72 [0005560000 - 000557e000] BOOTMEM
#73 [0005580000 - 000559e000] BOOTMEM
#74 [00055a0000 - 00055be000] BOOTMEM
#75 [00055c0000 - 00055de000] BOOTMEM
#76 [00055e0000 - 00055fe000] BOOTMEM
#77 [0005600000 - 000561e000] BOOTMEM
#78 [0005620000 - 000563e000] BOOTMEM
#79 [0005640000 - 000565e000] BOOTMEM
#80 [0005660000 - 000567e000] BOOTMEM
#81 [0005680000 - 000569e000] BOOTMEM
#82 [00056a0000 - 00056be000] BOOTMEM
#83 [00056c0000 - 00056de000] BOOTMEM
#84 [00056e0000 - 00056fe000] BOOTMEM
#85 [0005700000 - 000571e000] BOOTMEM
#86 [0005720000 - 000573e000] BOOTMEM
#87 [0005740000 - 000575e000] BOOTMEM
#88 [0005760000 - 000577e000] BOOTMEM
#89 [0005780000 - 000579e000] BOOTMEM
#90 [00057a0000 - 00057be000] BOOTMEM
#91 [00057c0000 - 00057de000] BOOTMEM
#92 [00057e0000 - 00057fe000] BOOTMEM
#93 [0005800000 - 000581e000] BOOTMEM
#94 [0005820000 - 000583e000] BOOTMEM
#95 [0005840000 - 000585e000] BOOTMEM
#96 [0005860000 - 000587e000] BOOTMEM
#97 [0005880000 - 000589e000] BOOTMEM
#98 [00058a0000 - 00058be000] BOOTMEM
#99 [00058c0000 - 00058de000] BOOTMEM
#100 [00058e0000 - 00058fe000] BOOTMEM
#101 [0005900000 - 000591e000] BOOTMEM
#102 [0005920000 - 000593e000] BOOTMEM
#103 [0005940000 - 000595e000] BOOTMEM
#104 [0005960000 - 000597e000] BOOTMEM
#105 [0005980000 - 000599e000] BOOTMEM
#106 [00059a0000 - 00059be000] BOOTMEM
#107 [00059c0000 - 00059de000] BOOTMEM
#108 [00059e0000 - 00059fe000] BOOTMEM
#109 [0005a00000 - 0005a1e000] BOOTMEM
#110 [0005a20000 - 0005a3e000] BOOTMEM
#111 [0005a40000 - 0005a5e000] BOOTMEM
#112 [0005a60000 - 0005a7e000] BOOTMEM
#113 [0005a80000 - 0005a9e000] BOOTMEM
#114 [0005aa0000 - 0005abe000] BOOTMEM
#115 [0005ac0000 - 0005ade000] BOOTMEM
#116 [0005ae0000 - 0005afe000] BOOTMEM
#117 [0005b00000 - 0005b1e000] BOOTMEM
#118 [0005b20000 - 0005b3e000] BOOTMEM
#119 [0005b40000 - 0005b5e000] BOOTMEM
#120 [0005b60000 - 0005b7e000] BOOTMEM
#121 [0005b80000 - 0005b9e000] BOOTMEM
#122 [0005ba0000 - 0005bbe000] BOOTMEM
#123 [0005bc0000 - 0005bde000] BOOTMEM
#124 [0005be0000 - 0005bfe000] BOOTMEM
#125 [00030fd040 - 00030fd048] BOOTMEM
#126 [00030fd080 - 00030fd088] BOOTMEM
#127 [00030fd0c0 - 00030fd1c0] BOOTMEM
#128 [00030fd1c0 - 00030fd3c0] BOOTMEM
#129 [00030fd3c0 - 00030fd4d0] BOOTMEM
#130 [00030fd500 - 00030fd548] BOOTMEM
#131 [00030fd580 - 00030fd5c8] BOOTMEM
#132 [00030fb040 - 00030fb240] BOOTMEM
#133 [00030fb240 - 00030fb440] BOOTMEM
#134 [00030fb440 - 00030fb640] BOOTMEM
#135 [00030fb640 - 00030fb840] BOOTMEM
#136 [00030fb840 - 00030fba40] BOOTMEM
#137 [00030fba40 - 00030fbc40] BOOTMEM
#138 [00030fbc40 - 00030fbe40] BOOTMEM
#139 [00030fbe40 - 00030fc040] BOOTMEM
#140 [00030fc040 - 00030fd040] BOOTMEM
Memory: 94968k/180204k available (4766k kernel code, 49156k absent, 36080k reserved, 7642k data, 1448k init)
Hierarchical RCU implementation.
RCU-based detection of stalled CPUs is disabled.
Verbose stalled-CPUs detection is disabled.
NR_IRQS:262400 nr_irqs:2008
Extended CMOS year: 2000
Spurious LAPIC timer interrupt on cpu 0
Console: colour VGA+ 80x25
console [ttyS0] enabled
Fast TSC calibration using PIT
Detected 1994.798 MHz processor.
Calibrating delay loop (skipped), value calculated using timer frequency.. 3989.59 BogoMIPS (lpj=1994798)
pid_max: default: 65536 minimum: 512
Security Framework initialized
SELinux: Initializing.
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
Initializing cgroup subsys blkio
CPU: Physical Processor ID: 3
CPU: Processor Core ID: 0
mce: CPU supports 22 MCE banks
using mwait in idle threads.
Performance Events: PEBS fmt1+, Nehalem events, Intel PMU driver.
... version: 3
... bit width: 48
... generic registers: 4
... value mask: 0000ffffffffffff
... max period: 000000007fffffff
... fixed-purpose events: 3
... event mask: 000000070000000f
SMP alternatives: switching to UP code
ACPI: Core revision 20100702
ftrace: converting mcount calls to 0f 1f 44 00 00
ftrace: allocating 18514 entries in 73 pages
DMAR: Host address width 48
DMAR: DRHD base: 0x000000fd800000 flags: 0x0
IOMMU 0: reg_base_addr fd800000 ver 1:0 cap c90780106f0462 ecap f020fe
DMAR: DRHD base: 0x000000fd000000 flags: 0x1
IOMMU 1: reg_base_addr fd000000 ver 1:0 cap c90780106f0462 ecap f020fe
DMAR: RMRR base: 0x0000007be29000 end: 0x0000007be2bfff
DMAR: RMRR base: 0x0000007be16000 end: 0x0000007be16fff
DMAR: RMRR base: 0x0000007be13000 end: 0x0000007be13fff
DMAR: RMRR base: 0x0000007be10000 end: 0x0000007be10fff
DMAR: RMRR base: 0x0000007be0d000 end: 0x0000007be0dfff
DMAR: RMRR base: 0x0000007be0a000 end: 0x0000007be0afff
DMAR: RMRR base: 0x0000007be07000 end: 0x0000007be07fff
DMAR: RMRR base: 0x0000007be04000 end: 0x0000007be04fff
DMAR: RMRR base: 0x0000007be01000 end: 0x0000007be01fff
DMAR: ATSR flags: 0x0
DMAR: ATSR flags: 0x0
DMAR: RHSA base: 0x000000fd000000 proximity domain: 0x0
DMAR: RHSA base: 0x000000fd800000 proximity domain: 0x2
IOAPIC id 10 under DRHD base 0xfd800000 IOMMU 0
IOAPIC id 8 under DRHD base 0xfd000000 IOMMU 1
IOAPIC id 9 under DRHD base 0xfd000000 IOMMU 1
Enabled Interrupt-remapping
Setting APIC routing to cluster x2apic
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Xeon(R) CPU X7550 @ 2.00GHz stepping 06
Brought up 1 CPUs
Total of 1 processors activated (3989.59 BogoMIPS).
devtmpfs: initialized
regulator: core version 0.5
NET: Registered protocol family 16
ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
ACPI: bus type pci registered
PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
PCI: Using configuration type 1 for base access
bio: create slab <bio-0> at 0


2010-08-28 15:20:00

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

Hmm, after bisect the mainline, it pointed to a wrong commit earlier than v2.6.35, since we knew that v2.6.35 was working fine and v2.6.36-rc1 was broken. Here was the log starting from HEAD (bad) and v2.6.35 (good).

I noticed that we got this too,
# git bisect good
Bisecting: a merge base must be tested
[21aa9af03d06cb1d19a3738e5cf12acff984e69b] sched: add hooks for workqueue

# git bisect log
git bisect start
# good: [ab69bcd66fb4be64edfc767365cb9eb084961246] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
git bisect good ab69bcd66fb4be64edfc767365cb9eb084961246
# good: [1cfd2bda8c486ae0e7a8005354758ebb68172bca] Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
git bisect good 1cfd2bda8c486ae0e7a8005354758ebb68172bca
# bad: [faa38b5e0e092914764cdba9f83d31a3f794d182] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
git bisect bad faa38b5e0e092914764cdba9f83d31a3f794d182
# bad: [5df6b8e65ad0f2eaee202ff002ac00d1ac605315] Merge branch 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
git bisect bad 5df6b8e65ad0f2eaee202ff002ac00d1ac605315
# bad: [1fc7995d19139d6f99203b43c161968f3f554a15] Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
git bisect bad 1fc7995d19139d6f99203b43c161968f3f554a15
# good: [e8779776afbd5f2d5315cf48c4257ca7e9b250fb] Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect good e8779776afbd5f2d5315cf48c4257ca7e9b250fb
# bad: [f34217977d717385a3e9fd7018ac39fade3964c0] workqueue: implement unbound workqueue
git bisect bad f34217977d717385a3e9fd7018ac39fade3964c0
# good: [7e27d6e778cd87b6f2415515d7127eba53fe5d02] Linux 2.6.35-rc3
git bisect good 7e27d6e778cd87b6f2415515d7127eba53fe5d02
# good: [21aa9af03d06cb1d19a3738e5cf12acff984e69b] sched: add hooks for workqueue
git bisect good 21aa9af03d06cb1d19a3738e5cf12acff984e69b
# good: [c8e55f360210c1bc49bea5d62bc3939b7ee13483] workqueue: implement worker states
git bisect good c8e55f360210c1bc49bea5d62bc3939b7ee13483
# good: [d320c03830b17af64e4547075003b1eeb274bc6c] workqueue: s/__create_workqueue()/alloc_workqueue()/, and add system workqueues
git bisect good d320c03830b17af64e4547075003b1eeb274bc6c
# good: [4ce48b37bfedc2bc11e61eae76784887e88b922c] workqueue: fix race condition in flush_workqueue()
git bisect good 4ce48b37bfedc2bc11e61eae76784887e88b922c
# good: [d313dd85ad846bc768d58e9ceb28588f917f4c9a] workqueue: fix worker management invocation without pending works
git bisect good d313dd85ad846bc768d58e9ceb28588f917f4c9a
# good: [bdbc5dd7de5d07d6c9d3536e598956165a031d4c] workqueue: prepare for WQ_UNBOUND implementation
git bisect good bdbc5dd7de5d07d6c9d3536e598956165a031d4c

Any suggestion how to track it down?

----- "CAI Qian" <[email protected]> wrote:

> Just a head-up, the kdump kernel is stuck here. Bisect indicated that
> cc41f5cede3c63836d1c0958204630b07f5b5ee7 was also good.
>
> Kernel command line: ro root=/dev/mapper/vg_intels3e3601-lv_root
> rd_LVM_LV=vg_intels3e3601/lv_root rd_LVM_LV=vg_intels3e3601/lv_swap
> rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8
> SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
> console=ttyS0,115200 irqpoll maxcpus=1 reset_devices
> cgroup_disable=memory memmap=exactmap memmap=640K@0K
> memmap=130412K@49792K elfcorehdr=180204K memmap=128K$896K
> memmap=2124K#1978520K memmap=4280K#1980644K memmap=464K$1984924K
> memmap=156K#1985388K memmap=84K$1985544K memmap=92K#1985628K
> memmap=8K$1985720K memmap=124K#1985728K memmap=136K$1985852K
> memmap=192K#1985988K memmap=260K$1986180K memmap=2816K#1986440K
> memmap=176K$1989256K memmap=3028K#1989432K memmap=2048K#1992460K
> memmap=2072K#1994508K memmap=8K$1996580K memmap=324K#1996588K
> memmap=428K$1996912K memmap=31584K#1997340K memmap=960K$2028924K
> memmap=1248K#2029884K memmap=288K#2031132K memmap=192K#2031420K
> memmap=327684K$2031612K memmap=16384K$4128768K memmap=16K$4174960K
> memmap=16384K$4177920K
> Misrouted IRQ fixup and polling support enabled
> This may significantly impact system performance
> Disabling memory control group subsystem
> PID hash table entries: 512 (order: 0, 4096 bytes)
> Checking aperture...
> No AGP bridge found
> Queued invalidation will be enabled to support x2apic and
> Intr-remapping.
> Subtract (141 early reservations)
> #1 [0004000000 - 00052b1548] TEXT DATA BSS
> #2 [000ab63000 - 000afef000] RAMDISK
> #3 [00052b2000 - 00052b2220] BRK
> #4 [000009b000 - 00000fdd20] BIOS reserved
> #5 [00000fdd30 - 0000100000] BIOS reserved
> #6 [00000fdd20 - 00000fdd30] MP-table mpf
> #7 [0000000012 - 000000f012] MP-table mpc
> #8 [0000010000 - 0000012000] TRAMPOLINE
> #9 [0000012000 - 0000016000] ACPI WAKEUP
> #10 [0000016000 - 0000017000] PGTABLE
> #11 [0000017000 - 000001703c] ACPI SLIT
> #12 [000000f040 - 000000f4c0] MEMNODEMAP
> #13 [00030a0000 - 00030c7000] NODE_DATA
> #14 [00030c7000 - 00030c8000] BOOTMEM
> #15 [00034c8000 - 00034c8030] BOOTMEM
> #16 [00038c9000 - 00038ca000] BOOTMEM
> #17 [00038ca000 - 00038cb000] BOOTMEM
> #18 [0003a00000 - 0003e00000] MEMMAP 0
> #19 [00030c8000 - 00030e0000] BOOTMEM
> #20 [00030e0000 - 00030f8000] BOOTMEM
> #21 [00030f8000 - 00030f9000] BOOTMEM
> #22 [00030f9000 - 00030f9041] BOOTMEM
> #23 [00030f9080 - 00030f9149] BOOTMEM
> #24 [00030f9180 - 00030f9768] BOOTMEM
> #25 [00030f9780 - 00030f97e8] BOOTMEM
> #26 [00030f9800 - 00030f9868] BOOTMEM
> #27 [00030f9880 - 00030f98e8] BOOTMEM
> #28 [00030f9900 - 00030f9968] BOOTMEM
> #29 [00030f9980 - 00030f99e8] BOOTMEM
> #30 [00030f9a00 - 00030f9a68] BOOTMEM
> #31 [00030f9a80 - 00030f9ae8] BOOTMEM
> #32 [00030f9b00 - 00030f9b68] BOOTMEM
> #33 [00030f9b80 - 00030f9be8] BOOTMEM
> #34 [00030f9c00 - 00030f9c68] BOOTMEM
> #35 [00030f9c80 - 00030f9ce8] BOOTMEM
> #36 [00030f9d00 - 00030f9d68] BOOTMEM
> #37 [00030f9d80 - 00030f9de8] BOOTMEM
> #38 [00030f9e00 - 00030f9e68] BOOTMEM
> #39 [00030f9e80 - 00030f9ee8] BOOTMEM
> #40 [00030f9f00 - 00030f9f68] BOOTMEM
> #41 [00030f9f80 - 00030f9fe8] BOOTMEM
> #42 [00030fa000 - 00030fa068] BOOTMEM
> #43 [00030fa080 - 00030fa0e8] BOOTMEM
> #44 [00030fa100 - 00030fa168] BOOTMEM
> #45 [00030fa180 - 00030fa1e8] BOOTMEM
> #46 [00030fa200 - 00030fa268] BOOTMEM
> #47 [00030fa280 - 00030fa2e8] BOOTMEM
> #48 [00030fa300 - 00030fa368] BOOTMEM
> #49 [00030fa380 - 00030fa3e8] BOOTMEM
> #50 [00030fa400 - 00030fa468] BOOTMEM
> #51 [00030fa480 - 00030fa4e8] BOOTMEM
> #52 [00030fa500 - 00030fa568] BOOTMEM
> #53 [00030fa580 - 00030fa5e8] BOOTMEM
> #54 [00030fa600 - 00030fa668] BOOTMEM
> #55 [00030fa680 - 00030fa6e8] BOOTMEM
> #56 [00030fa700 - 00030fa768] BOOTMEM
> #57 [00030fa780 - 00030fa7e8] BOOTMEM
> #58 [00030fa800 - 00030fa820] BOOTMEM
> #59 [00030fa840 - 00030fac0e] BOOTMEM
> #60 [00030fac40 - 00030fb00e] BOOTMEM
> #61 [0005400000 - 000541e000] BOOTMEM
> #62 [0005420000 - 000543e000] BOOTMEM
> #63 [0005440000 - 000545e000] BOOTMEM
> #64 [0005460000 - 000547e000] BOOTMEM
> #65 [0005480000 - 000549e000] BOOTMEM
> #66 [00054a0000 - 00054be000] BOOTMEM
> #67 [00054c0000 - 00054de000] BOOTMEM
> #68 [00054e0000 - 00054fe000] BOOTMEM
> #69 [0005500000 - 000551e000] BOOTMEM
> #70 [0005520000 - 000553e000] BOOTMEM
> #71 [0005540000 - 000555e000] BOOTMEM
> #72 [0005560000 - 000557e000] BOOTMEM
> #73 [0005580000 - 000559e000] BOOTMEM
> #74 [00055a0000 - 00055be000] BOOTMEM
> #75 [00055c0000 - 00055de000] BOOTMEM
> #76 [00055e0000 - 00055fe000] BOOTMEM
> #77 [0005600000 - 000561e000] BOOTMEM
> #78 [0005620000 - 000563e000] BOOTMEM
> #79 [0005640000 - 000565e000] BOOTMEM
> #80 [0005660000 - 000567e000] BOOTMEM
> #81 [0005680000 - 000569e000] BOOTMEM
> #82 [00056a0000 - 00056be000] BOOTMEM
> #83 [00056c0000 - 00056de000] BOOTMEM
> #84 [00056e0000 - 00056fe000] BOOTMEM
> #85 [0005700000 - 000571e000] BOOTMEM
> #86 [0005720000 - 000573e000] BOOTMEM
> #87 [0005740000 - 000575e000] BOOTMEM
> #88 [0005760000 - 000577e000] BOOTMEM
> #89 [0005780000 - 000579e000] BOOTMEM
> #90 [00057a0000 - 00057be000] BOOTMEM
> #91 [00057c0000 - 00057de000] BOOTMEM
> #92 [00057e0000 - 00057fe000] BOOTMEM
> #93 [0005800000 - 000581e000] BOOTMEM
> #94 [0005820000 - 000583e000] BOOTMEM
> #95 [0005840000 - 000585e000] BOOTMEM
> #96 [0005860000 - 000587e000] BOOTMEM
> #97 [0005880000 - 000589e000] BOOTMEM
> #98 [00058a0000 - 00058be000] BOOTMEM
> #99 [00058c0000 - 00058de000] BOOTMEM
> #100 [00058e0000 - 00058fe000] BOOTMEM
> #101 [0005900000 - 000591e000] BOOTMEM
> #102 [0005920000 - 000593e000] BOOTMEM
> #103 [0005940000 - 000595e000] BOOTMEM
> #104 [0005960000 - 000597e000] BOOTMEM
> #105 [0005980000 - 000599e000] BOOTMEM
> #106 [00059a0000 - 00059be000] BOOTMEM
> #107 [00059c0000 - 00059de000] BOOTMEM
> #108 [00059e0000 - 00059fe000] BOOTMEM
> #109 [0005a00000 - 0005a1e000] BOOTMEM
> #110 [0005a20000 - 0005a3e000] BOOTMEM
> #111 [0005a40000 - 0005a5e000] BOOTMEM
> #112 [0005a60000 - 0005a7e000] BOOTMEM
> #113 [0005a80000 - 0005a9e000] BOOTMEM
> #114 [0005aa0000 - 0005abe000] BOOTMEM
> #115 [0005ac0000 - 0005ade000] BOOTMEM
> #116 [0005ae0000 - 0005afe000] BOOTMEM
> #117 [0005b00000 - 0005b1e000] BOOTMEM
> #118 [0005b20000 - 0005b3e000] BOOTMEM
> #119 [0005b40000 - 0005b5e000] BOOTMEM
> #120 [0005b60000 - 0005b7e000] BOOTMEM
> #121 [0005b80000 - 0005b9e000] BOOTMEM
> #122 [0005ba0000 - 0005bbe000] BOOTMEM
> #123 [0005bc0000 - 0005bde000] BOOTMEM
> #124 [0005be0000 - 0005bfe000] BOOTMEM
> #125 [00030fd040 - 00030fd048] BOOTMEM
> #126 [00030fd080 - 00030fd088] BOOTMEM
> #127 [00030fd0c0 - 00030fd1c0] BOOTMEM
> #128 [00030fd1c0 - 00030fd3c0] BOOTMEM
> #129 [00030fd3c0 - 00030fd4d0] BOOTMEM
> #130 [00030fd500 - 00030fd548] BOOTMEM
> #131 [00030fd580 - 00030fd5c8] BOOTMEM
> #132 [00030fb040 - 00030fb240] BOOTMEM
> #133 [00030fb240 - 00030fb440] BOOTMEM
> #134 [00030fb440 - 00030fb640] BOOTMEM
> #135 [00030fb640 - 00030fb840] BOOTMEM
> #136 [00030fb840 - 00030fba40] BOOTMEM
> #137 [00030fba40 - 00030fbc40] BOOTMEM
> #138 [00030fbc40 - 00030fbe40] BOOTMEM
> #139 [00030fbe40 - 00030fc040] BOOTMEM
> #140 [00030fc040 - 00030fd040] BOOTMEM
> Memory: 94968k/180204k available (4766k kernel code, 49156k absent,
> 36080k reserved, 7642k data, 1448k init)
> Hierarchical RCU implementation.
> RCU-based detection of stalled CPUs is disabled.
> Verbose stalled-CPUs detection is disabled.
> NR_IRQS:262400 nr_irqs:2008
> Extended CMOS year: 2000
> Spurious LAPIC timer interrupt on cpu 0
> Console: colour VGA+ 80x25
> console [ttyS0] enabled
> Fast TSC calibration using PIT
> Detected 1994.798 MHz processor.
> Calibrating delay loop (skipped), value calculated using timer
> frequency.. 3989.59 BogoMIPS (lpj=1994798)
> pid_max: default: 65536 minimum: 512
> Security Framework initialized
> SELinux: Initializing.
> Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
> Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
> Mount-cache hash table entries: 256
> Initializing cgroup subsys ns
> Initializing cgroup subsys cpuacct
> Initializing cgroup subsys memory
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> Initializing cgroup subsys net_cls
> Initializing cgroup subsys blkio
> CPU: Physical Processor ID: 3
> CPU: Processor Core ID: 0
> mce: CPU supports 22 MCE banks
> using mwait in idle threads.
> Performance Events: PEBS fmt1+, Nehalem events, Intel PMU driver.
> ... version: 3
> ... bit width: 48
> ... generic registers: 4
> ... value mask: 0000ffffffffffff
> ... max period: 000000007fffffff
> ... fixed-purpose events: 3
> ... event mask: 000000070000000f
> SMP alternatives: switching to UP code
> ACPI: Core revision 20100702
> ftrace: converting mcount calls to 0f 1f 44 00 00
> ftrace: allocating 18514 entries in 73 pages
> DMAR: Host address width 48
> DMAR: DRHD base: 0x000000fd800000 flags: 0x0
> IOMMU 0: reg_base_addr fd800000 ver 1:0 cap c90780106f0462 ecap f020fe
> DMAR: DRHD base: 0x000000fd000000 flags: 0x1
> IOMMU 1: reg_base_addr fd000000 ver 1:0 cap c90780106f0462 ecap f020fe
> DMAR: RMRR base: 0x0000007be29000 end: 0x0000007be2bfff
> DMAR: RMRR base: 0x0000007be16000 end: 0x0000007be16fff
> DMAR: RMRR base: 0x0000007be13000 end: 0x0000007be13fff
> DMAR: RMRR base: 0x0000007be10000 end: 0x0000007be10fff
> DMAR: RMRR base: 0x0000007be0d000 end: 0x0000007be0dfff
> DMAR: RMRR base: 0x0000007be0a000 end: 0x0000007be0afff
> DMAR: RMRR base: 0x0000007be07000 end: 0x0000007be07fff
> DMAR: RMRR base: 0x0000007be04000 end: 0x0000007be04fff
> DMAR: RMRR base: 0x0000007be01000 end: 0x0000007be01fff
> DMAR: ATSR flags: 0x0
> DMAR: ATSR flags: 0x0
> DMAR: RHSA base: 0x000000fd000000 proximity domain: 0x0
> DMAR: RHSA base: 0x000000fd800000 proximity domain: 0x2
> IOAPIC id 10 under DRHD base 0xfd800000 IOMMU 0
> IOAPIC id 8 under DRHD base 0xfd000000 IOMMU 1
> IOAPIC id 9 under DRHD base 0xfd000000 IOMMU 1
> Enabled Interrupt-remapping
> Setting APIC routing to cluster x2apic
> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> CPU0: Intel(R) Xeon(R) CPU X7550 @ 2.00GHz stepping 06
> Brought up 1 CPUs
> Total of 1 processors activated (3989.59 BogoMIPS).
> devtmpfs: initialized
> regulator: core version 0.5
> NET: Registered protocol family 16
> ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
> ACPI: bus type pci registered
> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> 0x80000000-0x8fffffff] (base 0x80000000)
> PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> PCI: Using configuration type 1 for base access
> bio: create slab <bio-0> at 0

2010-08-28 18:37:20

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

Never mind. I figured out the bad commit,

commit 3b7433b8a8a83c87972065b1852b7dcae691e464
Merge: 4a386c3 6ee0578
Author: Linus Torvalds <[email protected]>
Date: Sat Aug 7 12:42:58 2010 -0700

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
workqueue: mark init_workqueues() as early_initcall()
workqueue: explain for_each_*cwq_cpu() iterators
fscache: fix build on !CONFIG_SYSCTL
slow-work: kill it
gfs2: use workqueue instead of slow-work
drm: use workqueue instead of slow-work
cifs: use workqueue instead of slow-work
fscache: drop references to slow-work
fscache: convert operation to use workqueue instead of slow-work
fscache: convert object to use workqueue instead of slow-work
workqueue: fix how cpu number is stored in work->data
workqueue: fix mayday_mask handling on UP
workqueue: fix build problem on !CONFIG_SMP
workqueue: fix locking in retry path of maybe_create_worker()
async: use workqueue for worker pool
workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
workqueue: implement unbound workqueue
workqueue: prepare for WQ_UNBOUND implementation
libata: take advantage of cmwq and remove concurrency limitations
workqueue: fix worker management invocation without pending works
...

Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c


----- [email protected] wrote:

> Hmm, after bisect the mainline, it pointed to a wrong commit earlier
> than v2.6.35, since we knew that v2.6.35 was working fine and
> v2.6.36-rc1 was broken. Here was the log starting from HEAD (bad) and
> v2.6.35 (good).
>
> I noticed that we got this too,
> # git bisect good
> Bisecting: a merge base must be tested
> [21aa9af03d06cb1d19a3738e5cf12acff984e69b] sched: add hooks for
> workqueue
>
> # git bisect log
> git bisect start
> # good: [ab69bcd66fb4be64edfc767365cb9eb084961246] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
> git bisect good ab69bcd66fb4be64edfc767365cb9eb084961246
> # good: [1cfd2bda8c486ae0e7a8005354758ebb68172bca] Merge branch
> 'linux-next' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
> git bisect good 1cfd2bda8c486ae0e7a8005354758ebb68172bca
> # bad: [faa38b5e0e092914764cdba9f83d31a3f794d182] Merge branch
> 'for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
> git bisect bad faa38b5e0e092914764cdba9f83d31a3f794d182
> # bad: [5df6b8e65ad0f2eaee202ff002ac00d1ac605315] Merge branch
> 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
> git bisect bad 5df6b8e65ad0f2eaee202ff002ac00d1ac605315
> # bad: [1fc7995d19139d6f99203b43c161968f3f554a15] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
> git bisect bad 1fc7995d19139d6f99203b43c161968f3f554a15
> # good: [e8779776afbd5f2d5315cf48c4257ca7e9b250fb] Merge branch
> 'x86-mce-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
> git bisect good e8779776afbd5f2d5315cf48c4257ca7e9b250fb
> # bad: [f34217977d717385a3e9fd7018ac39fade3964c0] workqueue: implement
> unbound workqueue
> git bisect bad f34217977d717385a3e9fd7018ac39fade3964c0
> # good: [7e27d6e778cd87b6f2415515d7127eba53fe5d02] Linux 2.6.35-rc3
> git bisect good 7e27d6e778cd87b6f2415515d7127eba53fe5d02
> # good: [21aa9af03d06cb1d19a3738e5cf12acff984e69b] sched: add hooks
> for workqueue
> git bisect good 21aa9af03d06cb1d19a3738e5cf12acff984e69b
> # good: [c8e55f360210c1bc49bea5d62bc3939b7ee13483] workqueue:
> implement worker states
> git bisect good c8e55f360210c1bc49bea5d62bc3939b7ee13483
> # good: [d320c03830b17af64e4547075003b1eeb274bc6c] workqueue:
> s/__create_workqueue()/alloc_workqueue()/, and add system workqueues
> git bisect good d320c03830b17af64e4547075003b1eeb274bc6c
> # good: [4ce48b37bfedc2bc11e61eae76784887e88b922c] workqueue: fix race
> condition in flush_workqueue()
> git bisect good 4ce48b37bfedc2bc11e61eae76784887e88b922c
> # good: [d313dd85ad846bc768d58e9ceb28588f917f4c9a] workqueue: fix
> worker management invocation without pending works
> git bisect good d313dd85ad846bc768d58e9ceb28588f917f4c9a
> # good: [bdbc5dd7de5d07d6c9d3536e598956165a031d4c] workqueue: prepare
> for WQ_UNBOUND implementation
> git bisect good bdbc5dd7de5d07d6c9d3536e598956165a031d4c
>
> Any suggestion how to track it down?
>
> ----- "CAI Qian" <[email protected]> wrote:
>
> > Just a head-up, the kdump kernel is stuck here. Bisect indicated
> that
> > cc41f5cede3c63836d1c0958204630b07f5b5ee7 was also good.
> >
> > Kernel command line: ro root=/dev/mapper/vg_intels3e3601-lv_root
> > rd_LVM_LV=vg_intels3e3601/lv_root rd_LVM_LV=vg_intels3e3601/lv_swap
> > rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8
> > SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
> > console=ttyS0,115200 irqpoll maxcpus=1 reset_devices
> > cgroup_disable=memory memmap=exactmap memmap=640K@0K
> > memmap=130412K@49792K elfcorehdr=180204K memmap=128K$896K
> > memmap=2124K#1978520K memmap=4280K#1980644K memmap=464K$1984924K
> > memmap=156K#1985388K memmap=84K$1985544K memmap=92K#1985628K
> > memmap=8K$1985720K memmap=124K#1985728K memmap=136K$1985852K
> > memmap=192K#1985988K memmap=260K$1986180K memmap=2816K#1986440K
> > memmap=176K$1989256K memmap=3028K#1989432K memmap=2048K#1992460K
> > memmap=2072K#1994508K memmap=8K$1996580K memmap=324K#1996588K
> > memmap=428K$1996912K memmap=31584K#1997340K memmap=960K$2028924K
> > memmap=1248K#2029884K memmap=288K#2031132K memmap=192K#2031420K
> > memmap=327684K$2031612K memmap=16384K$4128768K memmap=16K$4174960K
> > memmap=16384K$4177920K
> > Misrouted IRQ fixup and polling support enabled
> > This may significantly impact system performance
> > Disabling memory control group subsystem
> > PID hash table entries: 512 (order: 0, 4096 bytes)
> > Checking aperture...
> > No AGP bridge found
> > Queued invalidation will be enabled to support x2apic and
> > Intr-remapping.
> > Subtract (141 early reservations)
> > #1 [0004000000 - 00052b1548] TEXT DATA BSS
> > #2 [000ab63000 - 000afef000] RAMDISK
> > #3 [00052b2000 - 00052b2220] BRK
> > #4 [000009b000 - 00000fdd20] BIOS reserved
> > #5 [00000fdd30 - 0000100000] BIOS reserved
> > #6 [00000fdd20 - 00000fdd30] MP-table mpf
> > #7 [0000000012 - 000000f012] MP-table mpc
> > #8 [0000010000 - 0000012000] TRAMPOLINE
> > #9 [0000012000 - 0000016000] ACPI WAKEUP
> > #10 [0000016000 - 0000017000] PGTABLE
> > #11 [0000017000 - 000001703c] ACPI SLIT
> > #12 [000000f040 - 000000f4c0] MEMNODEMAP
> > #13 [00030a0000 - 00030c7000] NODE_DATA
> > #14 [00030c7000 - 00030c8000] BOOTMEM
> > #15 [00034c8000 - 00034c8030] BOOTMEM
> > #16 [00038c9000 - 00038ca000] BOOTMEM
> > #17 [00038ca000 - 00038cb000] BOOTMEM
> > #18 [0003a00000 - 0003e00000] MEMMAP 0
> > #19 [00030c8000 - 00030e0000] BOOTMEM
> > #20 [00030e0000 - 00030f8000] BOOTMEM
> > #21 [00030f8000 - 00030f9000] BOOTMEM
> > #22 [00030f9000 - 00030f9041] BOOTMEM
> > #23 [00030f9080 - 00030f9149] BOOTMEM
> > #24 [00030f9180 - 00030f9768] BOOTMEM
> > #25 [00030f9780 - 00030f97e8] BOOTMEM
> > #26 [00030f9800 - 00030f9868] BOOTMEM
> > #27 [00030f9880 - 00030f98e8] BOOTMEM
> > #28 [00030f9900 - 00030f9968] BOOTMEM
> > #29 [00030f9980 - 00030f99e8] BOOTMEM
> > #30 [00030f9a00 - 00030f9a68] BOOTMEM
> > #31 [00030f9a80 - 00030f9ae8] BOOTMEM
> > #32 [00030f9b00 - 00030f9b68] BOOTMEM
> > #33 [00030f9b80 - 00030f9be8] BOOTMEM
> > #34 [00030f9c00 - 00030f9c68] BOOTMEM
> > #35 [00030f9c80 - 00030f9ce8] BOOTMEM
> > #36 [00030f9d00 - 00030f9d68] BOOTMEM
> > #37 [00030f9d80 - 00030f9de8] BOOTMEM
> > #38 [00030f9e00 - 00030f9e68] BOOTMEM
> > #39 [00030f9e80 - 00030f9ee8] BOOTMEM
> > #40 [00030f9f00 - 00030f9f68] BOOTMEM
> > #41 [00030f9f80 - 00030f9fe8] BOOTMEM
> > #42 [00030fa000 - 00030fa068] BOOTMEM
> > #43 [00030fa080 - 00030fa0e8] BOOTMEM
> > #44 [00030fa100 - 00030fa168] BOOTMEM
> > #45 [00030fa180 - 00030fa1e8] BOOTMEM
> > #46 [00030fa200 - 00030fa268] BOOTMEM
> > #47 [00030fa280 - 00030fa2e8] BOOTMEM
> > #48 [00030fa300 - 00030fa368] BOOTMEM
> > #49 [00030fa380 - 00030fa3e8] BOOTMEM
> > #50 [00030fa400 - 00030fa468] BOOTMEM
> > #51 [00030fa480 - 00030fa4e8] BOOTMEM
> > #52 [00030fa500 - 00030fa568] BOOTMEM
> > #53 [00030fa580 - 00030fa5e8] BOOTMEM
> > #54 [00030fa600 - 00030fa668] BOOTMEM
> > #55 [00030fa680 - 00030fa6e8] BOOTMEM
> > #56 [00030fa700 - 00030fa768] BOOTMEM
> > #57 [00030fa780 - 00030fa7e8] BOOTMEM
> > #58 [00030fa800 - 00030fa820] BOOTMEM
> > #59 [00030fa840 - 00030fac0e] BOOTMEM
> > #60 [00030fac40 - 00030fb00e] BOOTMEM
> > #61 [0005400000 - 000541e000] BOOTMEM
> > #62 [0005420000 - 000543e000] BOOTMEM
> > #63 [0005440000 - 000545e000] BOOTMEM
> > #64 [0005460000 - 000547e000] BOOTMEM
> > #65 [0005480000 - 000549e000] BOOTMEM
> > #66 [00054a0000 - 00054be000] BOOTMEM
> > #67 [00054c0000 - 00054de000] BOOTMEM
> > #68 [00054e0000 - 00054fe000] BOOTMEM
> > #69 [0005500000 - 000551e000] BOOTMEM
> > #70 [0005520000 - 000553e000] BOOTMEM
> > #71 [0005540000 - 000555e000] BOOTMEM
> > #72 [0005560000 - 000557e000] BOOTMEM
> > #73 [0005580000 - 000559e000] BOOTMEM
> > #74 [00055a0000 - 00055be000] BOOTMEM
> > #75 [00055c0000 - 00055de000] BOOTMEM
> > #76 [00055e0000 - 00055fe000] BOOTMEM
> > #77 [0005600000 - 000561e000] BOOTMEM
> > #78 [0005620000 - 000563e000] BOOTMEM
> > #79 [0005640000 - 000565e000] BOOTMEM
> > #80 [0005660000 - 000567e000] BOOTMEM
> > #81 [0005680000 - 000569e000] BOOTMEM
> > #82 [00056a0000 - 00056be000] BOOTMEM
> > #83 [00056c0000 - 00056de000] BOOTMEM
> > #84 [00056e0000 - 00056fe000] BOOTMEM
> > #85 [0005700000 - 000571e000] BOOTMEM
> > #86 [0005720000 - 000573e000] BOOTMEM
> > #87 [0005740000 - 000575e000] BOOTMEM
> > #88 [0005760000 - 000577e000] BOOTMEM
> > #89 [0005780000 - 000579e000] BOOTMEM
> > #90 [00057a0000 - 00057be000] BOOTMEM
> > #91 [00057c0000 - 00057de000] BOOTMEM
> > #92 [00057e0000 - 00057fe000] BOOTMEM
> > #93 [0005800000 - 000581e000] BOOTMEM
> > #94 [0005820000 - 000583e000] BOOTMEM
> > #95 [0005840000 - 000585e000] BOOTMEM
> > #96 [0005860000 - 000587e000] BOOTMEM
> > #97 [0005880000 - 000589e000] BOOTMEM
> > #98 [00058a0000 - 00058be000] BOOTMEM
> > #99 [00058c0000 - 00058de000] BOOTMEM
> > #100 [00058e0000 - 00058fe000] BOOTMEM
> > #101 [0005900000 - 000591e000] BOOTMEM
> > #102 [0005920000 - 000593e000] BOOTMEM
> > #103 [0005940000 - 000595e000] BOOTMEM
> > #104 [0005960000 - 000597e000] BOOTMEM
> > #105 [0005980000 - 000599e000] BOOTMEM
> > #106 [00059a0000 - 00059be000] BOOTMEM
> > #107 [00059c0000 - 00059de000] BOOTMEM
> > #108 [00059e0000 - 00059fe000] BOOTMEM
> > #109 [0005a00000 - 0005a1e000] BOOTMEM
> > #110 [0005a20000 - 0005a3e000] BOOTMEM
> > #111 [0005a40000 - 0005a5e000] BOOTMEM
> > #112 [0005a60000 - 0005a7e000] BOOTMEM
> > #113 [0005a80000 - 0005a9e000] BOOTMEM
> > #114 [0005aa0000 - 0005abe000] BOOTMEM
> > #115 [0005ac0000 - 0005ade000] BOOTMEM
> > #116 [0005ae0000 - 0005afe000] BOOTMEM
> > #117 [0005b00000 - 0005b1e000] BOOTMEM
> > #118 [0005b20000 - 0005b3e000] BOOTMEM
> > #119 [0005b40000 - 0005b5e000] BOOTMEM
> > #120 [0005b60000 - 0005b7e000] BOOTMEM
> > #121 [0005b80000 - 0005b9e000] BOOTMEM
> > #122 [0005ba0000 - 0005bbe000] BOOTMEM
> > #123 [0005bc0000 - 0005bde000] BOOTMEM
> > #124 [0005be0000 - 0005bfe000] BOOTMEM
> > #125 [00030fd040 - 00030fd048] BOOTMEM
> > #126 [00030fd080 - 00030fd088] BOOTMEM
> > #127 [00030fd0c0 - 00030fd1c0] BOOTMEM
> > #128 [00030fd1c0 - 00030fd3c0] BOOTMEM
> > #129 [00030fd3c0 - 00030fd4d0] BOOTMEM
> > #130 [00030fd500 - 00030fd548] BOOTMEM
> > #131 [00030fd580 - 00030fd5c8] BOOTMEM
> > #132 [00030fb040 - 00030fb240] BOOTMEM
> > #133 [00030fb240 - 00030fb440] BOOTMEM
> > #134 [00030fb440 - 00030fb640] BOOTMEM
> > #135 [00030fb640 - 00030fb840] BOOTMEM
> > #136 [00030fb840 - 00030fba40] BOOTMEM
> > #137 [00030fba40 - 00030fbc40] BOOTMEM
> > #138 [00030fbc40 - 00030fbe40] BOOTMEM
> > #139 [00030fbe40 - 00030fc040] BOOTMEM
> > #140 [00030fc040 - 00030fd040] BOOTMEM
> > Memory: 94968k/180204k available (4766k kernel code, 49156k absent,
> > 36080k reserved, 7642k data, 1448k init)
> > Hierarchical RCU implementation.
> > RCU-based detection of stalled CPUs is disabled.
> > Verbose stalled-CPUs detection is disabled.
> > NR_IRQS:262400 nr_irqs:2008
> > Extended CMOS year: 2000
> > Spurious LAPIC timer interrupt on cpu 0
> > Console: colour VGA+ 80x25
> > console [ttyS0] enabled
> > Fast TSC calibration using PIT
> > Detected 1994.798 MHz processor.
> > Calibrating delay loop (skipped), value calculated using timer
> > frequency.. 3989.59 BogoMIPS (lpj=1994798)
> > pid_max: default: 65536 minimum: 512
> > Security Framework initialized
> > SELinux: Initializing.
> > Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
> > Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
> > Mount-cache hash table entries: 256
> > Initializing cgroup subsys ns
> > Initializing cgroup subsys cpuacct
> > Initializing cgroup subsys memory
> > Initializing cgroup subsys devices
> > Initializing cgroup subsys freezer
> > Initializing cgroup subsys net_cls
> > Initializing cgroup subsys blkio
> > CPU: Physical Processor ID: 3
> > CPU: Processor Core ID: 0
> > mce: CPU supports 22 MCE banks
> > using mwait in idle threads.
> > Performance Events: PEBS fmt1+, Nehalem events, Intel PMU driver.
> > ... version: 3
> > ... bit width: 48
> > ... generic registers: 4
> > ... value mask: 0000ffffffffffff
> > ... max period: 000000007fffffff
> > ... fixed-purpose events: 3
> > ... event mask: 000000070000000f
> > SMP alternatives: switching to UP code
> > ACPI: Core revision 20100702
> > ftrace: converting mcount calls to 0f 1f 44 00 00
> > ftrace: allocating 18514 entries in 73 pages
> > DMAR: Host address width 48
> > DMAR: DRHD base: 0x000000fd800000 flags: 0x0
> > IOMMU 0: reg_base_addr fd800000 ver 1:0 cap c90780106f0462 ecap
> f020fe
> > DMAR: DRHD base: 0x000000fd000000 flags: 0x1
> > IOMMU 1: reg_base_addr fd000000 ver 1:0 cap c90780106f0462 ecap
> f020fe
> > DMAR: RMRR base: 0x0000007be29000 end: 0x0000007be2bfff
> > DMAR: RMRR base: 0x0000007be16000 end: 0x0000007be16fff
> > DMAR: RMRR base: 0x0000007be13000 end: 0x0000007be13fff
> > DMAR: RMRR base: 0x0000007be10000 end: 0x0000007be10fff
> > DMAR: RMRR base: 0x0000007be0d000 end: 0x0000007be0dfff
> > DMAR: RMRR base: 0x0000007be0a000 end: 0x0000007be0afff
> > DMAR: RMRR base: 0x0000007be07000 end: 0x0000007be07fff
> > DMAR: RMRR base: 0x0000007be04000 end: 0x0000007be04fff
> > DMAR: RMRR base: 0x0000007be01000 end: 0x0000007be01fff
> > DMAR: ATSR flags: 0x0
> > DMAR: ATSR flags: 0x0
> > DMAR: RHSA base: 0x000000fd000000 proximity domain: 0x0
> > DMAR: RHSA base: 0x000000fd800000 proximity domain: 0x2
> > IOAPIC id 10 under DRHD base 0xfd800000 IOMMU 0
> > IOAPIC id 8 under DRHD base 0xfd000000 IOMMU 1
> > IOAPIC id 9 under DRHD base 0xfd000000 IOMMU 1
> > Enabled Interrupt-remapping
> > Setting APIC routing to cluster x2apic
> > ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> > CPU0: Intel(R) Xeon(R) CPU X7550 @ 2.00GHz stepping 06
> > Brought up 1 CPUs
> > Total of 1 processors activated (3989.59 BogoMIPS).
> > devtmpfs: initialized
> > regulator: core version 0.5
> > NET: Registered protocol family 16
> > ACPI FADT declares the system doesn't support PCIe ASPM, so disable
> it
> > ACPI: bus type pci registered
> > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> > 0x80000000-0x8fffffff] (base 0x80000000)
> > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> > PCI: Using configuration type 1 for base access
> > bio: create slab <bio-0> at 0
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2010-08-29 07:01:23

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

Further bisect indicated this bad commit from the merge. Given kdump kernel was running with maxcpus=1, I guess this work caused fs/bio.c hung in the workqueue on UP. Reverted the whole merge let kdump work again.

commit e22bee782b3b00bd4534ae9b1c5fb2e8e6573c5c
Author: Tejun Heo <[email protected]>
Date: Tue Jun 29 10:07:14 2010 +0200

workqueue: implement concurrency managed dynamic worker pool

Instead of creating a worker for each cwq and putting it into the
shared pool, manage per-cpu workers dynamically.

Works aren't supposed to be cpu cycle hogs and maintaining just enough
concurrency to prevent work processing from stalling due to lack of
processing context is optimal. gcwq keeps the number of concurrent
active workers to minimum but no less. As long as there's one or more
running workers on the cpu, no new worker is scheduled so that works
can be processed in batch as much as possible but when the last
running worker blocks, gcwq immediately schedules new worker so that
the cpu doesn't sit idle while there are works to be processed.

gcwq always keeps at least single idle worker around. When a new
worker is necessary and the worker is the last idle one, the worker
assumes the role of "manager" and manages the worker pool -
ie. creates another worker. Forward-progress is guaranteed by having
dedicated rescue workers for workqueues which may be necessary while
creating a new worker. When the manager is having problem creating a
new worker, mayday timer activates and rescue workers are summoned to
the cpu and execute works which might be necessary to create new
workers.

Trustee is expanded to serve the role of manager while a CPU is being
taken down and stays down. As no new works are supposed to be queued
on a dead cpu, it just needs to drain all the existing ones. Trustee
continues to try to create new workers and summon rescuers as long as
there are pending works. If the CPU is brought back up while the
trustee is still trying to drain the gcwq from the previous offlining,
the trustee will kill all idles ones and tell workers which are still
busy to rebind to the cpu, and pass control over to gcwq which assumes
the manager role as necessary.

Concurrency managed worker pool reduces the number of workers
drastically. Only workers which are necessary to keep the processing
going are created and kept. Also, it reduces cache footprint by
avoiding unnecessarily switching contexts between different workers.

Please note that this patch does not increase max_active of any
workqueue. All workqueues can still only process one work per cpu.

Signed-off-by: Tejun Heo <[email protected]>

----- [email protected] wrote:

> Never mind. I figured out the bad commit,
>
> commit 3b7433b8a8a83c87972065b1852b7dcae691e464
> Merge: 4a386c3 6ee0578
> Author: Linus Torvalds <[email protected]>
> Date: Sat Aug 7 12:42:58 2010 -0700
>
> Merge branch 'for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
>
> * 'for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
> workqueue: mark init_workqueues() as early_initcall()
> workqueue: explain for_each_*cwq_cpu() iterators
> fscache: fix build on !CONFIG_SYSCTL
> slow-work: kill it
> gfs2: use workqueue instead of slow-work
> drm: use workqueue instead of slow-work
> cifs: use workqueue instead of slow-work
> fscache: drop references to slow-work
> fscache: convert operation to use workqueue instead of
> slow-work
> fscache: convert object to use workqueue instead of slow-work
> workqueue: fix how cpu number is stored in work->data
> workqueue: fix mayday_mask handling on UP
> workqueue: fix build problem on !CONFIG_SMP
> workqueue: fix locking in retry path of maybe_create_worker()
> async: use workqueue for worker pool
> workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
> workqueue: implement unbound workqueue
> workqueue: prepare for WQ_UNBOUND implementation
> libata: take advantage of cmwq and remove concurrency
> limitations
> workqueue: fix worker management invocation without pending
> works
> ...
>
> Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial
> conflicts in
> include/linux/workqueue.h, kernel/trace/Kconfig and
> kernel/workqueue.c
>
>
> ----- [email protected] wrote:
>
> > Hmm, after bisect the mainline, it pointed to a wrong commit
> earlier
> > than v2.6.35, since we knew that v2.6.35 was working fine and
> > v2.6.36-rc1 was broken. Here was the log starting from HEAD (bad)
> and
> > v2.6.35 (good).
> >
> > I noticed that we got this too,
> > # git bisect good
> > Bisecting: a merge base must be tested
> > [21aa9af03d06cb1d19a3738e5cf12acff984e69b] sched: add hooks for
> > workqueue
> >
> > # git bisect log
> > git bisect start
> > # good: [ab69bcd66fb4be64edfc767365cb9eb084961246] Merge
> >
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
> > git bisect good ab69bcd66fb4be64edfc767365cb9eb084961246
> > # good: [1cfd2bda8c486ae0e7a8005354758ebb68172bca] Merge branch
> > 'linux-next' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
> > git bisect good 1cfd2bda8c486ae0e7a8005354758ebb68172bca
> > # bad: [faa38b5e0e092914764cdba9f83d31a3f794d182] Merge branch
> > 'for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
> > git bisect bad faa38b5e0e092914764cdba9f83d31a3f794d182
> > # bad: [5df6b8e65ad0f2eaee202ff002ac00d1ac605315] Merge branch
> > 'nfs-for-2.6.36' of
> git://git.linux-nfs.org/projects/trondmy/nfs-2.6
> > git bisect bad 5df6b8e65ad0f2eaee202ff002ac00d1ac605315
> > # bad: [1fc7995d19139d6f99203b43c161968f3f554a15] Merge
> > git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
> > git bisect bad 1fc7995d19139d6f99203b43c161968f3f554a15
> > # good: [e8779776afbd5f2d5315cf48c4257ca7e9b250fb] Merge branch
> > 'x86-mce-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
> > git bisect good e8779776afbd5f2d5315cf48c4257ca7e9b250fb
> > # bad: [f34217977d717385a3e9fd7018ac39fade3964c0] workqueue:
> implement
> > unbound workqueue
> > git bisect bad f34217977d717385a3e9fd7018ac39fade3964c0
> > # good: [7e27d6e778cd87b6f2415515d7127eba53fe5d02] Linux 2.6.35-rc3
> > git bisect good 7e27d6e778cd87b6f2415515d7127eba53fe5d02
> > # good: [21aa9af03d06cb1d19a3738e5cf12acff984e69b] sched: add hooks
> > for workqueue
> > git bisect good 21aa9af03d06cb1d19a3738e5cf12acff984e69b
> > # good: [c8e55f360210c1bc49bea5d62bc3939b7ee13483] workqueue:
> > implement worker states
> > git bisect good c8e55f360210c1bc49bea5d62bc3939b7ee13483
> > # good: [d320c03830b17af64e4547075003b1eeb274bc6c] workqueue:
> > s/__create_workqueue()/alloc_workqueue()/, and add system
> workqueues
> > git bisect good d320c03830b17af64e4547075003b1eeb274bc6c
> > # good: [4ce48b37bfedc2bc11e61eae76784887e88b922c] workqueue: fix
> race
> > condition in flush_workqueue()
> > git bisect good 4ce48b37bfedc2bc11e61eae76784887e88b922c
> > # good: [d313dd85ad846bc768d58e9ceb28588f917f4c9a] workqueue: fix
> > worker management invocation without pending works
> > git bisect good d313dd85ad846bc768d58e9ceb28588f917f4c9a
> > # good: [bdbc5dd7de5d07d6c9d3536e598956165a031d4c] workqueue:
> prepare
> > for WQ_UNBOUND implementation
> > git bisect good bdbc5dd7de5d07d6c9d3536e598956165a031d4c
> >
> > Any suggestion how to track it down?
> >
> > ----- "CAI Qian" <[email protected]> wrote:
> >
> > > Just a head-up, the kdump kernel is stuck here. Bisect indicated
> > that
> > > cc41f5cede3c63836d1c0958204630b07f5b5ee7 was also good.
> > >
> > > Kernel command line: ro root=/dev/mapper/vg_intels3e3601-lv_root
> > > rd_LVM_LV=vg_intels3e3601/lv_root
> rd_LVM_LV=vg_intels3e3601/lv_swap
> > > rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8
> > > SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
> > > console=ttyS0,115200 irqpoll maxcpus=1 reset_devices
> > > cgroup_disable=memory memmap=exactmap memmap=640K@0K
> > > memmap=130412K@49792K elfcorehdr=180204K memmap=128K$896K
> > > memmap=2124K#1978520K memmap=4280K#1980644K memmap=464K$1984924K
> > > memmap=156K#1985388K memmap=84K$1985544K memmap=92K#1985628K
> > > memmap=8K$1985720K memmap=124K#1985728K memmap=136K$1985852K
> > > memmap=192K#1985988K memmap=260K$1986180K memmap=2816K#1986440K
> > > memmap=176K$1989256K memmap=3028K#1989432K memmap=2048K#1992460K
> > > memmap=2072K#1994508K memmap=8K$1996580K memmap=324K#1996588K
> > > memmap=428K$1996912K memmap=31584K#1997340K memmap=960K$2028924K
> > > memmap=1248K#2029884K memmap=288K#2031132K memmap=192K#2031420K
> > > memmap=327684K$2031612K memmap=16384K$4128768K
> memmap=16K$4174960K
> > > memmap=16384K$4177920K
> > > Misrouted IRQ fixup and polling support enabled
> > > This may significantly impact system performance
> > > Disabling memory control group subsystem
> > > PID hash table entries: 512 (order: 0, 4096 bytes)
> > > Checking aperture...
> > > No AGP bridge found
> > > Queued invalidation will be enabled to support x2apic and
> > > Intr-remapping.
> > > Subtract (141 early reservations)
> > > #1 [0004000000 - 00052b1548] TEXT DATA BSS
> > > #2 [000ab63000 - 000afef000] RAMDISK
> > > #3 [00052b2000 - 00052b2220] BRK
> > > #4 [000009b000 - 00000fdd20] BIOS reserved
> > > #5 [00000fdd30 - 0000100000] BIOS reserved
> > > #6 [00000fdd20 - 00000fdd30] MP-table mpf
> > > #7 [0000000012 - 000000f012] MP-table mpc
> > > #8 [0000010000 - 0000012000] TRAMPOLINE
> > > #9 [0000012000 - 0000016000] ACPI WAKEUP
> > > #10 [0000016000 - 0000017000] PGTABLE
> > > #11 [0000017000 - 000001703c] ACPI SLIT
> > > #12 [000000f040 - 000000f4c0] MEMNODEMAP
> > > #13 [00030a0000 - 00030c7000] NODE_DATA
> > > #14 [00030c7000 - 00030c8000] BOOTMEM
> > > #15 [00034c8000 - 00034c8030] BOOTMEM
> > > #16 [00038c9000 - 00038ca000] BOOTMEM
> > > #17 [00038ca000 - 00038cb000] BOOTMEM
> > > #18 [0003a00000 - 0003e00000] MEMMAP 0
> > > #19 [00030c8000 - 00030e0000] BOOTMEM
> > > #20 [00030e0000 - 00030f8000] BOOTMEM
> > > #21 [00030f8000 - 00030f9000] BOOTMEM
> > > #22 [00030f9000 - 00030f9041] BOOTMEM
> > > #23 [00030f9080 - 00030f9149] BOOTMEM
> > > #24 [00030f9180 - 00030f9768] BOOTMEM
> > > #25 [00030f9780 - 00030f97e8] BOOTMEM
> > > #26 [00030f9800 - 00030f9868] BOOTMEM
> > > #27 [00030f9880 - 00030f98e8] BOOTMEM
> > > #28 [00030f9900 - 00030f9968] BOOTMEM
> > > #29 [00030f9980 - 00030f99e8] BOOTMEM
> > > #30 [00030f9a00 - 00030f9a68] BOOTMEM
> > > #31 [00030f9a80 - 00030f9ae8] BOOTMEM
> > > #32 [00030f9b00 - 00030f9b68] BOOTMEM
> > > #33 [00030f9b80 - 00030f9be8] BOOTMEM
> > > #34 [00030f9c00 - 00030f9c68] BOOTMEM
> > > #35 [00030f9c80 - 00030f9ce8] BOOTMEM
> > > #36 [00030f9d00 - 00030f9d68] BOOTMEM
> > > #37 [00030f9d80 - 00030f9de8] BOOTMEM
> > > #38 [00030f9e00 - 00030f9e68] BOOTMEM
> > > #39 [00030f9e80 - 00030f9ee8] BOOTMEM
> > > #40 [00030f9f00 - 00030f9f68] BOOTMEM
> > > #41 [00030f9f80 - 00030f9fe8] BOOTMEM
> > > #42 [00030fa000 - 00030fa068] BOOTMEM
> > > #43 [00030fa080 - 00030fa0e8] BOOTMEM
> > > #44 [00030fa100 - 00030fa168] BOOTMEM
> > > #45 [00030fa180 - 00030fa1e8] BOOTMEM
> > > #46 [00030fa200 - 00030fa268] BOOTMEM
> > > #47 [00030fa280 - 00030fa2e8] BOOTMEM
> > > #48 [00030fa300 - 00030fa368] BOOTMEM
> > > #49 [00030fa380 - 00030fa3e8] BOOTMEM
> > > #50 [00030fa400 - 00030fa468] BOOTMEM
> > > #51 [00030fa480 - 00030fa4e8] BOOTMEM
> > > #52 [00030fa500 - 00030fa568] BOOTMEM
> > > #53 [00030fa580 - 00030fa5e8] BOOTMEM
> > > #54 [00030fa600 - 00030fa668] BOOTMEM
> > > #55 [00030fa680 - 00030fa6e8] BOOTMEM
> > > #56 [00030fa700 - 00030fa768] BOOTMEM
> > > #57 [00030fa780 - 00030fa7e8] BOOTMEM
> > > #58 [00030fa800 - 00030fa820] BOOTMEM
> > > #59 [00030fa840 - 00030fac0e] BOOTMEM
> > > #60 [00030fac40 - 00030fb00e] BOOTMEM
> > > #61 [0005400000 - 000541e000] BOOTMEM
> > > #62 [0005420000 - 000543e000] BOOTMEM
> > > #63 [0005440000 - 000545e000] BOOTMEM
> > > #64 [0005460000 - 000547e000] BOOTMEM
> > > #65 [0005480000 - 000549e000] BOOTMEM
> > > #66 [00054a0000 - 00054be000] BOOTMEM
> > > #67 [00054c0000 - 00054de000] BOOTMEM
> > > #68 [00054e0000 - 00054fe000] BOOTMEM
> > > #69 [0005500000 - 000551e000] BOOTMEM
> > > #70 [0005520000 - 000553e000] BOOTMEM
> > > #71 [0005540000 - 000555e000] BOOTMEM
> > > #72 [0005560000 - 000557e000] BOOTMEM
> > > #73 [0005580000 - 000559e000] BOOTMEM
> > > #74 [00055a0000 - 00055be000] BOOTMEM
> > > #75 [00055c0000 - 00055de000] BOOTMEM
> > > #76 [00055e0000 - 00055fe000] BOOTMEM
> > > #77 [0005600000 - 000561e000] BOOTMEM
> > > #78 [0005620000 - 000563e000] BOOTMEM
> > > #79 [0005640000 - 000565e000] BOOTMEM
> > > #80 [0005660000 - 000567e000] BOOTMEM
> > > #81 [0005680000 - 000569e000] BOOTMEM
> > > #82 [00056a0000 - 00056be000] BOOTMEM
> > > #83 [00056c0000 - 00056de000] BOOTMEM
> > > #84 [00056e0000 - 00056fe000] BOOTMEM
> > > #85 [0005700000 - 000571e000] BOOTMEM
> > > #86 [0005720000 - 000573e000] BOOTMEM
> > > #87 [0005740000 - 000575e000] BOOTMEM
> > > #88 [0005760000 - 000577e000] BOOTMEM
> > > #89 [0005780000 - 000579e000] BOOTMEM
> > > #90 [00057a0000 - 00057be000] BOOTMEM
> > > #91 [00057c0000 - 00057de000] BOOTMEM
> > > #92 [00057e0000 - 00057fe000] BOOTMEM
> > > #93 [0005800000 - 000581e000] BOOTMEM
> > > #94 [0005820000 - 000583e000] BOOTMEM
> > > #95 [0005840000 - 000585e000] BOOTMEM
> > > #96 [0005860000 - 000587e000] BOOTMEM
> > > #97 [0005880000 - 000589e000] BOOTMEM
> > > #98 [00058a0000 - 00058be000] BOOTMEM
> > > #99 [00058c0000 - 00058de000] BOOTMEM
> > > #100 [00058e0000 - 00058fe000] BOOTMEM
> > > #101 [0005900000 - 000591e000] BOOTMEM
> > > #102 [0005920000 - 000593e000] BOOTMEM
> > > #103 [0005940000 - 000595e000] BOOTMEM
> > > #104 [0005960000 - 000597e000] BOOTMEM
> > > #105 [0005980000 - 000599e000] BOOTMEM
> > > #106 [00059a0000 - 00059be000] BOOTMEM
> > > #107 [00059c0000 - 00059de000] BOOTMEM
> > > #108 [00059e0000 - 00059fe000] BOOTMEM
> > > #109 [0005a00000 - 0005a1e000] BOOTMEM
> > > #110 [0005a20000 - 0005a3e000] BOOTMEM
> > > #111 [0005a40000 - 0005a5e000] BOOTMEM
> > > #112 [0005a60000 - 0005a7e000] BOOTMEM
> > > #113 [0005a80000 - 0005a9e000] BOOTMEM
> > > #114 [0005aa0000 - 0005abe000] BOOTMEM
> > > #115 [0005ac0000 - 0005ade000] BOOTMEM
> > > #116 [0005ae0000 - 0005afe000] BOOTMEM
> > > #117 [0005b00000 - 0005b1e000] BOOTMEM
> > > #118 [0005b20000 - 0005b3e000] BOOTMEM
> > > #119 [0005b40000 - 0005b5e000] BOOTMEM
> > > #120 [0005b60000 - 0005b7e000] BOOTMEM
> > > #121 [0005b80000 - 0005b9e000] BOOTMEM
> > > #122 [0005ba0000 - 0005bbe000] BOOTMEM
> > > #123 [0005bc0000 - 0005bde000] BOOTMEM
> > > #124 [0005be0000 - 0005bfe000] BOOTMEM
> > > #125 [00030fd040 - 00030fd048] BOOTMEM
> > > #126 [00030fd080 - 00030fd088] BOOTMEM
> > > #127 [00030fd0c0 - 00030fd1c0] BOOTMEM
> > > #128 [00030fd1c0 - 00030fd3c0] BOOTMEM
> > > #129 [00030fd3c0 - 00030fd4d0] BOOTMEM
> > > #130 [00030fd500 - 00030fd548] BOOTMEM
> > > #131 [00030fd580 - 00030fd5c8] BOOTMEM
> > > #132 [00030fb040 - 00030fb240] BOOTMEM
> > > #133 [00030fb240 - 00030fb440] BOOTMEM
> > > #134 [00030fb440 - 00030fb640] BOOTMEM
> > > #135 [00030fb640 - 00030fb840] BOOTMEM
> > > #136 [00030fb840 - 00030fba40] BOOTMEM
> > > #137 [00030fba40 - 00030fbc40] BOOTMEM
> > > #138 [00030fbc40 - 00030fbe40] BOOTMEM
> > > #139 [00030fbe40 - 00030fc040] BOOTMEM
> > > #140 [00030fc040 - 00030fd040] BOOTMEM
> > > Memory: 94968k/180204k available (4766k kernel code, 49156k
> absent,
> > > 36080k reserved, 7642k data, 1448k init)
> > > Hierarchical RCU implementation.
> > > RCU-based detection of stalled CPUs is disabled.
> > > Verbose stalled-CPUs detection is disabled.
> > > NR_IRQS:262400 nr_irqs:2008
> > > Extended CMOS year: 2000
> > > Spurious LAPIC timer interrupt on cpu 0
> > > Console: colour VGA+ 80x25
> > > console [ttyS0] enabled
> > > Fast TSC calibration using PIT
> > > Detected 1994.798 MHz processor.
> > > Calibrating delay loop (skipped), value calculated using timer
> > > frequency.. 3989.59 BogoMIPS (lpj=1994798)
> > > pid_max: default: 65536 minimum: 512
> > > Security Framework initialized
> > > SELinux: Initializing.
> > > Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
> > > Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
> > > Mount-cache hash table entries: 256
> > > Initializing cgroup subsys ns
> > > Initializing cgroup subsys cpuacct
> > > Initializing cgroup subsys memory
> > > Initializing cgroup subsys devices
> > > Initializing cgroup subsys freezer
> > > Initializing cgroup subsys net_cls
> > > Initializing cgroup subsys blkio
> > > CPU: Physical Processor ID: 3
> > > CPU: Processor Core ID: 0
> > > mce: CPU supports 22 MCE banks
> > > using mwait in idle threads.
> > > Performance Events: PEBS fmt1+, Nehalem events, Intel PMU driver.
> > > ... version: 3
> > > ... bit width: 48
> > > ... generic registers: 4
> > > ... value mask: 0000ffffffffffff
> > > ... max period: 000000007fffffff
> > > ... fixed-purpose events: 3
> > > ... event mask: 000000070000000f
> > > SMP alternatives: switching to UP code
> > > ACPI: Core revision 20100702
> > > ftrace: converting mcount calls to 0f 1f 44 00 00
> > > ftrace: allocating 18514 entries in 73 pages
> > > DMAR: Host address width 48
> > > DMAR: DRHD base: 0x000000fd800000 flags: 0x0
> > > IOMMU 0: reg_base_addr fd800000 ver 1:0 cap c90780106f0462 ecap
> > f020fe
> > > DMAR: DRHD base: 0x000000fd000000 flags: 0x1
> > > IOMMU 1: reg_base_addr fd000000 ver 1:0 cap c90780106f0462 ecap
> > f020fe
> > > DMAR: RMRR base: 0x0000007be29000 end: 0x0000007be2bfff
> > > DMAR: RMRR base: 0x0000007be16000 end: 0x0000007be16fff
> > > DMAR: RMRR base: 0x0000007be13000 end: 0x0000007be13fff
> > > DMAR: RMRR base: 0x0000007be10000 end: 0x0000007be10fff
> > > DMAR: RMRR base: 0x0000007be0d000 end: 0x0000007be0dfff
> > > DMAR: RMRR base: 0x0000007be0a000 end: 0x0000007be0afff
> > > DMAR: RMRR base: 0x0000007be07000 end: 0x0000007be07fff
> > > DMAR: RMRR base: 0x0000007be04000 end: 0x0000007be04fff
> > > DMAR: RMRR base: 0x0000007be01000 end: 0x0000007be01fff
> > > DMAR: ATSR flags: 0x0
> > > DMAR: ATSR flags: 0x0
> > > DMAR: RHSA base: 0x000000fd000000 proximity domain: 0x0
> > > DMAR: RHSA base: 0x000000fd800000 proximity domain: 0x2
> > > IOAPIC id 10 under DRHD base 0xfd800000 IOMMU 0
> > > IOAPIC id 8 under DRHD base 0xfd000000 IOMMU 1
> > > IOAPIC id 9 under DRHD base 0xfd000000 IOMMU 1
> > > Enabled Interrupt-remapping
> > > Setting APIC routing to cluster x2apic
> > > ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> > > CPU0: Intel(R) Xeon(R) CPU X7550 @ 2.00GHz stepping 06
> > > Brought up 1 CPUs
> > > Total of 1 processors activated (3989.59 BogoMIPS).
> > > devtmpfs: initialized
> > > regulator: core version 0.5
> > > NET: Registered protocol family 16
> > > ACPI FADT declares the system doesn't support PCIe ASPM, so
> disable
> > it
> > > ACPI: bus type pci registered
> > > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> > > 0x80000000-0x8fffffff] (base 0x80000000)
> > > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> > > PCI: Using configuration type 1 for base access
> > > bio: create slab <bio-0> at 0
> >
> > _______________________________________________
> > kexec mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/kexec
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2010-08-29 09:04:45

by Tejun Heo

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

On 08/29/2010 09:01 AM, [email protected] wrote:
> Further bisect indicated this bad commit from the merge. Given kdump
> kernel was running with maxcpus=1, I guess this work caused fs/bio.c
> hung in the workqueue on UP. Reverted the whole merge let kdump work
> again.

Can you please pull from the following git tree and see whether it
fixes the problem? There was a bug in nr_active accounting.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-linus

Thanks.

--
tejun

2010-08-29 11:25:27

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35


----- "Tejun Heo" <[email protected]> wrote:

> On 08/29/2010 09:01 AM, [email protected] wrote:
> > Further bisect indicated this bad commit from the merge. Given kdump
> > kernel was running with maxcpus=1, I guess this work caused fs/bio.c
> > hung in the workqueue on UP. Reverted the whole merge let kdump work
> > again.
>
> Can you please pull from the following git tree and see whether it
> fixes the problem? There was a bug in nr_active accounting.
It had the same problem.
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-linus
>
> Thanks.
>
> --
> tejun
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2010-08-29 11:27:55

by Tejun Heo

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

Hello,

On 08/29/2010 01:24 PM, CAI Qian wrote:
>> On 08/29/2010 09:01 AM, [email protected] wrote:
>>> Further bisect indicated this bad commit from the merge. Given kdump
>>> kernel was running with maxcpus=1, I guess this work caused fs/bio.c
>>> hung in the workqueue on UP. Reverted the whole merge let kdump work
>>> again.
>>
>> Can you please pull from the following git tree and see whether it
>> fixes the problem? There was a bug in nr_active accounting.
>
> It had the same problem.

I see. Hmm... a different issue then. Can you please tell me how to
reproduce the problem?

Thanks.

--
tejun

2010-08-29 11:41:48

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35


----- "Tejun Heo" <[email protected]> wrote:

> Hello,
>
> On 08/29/2010 01:24 PM, CAI Qian wrote:
> >> On 08/29/2010 09:01 AM, [email protected] wrote:
> >>> Further bisect indicated this bad commit from the merge. Given kdump
> >>> kernel was running with maxcpus=1, I guess this work caused
> fs/bio.c
> >>> hung in the workqueue on UP. Reverted the whole merge let kdump
> work
> >>> again.
> >>
> >> Can you please pull from the following git tree and see whether it
> >> fixes the problem? There was a bug in nr_active accounting.
> >
> > It had the same problem.
>
> I see. Hmm... a different issue then. Can you please tell me how to
> reproduce the problem?
First, to configure kdump - see Documentation/kdump/kdump.txt. It might be easier if you are using a distro that provide advanced kdump tools. Then, to trigger the kdump - echo c >/proc/sysrq-trigger. In case needed, here is the system information,
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 64
Thread(s) per core: 2
Core(s) per socket: 8
CPU socket(s): 4
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 46
Stepping: 6
CPU MHz: 1064.000
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 18432K
NUMA node0 CPU(s): 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60
NUMA node1 CPU(s): 1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61
NUMA node2 CPU(s): 2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62
NUMA node3 CPU(s): 3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63

> Thanks.
>
> --
> tejun
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2010-08-29 11:56:58

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35


----- [email protected] wrote:

> ----- "Tejun Heo" <[email protected]> wrote:
>
> > Hello,
> >
> > On 08/29/2010 01:24 PM, CAI Qian wrote:
> > >> On 08/29/2010 09:01 AM, [email protected] wrote:
> > >>> Further bisect indicated this bad commit from the merge. Given
> kdump
> > >>> kernel was running with maxcpus=1, I guess this work caused
> > fs/bio.c
> > >>> hung in the workqueue on UP. Reverted the whole merge let kdump
> > work
> > >>> again.
> > >>
> > >> Can you please pull from the following git tree and see whether
> it
> > >> fixes the problem? There was a bug in nr_active accounting.
> > >
> > > It had the same problem.
> >
> > I see. Hmm... a different issue then. Can you please tell me how
> to
> > reproduce the problem?
It is easy to reproduce by passing maxcpus=1 to the first kernel.
> First, to configure kdump - see Documentation/kdump/kdump.txt. It
> might be easier if you are using a distro that provide advanced kdump
> tools. Then, to trigger the kdump - echo c >/proc/sysrq-trigger. In
> case needed, here is the system information,
> # lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> CPU(s): 64
> Thread(s) per core: 2
> Core(s) per socket: 8
> CPU socket(s): 4
> NUMA node(s): 4
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 46
> Stepping: 6
> CPU MHz: 1064.000
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 18432K
> NUMA node0 CPU(s): 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60
> NUMA node1 CPU(s): 1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61
> NUMA node2 CPU(s): 2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62
> NUMA node3 CPU(s): 3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63
>
> > Thanks.
> >
> > --
> > tejun
> >
> > _______________________________________________
> > kexec mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/kexec

2010-08-29 11:59:06

by Tejun Heo

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

On 08/29/2010 01:56 PM, CAI Qian wrote:
> It is easy to reproduce by passing maxcpus=1 to the first kernel.

Do you mean booting w/ maxcpus=1 hangs the first kernel even w/o
kdump?

Thanks.

--
tejun

2010-08-29 12:03:32

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35


----- "Tejun Heo" <[email protected]> wrote:

> On 08/29/2010 01:56 PM, CAI Qian wrote:
> > It is easy to reproduce by passing maxcpus=1 to the first kernel.
>
> Do you mean booting w/ maxcpus=1 hangs the first kernel even w/o
> kdump?
Yes, here was the log,

Linux version 2.6.36-rc2-mm1-orig+ ([email protected]) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #3 SMP Sun Aug 29 07:27:06 EDT 2010
Command line: ro root=/dev/mapper/vg_intels3e3601-lv_root rd_LVM_LV=vg_intels3e3601/lv_root rd_LVM_LV=vg_intels3e3601/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us console=ttyS0,115200n81 crashkernel=128M maxcpus=1
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 0000000000099800 (usable)
BIOS-e820: 0000000000099800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 0000000078c51000 (usable)
BIOS-e820: 0000000078c51000 - 0000000078e66000 (ACPI NVS)
BIOS-e820: 0000000078e66000 - 0000000079238000 (ACPI data)
BIOS-e820: 0000000079238000 - 00000000792ac000 (reserved)
BIOS-e820: 00000000792ac000 - 00000000792b9000 (ACPI data)
BIOS-e820: 00000000792b9000 - 00000000792ce000 (reserved)
BIOS-e820: 00000000792ce000 - 00000000792e1000 (ACPI data)
BIOS-e820: 00000000792e1000 - 00000000792e3000 (reserved)
BIOS-e820: 00000000792e3000 - 00000000792ec000 (ACPI data)
BIOS-e820: 00000000792ec000 - 00000000792f8000 (reserved)
BIOS-e820: 00000000792f8000 - 00000000792f9000 (ACPI data)
BIOS-e820: 00000000792f9000 - 00000000792ff000 (reserved)
BIOS-e820: 00000000792ff000 - 0000000079320000 (ACPI data)
BIOS-e820: 0000000079320000 - 0000000079341000 (reserved)
BIOS-e820: 0000000079341000 - 0000000079370000 (ACPI data)
BIOS-e820: 0000000079370000 - 00000000793b1000 (reserved)
BIOS-e820: 00000000793b1000 - 000000007996a000 (ACPI data)
BIOS-e820: 000000007996a000 - 0000000079b6a000 (ACPI NVS)
BIOS-e820: 0000000079b6a000 - 0000000079cae000 (ACPI data)
BIOS-e820: 0000000079cae000 - 0000000079cde000 (reserved)
BIOS-e820: 0000000079cde000 - 0000000079d72000 (ACPI data)
BIOS-e820: 0000000079d72000 - 0000000079d75000 (reserved)
BIOS-e820: 0000000079d75000 - 0000000079e05000 (ACPI data)
BIOS-e820: 0000000079e05000 - 0000000079e70000 (reserved)
BIOS-e820: 0000000079e70000 - 000000007bd5f000 (ACPI data)
BIOS-e820: 000000007bd5f000 - 000000007be4f000 (reserved)
BIOS-e820: 000000007be4f000 - 000000007bf87000 (ACPI data)
BIOS-e820: 000000007bf87000 - 000000007bfcf000 (ACPI NVS)
BIOS-e820: 000000007bfcf000 - 000000007bfff000 (ACPI data)
BIOS-e820: 000000007bfff000 - 0000000090000000 (reserved)
BIOS-e820: 00000000fc000000 - 00000000fd000000 (reserved)
BIOS-e820: 00000000fed1c000 - 00000000fed20000 (reserved)
BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000001080000000 (usable)
NX (Execute Disable) protection: active
DMI 2.5 present.
No AGP bridge found
last_pfn = 0x1080000 max_arch_pfn = 0x400000000
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
last_pfn = 0x78c51 max_arch_pfn = 0x400000000
found SMP MP-table at [ffff8800000fdd80] fdd80
init_memory_mapping: 0000000000000000-0000000078c51000
init_memory_mapping: 0000000100000000-0000001080000000
RAMDISK: 3754d000 - 37ff0000
Reserving 128MB of memory at 48MB for crashkernel (System RAM: 67584MB)
ACPI: RSDP 00000000000f0410 00024 (v02 QUANTA)
ACPI: XSDT 000000007bffe120 000B4 (v01 QUANTA QSSC-S4R 00000000 01000013)
ACPI: FACP 000000007bffd000 000F4 (v04 QUANTA QSSC-S4R 00000000 MSFT 0100000D)
ACPI: DSDT 000000007bfe3000 19F6A (v02 QUANTA QSSC-S4R 00000003 MSFT 0100000D)
ACPI: FACS 000000007bf87000 00040
ACPI: APIC 000000007bfe2000 003E4 (v02 QUANTA QSSC-S4R 00000000 MSFT 0100000D)
ACPI: MSCT 000000007bfe1000 00090 (v01 QUANTA QSSC-S4R 00000001 MSFT 0100000D)
ACPI: MCFG 000000007bfe0000 0003C (v01 QUANTA QSSC-S4R 00000001 MSFT 0100000D)
ACPI: HPET 000000007bfdf000 00038 (v01 QUANTA QSSC-S4R 00000001 MSFT 0100000D)
ACPI: SLIT 000000007bfde000 0003C (v01 QUANTA QSSC-S4R 00000001 MSFT 0100000D)
ACPI: SRAT 000000007bfdd000 00930 (v02 QUANTA QSSC-S4R 00000001 MSFT 0100000D)
ACPI: SPCR 000000007bfdc000 00050 (v01 QUANTA QSSC-S4R 00000000 MSFT 0100000D)
ACPI: WDDT 000000007bfdb000 00040 (v01 QUANTA QSSC-S4R 00000000 MSFT 0100000D)
ACPI: SSDT 000000007bf4a000 3CFA4 (v02 QUANTA QSSC-S4R 00004000 INTL 20061109)
ACPI: SSDT 000000007bfda000 00174 (v02 QUANTA QSSC-S4R 00004000 INTL 20061109)
ACPI: PMCT 000000007bfd9000 00060 (v01 QUANTA QSSC-S4R 00000000 MSFT 0100000D)
ACPI: MIGT 000000007bfd8000 00040 (v01 QUANTA QSSC-S4R 00000000 MSFT 0100000D)
ACPI: TCPA 000000007bfd5000 00032 (v00 QUANTA QSSC-S4R 00000000 00000000)
ACPI: HEST 000000007bfd4000 000A8 (v01 QUANTA QSSC-S4R 00000001 INTL 00000001)
ACPI: BERT 000000007bfd3000 00030 (v01 QUANTA QSSC-S4R 00000001 INTL 00000001)
ACPI: ERST 000000007bfd2000 00230 (v01 QUANTA QSSC-S4R 00000001 INTL 00000001)
ACPI: EINJ 000000007bfd1000 00130 (v01 QUANTA QSSC-S4R 00000001 INTL 00000001)
SRAT: PXM 0 -> APIC 0x00 -> Node 0
SRAT: PXM 2 -> APIC 0x40 -> Node 1
SRAT: PXM 1 -> APIC 0x20 -> Node 2
SRAT: PXM 3 -> APIC 0x60 -> Node 3
SRAT: PXM 0 -> APIC 0x02 -> Node 0
SRAT: PXM 2 -> APIC 0x42 -> Node 1
SRAT: PXM 1 -> APIC 0x22 -> Node 2
SRAT: PXM 3 -> APIC 0x62 -> Node 3
SRAT: PXM 0 -> APIC 0x04 -> Node 0
SRAT: PXM 2 -> APIC 0x44 -> Node 1
SRAT: PXM 1 -> APIC 0x24 -> Node 2
SRAT: PXM 3 -> APIC 0x64 -> Node 3
SRAT: PXM 0 -> APIC 0x06 -> Node 0
SRAT: PXM 2 -> APIC 0x46 -> Node 1
SRAT: PXM 1 -> APIC 0x26 -> Node 2
SRAT: PXM 3 -> APIC 0x66 -> Node 3
SRAT: PXM 0 -> APIC 0x10 -> Node 0
SRAT: PXM 2 -> APIC 0x50 -> Node 1
SRAT: PXM 1 -> APIC 0x30 -> Node 2
SRAT: PXM 3 -> APIC 0x70 -> Node 3
SRAT: PXM 0 -> APIC 0x12 -> Node 0
SRAT: PXM 2 -> APIC 0x52 -> Node 1
SRAT: PXM 1 -> APIC 0x32 -> Node 2
SRAT: PXM 3 -> APIC 0x72 -> Node 3
SRAT: PXM 0 -> APIC 0x14 -> Node 0
SRAT: PXM 2 -> APIC 0x54 -> Node 1
SRAT: PXM 1 -> APIC 0x34 -> Node 2
SRAT: PXM 3 -> APIC 0x74 -> Node 3
SRAT: PXM 0 -> APIC 0x16 -> Node 0
SRAT: PXM 2 -> APIC 0x56 -> Node 1
SRAT: PXM 1 -> APIC 0x36 -> Node 2
SRAT: PXM 3 -> APIC 0x76 -> Node 3
SRAT: PXM 0 -> APIC 0x01 -> Node 0
SRAT: PXM 2 -> APIC 0x41 -> Node 1
SRAT: PXM 1 -> APIC 0x21 -> Node 2
SRAT: PXM 3 -> APIC 0x61 -> Node 3
SRAT: PXM 0 -> APIC 0x03 -> Node 0
SRAT: PXM 2 -> APIC 0x43 -> Node 1
SRAT: PXM 1 -> APIC 0x23 -> Node 2
SRAT: PXM 3 -> APIC 0x63 -> Node 3
SRAT: PXM 0 -> APIC 0x05 -> Node 0
SRAT: PXM 2 -> APIC 0x45 -> Node 1
SRAT: PXM 1 -> APIC 0x25 -> Node 2
SRAT: PXM 3 -> APIC 0x65 -> Node 3
SRAT: PXM 0 -> APIC 0x07 -> Node 0
SRAT: PXM 2 -> APIC 0x47 -> Node 1
SRAT: PXM 1 -> APIC 0x27 -> Node 2
SRAT: PXM 3 -> APIC 0x67 -> Node 3
SRAT: PXM 0 -> APIC 0x11 -> Node 0
SRAT: PXM 2 -> APIC 0x51 -> Node 1
SRAT: PXM 1 -> APIC 0x31 -> Node 2
SRAT: PXM 3 -> APIC 0x71 -> Node 3
SRAT: PXM 0 -> APIC 0x13 -> Node 0
SRAT: PXM 2 -> APIC 0x53 -> Node 1
SRAT: PXM 1 -> APIC 0x33 -> Node 2
SRAT: PXM 3 -> APIC 0x73 -> Node 3
SRAT: PXM 0 -> APIC 0x15 -> Node 0
SRAT: PXM 2 -> APIC 0x55 -> Node 1
SRAT: PXM 1 -> APIC 0x35 -> Node 2
SRAT: PXM 3 -> APIC 0x75 -> Node 3
SRAT: PXM 0 -> APIC 0x17 -> Node 0
SRAT: PXM 2 -> APIC 0x57 -> Node 1
SRAT: PXM 1 -> APIC 0x37 -> Node 2
SRAT: PXM 3 -> APIC 0x77 -> Node 3
SRAT: Node 0 PXM 0 0-80000000
SRAT: Node 0 PXM 0 100000000-480000000
SRAT: Node 2 PXM 1 480000000-880000000
SRAT: Node 1 PXM 2 880000000-c80000000
SRAT: Node 3 PXM 3 c80000000-1080000000
SRAT: Node 0 PXM 0 1100000000-3100000000
SRAT: Hotplug area 17825792 -> 51380224 has existing memory
SRAT: Node 0 PXM 0 3100000000-5100000000
SRAT: Hotplug area 51380224 -> 84934656 has existing memory
SRAT: Node 2 PXM 1 5100000000-7100000000
SRAT: Hotplug area 84934656 -> 118489088 has existing memory
SRAT: Node 2 PXM 1 7100000000-9100000000
SRAT: Hotplug area 118489088 -> 152043520 has existing memory
SRAT: Node 1 PXM 2 9100000000-b100000000
SRAT: Hotplug area 152043520 -> 185597952 has existing memory
SRAT: Node 1 PXM 2 b100000000-d100000000
SRAT: Hotplug area 185597952 -> 219152384 has existing memory
SRAT: Node 3 PXM 3 d100000000-f100000000
SRAT: Hotplug area 219152384 -> 252706816 has existing memory
SRAT: Node 3 PXM 3 f100000000-11100000000
SRAT: Hotplug area 252706816 -> 286261248 has existing memory
SRAT: Node 0 [0,80000000) + [100000000,480000000) -> [0,480000000)
SRAT: Node 0 [1100000000,3100000000) + [3100000000,5100000000) -> [1100000000,5100000000)
SRAT: Node 2 [5100000000,7100000000) + [7100000000,9100000000) -> [5100000000,9100000000)
SRAT: Node 1 [9100000000,b100000000) + [b100000000,d100000000) -> [9100000000,d100000000)
SRAT: Node 3 [d100000000,f100000000) + [f100000000,11100000000) -> [d100000000,11100000000)
Initmem setup node 0 0000000000000000-0000000480000000
NODE_DATA [0000000100000000 - 0000000100026fff]
Initmem setup node 1 0000000880000000-0000000c80000000
NODE_DATA [0000000880000000 - 0000000880026fff]
Initmem setup node 2 0000000480000000-0000000880000000
NODE_DATA [0000000480000000 - 0000000480026fff]
Initmem setup node 3 0000000c80000000-0000001080000000
NODE_DATA [0000000c80000000 - 0000000c80026fff]
Zone PFN ranges:
DMA 0x00000010 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal 0x00100000 -> 0x01080000
Movable zone start PFN for each node
early_node_map[6] active PFN ranges
0: 0x00000010 -> 0x00000099
0: 0x00000100 -> 0x00078c51
0: 0x00100000 -> 0x00480000
2: 0x00480000 -> 0x00880000
1: 0x00880000 -> 0x00c80000
3: 0x00c80000 -> 0x01080000
ACPI: PM-Timer IO Port: 0x408
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x20] lapic_id[0x40] enabled)
ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled)
ACPI: LAPIC (acpi_id[0x30] lapic_id[0x60] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
ACPI: LAPIC (acpi_id[0x22] lapic_id[0x42] enabled)
ACPI: LAPIC (acpi_id[0x12] lapic_id[0x22] enabled)
ACPI: LAPIC (acpi_id[0x32] lapic_id[0x62] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
ACPI: LAPIC (acpi_id[0x24] lapic_id[0x44] enabled)
ACPI: LAPIC (acpi_id[0x14] lapic_id[0x24] enabled)
ACPI: LAPIC (acpi_id[0x34] lapic_id[0x64] enabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
ACPI: LAPIC (acpi_id[0x26] lapic_id[0x46] enabled)
ACPI: LAPIC (acpi_id[0x16] lapic_id[0x26] enabled)
ACPI: LAPIC (acpi_id[0x36] lapic_id[0x66] enabled)
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x10] enabled)
ACPI: LAPIC (acpi_id[0x28] lapic_id[0x50] enabled)
ACPI: LAPIC (acpi_id[0x18] lapic_id[0x30] enabled)
ACPI: LAPIC (acpi_id[0x38] lapic_id[0x70] enabled)
ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x12] enabled)
ACPI: LAPIC (acpi_id[0x2a] lapic_id[0x52] enabled)
ACPI: LAPIC (acpi_id[0x1a] lapic_id[0x32] enabled)
ACPI: LAPIC (acpi_id[0x3a] lapic_id[0x72] enabled)
ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x14] enabled)
ACPI: LAPIC (acpi_id[0x2c] lapic_id[0x54] enabled)
ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x34] enabled)
ACPI: LAPIC (acpi_id[0x3c] lapic_id[0x74] enabled)
ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x16] enabled)
ACPI: LAPIC (acpi_id[0x2e] lapic_id[0x56] enabled)
ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x36] enabled)
ACPI: LAPIC (acpi_id[0x3e] lapic_id[0x76] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
ACPI: LAPIC (acpi_id[0x21] lapic_id[0x41] enabled)
ACPI: LAPIC (acpi_id[0x11] lapic_id[0x21] enabled)
ACPI: LAPIC (acpi_id[0x31] lapic_id[0x61] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
ACPI: LAPIC (acpi_id[0x23] lapic_id[0x43] enabled)
ACPI: LAPIC (acpi_id[0x13] lapic_id[0x23] enabled)
ACPI: LAPIC (acpi_id[0x33] lapic_id[0x63] enabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
ACPI: LAPIC (acpi_id[0x25] lapic_id[0x45] enabled)
ACPI: LAPIC (acpi_id[0x15] lapic_id[0x25] enabled)
ACPI: LAPIC (acpi_id[0x35] lapic_id[0x65] enabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
ACPI: LAPIC (acpi_id[0x27] lapic_id[0x47] enabled)
ACPI: LAPIC (acpi_id[0x17] lapic_id[0x27] enabled)
ACPI: LAPIC (acpi_id[0x37] lapic_id[0x67] enabled)
ACPI: LAPIC (acpi_id[0x09] lapic_id[0x11] enabled)
ACPI: LAPIC (acpi_id[0x29] lapic_id[0x51] enabled)
ACPI: LAPIC (acpi_id[0x19] lapic_id[0x31] enabled)
ACPI: LAPIC (acpi_id[0x39] lapic_id[0x71] enabled)
ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x13] enabled)
ACPI: LAPIC (acpi_id[0x2b] lapic_id[0x53] enabled)
ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x33] enabled)
ACPI: LAPIC (acpi_id[0x3b] lapic_id[0x73] enabled)
ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x15] enabled)
ACPI: LAPIC (acpi_id[0x2d] lapic_id[0x55] enabled)
ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x35] enabled)
ACPI: LAPIC (acpi_id[0x3d] lapic_id[0x75] enabled)
ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x17] enabled)
ACPI: LAPIC (acpi_id[0x2f] lapic_id[0x57] enabled)
ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x37] enabled)
ACPI: LAPIC (acpi_id[0x3f] lapic_id[0x77] enabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x05] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x06] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x07] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x08] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x09] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x0a] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x0b] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x0c] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x0d] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x0e] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x0f] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x10] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x11] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x12] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x13] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x14] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x15] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x16] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x17] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x18] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x19] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x1a] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x1b] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x1c] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x1d] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x1e] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x1f] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x20] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x21] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x22] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x23] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x24] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x25] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x26] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x27] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x28] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x29] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x2a] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x2b] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x2c] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x2d] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x2e] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x2f] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x30] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x31] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x32] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x33] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x34] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x35] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x36] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x37] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x38] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x39] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x3a] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x3b] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x3c] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x3d] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x3e] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x3f] high level lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x09] address[0xfec01000] gsi_base[24])
IOAPIC[1]: apic_id 9, version 32, address 0xfec01000, GSI 24-47
ACPI: IOAPIC (id[0x0a] address[0xfec04000] gsi_base[48])
IOAPIC[2]: apic_id 10, version 32, address 0xfec04000, GSI 48-71
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x8086a401 base: 0xfed00000
SMP: Allowing 64 CPUs, 0 hotplug CPUs
PM: Registered nosave memory: 0000000000099000 - 000000000009a000
PM: Registered nosave memory: 000000000009a000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
PM: Registered nosave memory: 0000000078c51000 - 0000000078e66000
PM: Registered nosave memory: 0000000078e66000 - 0000000079238000
PM: Registered nosave memory: 0000000079238000 - 00000000792ac000
PM: Registered nosave memory: 00000000792ac000 - 00000000792b9000
PM: Registered nosave memory: 00000000792b9000 - 00000000792ce000
PM: Registered nosave memory: 00000000792ce000 - 00000000792e1000
PM: Registered nosave memory: 00000000792e1000 - 00000000792e3000
PM: Registered nosave memory: 00000000792e3000 - 00000000792ec000
PM: Registered nosave memory: 00000000792ec000 - 00000000792f8000
PM: Registered nosave memory: 00000000792f8000 - 00000000792f9000
PM: Registered nosave memory: 00000000792f9000 - 00000000792ff000
PM: Registered nosave memory: 00000000792ff000 - 0000000079320000
PM: Registered nosave memory: 0000000079320000 - 0000000079341000
PM: Registered nosave memory: 0000000079341000 - 0000000079370000
PM: Registered nosave memory: 0000000079370000 - 00000000793b1000
PM: Registered nosave memory: 00000000793b1000 - 000000007996a000
PM: Registered nosave memory: 000000007996a000 - 0000000079b6a000
PM: Registered nosave memory: 0000000079b6a000 - 0000000079cae000
PM: Registered nosave memory: 0000000079cae000 - 0000000079cde000
PM: Registered nosave memory: 0000000079cde000 - 0000000079d72000
PM: Registered nosave memory: 0000000079d72000 - 0000000079d75000
PM: Registered nosave memory: 0000000079d75000 - 0000000079e05000
PM: Registered nosave memory: 0000000079e05000 - 0000000079e70000
PM: Registered nosave memory: 0000000079e70000 - 000000007bd5f000
PM: Registered nosave memory: 000000007bd5f000 - 000000007be4f000
PM: Registered nosave memory: 000000007be4f000 - 000000007bf87000
PM: Registered nosave memory: 000000007bf87000 - 000000007bfcf000
PM: Registered nosave memory: 000000007bfcf000 - 000000007bfff000
PM: Registered nosave memory: 000000007bfff000 - 0000000090000000
PM: Registered nosave memory: 0000000090000000 - 00000000fc000000
PM: Registered nosave memory: 00000000fc000000 - 00000000fd000000
PM: Registered nosave memory: 00000000fd000000 - 00000000fed1c000
PM: Registered nosave memory: 00000000fed1c000 - 00000000fed20000
PM: Registered nosave memory: 00000000fed20000 - 00000000ff000000
PM: Registered nosave memory: 00000000ff000000 - 0000000100000000
Allocating PCI resources starting at 90000000 (gap: 90000000:6c000000)
Booting paravirtualized kernel on bare hardware
setup_percpu: NR_CPUS:4096 nr_cpumask_bits:64 nr_cpu_ids:64 nr_node_ids:4
PERCPU: Embedded 30 pages/cpu @ffff880002400000 s90368 r8192 d24320 u131072
pcpu-alloc: s90368 r8192 d24320 u131072 alloc=1*2097152
pcpu-alloc: [0] 00 04 08 12 16 20 24 28 32 36 40 44 48 52 56 60
pcpu-alloc: [1] 01 05 09 13 17 21 25 29 33 37 41 45 49 53 57 61
pcpu-alloc: [2] 02 06 10 14 18 22 26 30 34 38 42 46 50 54 58 62
pcpu-alloc: [3] 03 07 11 15 19 23 27 31 35 39 43 47 51 55 59 63
Built 4 zonelists in Zone order, mobility grouping on. Total pages: 16510938
Policy zone: Normal
Kernel command line: ro root=/dev/mapper/vg_intels3e3601-lv_root rd_LVM_LV=vg_intels3e3601/lv_root rd_LVM_LV=vg_intels3e3601/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us console=ttyS0,115200n81 crashkernel=128M maxcpus=1
PID hash table entries: 4096 (order: 3, 32768 bytes)
Checking aperture...
No AGP bridge found
Subtract (169 early reservations)
#1 [0001000000 - 00022a0408] TEXT DATA BSS
#2 [003754d000 - 0037ff0000] RAMDISK
#3 [00022a1000 - 00022a1354] BRK
#4 [0000099800 - 00000fdd80] BIOS reserved
#5 [00000fdd90 - 0000100000] BIOS reserved
#6 [00000fdd80 - 00000fdd90] MP-table mpf
#7 [0000000012 - 000000f012] MP-table mpc
#8 [0000010000 - 0000012000] TRAMPOLINE
#9 [0000012000 - 0000016000] ACPI WAKEUP
#10 [0000016000 - 0000018000] PGTABLE
#11 [0000018000 - 0000056000] PGTABLE
#12 [0003000000 - 000b000000] CRASH KERNEL
#13 [0000056000 - 000005603c] ACPI SLIT
#14 [0000056040 - 00000564c0] MEMNODEMAP
#15 [0100000000 - 0100027000] NODE_DATA
#16 [0880000000 - 0880027000] NODE_DATA
#17 [0480000000 - 0480027000] NODE_DATA
#18 [0c80000000 - 0c80027000] NODE_DATA
#19 [00022a1380 - 00022a2380] BOOTMEM
#20 [00022a2380 - 00022a3380] BOOTMEM
#21 [0480027000 - 0480028000] BOOTMEM
#22 [0880027000 - 0880028000] BOOTMEM
#23 [0c80027000 - 0c80028000] BOOTMEM
#24 [00026a3380 - 00026a3f80] BOOTMEM
#25 [0480028000 - 0480028c00] BOOTMEM
#26 [0880028000 - 0880028c00] BOOTMEM
#27 [0c80028000 - 0c80028c00] BOOTMEM
#28 [0100027000 - 0100028000] BOOTMEM
#29 [0100028000 - 0100029000] BOOTMEM
#30 [0100200000 - 010e200000] MEMMAP 0
#31 [0480200000 - 048e200000] MEMMAP 2
#32 [0880200000 - 088e200000] MEMMAP 1
#33 [0c80200000 - 0c8e200000] MEMMAP 3
#34 [00022a3380 - 00022bb380] BOOTMEM
#35 [00022bb380 - 00022d3380] BOOTMEM
#36 [00022d3380 - 00022eb380] BOOTMEM
#37 [0880028c00 - 0880040c00] BOOTMEM
#38 [0480028c00 - 0480040c00] BOOTMEM
#39 [0c80028c00 - 0c80040c00] BOOTMEM
#40 [00022ec000 - 00022ed000] BOOTMEM
#41 [00022a0440 - 00022a0481] BOOTMEM
#42 [00022a04c0 - 00022a0589] BOOTMEM
#43 [00022a05c0 - 00022a0e10] BOOTMEM
#44 [00022a0e40 - 00022a0ea8] BOOTMEM
#45 [00022a0ec0 - 00022a0f28] BOOTMEM
#46 [00022a0f40 - 00022a0fa8] BOOTMEM
#47 [00022eb380 - 00022eb3e8] BOOTMEM
#48 [00022eb400 - 00022eb468] BOOTMEM
#49 [00022eb480 - 00022eb4e8] BOOTMEM
#50 [00022eb500 - 00022eb568] BOOTMEM
#51 [00022eb580 - 00022eb5e8] BOOTMEM
#52 [00022eb600 - 00022eb668] BOOTMEM
#53 [00022eb680 - 00022eb6e8] BOOTMEM
#54 [00022eb700 - 00022eb768] BOOTMEM
#55 [00022eb780 - 00022eb7e8] BOOTMEM
#56 [00022eb800 - 00022eb868] BOOTMEM
#57 [00022eb880 - 00022eb8e8] BOOTMEM
#58 [00022eb900 - 00022eb968] BOOTMEM
#59 [00022eb980 - 00022eb9e8] BOOTMEM
#60 [00022eba00 - 00022eba68] BOOTMEM
#61 [00022eba80 - 00022ebae8] BOOTMEM
#62 [00022ebb00 - 00022ebb68] BOOTMEM
#63 [00022ebb80 - 00022ebbe8] BOOTMEM
#64 [00022ebc00 - 00022ebc68] BOOTMEM
#65 [00022ebc80 - 00022ebce8] BOOTMEM
#66 [00022ebd00 - 00022ebd68] BOOTMEM
#67 [00022ebd80 - 00022ebde8] BOOTMEM
#68 [00022ebe00 - 00022ebe68] BOOTMEM
#69 [00022ebe80 - 00022ebee8] BOOTMEM
#70 [00022ebf00 - 00022ebf68] BOOTMEM
#71 [00022ebf80 - 00022ebfe8] BOOTMEM
#72 [00022ed000 - 00022ed068] BOOTMEM
#73 [00022ed080 - 00022ed0e8] BOOTMEM
#74 [00022ed100 - 00022ed168] BOOTMEM
#75 [00022ed180 - 00022ed1e8] BOOTMEM
#76 [00022ed200 - 00022ed268] BOOTMEM
#77 [00022ed280 - 00022ed2e8] BOOTMEM
#78 [00022ed300 - 00022ed368] BOOTMEM
#79 [00022ed380 - 00022ed3e8] BOOTMEM
#80 [00022ed400 - 00022ed468] BOOTMEM
#81 [00022a0fc0 - 00022a0fe0] BOOTMEM
#82 [00022ed480 - 00022ed4a0] BOOTMEM
#83 [00022ed4c0 - 00022ed5c7] BOOTMEM
#84 [00022ed600 - 00022ed707] BOOTMEM
#85 [0002400000 - 000241e000] BOOTMEM
#86 [0002420000 - 000243e000] BOOTMEM
#87 [0002440000 - 000245e000] BOOTMEM
#88 [0002460000 - 000247e000] BOOTMEM
#89 [0002480000 - 000249e000] BOOTMEM
#90 [00024a0000 - 00024be000] BOOTMEM
#91 [00024c0000 - 00024de000] BOOTMEM
#92 [00024e0000 - 00024fe000] BOOTMEM
#93 [0002500000 - 000251e000] BOOTMEM
#94 [0002520000 - 000253e000] BOOTMEM
#95 [0002540000 - 000255e000] BOOTMEM
#96 [0002560000 - 000257e000] BOOTMEM
#97 [0002580000 - 000259e000] BOOTMEM
#98 [00025a0000 - 00025be000] BOOTMEM
#99 [00025c0000 - 00025de000] BOOTMEM
#100 [00025e0000 - 00025fe000] BOOTMEM
#101 [088e200000 - 088e21e000] BOOTMEM
#102 [088e220000 - 088e23e000] BOOTMEM
#103 [088e240000 - 088e25e000] BOOTMEM
#104 [088e260000 - 088e27e000] BOOTMEM
#105 [088e280000 - 088e29e000] BOOTMEM
#106 [088e2a0000 - 088e2be000] BOOTMEM
#107 [088e2c0000 - 088e2de000] BOOTMEM
#108 [088e2e0000 - 088e2fe000] BOOTMEM
#109 [088e300000 - 088e31e000] BOOTMEM
#110 [088e320000 - 088e33e000] BOOTMEM
#111 [088e340000 - 088e35e000] BOOTMEM
#112 [088e360000 - 088e37e000] BOOTMEM
#113 [088e380000 - 088e39e000] BOOTMEM
#114 [088e3a0000 - 088e3be000] BOOTMEM
#115 [088e3c0000 - 088e3de000] BOOTMEM
#116 [088e3e0000 - 088e3fe000] BOOTMEM
#117 [048e200000 - 048e21e000] BOOTMEM
#118 [048e220000 - 048e23e000] BOOTMEM
#119 [048e240000 - 048e25e000] BOOTMEM
#120 [048e260000 - 048e27e000] BOOTMEM
#121 [048e280000 - 048e29e000] BOOTMEM
#122 [048e2a0000 - 048e2be000] BOOTMEM
#123 [048e2c0000 - 048e2de000] BOOTMEM
#124 [048e2e0000 - 048e2fe000] BOOTMEM
#125 [048e300000 - 048e31e000] BOOTMEM
#126 [048e320000 - 048e33e000] BOOTMEM
#127 [048e340000 - 048e35e000] BOOTMEM
#128 [048e360000 - 048e37e000] BOOTMEM
#129 [048e380000 - 048e39e000] BOOTMEM
#130 [048e3a0000 - 048e3be000] BOOTMEM
#131 [048e3c0000 - 048e3de000] BOOTMEM
#132 [048e3e0000 - 048e3fe000] BOOTMEM
#133 [0c8e200000 - 0c8e21e000] BOOTMEM
#134 [0c8e220000 - 0c8e23e000] BOOTMEM
#135 [0c8e240000 - 0c8e25e000] BOOTMEM
#136 [0c8e260000 - 0c8e27e000] BOOTMEM
#137 [0c8e280000 - 0c8e29e000] BOOTMEM
#138 [0c8e2a0000 - 0c8e2be000] BOOTMEM
#139 [0c8e2c0000 - 0c8e2de000] BOOTMEM
#140 [0c8e2e0000 - 0c8e2fe000] BOOTMEM
#141 [0c8e300000 - 0c8e31e000] BOOTMEM
#142 [0c8e320000 - 0c8e33e000] BOOTMEM
#143 [0c8e340000 - 0c8e35e000] BOOTMEM
#144 [0c8e360000 - 0c8e37e000] BOOTMEM
#145 [0c8e380000 - 0c8e39e000] BOOTMEM
#146 [0c8e3a0000 - 0c8e3be000] BOOTMEM
#147 [0c8e3c0000 - 0c8e3de000] BOOTMEM
#148 [0c8e3e0000 - 0c8e3fe000] BOOTMEM
#149 [00022ef740 - 00022ef760] BOOTMEM
#150 [00022ef780 - 00022ef7a0] BOOTMEM
#151 [00022ef7c0 - 00022ef8c0] BOOTMEM
#152 [00022ef8c0 - 00022efac0] BOOTMEM
#153 [00022efac0 - 00022efbd0] BOOTMEM
#154 [00022efc00 - 00022efc48] BOOTMEM
#155 [00022efc80 - 00022efcc8] BOOTMEM
#156 [00022ed740 - 00022ed940] BOOTMEM
#157 [00022ed940 - 00022edb40] BOOTMEM
#158 [00022edb40 - 00022edd40] BOOTMEM
#159 [00022edd40 - 00022edf40] BOOTMEM
#160 [00022edf40 - 00022ee140] BOOTMEM
#161 [00022ee140 - 00022ee340] BOOTMEM
#162 [00022ee340 - 00022ee540] BOOTMEM
#163 [00022ee540 - 00022ee740] BOOTMEM
#164 [00022efd00 - 00022f7d00] BOOTMEM
#165 [000b000000 - 000f000000] BOOTMEM
#166 [00022f7d00 - 0002317d00] BOOTMEM
#167 [0002317d00 - 0002357d00] BOOTMEM
#168 [0000059cc0 - 0000061cc0] BOOTMEM
Memory: 65836168k/69206016k available (4775k kernel code, 2216088k absent, 1153760k reserved, 7568k data, 1444k init)
Hierarchical RCU implementation.
RCU-based detection of stalled CPUs is disabled.
Verbose stalled-CPUs detection is disabled.
NR_IRQS:262400 nr_irqs:2008
Extended CMOS year: 2000
Console: colour VGA+ 80x25
console [ttyS0] enabled
allocated 671088640 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Fast TSC calibration using PIT
Detected 1995.288 MHz processor.
Calibrating delay loop (skipped), value calculated using timer frequency.. 3990.57 BogoMIPS (lpj=1995288)
pid_max: default: 65536 minimum: 512
Security Framework initialized
SELinux: Initializing.
Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes)
Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
Initializing cgroup subsys blkio
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
mce: CPU supports 22 MCE banks
CPU0: Thermal monitoring enabled (TM1)
using mwait in idle threads.
Performance Events: PEBS fmt1+, Nehalem events, Intel PMU driver.
... version: 3
... bit width: 48
... generic registers: 4
... value mask: 0000ffffffffffff
... max period: 000000007fffffff
... fixed-purpose events: 3
... event mask: 000000070000000f
SMP alternatives: switching to UP code
ACPI: Core revision 20100702
ftrace: converting mcount calls to 0f 1f 44 00 00
ftrace: allocating 18528 entries in 73 pages
Not enabling x2apic, Intr-remapping init failed.
Setting APIC routing to physical flat
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Xeon(R) CPU X7550 @ 2.00GHz stepping 06
Brought up 1 CPUs
Total of 1 processors activated (3990.57 BogoMIPS).
devtmpfs: initialized
regulator: core version 0.5
NET: Registered protocol family 16
ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
ACPI: bus type pci registered
PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
PCI: Using configuration type 1 for base access
bio: create slab <bio-0> at 0

> Thanks.
>
> --
> tejun
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2010-08-29 12:43:50

by Tejun Heo

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

Hello,

On 08/29/2010 02:03 PM, CAI Qian wrote:
>
> ----- "Tejun Heo" <[email protected]> wrote:
>
>> On 08/29/2010 01:56 PM, CAI Qian wrote:
>>> It is easy to reproduce by passing maxcpus=1 to the first kernel.
>>
>> Do you mean booting w/ maxcpus=1 hangs the first kernel even w/o
>> kdump?
> Yes, here was the log,

Hmmm... I can't reproduce it here. I wonder what the difference is.
Can you please trigger sysrq-t after the boot is hung and post the
result?

Thanks.

--
tejun

2010-08-30 03:43:17

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35


----- "Tejun Heo" <[email protected]> wrote:

> Hello,
>
> On 08/29/2010 02:03 PM, CAI Qian wrote:
> >
> > ----- "Tejun Heo" <[email protected]> wrote:
> >
> >> On 08/29/2010 01:56 PM, CAI Qian wrote:
> >>> It is easy to reproduce by passing maxcpus=1 to the first kernel.
> >>
> >> Do you mean booting w/ maxcpus=1 hangs the first kernel even w/o
> >> kdump?
> > Yes, here was the log,
>
> Hmmm... I can't reproduce it here. I wonder what the difference is.
> Can you please trigger sysrq-t after the boot is hung and post the
> result?
Sysrq keys did not work at this early stage of booting for me even adding sysrq_always_enabled.
> Thanks.
>
> --
> tejun

2010-08-30 08:36:53

by Tejun Heo

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

Hello,

On 08/30/2010 05:42 AM, CAI Qian wrote:
>> Hmmm... I can't reproduce it here. I wonder what the difference is.
>> Can you please trigger sysrq-t after the boot is hung and post the
>> result?
> Sysrq keys did not work at this early stage of booting for me even
> adding sysrq_always_enabled.

I see. I'll prepare a debug patch which tries to catch long execution
delays in workqueue but in the meantime can you please turn on
hangcheck timer and see whether it triggers after the kernel hangs
during boot?

Thanks.

--
tejun

2010-08-30 10:25:16

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35


----- "Tejun Heo" <[email protected]> wrote:

> Hello,
>
> On 08/30/2010 05:42 AM, CAI Qian wrote:
> >> Hmmm... I can't reproduce it here. I wonder what the difference
> is.
> >> Can you please trigger sysrq-t after the boot is hung and post the
> >> result?
> > Sysrq keys did not work at this early stage of booting for me even
> > adding sysrq_always_enabled.
>
> I see. I'll prepare a debug patch which tries to catch long
> execution
> delays in workqueue but in the meantime can you please turn on
> hangcheck timer and see whether it triggers after the kernel hangs
> during boot?
Can't see any difference with hangcheck timer enabled.
> Thanks.
>
> --
> tejun
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2010-08-30 12:57:32

by Tejun Heo

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

On 08/30/2010 12:24 PM, CAI Qian wrote:
> Can't see any difference with hangcheck timer enabled.

Hmm, odd. So, here's the said debug patch. It will periodically
check all works and report if any work is being delayed for too long.
If the max wait goes over 30secs, it will dump all task states and
disable itself. Can you please apply the patch on top of rc2 +
wq#for-linus and report the output? It should tell us who's stuck
where.

Thanks.

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index f11100f..282322c 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -83,6 +83,8 @@ struct work_struct {
#ifdef CONFIG_LOCKDEP
struct lockdep_map lockdep_map;
#endif
+ unsigned long queued_on;
+ unsigned long activated_on;
};

#define WORK_DATA_INIT() ATOMIC_LONG_INIT(WORK_STRUCT_NO_CPU)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a2dccfc..9f95169 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -913,6 +913,8 @@ static void insert_work(struct cpu_workqueue_struct *cwq,
{
struct global_cwq *gcwq = cwq->gcwq;

+ work->queued_on = work->activated_on = jiffies;
+
/* we own @work, set data and link */
set_work_cwq(work, cwq, extra_flags);

@@ -996,13 +998,14 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
if (likely(cwq->nr_active < cwq->max_active)) {
cwq->nr_active++;
worklist = gcwq_determine_ins_pos(gcwq, cwq);
+ insert_work(cwq, work, worklist, work_flags);
} else {
work_flags |= WORK_STRUCT_DELAYED;
worklist = &cwq->delayed_works;
+ insert_work(cwq, work, worklist, work_flags);
+ work->activated_on--;
}

- insert_work(cwq, work, worklist, work_flags);
-
spin_unlock_irqrestore(&gcwq->lock, flags);
}

@@ -1669,6 +1672,7 @@ static void cwq_activate_first_delayed(struct cpu_workqueue_struct *cwq)
struct work_struct, entry);
struct list_head *pos = gcwq_determine_ins_pos(cwq->gcwq, cwq);

+ work->activated_on = jiffies;
move_linked_works(work, pos, NULL);
__clear_bit(WORK_STRUCT_DELAYED_BIT, work_data_bits(work));
cwq->nr_active++;
@@ -2810,7 +2814,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char *name,
* list. Grab it, set max_active accordingly and add the new
* workqueue to workqueues list.
*/
- spin_lock(&workqueue_lock);
+ spin_lock_irq(&workqueue_lock);

if (workqueue_freezing && wq->flags & WQ_FREEZEABLE)
for_each_cwq_cpu(cpu, wq)
@@ -2818,7 +2822,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char *name,

list_add(&wq->list, &workqueues);

- spin_unlock(&workqueue_lock);
+ spin_unlock_irq(&workqueue_lock);

return wq;
err:
@@ -2849,9 +2853,9 @@ void destroy_workqueue(struct workqueue_struct *wq)
* wq list is used to freeze wq, remove from list after
* flushing is complete in case freeze races us.
*/
- spin_lock(&workqueue_lock);
+ spin_lock_irq(&workqueue_lock);
list_del(&wq->list);
- spin_unlock(&workqueue_lock);
+ spin_unlock_irq(&workqueue_lock);

/* sanity check */
for_each_cwq_cpu(cpu, wq) {
@@ -2891,23 +2895,23 @@ void workqueue_set_max_active(struct workqueue_struct *wq, int max_active)

max_active = wq_clamp_max_active(max_active, wq->flags, wq->name);

- spin_lock(&workqueue_lock);
+ spin_lock_irq(&workqueue_lock);

wq->saved_max_active = max_active;

for_each_cwq_cpu(cpu, wq) {
struct global_cwq *gcwq = get_gcwq(cpu);

- spin_lock_irq(&gcwq->lock);
+ spin_lock(&gcwq->lock);

if (!(wq->flags & WQ_FREEZEABLE) ||
!(gcwq->flags & GCWQ_FREEZING))
get_cwq(gcwq->cpu, wq)->max_active = max_active;

- spin_unlock_irq(&gcwq->lock);
+ spin_unlock(&gcwq->lock);
}

- spin_unlock(&workqueue_lock);
+ spin_unlock_irq(&workqueue_lock);
}
EXPORT_SYMBOL_GPL(workqueue_set_max_active);

@@ -3419,7 +3423,7 @@ void freeze_workqueues_begin(void)
{
unsigned int cpu;

- spin_lock(&workqueue_lock);
+ spin_lock_irq(&workqueue_lock);

BUG_ON(workqueue_freezing);
workqueue_freezing = true;
@@ -3428,7 +3432,7 @@ void freeze_workqueues_begin(void)
struct global_cwq *gcwq = get_gcwq(cpu);
struct workqueue_struct *wq;

- spin_lock_irq(&gcwq->lock);
+ spin_lock(&gcwq->lock);

BUG_ON(gcwq->flags & GCWQ_FREEZING);
gcwq->flags |= GCWQ_FREEZING;
@@ -3440,10 +3444,10 @@ void freeze_workqueues_begin(void)
cwq->max_active = 0;
}

- spin_unlock_irq(&gcwq->lock);
+ spin_unlock(&gcwq->lock);
}

- spin_unlock(&workqueue_lock);
+ spin_unlock_irq(&workqueue_lock);
}

/**
@@ -3464,7 +3468,7 @@ bool freeze_workqueues_busy(void)
unsigned int cpu;
bool busy = false;

- spin_lock(&workqueue_lock);
+ spin_lock_irq(&workqueue_lock);

BUG_ON(!workqueue_freezing);

@@ -3488,7 +3492,7 @@ bool freeze_workqueues_busy(void)
}
}
out_unlock:
- spin_unlock(&workqueue_lock);
+ spin_unlock_irq(&workqueue_lock);
return busy;
}

@@ -3505,7 +3509,7 @@ void thaw_workqueues(void)
{
unsigned int cpu;

- spin_lock(&workqueue_lock);
+ spin_lock_irq(&workqueue_lock);

if (!workqueue_freezing)
goto out_unlock;
@@ -3514,7 +3518,7 @@ void thaw_workqueues(void)
struct global_cwq *gcwq = get_gcwq(cpu);
struct workqueue_struct *wq;

- spin_lock_irq(&gcwq->lock);
+ spin_lock(&gcwq->lock);

BUG_ON(!(gcwq->flags & GCWQ_FREEZING));
gcwq->flags &= ~GCWQ_FREEZING;
@@ -3535,15 +3539,82 @@ void thaw_workqueues(void)

wake_up_worker(gcwq);

- spin_unlock_irq(&gcwq->lock);
+ spin_unlock(&gcwq->lock);
}

workqueue_freezing = false;
out_unlock:
- spin_unlock(&workqueue_lock);
+ spin_unlock_irq(&workqueue_lock);
}
#endif /* CONFIG_FREEZER */

+#define WQ_CHECK_INTERVAL (10 * HZ)
+static void workqueue_check_timer_fn(unsigned long data);
+static DEFINE_TIMER(workqueue_check_timer, workqueue_check_timer_fn, 0, 0);
+
+static void workqueue_check_timer_fn(unsigned long data)
+{
+ unsigned long now = jiffies;
+ unsigned long wait, max_wait = 0;
+ unsigned int cpu;
+ unsigned long flags;
+
+ spin_lock_irqsave(&workqueue_lock, flags);
+
+ for_each_gcwq_cpu(cpu) {
+ struct global_cwq *gcwq = get_gcwq(cpu);
+ struct workqueue_struct *wq;
+ struct work_struct *work;
+
+ spin_lock(&gcwq->lock);
+
+ list_for_each_entry(wq, &workqueues, list) {
+ struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq);
+
+ if (!cwq)
+ continue;
+
+ list_for_each_entry(work, &cwq->delayed_works, entry) {
+ WARN_ON_ONCE(!time_before(work->activated_on,
+ work->queued_on));
+ wait = now - work->queued_on;
+ if (wait < WQ_CHECK_INTERVAL)
+ continue;
+ max_wait = max(max_wait, wait);
+ printk("XXX %s/%d %p:%pf delayed for %ums\n",
+ wq->name,
+ gcwq->cpu != WORK_CPU_UNBOUND ? gcwq->cpu : -1,
+ work, work->func, jiffies_to_msecs(wait));
+ }
+ }
+
+ list_for_each_entry(work, &gcwq->worklist, entry) {
+ WARN_ON_ONCE(time_before(work->activated_on,
+ work->queued_on));
+ wait = now - work->activated_on;
+ if (wait < WQ_CHECK_INTERVAL)
+ continue;
+ max_wait = max(max_wait, wait);
+ printk("XXX %s/%d %p:%pf pending for %ums after delayed %ums\n",
+ get_work_cwq(work)->wq->name,
+ gcwq->cpu != WORK_CPU_UNBOUND ? gcwq->cpu : -1,
+ work, work->func,
+ jiffies_to_msecs(wait),
+ jiffies_to_msecs(work->activated_on - work->queued_on));
+ }
+
+ spin_unlock(&gcwq->lock);
+ }
+
+ spin_unlock_irqrestore(&workqueue_lock, flags);
+
+ if (max_wait > 20 * HZ) {
+ printk("XXX max_wait over 30secs, dumping tasks\n");
+ show_state();
+ } else
+ mod_timer(&workqueue_check_timer, now + WQ_CHECK_INTERVAL / 2);
+}
+
static int __init init_workqueues(void)
{
unsigned int cpu;
@@ -3596,6 +3667,7 @@ static int __init init_workqueues(void)
system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
WQ_UNBOUND_MAX_ACTIVE);
BUG_ON(!system_wq || !system_long_wq || !system_nrt_wq);
+ mod_timer(&workqueue_check_timer, jiffies + WQ_CHECK_INTERVAL / 2);
return 0;
}
early_initcall(init_workqueues);

2010-08-30 14:02:36

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35


----- "Tejun Heo" <[email protected]> wrote:

> On 08/30/2010 12:24 PM, CAI Qian wrote:
> > Can't see any difference with hangcheck timer enabled.
>
> Hmm, odd. So, here's the said debug patch. It will periodically
> check all works and report if any work is being delayed for too long.
> If the max wait goes over 30secs, it will dump all task states and
> disable itself. Can you please apply the patch on top of rc2 +
> wq#for-linus and report the output? It should tell us who's stuck
> where.
Nothing new was printed after around 10 minutes.
> Thanks.
>
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index f11100f..282322c 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -83,6 +83,8 @@ struct work_struct {
> #ifdef CONFIG_LOCKDEP
> struct lockdep_map lockdep_map;
> #endif
> + unsigned long queued_on;
> + unsigned long activated_on;
> };
>
> #define WORK_DATA_INIT() ATOMIC_LONG_INIT(WORK_STRUCT_NO_CPU)
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index a2dccfc..9f95169 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -913,6 +913,8 @@ static void insert_work(struct
> cpu_workqueue_struct *cwq,
> {
> struct global_cwq *gcwq = cwq->gcwq;
>
> + work->queued_on = work->activated_on = jiffies;
> +
> /* we own @work, set data and link */
> set_work_cwq(work, cwq, extra_flags);
>
> @@ -996,13 +998,14 @@ static void __queue_work(unsigned int cpu,
> struct workqueue_struct *wq,
> if (likely(cwq->nr_active < cwq->max_active)) {
> cwq->nr_active++;
> worklist = gcwq_determine_ins_pos(gcwq, cwq);
> + insert_work(cwq, work, worklist, work_flags);
> } else {
> work_flags |= WORK_STRUCT_DELAYED;
> worklist = &cwq->delayed_works;
> + insert_work(cwq, work, worklist, work_flags);
> + work->activated_on--;
> }
>
> - insert_work(cwq, work, worklist, work_flags);
> -
> spin_unlock_irqrestore(&gcwq->lock, flags);
> }
>
> @@ -1669,6 +1672,7 @@ static void cwq_activate_first_delayed(struct
> cpu_workqueue_struct *cwq)
> struct work_struct, entry);
> struct list_head *pos = gcwq_determine_ins_pos(cwq->gcwq, cwq);
>
> + work->activated_on = jiffies;
> move_linked_works(work, pos, NULL);
> __clear_bit(WORK_STRUCT_DELAYED_BIT, work_data_bits(work));
> cwq->nr_active++;
> @@ -2810,7 +2814,7 @@ struct workqueue_struct
> *__alloc_workqueue_key(const char *name,
> * list. Grab it, set max_active accordingly and add the new
> * workqueue to workqueues list.
> */
> - spin_lock(&workqueue_lock);
> + spin_lock_irq(&workqueue_lock);
>
> if (workqueue_freezing && wq->flags & WQ_FREEZEABLE)
> for_each_cwq_cpu(cpu, wq)
> @@ -2818,7 +2822,7 @@ struct workqueue_struct
> *__alloc_workqueue_key(const char *name,
>
> list_add(&wq->list, &workqueues);
>
> - spin_unlock(&workqueue_lock);
> + spin_unlock_irq(&workqueue_lock);
>
> return wq;
> err:
> @@ -2849,9 +2853,9 @@ void destroy_workqueue(struct workqueue_struct
> *wq)
> * wq list is used to freeze wq, remove from list after
> * flushing is complete in case freeze races us.
> */
> - spin_lock(&workqueue_lock);
> + spin_lock_irq(&workqueue_lock);
> list_del(&wq->list);
> - spin_unlock(&workqueue_lock);
> + spin_unlock_irq(&workqueue_lock);
>
> /* sanity check */
> for_each_cwq_cpu(cpu, wq) {
> @@ -2891,23 +2895,23 @@ void workqueue_set_max_active(struct
> workqueue_struct *wq, int max_active)
>
> max_active = wq_clamp_max_active(max_active, wq->flags, wq->name);
>
> - spin_lock(&workqueue_lock);
> + spin_lock_irq(&workqueue_lock);
>
> wq->saved_max_active = max_active;
>
> for_each_cwq_cpu(cpu, wq) {
> struct global_cwq *gcwq = get_gcwq(cpu);
>
> - spin_lock_irq(&gcwq->lock);
> + spin_lock(&gcwq->lock);
>
> if (!(wq->flags & WQ_FREEZEABLE) ||
> !(gcwq->flags & GCWQ_FREEZING))
> get_cwq(gcwq->cpu, wq)->max_active = max_active;
>
> - spin_unlock_irq(&gcwq->lock);
> + spin_unlock(&gcwq->lock);
> }
>
> - spin_unlock(&workqueue_lock);
> + spin_unlock_irq(&workqueue_lock);
> }
> EXPORT_SYMBOL_GPL(workqueue_set_max_active);
>
> @@ -3419,7 +3423,7 @@ void freeze_workqueues_begin(void)
> {
> unsigned int cpu;
>
> - spin_lock(&workqueue_lock);
> + spin_lock_irq(&workqueue_lock);
>
> BUG_ON(workqueue_freezing);
> workqueue_freezing = true;
> @@ -3428,7 +3432,7 @@ void freeze_workqueues_begin(void)
> struct global_cwq *gcwq = get_gcwq(cpu);
> struct workqueue_struct *wq;
>
> - spin_lock_irq(&gcwq->lock);
> + spin_lock(&gcwq->lock);
>
> BUG_ON(gcwq->flags & GCWQ_FREEZING);
> gcwq->flags |= GCWQ_FREEZING;
> @@ -3440,10 +3444,10 @@ void freeze_workqueues_begin(void)
> cwq->max_active = 0;
> }
>
> - spin_unlock_irq(&gcwq->lock);
> + spin_unlock(&gcwq->lock);
> }
>
> - spin_unlock(&workqueue_lock);
> + spin_unlock_irq(&workqueue_lock);
> }
>
> /**
> @@ -3464,7 +3468,7 @@ bool freeze_workqueues_busy(void)
> unsigned int cpu;
> bool busy = false;
>
> - spin_lock(&workqueue_lock);
> + spin_lock_irq(&workqueue_lock);
>
> BUG_ON(!workqueue_freezing);
>
> @@ -3488,7 +3492,7 @@ bool freeze_workqueues_busy(void)
> }
> }
> out_unlock:
> - spin_unlock(&workqueue_lock);
> + spin_unlock_irq(&workqueue_lock);
> return busy;
> }
>
> @@ -3505,7 +3509,7 @@ void thaw_workqueues(void)
> {
> unsigned int cpu;
>
> - spin_lock(&workqueue_lock);
> + spin_lock_irq(&workqueue_lock);
>
> if (!workqueue_freezing)
> goto out_unlock;
> @@ -3514,7 +3518,7 @@ void thaw_workqueues(void)
> struct global_cwq *gcwq = get_gcwq(cpu);
> struct workqueue_struct *wq;
>
> - spin_lock_irq(&gcwq->lock);
> + spin_lock(&gcwq->lock);
>
> BUG_ON(!(gcwq->flags & GCWQ_FREEZING));
> gcwq->flags &= ~GCWQ_FREEZING;
> @@ -3535,15 +3539,82 @@ void thaw_workqueues(void)
>
> wake_up_worker(gcwq);
>
> - spin_unlock_irq(&gcwq->lock);
> + spin_unlock(&gcwq->lock);
> }
>
> workqueue_freezing = false;
> out_unlock:
> - spin_unlock(&workqueue_lock);
> + spin_unlock_irq(&workqueue_lock);
> }
> #endif /* CONFIG_FREEZER */
>
> +#define WQ_CHECK_INTERVAL (10 * HZ)
> +static void workqueue_check_timer_fn(unsigned long data);
> +static DEFINE_TIMER(workqueue_check_timer, workqueue_check_timer_fn,
> 0, 0);
> +
> +static void workqueue_check_timer_fn(unsigned long data)
> +{
> + unsigned long now = jiffies;
> + unsigned long wait, max_wait = 0;
> + unsigned int cpu;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&workqueue_lock, flags);
> +
> + for_each_gcwq_cpu(cpu) {
> + struct global_cwq *gcwq = get_gcwq(cpu);
> + struct workqueue_struct *wq;
> + struct work_struct *work;
> +
> + spin_lock(&gcwq->lock);
> +
> + list_for_each_entry(wq, &workqueues, list) {
> + struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq);
> +
> + if (!cwq)
> + continue;
> +
> + list_for_each_entry(work, &cwq->delayed_works, entry) {
> + WARN_ON_ONCE(!time_before(work->activated_on,
> + work->queued_on));
> + wait = now - work->queued_on;
> + if (wait < WQ_CHECK_INTERVAL)
> + continue;
> + max_wait = max(max_wait, wait);
> + printk("XXX %s/%d %p:%pf delayed for %ums\n",
> + wq->name,
> + gcwq->cpu != WORK_CPU_UNBOUND ? gcwq->cpu : -1,
> + work, work->func, jiffies_to_msecs(wait));
> + }
> + }
> +
> + list_for_each_entry(work, &gcwq->worklist, entry) {
> + WARN_ON_ONCE(time_before(work->activated_on,
> + work->queued_on));
> + wait = now - work->activated_on;
> + if (wait < WQ_CHECK_INTERVAL)
> + continue;
> + max_wait = max(max_wait, wait);
> + printk("XXX %s/%d %p:%pf pending for %ums after delayed %ums\n",
> + get_work_cwq(work)->wq->name,
> + gcwq->cpu != WORK_CPU_UNBOUND ? gcwq->cpu : -1,
> + work, work->func,
> + jiffies_to_msecs(wait),
> + jiffies_to_msecs(work->activated_on - work->queued_on));
> + }
> +
> + spin_unlock(&gcwq->lock);
> + }
> +
> + spin_unlock_irqrestore(&workqueue_lock, flags);
> +
> + if (max_wait > 20 * HZ) {
> + printk("XXX max_wait over 30secs, dumping tasks\n");
> + show_state();
> + } else
> + mod_timer(&workqueue_check_timer, now + WQ_CHECK_INTERVAL / 2);
> +}
> +
> static int __init init_workqueues(void)
> {
> unsigned int cpu;
> @@ -3596,6 +3667,7 @@ static int __init init_workqueues(void)
> system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
> WQ_UNBOUND_MAX_ACTIVE);
> BUG_ON(!system_wq || !system_long_wq || !system_nrt_wq);
> + mod_timer(&workqueue_check_timer, jiffies + WQ_CHECK_INTERVAL / 2);
> return 0;
> }
> early_initcall(init_workqueues);
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2010-08-30 14:21:53

by Tejun Heo

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

Hello,

On 08/30/2010 04:02 PM, CAI Qian wrote:
>> Hmm, odd. So, here's the said debug patch. It will periodically
>> check all works and report if any work is being delayed for too long.
>> If the max wait goes over 30secs, it will dump all task states and
>> disable itself. Can you please apply the patch on top of rc2 +
>> wq#for-linus and report the output? It should tell us who's stuck
>> where.
>
> Nothing new was printed after around 10 minutes.

Eh... that's interesting. That means no work is stalled on
workqueues, so it at least is not a workqueue stall. Can you please
try the following then? Or if sysrq wasn't working because your
keyboard wasn't initialized at that point, you can setup serial
console and trigger sysrq that way.

Thanks.

diff --git a/init/main.c b/init/main.c
index 94ab488..e156b8f 100644
--- a/init/main.c
+++ b/init/main.c
@@ -423,11 +423,19 @@ static void __init setup_command_line(char *command_line)

static __initdata DECLARE_COMPLETION(kthreadd_done);

+static void show_state_timer_fn(unsigned long data)
+{
+ show_state();
+}
+static DEFINE_TIMER(show_state_timer, show_state_timer_fn, 0, 0);
+
static noinline void __init_refok rest_init(void)
__releases(kernel_lock)
{
int pid;

+ printk("XXX show_state_timer registered\n");
+ mod_timer(&show_state_timer, jiffies + 180 * HZ);
rcu_scheduler_starting();
/*
* We need to spawn init first so that it obtains pid 1, however

2010-08-30 14:47:43

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35


----- "Tejun Heo" <[email protected]> wrote:

> Hello,
>
> On 08/30/2010 04:02 PM, CAI Qian wrote:
> >> Hmm, odd. So, here's the said debug patch. It will periodically
> >> check all works and report if any work is being delayed for too
> long.
> >> If the max wait goes over 30secs, it will dump all task states and
> >> disable itself. Can you please apply the patch on top of rc2 +
> >> wq#for-linus and report the output? It should tell us who's stuck
> >> where.
> >
> > Nothing new was printed after around 10 minutes.
>
> Eh... that's interesting. That means no work is stalled on
> workqueues, so it at least is not a workqueue stall. Can you please
> try the following then? Or if sysrq wasn't working because your
...
SMP alternatives: switching to UP code
ACPI: Core revision 20100702
ftrace: converting mcount calls to 0f 1f 44 00 00
ftrace: allocating 18488 entries in 73 pages
XXX show_state_timer registered
Not enabling x2apic, Intr-remapping init failed.
Setting APIC routing to physical flat
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Xeon(R) CPU X7550 @ 2.00GHz stepping 06
Brought up 1 CPUs
Total of 1 processors activated (3990.10 BogoMIPS).
devtmpfs: initialized
regulator: core version 0.5
NET: Registered protocol family 16
ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
ACPI: bus type pci registered
PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
PCI: Using configuration type 1 for base access
bio: create slab <bio-0> at 0

> keyboard wasn't initialized at that point, you can setup serial
> console and trigger sysrq that way.
Yes, it was triggered from a serial console.
> Thanks.
>
> diff --git a/init/main.c b/init/main.c
> index 94ab488..e156b8f 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -423,11 +423,19 @@ static void __init setup_command_line(char
> *command_line)
>
> static __initdata DECLARE_COMPLETION(kthreadd_done);
>
> +static void show_state_timer_fn(unsigned long data)
> +{
> + show_state();
> +}
> +static DEFINE_TIMER(show_state_timer, show_state_timer_fn, 0, 0);
> +
> static noinline void __init_refok rest_init(void)
> __releases(kernel_lock)
> {
> int pid;
>
> + printk("XXX show_state_timer registered\n");
> + mod_timer(&show_state_timer, jiffies + 180 * HZ);
> rcu_scheduler_starting();
> /*
> * We need to spawn init first so that it obtains pid 1, however
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2010-08-30 14:51:38

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35


----- "CAI Qian" <[email protected]> wrote:

> ----- "Tejun Heo" <[email protected]> wrote:
>
> > Hello,
> >
> > On 08/30/2010 04:02 PM, CAI Qian wrote:
> > >> Hmm, odd. So, here's the said debug patch. It will
> periodically
> > >> check all works and report if any work is being delayed for too
> > long.
> > >> If the max wait goes over 30secs, it will dump all task states
> and
> > >> disable itself. Can you please apply the patch on top of rc2 +
> > >> wq#for-linus and report the output? It should tell us who's
> stuck
> > >> where.
> > >
> > > Nothing new was printed after around 10 minutes.
> >
> > Eh... that's interesting. That means no work is stalled on
> > workqueues, so it at least is not a workqueue stall. Can you
> please
> > try the following then? Or if sysrq wasn't working because your
> ...
> SMP alternatives: switching to UP code
> ACPI: Core revision 20100702
> ftrace: converting mcount calls to 0f 1f 44 00 00
> ftrace: allocating 18488 entries in 73 pages
> XXX show_state_timer registered
> Not enabling x2apic, Intr-remapping init failed.
> Setting APIC routing to physical flat
> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> CPU0: Intel(R) Xeon(R) CPU X7550 @ 2.00GHz stepping 06
> Brought up 1 CPUs
> Total of 1 processors activated (3990.10 BogoMIPS).
> devtmpfs: initialized
> regulator: core version 0.5
> NET: Registered protocol family 16
> ACPI FADT declares the system doesn't support PCIe ASPM, so disable
> it
> ACPI: bus type pci registered
> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> 0x80000000-0x8fffffff] (base 0x80000000)
> PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> PCI: Using configuration type 1 for base access
> bio: create slab <bio-0> at 0
Hit return too quickly, and here were something new,

task PC stack pid father
swapper R running task 0 1 0 0x00000000
ffff88046dc1bcf0 0000000000000046 ffffffff81796812 ffffffff81a90700
ffffffff81c21420 0000000000015d00 ffff88106e5dc040 0000000000015d00
ffff88106e5dc5d0 ffff88046dc1bfd8 ffff88106e5dc5d8 ffff88046dc1bfd8
Call Trace:
[<ffffffff8105757a>] __cond_resched+0x2a/0x40
[<ffffffff8149c8a0>] _cond_resched+0x30/0x40
[<ffffffff81136cf0>] kmem_cache_alloc_node_notrace+0xa0/0x120
[<ffffffff810f6bd0>] ? mempool_alloc_slab+0x0/0x20
[<ffffffff810f6fb1>] mempool_create_node+0x41/0x1a0
[<ffffffff810f6bb0>] ? mempool_free_slab+0x0/0x20
[<ffffffff810f7124>] mempool_create+0x14/0x20
[<ffffffff81176d53>] bioset_create+0x1a3/0x2f0
[<ffffffff81c685a8>] ? init_bio+0x0/0x141
[<ffffffff81c68693>] init_bio+0xeb/0x141
[<ffffffff81c64ff4>] ? default_bdi_init+0x9b/0xa2
[<ffffffff81002053>] do_one_initcall+0x43/0x190
[<ffffffff81c418ab>] kernel_init+0x2a0/0x330
[<ffffffff8100be84>] kernel_thread_helper+0x4/0x10
[<ffffffff81c4160b>] ? kernel_init+0x0/0x330
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
kthreadd S 0000000000000000 0 2 0 0x00000000
ffff88046dc1fe90 0000000000000046 ffff88046dc1fe90 ffffffff81013596
0000000000000000 0000000000015d00 ffff88046dc1d4c0 0000000000015d00
ffff88046dc1da50 ffff88046dc1ffd8 ffff88046dc1da58 ffff88046dc1ffd8
Call Trace:
[<ffffffff81013596>] ? kernel_thread+0x76/0x80
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
[<ffffffff8107fe85>] kthreadd+0x275/0x280
[<ffffffff8100be84>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fc10>] ? kthreadd+0x0/0x280
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
ksoftirqd/0 S 0000000000000000 0 3 2 0x00000000
ffff88046dc43ea0 0000000000000046 ffffffff81a5a4a0 ffff880002415d00
ffffffff81a5a4a0 0000000000015d00 ffff88046dc1ca80 0000000000015d00
ffff88046dc1d010 ffff88046dc43fd8 ffff88046dc1d018 ffff88046dc43fd8
Call Trace:
[<ffffffff81065c35>] run_ksoftirqd+0xe5/0x140
[<ffffffff81065b50>] ? run_ksoftirqd+0x0/0x140
[<ffffffff8107fc06>] kthread+0x96/0xa0
[<ffffffff8100be84>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fb70>] ? kthread+0x0/0xa0
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
kworker/0:0 S 0000000000000000 0 4 2 0x00000000
ffff88046dc45e50 0000000000000046 ffff88046dc45db0 ffffffff8104ac80
ffff88046dc45e00 0000000000015d00 ffff88046dc1c040 0000000000015d00
ffff88046dc1c5d0 ffff88046dc45fd8 ffff88046dc1c5d8 ffff88046dc45fd8
Call Trace:
[<ffffffff8104ac80>] ? __dequeue_entity+0x30/0x50
[<ffffffff8107b949>] worker_thread+0x259/0x3c0
[<ffffffff8107b6f0>] ? worker_thread+0x0/0x3c0
[<ffffffff8107fc06>] kthread+0x96/0xa0
[<ffffffff8100be84>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fb70>] ? kthread+0x0/0xa0
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
kworker/u:0 S 0000000000000000 0 5 2 0x00000000
ffff88046dc49e50 0000000000000046 ffff88046dc49db0 ffffffff8104ac80
ffff88046dc49e00 0000000000015d00 ffff88046dc47500 0000000000015d00
ffff88046dc47a90 ffff88046dc49fd8 ffff88046dc47a98 ffff88046dc49fd8
Call Trace:
[<ffffffff8104ac80>] ? __dequeue_entity+0x30/0x50
[<ffffffff8107b949>] worker_thread+0x259/0x3c0
[<ffffffff8107b6f0>] ? worker_thread+0x0/0x3c0
[<ffffffff8107fc06>] kthread+0x96/0xa0
[<ffffffff8100be84>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fb70>] ? kthread+0x0/0xa0
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
migration/0 S 0000000000000000 0 6 2 0x00000000
ffff88046dc4dde0 0000000000000046 ffff88046dc4dd60 ffffffff8104a7e8
ffff88046dc4dd70 0000000000015d00 ffff88046dc46ac0 0000000000015d00
ffff88046dc47050 ffff88046dc4dfd8 ffff88046dc47058 ffff88046dc4dfd8
Call Trace:
[<ffffffff8104a7e8>] ? update_curr+0xf8/0x1e0
[<ffffffff81009630>] ? __switch_to+0xd0/0x320
[<ffffffff810b1ec0>] ? cpu_stopper_thread+0x0/0x1d0
[<ffffffff810b1ffd>] cpu_stopper_thread+0x13d/0x1d0
[<ffffffff810b1ec0>] ? cpu_stopper_thread+0x0/0x1d0
[<ffffffff810b1ec0>] ? cpu_stopper_thread+0x0/0x1d0
[<ffffffff8107fc06>] kthread+0x96/0xa0
[<ffffffff8100be84>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fb70>] ? kthread+0x0/0xa0
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
cpuset S 0000000000000000 0 7 2 0x00000000
ffff88046dc6fe60 0000000000000046 ffff88046dc460b8 ffff88046dc460b8
0000000000000000 0000000000015d00 ffff88046dc46080 0000000000015d00
ffff88046dc46610 ffff88046dc6ffd8 ffff88046dc46618 ffff88046dc6ffd8
Call Trace:
[<ffffffff81050e39>] ? set_user_nice+0xc9/0x130
[<ffffffff81079789>] rescuer_thread+0x1a9/0x1d0
[<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
[<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
[<ffffffff8107fc06>] kthread+0x96/0xa0
[<ffffffff8100be84>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fb70>] ? kthread+0x0/0xa0
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
khelper S 0000000000000000 0 8 2 0x00000000
ffff88046dc73e60 0000000000000046 ffff88046dc71578 ffff88046dc71578
0000000000000000 0000000000015d00 ffff88046dc71540 0000000000015d00
ffff88046dc71ad0 ffff88046dc73fd8 ffff88046dc71ad8 ffff88046dc73fd8
Call Trace:
[<ffffffff81050e39>] ? set_user_nice+0xc9/0x130
[<ffffffff81079789>] rescuer_thread+0x1a9/0x1d0
[<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
[<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
[<ffffffff8107fc06>] kthread+0x96/0xa0
[<ffffffff8100be84>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fb70>] ? kthread+0x0/0xa0
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
netns S 0000000000000000 0 9 2 0x00000000
ffff88046dca7e60 0000000000000046 ffff88046dc70b38 ffff88046dc70b38
0000000000000000 0000000000015d00 ffff88046dc70b00 0000000000015d00
ffff88046dc71090 ffff88046dca7fd8 ffff88046dc71098 ffff88046dca7fd8
Call Trace:
[<ffffffff81050e39>] ? set_user_nice+0xc9/0x130
[<ffffffff81079789>] rescuer_thread+0x1a9/0x1d0
[<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
[<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
[<ffffffff8107fc06>] kthread+0x96/0xa0
[<ffffffff8100be84>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fb70>] ? kthread+0x0/0xa0
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
pm S 0000000000000000 0 10 2 0x00000000
ffff88046dca9e60 0000000000000046 ffff88046dc700f8 ffff88046dc700f8
0000000000000000 0000000000015d00 ffff88046dc700c0 0000000000015d00
ffff88046dc70650 ffff88046dca9fd8 ffff88046dc70658 ffff88046dca9fd8
Call Trace:
[<ffffffff81050e39>] ? set_user_nice+0xc9/0x130
[<ffffffff81079789>] rescuer_thread+0x1a9/0x1d0
[<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
[<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
[<ffffffff8107fc06>] kthread+0x96/0xa0
[<ffffffff8100be84>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fb70>] ? kthread+0x0/0xa0
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
sync_supers R running task 0 11 2 0x00000000
ffff88046dcebeb0 0000000000000046 ffffffff81a5a4a0 0000000000000000
ffff88106e5dc040 0000000000015d00 ffff88046dce9580 0000000000015d00
ffff88046dce9b10 ffff88046dcebfd8 ffff88046dce9b18 ffff88046dcebfd8
Call Trace:
[<ffffffff8110e780>] ? bdi_sync_supers+0x0/0x60
[<ffffffff8110e7c4>] bdi_sync_supers+0x44/0x60
[<ffffffff8110e780>] ? bdi_sync_supers+0x0/0x60
[<ffffffff8107fc06>] kthread+0x96/0xa0
[<ffffffff8100be84>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fb70>] ? kthread+0x0/0xa0
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
bdi-default S 0000000000000000 0 12 2 0x00000000
ffff88046dcefd40 0000000000000046 0000000000000000 0000000000000000
0000000000000000 0000000000015d00 ffff88046dce8b40 0000000000015d00
ffff88046dce90d0 ffff88046dceffd8 ffff88046dce90d8 ffff88046dceffd8
Call Trace:
[<ffffffff8149cd1c>] schedule_timeout+0x18c/0x2e0
[<ffffffff8106e260>] ? process_timeout+0x0/0x10
[<ffffffff8110f46d>] bdi_forker_thread+0x22d/0x500
[<ffffffff8110f240>] ? bdi_forker_thread+0x0/0x500
[<ffffffff8110f240>] ? bdi_forker_thread+0x0/0x500
[<ffffffff8107fc06>] kthread+0x96/0xa0
[<ffffffff8100be84>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fb70>] ? kthread+0x0/0xa0
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
kintegrityd R running task 0 13 2 0x00000008
ffffffffffffff10 ffffffff81078e25 0000000000000010 0000000000000206
ffff88046dcf1e40 0000000000000018 ffff88046dc9a440 ffff88046dc9a440
ffff88088e231f00 ffff88088e239300 ffff88046dcf1ee0 ffffffff810796e8
Call Trace:
[<ffffffff81078e25>] ? worker_maybe_bind_and_lock+0x55/0xf0
[<ffffffff810796e8>] ? rescuer_thread+0x108/0x1d0
[<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
[<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
[<ffffffff8107fc06>] ? kthread+0x96/0xa0
[<ffffffff8100be84>] ? kernel_thread_helper+0x4/0x10
[<ffffffff8107fb70>] ? kthread+0x0/0xa0
[<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
Sched Debug Version: v0.09, 2.6.36-rc2-wq+ #20
now at 180552.999849 msecs
.jiffies : 4294847851
.sysctl_sched_latency : 6.000000
.sysctl_sched_min_granularity : 2.000000
.sysctl_sched_wakeup_granularity : 1.000000
.sysctl_sched_child_runs_first : 0
.sysctl_sched_features : 15471
.sysctl_sched_tunable_scaling : 1 (logaritmic)

cpu#0, 1995.053 MHz
.nr_running : 3
.load : 90809
.nr_switches : 57
.nr_load_updates : 180566
.nr_uninterruptible : 0
.next_balance : 4294.668298
.curr->pid : 13
.clock : 181316.279544
.cpu_load[0] : 90809
.cpu_load[1] : 90809
.cpu_load[2] : 90809
.cpu_load[3] : 90809
.cpu_load[4] : 90809
.yld_count : 0
.sched_switch : 0
.sched_count : 59
.sched_goidle : 1
.avg_idle : 1000000
.ttwu_count : 34
.ttwu_local : 34
.bkl_count : 0

cfs_rq[0]:/
.exec_clock : 181045.961245
.MIN_vruntime : 293.512617
.min_vruntime : 296.512617
.max_vruntime : 296.512617
.spread : 3.000000
.spread0 : 0.000000
.nr_running : 3
.load : 90809
.nr_spread_over : 0
.shares : 0

rt_rq[0]:/
.rt_nr_running : 0
.rt_throttled : 0
.rt_time : 0.000000
.rt_runtime : 950.000000

runnable tasks:
task PID tree-key switches prio exec-runtime sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
swapper 1 296.512617 23 120 296.512617 282.090606 0.046441 /
sync_supers 11 293.512617 2 120 293.512617 0.002502 5354.887736 /
R kintegrityd 13 2382.377879 1 100 2382.377879 180846.751167 0.001076 /


> > keyboard wasn't initialized at that point, you can setup serial
> > console and trigger sysrq that way.
> Yes, it was triggered from a serial console.
> > Thanks.
> >
> > diff --git a/init/main.c b/init/main.c
> > index 94ab488..e156b8f 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -423,11 +423,19 @@ static void __init setup_command_line(char
> > *command_line)
> >
> > static __initdata DECLARE_COMPLETION(kthreadd_done);
> >
> > +static void show_state_timer_fn(unsigned long data)
> > +{
> > + show_state();
> > +}
> > +static DEFINE_TIMER(show_state_timer, show_state_timer_fn, 0, 0);
> > +
> > static noinline void __init_refok rest_init(void)
> > __releases(kernel_lock)
> > {
> > int pid;
> >
> > + printk("XXX show_state_timer registered\n");
> > + mod_timer(&show_state_timer, jiffies + 180 * HZ);
> > rcu_scheduler_starting();
> > /*
> > * We need to spawn init first so that it obtains pid 1, however
> >
> > _______________________________________________
> > kexec mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/kexec
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2010-08-30 14:56:18

by Tejun Heo

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

On 08/30/2010 04:51 PM, CAI Qian wrote:
> kintegrityd R running task 0 13 2 0x00000008
> ffffffffffffff10 ffffffff81078e25 0000000000000010 0000000000000206
> ffff88046dcf1e40 0000000000000018 ffff88046dc9a440 ffff88046dc9a440
> ffff88088e231f00 ffff88088e239300 ffff88046dcf1ee0 ffffffff810796e8
> Call Trace:
> [<ffffffff81078e25>] ? worker_maybe_bind_and_lock+0x55/0xf0
> [<ffffffff810796e8>] ? rescuer_thread+0x108/0x1d0
> [<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
> [<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
> [<ffffffff8107fc06>] ? kthread+0x96/0xa0
> [<ffffffff8100be84>] ? kernel_thread_helper+0x4/0x10
> [<ffffffff8107fb70>] ? kthread+0x0/0xa0
> [<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
...
> runnable tasks:
> task PID tree-key switches prio exec-runtime sum-exec sum-sleep
> ----------------------------------------------------------------------------------------------------------
> swapper 1 296.512617 23 120 296.512617 282.090606 0.046441 /
> sync_supers 11 293.512617 2 120 293.512617 0.002502 5354.887736 /
> R kintegrityd 13 2382.377879 1 100 2382.377879 180846.751167 0.001076 /

Looks like kintegrityd went bonkers. Maybe it has some busy wait
which doesn't quite work well with how cmwq schedules works. I'll
look into it. Can you please attach your .config?

Thanks.

--
tejun

2010-08-30 15:11:12

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35


----- "Tejun Heo" <[email protected]> wrote:

> On 08/30/2010 04:51 PM, CAI Qian wrote:
> > kintegrityd R running task 0 13 2 0x00000008
> > ffffffffffffff10 ffffffff81078e25 0000000000000010
> 0000000000000206
> > ffff88046dcf1e40 0000000000000018 ffff88046dc9a440
> ffff88046dc9a440
> > ffff88088e231f00 ffff88088e239300 ffff88046dcf1ee0
> ffffffff810796e8
> > Call Trace:
> > [<ffffffff81078e25>] ? worker_maybe_bind_and_lock+0x55/0xf0
> > [<ffffffff810796e8>] ? rescuer_thread+0x108/0x1d0
> > [<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
> > [<ffffffff810795e0>] ? rescuer_thread+0x0/0x1d0
> > [<ffffffff8107fc06>] ? kthread+0x96/0xa0
> > [<ffffffff8100be84>] ? kernel_thread_helper+0x4/0x10
> > [<ffffffff8107fb70>] ? kthread+0x0/0xa0
> > [<ffffffff8100be80>] ? kernel_thread_helper+0x0/0x10
> ...
> > runnable tasks:
> > task PID tree-key switches prio
> exec-runtime sum-exec sum-sleep
> >
> ----------------------------------------------------------------------------------------------------------
> > swapper 1 296.512617 23 120
> 296.512617 282.090606 0.046441 /
> > sync_supers 11 293.512617 2 120
> 293.512617 0.002502 5354.887736 /
> > R kintegrityd 13 2382.377879 1 100
> 2382.377879 180846.751167 0.001076 /
>
> Looks like kintegrityd went bonkers. Maybe it has some busy wait
> which doesn't quite work well with how cmwq schedules works. I'll
> look into it. Can you please attach your .config?
Attached.
> Thanks.
>
> --
> tejun


Attachments:
config-2.6.36-rc2-wq+.x86_64 (103.60 kB)

2010-08-30 16:38:58

by Tejun Heo

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

Can you please try the following patch?

Thanks.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a2dccfc..f57cd6e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1224,6 +1224,8 @@ __acquires(&gcwq->lock)
{
struct global_cwq *gcwq = worker->gcwq;
struct task_struct *task = worker->task;
+ static unsigned int cnt;
+ int rc;

while (true) {
/*
@@ -1232,8 +1234,11 @@ __acquires(&gcwq->lock)
* it races with cpu hotunplug operation. Verify
* against GCWQ_DISASSOCIATED.
*/
- if (!(gcwq->flags & GCWQ_DISASSOCIATED))
- set_cpus_allowed_ptr(task, get_cpu_mask(gcwq->cpu));
+ if (!(gcwq->flags & GCWQ_DISASSOCIATED)) {
+ rc = set_cpus_allowed_ptr(task, get_cpu_mask(gcwq->cpu));
+ if (rc && ++cnt < 10)
+ printk("XXX set_cpus_allowed_ptr() failed w/ %d\n", rc);
+ }

spin_lock_irq(&gcwq->lock);
if (gcwq->flags & GCWQ_DISASSOCIATED)
@@ -1985,13 +1990,16 @@ repeat:
struct cpu_workqueue_struct *cwq = get_cwq(tcpu, wq);
struct global_cwq *gcwq = cwq->gcwq;
struct work_struct *work, *n;
+ bool bound;

__set_current_state(TASK_RUNNING);
mayday_clear_cpu(cpu, wq->mayday_mask);

/* migrate to the target cpu if possible */
rescuer->gcwq = gcwq;
- worker_maybe_bind_and_lock(rescuer);
+ printk("XXX %s: rescuer dispatching to cpu%u\n", wq->name, gcwq->cpu);
+ bound = worker_maybe_bind_and_lock(rescuer);
+ printk("XXX %s: rescuer done binding, bound=%d\n", wq->name, bound);

/*
* Slurp in all works issued via this workqueue and
@@ -3558,8 +3566,7 @@ static int __init init_workqueues(void)
spin_lock_init(&gcwq->lock);
INIT_LIST_HEAD(&gcwq->worklist);
gcwq->cpu = cpu;
- if (cpu == WORK_CPU_UNBOUND)
- gcwq->flags |= GCWQ_DISASSOCIATED;
+ gcwq->flags |= GCWQ_DISASSOCIATED;

INIT_LIST_HEAD(&gcwq->idle_list);
for (i = 0; i < BUSY_WORKER_HASH_SIZE; i++)
@@ -3583,6 +3590,7 @@ static int __init init_workqueues(void)
struct global_cwq *gcwq = get_gcwq(cpu);
struct worker *worker;

+ gcwq->flags &= ~GCWQ_DISASSOCIATED;
worker = create_worker(gcwq, true);
BUG_ON(!worker);
spin_lock_irq(&gcwq->lock);

2010-08-30 17:31:59

by Tejun Heo

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

Please try this one instead. Thanks.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a2dccfc..75cdbc2 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1224,6 +1224,8 @@ __acquires(&gcwq->lock)
{
struct global_cwq *gcwq = worker->gcwq;
struct task_struct *task = worker->task;
+ static unsigned int cnt;
+ int rc;

while (true) {
/*
@@ -1232,8 +1234,11 @@ __acquires(&gcwq->lock)
* it races with cpu hotunplug operation. Verify
* against GCWQ_DISASSOCIATED.
*/
- if (!(gcwq->flags & GCWQ_DISASSOCIATED))
- set_cpus_allowed_ptr(task, get_cpu_mask(gcwq->cpu));
+ if (!(gcwq->flags & GCWQ_DISASSOCIATED)) {
+ rc = set_cpus_allowed_ptr(task, get_cpu_mask(gcwq->cpu));
+ if (rc && ++cnt < 10)
+ printk("XXX set_cpus_allowed_ptr() failed w/ %d\n", rc);
+ }

spin_lock_irq(&gcwq->lock);
if (gcwq->flags & GCWQ_DISASSOCIATED)
@@ -1985,13 +1990,16 @@ repeat:
struct cpu_workqueue_struct *cwq = get_cwq(tcpu, wq);
struct global_cwq *gcwq = cwq->gcwq;
struct work_struct *work, *n;
+ bool bound;

__set_current_state(TASK_RUNNING);
mayday_clear_cpu(cpu, wq->mayday_mask);

/* migrate to the target cpu if possible */
rescuer->gcwq = gcwq;
- worker_maybe_bind_and_lock(rescuer);
+ printk("XXX %s: rescuer dispatching to cpu%u\n", wq->name, gcwq->cpu);
+ bound = worker_maybe_bind_and_lock(rescuer);
+ printk("XXX %s: rescuer done binding, bound=%d\n", wq->name, bound);

/*
* Slurp in all works issued via this workqueue and
@@ -3558,8 +3566,7 @@ static int __init init_workqueues(void)
spin_lock_init(&gcwq->lock);
INIT_LIST_HEAD(&gcwq->worklist);
gcwq->cpu = cpu;
- if (cpu == WORK_CPU_UNBOUND)
- gcwq->flags |= GCWQ_DISASSOCIATED;
+ gcwq->flags |= GCWQ_DISASSOCIATED;

INIT_LIST_HEAD(&gcwq->idle_list);
for (i = 0; i < BUSY_WORKER_HASH_SIZE; i++)
@@ -3583,6 +3590,8 @@ static int __init init_workqueues(void)
struct global_cwq *gcwq = get_gcwq(cpu);
struct worker *worker;

+ if (cpu != WORK_CPU_UNBOUND)
+ gcwq->flags &= ~GCWQ_DISASSOCIATED;
worker = create_worker(gcwq, true);
BUG_ON(!worker);
spin_lock_irq(&gcwq->lock);

2010-08-31 00:54:17

by Qian Cai

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35


----- "Tejun Heo" <[email protected]> wrote:

> Please try this one instead. Thanks.
Unable to reproduce it any more after applied the patch,

NET: Registered protocol family 16
ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
ACPI: bus type pci registered
PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
PCI: Using configuration type 1 for base access
bio: create slab <bio-0> at 0
XXX kintegrityd: rescuer dispatching to cpu0
XXX kintegrityd: rescuer done binding, bound=1
XXX kintegrityd: rescuer dispatching to cpu5
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu6
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu10
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu13
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu14
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu18
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu21
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu22
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu30
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu32
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu33
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu34
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu35
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu37
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu42
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu45
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu46
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu48
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu50
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu53
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu54
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu57
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu58
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu60
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu61
XXX kintegrityd: rescuer done binding, bound=0
XXX kintegrityd: rescuer dispatching to cpu62
XXX kintegrityd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu0
XXX kblockd: rescuer done binding, bound=1
XXX kblockd: rescuer dispatching to cpu5
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu6
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu10
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu13
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu14
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu18
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu21
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu22
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu30
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu32
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu33
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu34
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu35
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu37
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu40
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu41
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu45
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu46
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu50
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu51
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu53
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu54
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu56
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu61
XXX kblockd: rescuer done binding, bound=0
XXX kblockd: rescuer dispatching to cpu62
XXX kblockd: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu0
XXX kacpid: rescuer done binding, bound=1
XXX kacpid: rescuer dispatching to cpu5
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu6
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu10
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu13
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu14
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu18
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu21
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu22
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu30
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu32
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu33
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu34
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu35
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu37
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu40
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu41
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu45
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu46
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu50
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu51
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu53
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu54
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu56
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu61
XXX kacpid: rescuer done binding, bound=0
XXX kacpid: rescuer dispatching to cpu62
XXX kacpid: rescuer done binding, bound=0
ACPI Error: Field [CPB3] at 96 exceeds Buffer [NULL] size 64 (bits) (20100702/dsopcode-597)
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_._OSC] (Node ffff880c6e58cf38), AE_AML_BUFFER_LIMIT
ACPI: Interpreter enabled
ACPI: (supports S0 S1 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
ACPI: PCI Root Bridge [IOH0] (domain 0000 [bus 00-7f])
...

>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index a2dccfc..75cdbc2 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1224,6 +1224,8 @@ __acquires(&gcwq->lock)
> {
> struct global_cwq *gcwq = worker->gcwq;
> struct task_struct *task = worker->task;
> + static unsigned int cnt;
> + int rc;
>
> while (true) {
> /*
> @@ -1232,8 +1234,11 @@ __acquires(&gcwq->lock)
> * it races with cpu hotunplug operation. Verify
> * against GCWQ_DISASSOCIATED.
> */
> - if (!(gcwq->flags & GCWQ_DISASSOCIATED))
> - set_cpus_allowed_ptr(task, get_cpu_mask(gcwq->cpu));
> + if (!(gcwq->flags & GCWQ_DISASSOCIATED)) {
> + rc = set_cpus_allowed_ptr(task, get_cpu_mask(gcwq->cpu));
> + if (rc && ++cnt < 10)
> + printk("XXX set_cpus_allowed_ptr() failed w/ %d\n", rc);
> + }
>
> spin_lock_irq(&gcwq->lock);
> if (gcwq->flags & GCWQ_DISASSOCIATED)
> @@ -1985,13 +1990,16 @@ repeat:
> struct cpu_workqueue_struct *cwq = get_cwq(tcpu, wq);
> struct global_cwq *gcwq = cwq->gcwq;
> struct work_struct *work, *n;
> + bool bound;
>
> __set_current_state(TASK_RUNNING);
> mayday_clear_cpu(cpu, wq->mayday_mask);
>
> /* migrate to the target cpu if possible */
> rescuer->gcwq = gcwq;
> - worker_maybe_bind_and_lock(rescuer);
> + printk("XXX %s: rescuer dispatching to cpu%u\n", wq->name,
> gcwq->cpu);
> + bound = worker_maybe_bind_and_lock(rescuer);
> + printk("XXX %s: rescuer done binding, bound=%d\n", wq->name,
> bound);
>
> /*
> * Slurp in all works issued via this workqueue and
> @@ -3558,8 +3566,7 @@ static int __init init_workqueues(void)
> spin_lock_init(&gcwq->lock);
> INIT_LIST_HEAD(&gcwq->worklist);
> gcwq->cpu = cpu;
> - if (cpu == WORK_CPU_UNBOUND)
> - gcwq->flags |= GCWQ_DISASSOCIATED;
> + gcwq->flags |= GCWQ_DISASSOCIATED;
>
> INIT_LIST_HEAD(&gcwq->idle_list);
> for (i = 0; i < BUSY_WORKER_HASH_SIZE; i++)
> @@ -3583,6 +3590,8 @@ static int __init init_workqueues(void)
> struct global_cwq *gcwq = get_gcwq(cpu);
> struct worker *worker;
>
> + if (cpu != WORK_CPU_UNBOUND)
> + gcwq->flags &= ~GCWQ_DISASSOCIATED;
> worker = create_worker(gcwq, true);
> BUG_ON(!worker);
> spin_lock_irq(&gcwq->lock);
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2010-08-31 09:23:49

by Tejun Heo

[permalink] [raw]
Subject: Re: kdump regression compared to v2.6.35

Hello,

Thanks for verifyiing. Can you please apply this patch on top and
check that those debug messages go away?

Thanks.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c8183b2..7855429 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -196,7 +196,7 @@ typedef cpumask_var_t mayday_mask_t;
cpumask_test_and_set_cpu((cpu), (mask))
#define mayday_clear_cpu(cpu, mask) cpumask_clear_cpu((cpu), (mask))
#define for_each_mayday_cpu(cpu, mask) for_each_cpu((cpu), (mask))
-#define alloc_mayday_mask(maskp, gfp) alloc_cpumask_var((maskp), (gfp))
+#define alloc_mayday_mask(maskp, gfp) zalloc_cpumask_var((maskp), (gfp))
#define free_mayday_mask(mask) free_cpumask_var((mask))
#else
typedef unsigned long mayday_mask_t;