Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude.
3.10-rc5 did not exhibit this (nor any other kernel recently tried,
including most -rc's). Does not seem to be reproducible.
[ 568.834221] ------------[ cut here ]------------
[ 568.894907] WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
[ 568.971594] Modules linked in: ipv6 tg3 ptp pps_core hwmon
[ 569.043635] CPU: 1 PID: 2952 Comm: aptitude Not tainted 3.10.0-rc6 #85
[ 569.129412] Call Trace:
[ 569.161440] [00000000004d811c] exit_mmap+0x13c/0x160
[ 569.227785] [000000000045680c] mmput.part.62+0xc/0xc0
[ 569.295258] [000000000045c25c] exit_mm+0x11c/0x180
[ 569.359301] [000000000045dc24] do_exit+0x244/0x340
[ 569.423354] [000000000045dea8] do_group_exit+0x28/0xc0
[ 569.491982] [0000000000469a08] get_signal_to_deliver+0x1c8/0x3a0
[ 569.572039] [0000000000447874] do_signal32+0x14/0x220
[ 569.639526] [000000000042c8e0] do_signal+0x2c0/0x520
[ 569.705854] [000000000042d340] do_notify_resume+0x40/0x60
[ 569.777907] [0000000000404b04] __handle_signal+0xc/0x2c
[ 569.847671] ---[ end trace 4acf84f71c8b5f1b ]---
[ 569.908406] BUG: Bad rss-counter state mm:fffff8000dcb6700 idx:1 val:1
Full dmesg:
[ 0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.a 2010/01/06 14:48'
[ 0.000000] PROMLIB: Root node compatible:
[ 0.000000] Linux version 3.10.0-rc6 (mroos@v210) (gcc version 4.6.4 (Debian 4.6.4-2) ) #85 SMP Sun Jun 16 16:02:21 EEST 2013
[ 0.000000] debug: ignoring loglevel setting.
[ 0.000000] bootconsole [earlyprom0] enabled
[ 0.000000] ARCH: SUN4U
[ 0.000000] Ethernet address: 00:03:ba:0a:f3:85
[ 0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
[ 0.000000] Remapping the kernel... done.
[ 0.000000] OF stdout device is: /pci@1e,600000/isa@7/serial@0,3f8
[ 0.000000] PROM: Built device tree with 77761 bytes of memory.
[ 0.000000] Top of RAM: 0x10000000, Total RAM: 0x10000000
[ 0.000000] Memory hole size: 0MB
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x00000000-0x0fffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x00000000-0x0fffffff]
[ 0.000000] On node 0 totalpages: 32768
[ 0.000000] Normal zone: 256 pages used for memmap
[ 0.000000] Normal zone: 0 pages reserved
[ 0.000000] Normal zone: 32768 pages, LIFO batch:7
[ 0.000000] Booting Linux...
[ 0.000000] CPU CAPS: [flush,stbar,swap,muldiv,v9,ultra3,mul32,div32]
[ 0.000000] CPU CAPS: [v8plus,vis,vis2]
[ 0.000000] PERCPU: Embedded 6 pages/cpu @fffff8000f000000 s13440 r8192 d27520 u2097152
[ 0.000000] pcpu-alloc: s13440 r8192 d27520 u2097152 alloc=1*4194304
[ 0.000000] pcpu-alloc: [0] 0 1
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512
[ 0.000000] Kernel command line: root=/dev/sda2 ro mem=256M debug ignore_loglevel
[ 0.000000] PID hash table entries: 1024 (order: 0, 8192 bytes)
[ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 262144 bytes)
[ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 131072 bytes)
[ 0.000000] Sorting __ex_table...
[ 0.000000] Memory: 253216k available (3248k kernel code, 1032k data, 152k init) [fffff80000000000,0000000010000000]
[ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] Additional per-CPU info printed with stalls.
[ 0.000000] NR_IRQS:255
[ 0.000000] clocksource: mult[53555555] shift[24]
[ 0.000000] clockevent: mult[3126e98] shift[32]
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [tty0] enabled, bootconsole disabled
[ 33.851595] Calibrating delay using timer specific routine.. 24.00 BogoMIPS (lpj=120048)
[ 33.851610] pid_max: default: 32768 minimum: 301
[ 33.851772] Mount-cache hash table entries: 512
[ 33.854162] CPU 0: synchronized TICK with master CPU (last diff 0 cycles, maxerr 6 cycles)
[ 33.854242] Brought up 2 CPUs
[ 33.854269] Testing NMI watchdog ... OK.
[ 34.055163] NET: Registered protocol family 16
[ 34.063864] /pci@1f,700000: TOMATILLO PCI Bus Module ver[4:0]
[ 34.063883] /pci@1f,700000: PCI IO[7f601000000] MEM[7f700000000]
[ 34.065866] PCI: Scanning PBM /pci@1f,700000
[ 34.066071] schizo f0069c00: PCI host bridge to bus 0000:00
[ 34.066093] pci_bus 0000:00: root bus resource [io 0x7f601000000-0x7f601ffffff] (bus address [0x0000-0xffffff])
[ 34.066111] pci_bus 0000:00: root bus resource [mem 0x7f700000000-0x7f7ffffffff] (bus address [0x00000000-0xffffffff])
[ 34.066127] pci_bus 0000:00: root bus resource [bus 00]
[ 34.066228] pci 0000:00:02.0: PME# supported from D3hot
[ 34.066453] pci 0000:00:02.1: PME# supported from D3hot
[ 34.066764] /pci@1e,600000: TOMATILLO PCI Bus Module ver[4:0]
[ 34.066778] /pci@1e,600000: PCI IO[7fe01000000] MEM[7ff00000000]
[ 34.068753] PCI: Scanning PBM /pci@1e,600000
[ 34.068943] schizo f00732d0: PCI host bridge to bus 0001:00
[ 34.068969] pci_bus 0001:00: root bus resource [io 0x7fe01000000-0x7fe01ffffff] (bus address [0x0000-0xffffff])
[ 34.068996] pci_bus 0001:00: root bus resource [mem 0x7ff00000000-0x7ffffffffff] (bus address [0x00000000-0xffffffff])
[ 34.069022] pci_bus 0001:00: root bus resource [bus 00]
[ 34.069281] pci 0001:00:06.0: quirk: [io 0x7fe01000800-0x7fe0100083f] claimed by ali7101 ACPI
[ 34.069310] pci 0001:00:06.0: quirk: [io 0x7fe01000600-0x7fe0100061f] claimed by ali7101 SMB
[ 34.069518] pci 0001:00:0a.0: PME# supported from D3cold
[ 34.070045] /pci@1c,600000: TOMATILLO PCI Bus Module ver[4:0]
[ 34.070065] /pci@1c,600000: PCI IO[7ce01000000] MEM[7cf00000000]
[ 34.072083] PCI: Scanning PBM /pci@1c,600000
[ 34.072281] schizo f007c6ac: PCI host bridge to bus 0002:00
[ 34.072306] pci_bus 0002:00: root bus resource [io 0x7ce01000000-0x7ce01ffffff] (bus address [0x0000-0xffffff])
[ 34.072332] pci_bus 0002:00: root bus resource [mem 0x7cf00000000-0x7cfffffffff] (bus address [0x00000000-0xffffffff])
[ 34.072358] pci_bus 0002:00: root bus resource [bus 00]
[ 34.072452] pci 0002:00:02.0: supports D1 D2
[ 34.072674] pci 0002:00:02.1: supports D1 D2
[ 34.072990] /pci@1d,700000: TOMATILLO PCI Bus Module ver[4:0]
[ 34.073010] /pci@1d,700000: PCI IO[7c601000000] MEM[7c700000000]
[ 34.075005] PCI: Scanning PBM /pci@1d,700000
[ 34.075213] schizo f00859d4: PCI host bridge to bus 0003:00
[ 34.075239] pci_bus 0003:00: root bus resource [io 0x7c601000000-0x7c601ffffff] (bus address [0x0000-0xffffff])
[ 34.075266] pci_bus 0003:00: root bus resource [mem 0x7c700000000-0x7c7ffffffff] (bus address [0x00000000-0xffffffff])
[ 34.075292] pci_bus 0003:00: root bus resource [bus 00]
[ 34.075413] pci 0003:00:02.0: PME# supported from D3hot
[ 34.075676] pci 0003:00:02.1: PME# supported from D3hot
[ 34.081421] bio: create slab <bio-0> at 0
[ 34.081959] vgaarb: loaded
[ 34.082430] SCSI subsystem initialized
[ 34.083594] /pci@1e,600000/isa@7/rtc@0,70: RTC regs at 0x7fe01000070
[ 34.084869] Switching to clocksource stick
[ 34.092652] NET: Registered protocol family 2
[ 34.093064] TCP established hash table entries: 2048 (order: 2, 32768 bytes)
[ 34.093187] TCP bind hash table entries: 2048 (order: 2, 32768 bytes)
[ 34.093303] TCP: Hash tables configured (established 2048 bind 2048)
[ 34.093390] TCP: reno registered
[ 34.093407] UDP hash table entries: 256 (order: 0, 8192 bytes)
[ 34.093451] UDP-Lite hash table entries: 256 (order: 0, 8192 bytes)
[ 34.093698] NET: Registered protocol family 1
[ 34.093757] pci 0001:00:07.0: Activating ISA DMA hang workarounds
[ 34.093789] PCI: Enabling device: (0001:00:0a.0), cmd 2
[ 34.154948] PCI: CLS 64 bytes, default 64
[ 34.155142] power: Control reg at 7fe01000800
[ 34.155530] chmc: UltraSPARC-IIIi memory controller at /memory-controller@0,0
[ 34.155563] chmc: UltraSPARC-IIIi memory controller at /memory-controller@1,0
[ 34.165715] msgmni has been set to 494
[ 34.166349] io scheduler noop registered
[ 34.166523] io scheduler cfq registered (default)
[ 34.167195] f00aba6c: ttyS0 at MMIO 0x7fe010003f8 (irq = 15) is a 16550A
[ 34.167215] Console: ttyS0 (SU)
[ 42.374594] console [ttyS0] enabled
[ 42.420665] f00ad5ec: ttyS1 at MMIO 0x7fe010002e8 (irq = 15) is a 16550A
[ 42.509515] PCI: Enabling device: (0002:00:02.0), cmd 147
[ 42.581037] sym0: <1010-66> rev 0x1 at pci 0002:00:02.0 irq 24
[ 42.659821] sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
[ 42.778199] sym0: SCSI BUS has been reset.
[ 42.831989] scsi0 : sym-2.2.3
[ 45.866159] scsi 0:0:0:0: Direct-Access FUJITSU MAW3073NCSUN72G 1703 PQ: 0 ANSI: 4
[ 45.972582] scsi target0:0:0: tagged command queuing enabled, command queue depth 16.
[ 46.075599] scsi target0:0:0: Beginning Domain Validation
[ 46.152355] scsi target0:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
[ 46.358085] scsi target0:0:0: Ending Domain Validation
[ 50.552347] PCI: Enabling device: (0002:00:02.1), cmd 147
[ 50.623892] sym1: <1010-66> rev 0x1 at pci 0002:00:02.1 irq 25
[ 50.702671] sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking
[ 50.821133] sym1: SCSI BUS has been reset.
[ 50.874938] scsi1 : sym-2.2.3
[ 58.301097] mousedev: PS/2 mouse device common for all mice
[ 58.301843] sd 0:0:0:0: [sda] 143374738 512-byte logical blocks: (73.4 GB/68.3 GiB)
[ 58.304753] sd 0:0:0:0: [sda] Write Protect is off
[ 58.304759] sd 0:0:0:0: [sda] Mode Sense: c7 00 00 08
[ 58.305897] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 58.317159] sda: sda1 sda2 sda3 sda4
[ 58.321798] sd 0:0:0:0: [sda] Attached SCSI disk
[ 58.833788] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
[ 58.917383] rtc_cmos rtc_cmos: no alarms, 114 bytes nvram
[ 58.989328] TCP: cubic registered
[ 59.032880] NET: Registered protocol family 17
[ 59.091956] rtc_cmos rtc_cmos: setting system clock to 2013-06-16 13:05:00 UTC (1371387900)
[ 59.204672] EXT4-fs (sda2): couldn't mount as ext3 due to feature incompatibilities
[ 59.306482] EXT4-fs (sda2): couldn't mount as ext2 due to feature incompatibilities
[ 59.429442] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
[ 59.530217] VFS: Mounted root (ext4 filesystem) readonly on device 8:2.
[ 61.113258] pps_core: LinuxPPS API ver. 1 registered
[ 61.178571] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <[email protected]>
[ 61.299855] PTP clock support registered
[ 61.378830] tg3.c:v3.132 (May 21, 2013)
[ 61.429280] PCI: Enabling device: (0000:00:02.0), cmd 2
[ 61.679664] tg3 0000:00:02.0 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
[ 62.136797] tg3 0000:00:02.0 eth0: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:85
[ 62.272986] tg3 0000:00:02.0 eth0: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[ 62.401073] tg3 0000:00:02.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[ 62.504008] tg3 0000:00:02.0 eth0: dma_rwctrl[763f0000] dma_mask[32-bit]
[ 62.592101] PCI: Enabling device: (0000:00:02.1), cmd 2
[ 62.839311] tg3 0000:00:02.1 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
[ 63.295874] tg3 0000:00:02.1 eth1: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:86
[ 63.431987] tg3 0000:00:02.1 eth1: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[ 63.560078] tg3 0000:00:02.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[ 63.663011] tg3 0000:00:02.1 eth1: dma_rwctrl[763f0000] dma_mask[32-bit]
[ 63.751109] PCI: Enabling device: (0003:00:02.0), cmd 2
[ 63.999292] tg3 0003:00:02.0 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
[ 64.455848] tg3 0003:00:02.0 eth2: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:87
[ 64.592030] tg3 0003:00:02.0 eth2: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[ 64.720118] tg3 0003:00:02.0 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[ 64.823056] tg3 0003:00:02.0 eth2: dma_rwctrl[763f0000] dma_mask[32-bit]
[ 64.911140] PCI: Enabling device: (0003:00:02.1), cmd 2
[ 65.159296] tg3 0003:00:02.1 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
[ 65.615808] tg3 0003:00:02.1 eth3: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:88
[ 65.751967] tg3 0003:00:02.1 eth3: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[ 65.880064] tg3 0003:00:02.1 eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[ 65.982991] tg3 0003:00:02.1 eth3: dma_rwctrl[763f0000] dma_mask[32-bit]
[ 66.549023] Adding 3084472k swap on /dev/sda4. Priority:-1 extents:1 across:3084472k
[ 66.690487] EXT4-fs (sda2): re-mounted. Opts: (null)
[ 66.944966] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
[ 68.324377] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem
[ 68.447686] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
[ 69.338117] NET: Registered protocol family 10
[ 70.842065] tg3 0000:00:02.0 eth0: No firmware running
[ 71.285041] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 72.946419] tg3 0000:00:02.0 eth0: Link is up at 100 Mbps, full duplex
[ 73.039237] tg3 0000:00:02.0 eth0: Flow control is on for TX and on for RX
[ 73.146397] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 568.834221] ------------[ cut here ]------------
[ 568.894907] WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
[ 568.971594] Modules linked in: ipv6 tg3 ptp pps_core hwmon
[ 569.043635] CPU: 1 PID: 2952 Comm: aptitude Not tainted 3.10.0-rc6 #85
[ 569.129412] Call Trace:
[ 569.161440] [00000000004d811c] exit_mmap+0x13c/0x160
[ 569.227785] [000000000045680c] mmput.part.62+0xc/0xc0
[ 569.295258] [000000000045c25c] exit_mm+0x11c/0x180
[ 569.359301] [000000000045dc24] do_exit+0x244/0x340
[ 569.423354] [000000000045dea8] do_group_exit+0x28/0xc0
[ 569.491982] [0000000000469a08] get_signal_to_deliver+0x1c8/0x3a0
[ 569.572039] [0000000000447874] do_signal32+0x14/0x220
[ 569.639526] [000000000042c8e0] do_signal+0x2c0/0x520
[ 569.705854] [000000000042d340] do_notify_resume+0x40/0x60
[ 569.777907] [0000000000404b04] __handle_signal+0xc/0x2c
[ 569.847671] ---[ end trace 4acf84f71c8b5f1b ]---
[ 569.908406] BUG: Bad rss-counter state mm:fffff8000dcb6700 idx:1 val:1
[ 569.994271] [sched_delayed] sched: RT throttling activated
--
Meelis Roos ([email protected])
Hi,
On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
> Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude.
> 3.10-rc5 did not exhibit this (nor any other kernel recently tried,
> including most -rc's). Does not seem to be reproducible.
I get this regularly on Ultrasparc during long compilations. It's been
there with all recent kernels (probably at least since 3.8). Latest I
saw with 3.10-rc5.
A.
> [ 568.834221] ------------[ cut here ]------------
> [ 568.894907] WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
> [ 568.971594] Modules linked in: ipv6 tg3 ptp pps_core hwmon
> [ 569.043635] CPU: 1 PID: 2952 Comm: aptitude Not tainted 3.10.0-rc6 #85
> [ 569.129412] Call Trace:
> [ 569.161440] [00000000004d811c] exit_mmap+0x13c/0x160
> [ 569.227785] [000000000045680c] mmput.part.62+0xc/0xc0
> [ 569.295258] [000000000045c25c] exit_mm+0x11c/0x180
> [ 569.359301] [000000000045dc24] do_exit+0x244/0x340
> [ 569.423354] [000000000045dea8] do_group_exit+0x28/0xc0
> [ 569.491982] [0000000000469a08] get_signal_to_deliver+0x1c8/0x3a0
> [ 569.572039] [0000000000447874] do_signal32+0x14/0x220
> [ 569.639526] [000000000042c8e0] do_signal+0x2c0/0x520
> [ 569.705854] [000000000042d340] do_notify_resume+0x40/0x60
> [ 569.777907] [0000000000404b04] __handle_signal+0xc/0x2c
> [ 569.847671] ---[ end trace 4acf84f71c8b5f1b ]---
> [ 569.908406] BUG: Bad rss-counter state mm:fffff8000dcb6700 idx:1 val:1
>
> Full dmesg:
>
> [ 0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.a 2010/01/06 14:48'
> [ 0.000000] PROMLIB: Root node compatible:
> [ 0.000000] Linux version 3.10.0-rc6 (mroos@v210) (gcc version 4.6.4 (Debian 4.6.4-2) ) #85 SMP Sun Jun 16 16:02:21 EEST 2013
> [ 0.000000] debug: ignoring loglevel setting.
> [ 0.000000] bootconsole [earlyprom0] enabled
> [ 0.000000] ARCH: SUN4U
> [ 0.000000] Ethernet address: 00:03:ba:0a:f3:85
> [ 0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
> [ 0.000000] Remapping the kernel... done.
> [ 0.000000] OF stdout device is: /pci@1e,600000/isa@7/serial@0,3f8
> [ 0.000000] PROM: Built device tree with 77761 bytes of memory.
> [ 0.000000] Top of RAM: 0x10000000, Total RAM: 0x10000000
> [ 0.000000] Memory hole size: 0MB
> [ 0.000000] Zone ranges:
> [ 0.000000] Normal [mem 0x00000000-0x0fffffff]
> [ 0.000000] Movable zone start for each node
> [ 0.000000] Early memory node ranges
> [ 0.000000] node 0: [mem 0x00000000-0x0fffffff]
> [ 0.000000] On node 0 totalpages: 32768
> [ 0.000000] Normal zone: 256 pages used for memmap
> [ 0.000000] Normal zone: 0 pages reserved
> [ 0.000000] Normal zone: 32768 pages, LIFO batch:7
> [ 0.000000] Booting Linux...
> [ 0.000000] CPU CAPS: [flush,stbar,swap,muldiv,v9,ultra3,mul32,div32]
> [ 0.000000] CPU CAPS: [v8plus,vis,vis2]
> [ 0.000000] PERCPU: Embedded 6 pages/cpu @fffff8000f000000 s13440 r8192 d27520 u2097152
> [ 0.000000] pcpu-alloc: s13440 r8192 d27520 u2097152 alloc=1*4194304
> [ 0.000000] pcpu-alloc: [0] 0 1
> [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512
> [ 0.000000] Kernel command line: root=/dev/sda2 ro mem=256M debug ignore_loglevel
> [ 0.000000] PID hash table entries: 1024 (order: 0, 8192 bytes)
> [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 262144 bytes)
> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 131072 bytes)
> [ 0.000000] Sorting __ex_table...
> [ 0.000000] Memory: 253216k available (3248k kernel code, 1032k data, 152k init) [fffff80000000000,0000000010000000]
> [ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> [ 0.000000] Hierarchical RCU implementation.
> [ 0.000000] Additional per-CPU info printed with stalls.
> [ 0.000000] NR_IRQS:255
> [ 0.000000] clocksource: mult[53555555] shift[24]
> [ 0.000000] clockevent: mult[3126e98] shift[32]
> [ 0.000000] Console: colour dummy device 80x25
> [ 0.000000] console [tty0] enabled, bootconsole disabled
> [ 33.851595] Calibrating delay using timer specific routine.. 24.00 BogoMIPS (lpj=120048)
> [ 33.851610] pid_max: default: 32768 minimum: 301
> [ 33.851772] Mount-cache hash table entries: 512
> [ 33.854162] CPU 0: synchronized TICK with master CPU (last diff 0 cycles, maxerr 6 cycles)
> [ 33.854242] Brought up 2 CPUs
> [ 33.854269] Testing NMI watchdog ... OK.
> [ 34.055163] NET: Registered protocol family 16
> [ 34.063864] /pci@1f,700000: TOMATILLO PCI Bus Module ver[4:0]
> [ 34.063883] /pci@1f,700000: PCI IO[7f601000000] MEM[7f700000000]
> [ 34.065866] PCI: Scanning PBM /pci@1f,700000
> [ 34.066071] schizo f0069c00: PCI host bridge to bus 0000:00
> [ 34.066093] pci_bus 0000:00: root bus resource [io 0x7f601000000-0x7f601ffffff] (bus address [0x0000-0xffffff])
> [ 34.066111] pci_bus 0000:00: root bus resource [mem 0x7f700000000-0x7f7ffffffff] (bus address [0x00000000-0xffffffff])
> [ 34.066127] pci_bus 0000:00: root bus resource [bus 00]
> [ 34.066228] pci 0000:00:02.0: PME# supported from D3hot
> [ 34.066453] pci 0000:00:02.1: PME# supported from D3hot
> [ 34.066764] /pci@1e,600000: TOMATILLO PCI Bus Module ver[4:0]
> [ 34.066778] /pci@1e,600000: PCI IO[7fe01000000] MEM[7ff00000000]
> [ 34.068753] PCI: Scanning PBM /pci@1e,600000
> [ 34.068943] schizo f00732d0: PCI host bridge to bus 0001:00
> [ 34.068969] pci_bus 0001:00: root bus resource [io 0x7fe01000000-0x7fe01ffffff] (bus address [0x0000-0xffffff])
> [ 34.068996] pci_bus 0001:00: root bus resource [mem 0x7ff00000000-0x7ffffffffff] (bus address [0x00000000-0xffffffff])
> [ 34.069022] pci_bus 0001:00: root bus resource [bus 00]
> [ 34.069281] pci 0001:00:06.0: quirk: [io 0x7fe01000800-0x7fe0100083f] claimed by ali7101 ACPI
> [ 34.069310] pci 0001:00:06.0: quirk: [io 0x7fe01000600-0x7fe0100061f] claimed by ali7101 SMB
> [ 34.069518] pci 0001:00:0a.0: PME# supported from D3cold
> [ 34.070045] /pci@1c,600000: TOMATILLO PCI Bus Module ver[4:0]
> [ 34.070065] /pci@1c,600000: PCI IO[7ce01000000] MEM[7cf00000000]
> [ 34.072083] PCI: Scanning PBM /pci@1c,600000
> [ 34.072281] schizo f007c6ac: PCI host bridge to bus 0002:00
> [ 34.072306] pci_bus 0002:00: root bus resource [io 0x7ce01000000-0x7ce01ffffff] (bus address [0x0000-0xffffff])
> [ 34.072332] pci_bus 0002:00: root bus resource [mem 0x7cf00000000-0x7cfffffffff] (bus address [0x00000000-0xffffffff])
> [ 34.072358] pci_bus 0002:00: root bus resource [bus 00]
> [ 34.072452] pci 0002:00:02.0: supports D1 D2
> [ 34.072674] pci 0002:00:02.1: supports D1 D2
> [ 34.072990] /pci@1d,700000: TOMATILLO PCI Bus Module ver[4:0]
> [ 34.073010] /pci@1d,700000: PCI IO[7c601000000] MEM[7c700000000]
> [ 34.075005] PCI: Scanning PBM /pci@1d,700000
> [ 34.075213] schizo f00859d4: PCI host bridge to bus 0003:00
> [ 34.075239] pci_bus 0003:00: root bus resource [io 0x7c601000000-0x7c601ffffff] (bus address [0x0000-0xffffff])
> [ 34.075266] pci_bus 0003:00: root bus resource [mem 0x7c700000000-0x7c7ffffffff] (bus address [0x00000000-0xffffffff])
> [ 34.075292] pci_bus 0003:00: root bus resource [bus 00]
> [ 34.075413] pci 0003:00:02.0: PME# supported from D3hot
> [ 34.075676] pci 0003:00:02.1: PME# supported from D3hot
> [ 34.081421] bio: create slab <bio-0> at 0
> [ 34.081959] vgaarb: loaded
> [ 34.082430] SCSI subsystem initialized
> [ 34.083594] /pci@1e,600000/isa@7/rtc@0,70: RTC regs at 0x7fe01000070
> [ 34.084869] Switching to clocksource stick
> [ 34.092652] NET: Registered protocol family 2
> [ 34.093064] TCP established hash table entries: 2048 (order: 2, 32768 bytes)
> [ 34.093187] TCP bind hash table entries: 2048 (order: 2, 32768 bytes)
> [ 34.093303] TCP: Hash tables configured (established 2048 bind 2048)
> [ 34.093390] TCP: reno registered
> [ 34.093407] UDP hash table entries: 256 (order: 0, 8192 bytes)
> [ 34.093451] UDP-Lite hash table entries: 256 (order: 0, 8192 bytes)
> [ 34.093698] NET: Registered protocol family 1
> [ 34.093757] pci 0001:00:07.0: Activating ISA DMA hang workarounds
> [ 34.093789] PCI: Enabling device: (0001:00:0a.0), cmd 2
> [ 34.154948] PCI: CLS 64 bytes, default 64
> [ 34.155142] power: Control reg at 7fe01000800
> [ 34.155530] chmc: UltraSPARC-IIIi memory controller at /memory-controller@0,0
> [ 34.155563] chmc: UltraSPARC-IIIi memory controller at /memory-controller@1,0
> [ 34.165715] msgmni has been set to 494
> [ 34.166349] io scheduler noop registered
> [ 34.166523] io scheduler cfq registered (default)
> [ 34.167195] f00aba6c: ttyS0 at MMIO 0x7fe010003f8 (irq = 15) is a 16550A
> [ 34.167215] Console: ttyS0 (SU)
> [ 42.374594] console [ttyS0] enabled
> [ 42.420665] f00ad5ec: ttyS1 at MMIO 0x7fe010002e8 (irq = 15) is a 16550A
> [ 42.509515] PCI: Enabling device: (0002:00:02.0), cmd 147
> [ 42.581037] sym0: <1010-66> rev 0x1 at pci 0002:00:02.0 irq 24
> [ 42.659821] sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
> [ 42.778199] sym0: SCSI BUS has been reset.
> [ 42.831989] scsi0 : sym-2.2.3
> [ 45.866159] scsi 0:0:0:0: Direct-Access FUJITSU MAW3073NCSUN72G 1703 PQ: 0 ANSI: 4
> [ 45.972582] scsi target0:0:0: tagged command queuing enabled, command queue depth 16.
> [ 46.075599] scsi target0:0:0: Beginning Domain Validation
> [ 46.152355] scsi target0:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
> [ 46.358085] scsi target0:0:0: Ending Domain Validation
> [ 50.552347] PCI: Enabling device: (0002:00:02.1), cmd 147
> [ 50.623892] sym1: <1010-66> rev 0x1 at pci 0002:00:02.1 irq 25
> [ 50.702671] sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking
> [ 50.821133] sym1: SCSI BUS has been reset.
> [ 50.874938] scsi1 : sym-2.2.3
> [ 58.301097] mousedev: PS/2 mouse device common for all mice
> [ 58.301843] sd 0:0:0:0: [sda] 143374738 512-byte logical blocks: (73.4 GB/68.3 GiB)
> [ 58.304753] sd 0:0:0:0: [sda] Write Protect is off
> [ 58.304759] sd 0:0:0:0: [sda] Mode Sense: c7 00 00 08
> [ 58.305897] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
> [ 58.317159] sda: sda1 sda2 sda3 sda4
> [ 58.321798] sd 0:0:0:0: [sda] Attached SCSI disk
> [ 58.833788] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
> [ 58.917383] rtc_cmos rtc_cmos: no alarms, 114 bytes nvram
> [ 58.989328] TCP: cubic registered
> [ 59.032880] NET: Registered protocol family 17
> [ 59.091956] rtc_cmos rtc_cmos: setting system clock to 2013-06-16 13:05:00 UTC (1371387900)
> [ 59.204672] EXT4-fs (sda2): couldn't mount as ext3 due to feature incompatibilities
> [ 59.306482] EXT4-fs (sda2): couldn't mount as ext2 due to feature incompatibilities
> [ 59.429442] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
> [ 59.530217] VFS: Mounted root (ext4 filesystem) readonly on device 8:2.
> [ 61.113258] pps_core: LinuxPPS API ver. 1 registered
> [ 61.178571] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <[email protected]>
> [ 61.299855] PTP clock support registered
> [ 61.378830] tg3.c:v3.132 (May 21, 2013)
> [ 61.429280] PCI: Enabling device: (0000:00:02.0), cmd 2
> [ 61.679664] tg3 0000:00:02.0 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
> [ 62.136797] tg3 0000:00:02.0 eth0: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:85
> [ 62.272986] tg3 0000:00:02.0 eth0: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> [ 62.401073] tg3 0000:00:02.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> [ 62.504008] tg3 0000:00:02.0 eth0: dma_rwctrl[763f0000] dma_mask[32-bit]
> [ 62.592101] PCI: Enabling device: (0000:00:02.1), cmd 2
> [ 62.839311] tg3 0000:00:02.1 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
> [ 63.295874] tg3 0000:00:02.1 eth1: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:86
> [ 63.431987] tg3 0000:00:02.1 eth1: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> [ 63.560078] tg3 0000:00:02.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> [ 63.663011] tg3 0000:00:02.1 eth1: dma_rwctrl[763f0000] dma_mask[32-bit]
> [ 63.751109] PCI: Enabling device: (0003:00:02.0), cmd 2
> [ 63.999292] tg3 0003:00:02.0 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
> [ 64.455848] tg3 0003:00:02.0 eth2: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:87
> [ 64.592030] tg3 0003:00:02.0 eth2: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> [ 64.720118] tg3 0003:00:02.0 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> [ 64.823056] tg3 0003:00:02.0 eth2: dma_rwctrl[763f0000] dma_mask[32-bit]
> [ 64.911140] PCI: Enabling device: (0003:00:02.1), cmd 2
> [ 65.159296] tg3 0003:00:02.1 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
> [ 65.615808] tg3 0003:00:02.1 eth3: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:88
> [ 65.751967] tg3 0003:00:02.1 eth3: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> [ 65.880064] tg3 0003:00:02.1 eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> [ 65.982991] tg3 0003:00:02.1 eth3: dma_rwctrl[763f0000] dma_mask[32-bit]
> [ 66.549023] Adding 3084472k swap on /dev/sda4. Priority:-1 extents:1 across:3084472k
> [ 66.690487] EXT4-fs (sda2): re-mounted. Opts: (null)
> [ 66.944966] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
> [ 68.324377] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem
> [ 68.447686] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
> [ 69.338117] NET: Registered protocol family 10
> [ 70.842065] tg3 0000:00:02.0 eth0: No firmware running
> [ 71.285041] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [ 72.946419] tg3 0000:00:02.0 eth0: Link is up at 100 Mbps, full duplex
> [ 73.039237] tg3 0000:00:02.0 eth0: Flow control is on for TX and on for RX
> [ 73.146397] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [ 568.834221] ------------[ cut here ]------------
> [ 568.894907] WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
> [ 568.971594] Modules linked in: ipv6 tg3 ptp pps_core hwmon
> [ 569.043635] CPU: 1 PID: 2952 Comm: aptitude Not tainted 3.10.0-rc6 #85
> [ 569.129412] Call Trace:
> [ 569.161440] [00000000004d811c] exit_mmap+0x13c/0x160
> [ 569.227785] [000000000045680c] mmput.part.62+0xc/0xc0
> [ 569.295258] [000000000045c25c] exit_mm+0x11c/0x180
> [ 569.359301] [000000000045dc24] do_exit+0x244/0x340
> [ 569.423354] [000000000045dea8] do_group_exit+0x28/0xc0
> [ 569.491982] [0000000000469a08] get_signal_to_deliver+0x1c8/0x3a0
> [ 569.572039] [0000000000447874] do_signal32+0x14/0x220
> [ 569.639526] [000000000042c8e0] do_signal+0x2c0/0x520
> [ 569.705854] [000000000042d340] do_notify_resume+0x40/0x60
> [ 569.777907] [0000000000404b04] __handle_signal+0xc/0x2c
> [ 569.847671] ---[ end trace 4acf84f71c8b5f1b ]---
> [ 569.908406] BUG: Bad rss-counter state mm:fffff8000dcb6700 idx:1 val:1
> [ 569.994271] [sched_delayed] sched: RT throttling activated
>
>
> --
> Meelis Roos ([email protected])
> --
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi,
On Mon, Jun 17, 2013 at 08:32:25AM +0300, Aaro Koskinen wrote:
> On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
> > Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude.
> > 3.10-rc5 did not exhibit this (nor any other kernel recently tried,
> > including most -rc's). Does not seem to be reproducible.
>
> I get this regularly on Ultrasparc during long compilations. It's been
> there with all recent kernels (probably at least since 3.8). Latest I
> saw with 3.10-rc5.
Two examples:
[ 417.006586] ------------[ cut here ]------------
[ 417.065813] WARNING: at /home/aaro/los/work/shared/linux-v3.10-rc5/mm/mmap.c:2757 exit_mmap+0x134/0x160()
[ 417.189209] Modules linked in:
[ 417.229203] CPU: 0 PID: 1787 Comm: ld Not tainted 3.10.0-rc5-ultra #1
[ 417.310031] Call Trace:
[ 417.342875] [00000000004b5ef4] exit_mmap+0x134/0x160
[ 417.406941] [000000000044ef40] mmput+0x40/0xe0
[ 417.464591] [0000000000454b38] do_exit+0x1b8/0x800
[ 417.526429] [0000000000455d6c] do_group_exit+0x2c/0xa0
[ 417.592383] [0000000000455df4] SyS_exit_group+0x14/0x20
[ 417.659257] [0000000000406074] linux_sparc_syscall32+0x34/0x40
[ 417.733400] ---[ end trace b92a93fbf6d0204a ]---
[ 417.791913] BUG: Bad rss-counter state mm:fffff8001ebbb740 idx:1 val:1
[ 1674.164634] ------------[ cut here ]------------
[ 1674.218933] WARNING: at /home/aaro/los/work/shared/linux-v3.10-rc5/mm/mmap.c:2757 exit_mmap+0x134/0x160()
[ 1674.333505] Modules linked in:
[ 1674.369872] CPU: 0 PID: 26306 Comm: date Not tainted 3.10.0-rc5-ultra #1
[ 1674.450075] Call Trace:
[ 1674.479245] [00000000004b5ef4] exit_mmap+0x134/0x160
[ 1674.539661] [000000000044ef40] mmput+0x40/0xe0
[ 1674.593827] [0000000000454b38] do_exit+0x1b8/0x800
[ 1674.652140] [0000000000455d6c] do_group_exit+0x2c/0xa0
[ 1674.714630] [0000000000455df4] SyS_exit_group+0x14/0x20
[ 1674.778172] [0000000000406074] linux_sparc_syscall32+0x34/0x40
[ 1674.848993] ---[ end trace 77928f0ca6684101 ]---
[ 1674.904199] BUG: Bad rss-counter state mm:fffff8000f0c9d40 idx:0 val:1
A.
From: Aaro Koskinen <[email protected]>
Date: Mon, 17 Jun 2013 08:58:39 +0300
> On Mon, Jun 17, 2013 at 08:32:25AM +0300, Aaro Koskinen wrote:
>> On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
>> > Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude.
>> > 3.10-rc5 did not exhibit this (nor any other kernel recently tried,
>> > including most -rc's). Does not seem to be reproducible.
>>
>> I get this regularly on Ultrasparc during long compilations. It's been
>> there with all recent kernels (probably at least since 3.8). Latest I
>> saw with 3.10-rc5.
>
> Two examples:
Thanks for the reports, I'm actively looking into this.
Hi,
On Sat, Aug 03, 2013 at 01:40:42PM -0700, David Miller wrote:
> > On Mon, Jun 17, 2013 at 08:32:25AM +0300, Aaro Koskinen wrote:
> >> On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
> >> > Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude.
> >> > 3.10-rc5 did not exhibit this (nor any other kernel recently tried,
> >> > including most -rc's). Does not seem to be reproducible.
> >>
> >> I get this regularly on Ultrasparc during long compilations. It's been
> >> there with all recent kernels (probably at least since 3.8). Latest I
> >> saw with 3.10-rc5.
> >
> > Two examples:
>
> Thanks for the reports, I'm actively looking into this.
Got this again with 3.12-rc5 while doing GCC 4.8.2 bootstrap (during
make check phase):
[83998.025998] ------------[ cut here ]------------
[83998.080312] WARNING: CPU: 0 PID: 3983 at /home/aaro/los/work/shared/linux-v3.12-rc5/mm/mmap.c:2729 exit_mmap+0x138/0x160()
[83998.212541] Modules linked in:
[83998.248987] CPU: 0 PID: 3983 Comm: expect Not tainted 3.12.0-rc5-ultra-los.git-d7b26d7-dirty #1
[83998.353171] Call Trace:
[83998.382310] [00000000004b79d8] exit_mmap+0x138/0x160
[83998.442723] [00000000004503cc] mmput+0x2c/0xc0
[83998.496885] [00000000004cf338] flush_old_exec+0x418/0x520
[83998.562508] [000000000051294c] load_elf_binary+0x20c/0x1660
[83998.630194] [00000000004ce9f8] search_binary_handler+0x78/0x200
[83998.702061] [00000000004cfc1c] do_execve_common.isra.48+0x3dc/0x500
[83998.778105] [00000000004cffcc] compat_sys_execve+0x2c/0x60
[83998.844758] [0000000000406074] linux_sparc_syscall32+0x34/0x40
[83998.915565] ---[ end trace 1f66da8de6eddeb8 ]---
[83998.970773] BUG: Bad rss-counter state mm:fffff80019c4c000 idx:1 val:512
[85190.371621] ld[17707]: segfault at 58 ip 00000000f7d0d164 (rpc 00000000f7d0cf90) sp 00000000ffc36a20 error 30001 in libbfd-2.23.2.so[f7cc8000+ba000]
The box didn't die yet... Let's hope it will let GCC testsuite to finish.
A.
From: Aaro Koskinen <[email protected]>
Date: Tue, 22 Oct 2013 20:46:12 +0300
> Hi,
>
> On Sat, Aug 03, 2013 at 01:40:42PM -0700, David Miller wrote:
>> > On Mon, Jun 17, 2013 at 08:32:25AM +0300, Aaro Koskinen wrote:
>> >> On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
>> >> > Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude.
>> >> > 3.10-rc5 did not exhibit this (nor any other kernel recently tried,
>> >> > including most -rc's). Does not seem to be reproducible.
>> >>
>> >> I get this regularly on Ultrasparc during long compilations. It's been
>> >> there with all recent kernels (probably at least since 3.8). Latest I
>> >> saw with 3.10-rc5.
>> >
>> > Two examples:
>>
>> Thanks for the reports, I'm actively looking into this.
>
> Got this again with 3.12-rc5 while doing GCC 4.8.2 bootstrap (during
> make check phase):
...
> The box didn't die yet... Let's hope it will let GCC testsuite to finish.
Thanks for reporting this again, it's in my long TODO list to look into
still.
Hi,
Just for the archives, I got one of these again with 3.14:
[68674.536190] ------------[ cut here ]------------
[68674.590467] WARNING: CPU: 0 PID: 14600 at /home/aaro/los/work/shared/linux-v3.14/mm/mmap.c:2738 exit_mmap+0x138/0x160()
[68674.719635] Modules linked in:
[68674.756022] CPU: 0 PID: 14600 Comm: rm Not tainted 3.14.0-ultra-los_0a2b #1
[68674.839349] Call Trace:
[68674.868507] [00000000004b9c78] exit_mmap+0x138/0x160
[68674.928931] [00000000004503cc] mmput+0x2c/0xc0
[68674.983103] [0000000000452e98] do_exit+0x1b8/0x800
[68675.041409] [000000000045406c] do_group_exit+0x2c/0xa0
[68675.103897] [00000000004540f4] SyS_exit_group+0x14/0x20
[68675.167439] [0000000000406074] linux_sparc_syscall32+0x34/0x60
[68675.238258] ---[ end trace 8a52741fbdb89d8e ]---
[68675.293470] BUG: Bad rss-counter state mm:ffffff001df3d900 idx:1 val:1
A.
From: Aaro Koskinen <[email protected]>
Date: Mon, 14 Apr 2014 21:43:53 +0300
> Just for the archives, I got one of these again with 3.14:
>
> [68674.536190] ------------[ cut here ]------------
> [68674.590467] WARNING: CPU: 0 PID: 14600 at /home/aaro/los/work/shared/linux-v3.14/mm/mmap.c:2738 exit_mmap+0x138/0x160()
> [68674.719635] Modules linked in:
> [68674.756022] CPU: 0 PID: 14600 Comm: rm Not tainted 3.14.0-ultra-los_0a2b #1
> [68674.839349] Call Trace:
> [68674.868507] [00000000004b9c78] exit_mmap+0x138/0x160
> [68674.928931] [00000000004503cc] mmput+0x2c/0xc0
> [68674.983103] [0000000000452e98] do_exit+0x1b8/0x800
> [68675.041409] [000000000045406c] do_group_exit+0x2c/0xa0
> [68675.103897] [00000000004540f4] SyS_exit_group+0x14/0x20
> [68675.167439] [0000000000406074] linux_sparc_syscall32+0x34/0x60
> [68675.238258] ---[ end trace 8a52741fbdb89d8e ]---
> [68675.293470] BUG: Bad rss-counter state mm:ffffff001df3d900 idx:1 val:1
Yes, I have reports of this going back several releases and I started trying to figure
out what causes this.
I suspect there is something that runs during exit_mmap() that indirectly faults in
new pages, and that's how the rss-counter ends up being non-zero at the end of
exit_mmap().
I'll let you know if I figure out exactly what the problem is.
From: Aaro Koskinen <[email protected]>
Date: Mon, 14 Apr 2014 21:43:53 +0300
> Just for the archives, I got one of these again with 3.14:
Meelis and Aaro, thanks again for all of your reports.
After pouring over a lot of the data and auditing some code I'm
suspecting it's a problem with transparent huge pages.
One thing you two can do to help me further confirm this is to run
with THP disabled for a while and see if you still get the log
messages.
Simply, as root:
bash# echo "never" >/sys/kernel/mm/transparent_hugepage/enabled
And then do your gcc bootstraps or whatever else seems to usually
run when you trigger this problem.
Thanks!
From: [email protected]
Date: Thu, 17 Apr 2014 01:22:17 +0300 (EEST)
>> > Just for the archives, I got one of these again with 3.14:
>>
>> Meelis and Aaro, thanks again for all of your reports.
>>
>> After pouring over a lot of the data and auditing some code I'm
>> suspecting it's a problem with transparent huge pages.
>>
>> One thing you two can do to help me further confirm this is to run
>> with THP disabled for a while and see if you still get the log
>> messages.
>
> I have snice turned off CONFIG_TRANSPARENT_HUGEPAGE on 3 of 4 servers
> that had this problem (actually most of my sparc64 machines) and the 4th
> has
...
> and also has not had this problem since then. All 4 machines have been
> running through most -rc's of every kernel.
Thanks this is a very useful datapoint.
> > Just for the archives, I got one of these again with 3.14:
>
> Meelis and Aaro, thanks again for all of your reports.
>
> After pouring over a lot of the data and auditing some code I'm
> suspecting it's a problem with transparent huge pages.
>
> One thing you two can do to help me further confirm this is to run
> with THP disabled for a while and see if you still get the log
> messages.
I have snice turned off CONFIG_TRANSPARENT_HUGEPAGE on 3 of 4 servers
that had this problem (actually most of my sparc64 machines) and the 4th
has
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not se
and also has not had this problem since then. All 4 machines have been
running through most -rc's of every kernel.
--
Meelis Roos <[email protected]>
Hi,
On Wed, Apr 16, 2014 at 02:58:22PM -0400, David Miller wrote:
> One thing you two can do to help me further confirm this is to run
> with THP disabled for a while and see if you still get the log
> messages.
>
> Simply, as root:
>
> bash# echo "never" >/sys/kernel/mm/transparent_hugepage/enabled
>
> And then do your gcc bootstraps or whatever else seems to usually
> run when you trigger this problem.
I'm running my Ultras with "# CONFIG_TRANSPARENT_HUGEPAGE is not set"
and I still see the issue.
I tried reproducing the bug with function tracer running. It works
but reproducing the bug takes several days... This time it was "expect"
segfault during GCC testsuite that triggered the bug.
For the test I added tracing_off() after the "Bad rss-counter
state" printout. Now I see it should be done maybe earlier as warning/bug
printouts are polluting the trace.
Anyway, the results are here:
http://www.iki.fi/aaro/junk/linux-3.14-sparc-mm-bug-trace.txt
http://www.iki.fi/aaro/junk/linux-3.14-sparc-mm-bug-trace-dmesg.txt
A.
From: Aaro Koskinen <[email protected]>
Date: Fri, 25 Apr 2014 23:09:08 +0300
> On Wed, Apr 16, 2014 at 02:58:22PM -0400, David Miller wrote:
>> One thing you two can do to help me further confirm this is to run
>> with THP disabled for a while and see if you still get the log
>> messages.
>>
>> Simply, as root:
>>
>> bash# echo "never" >/sys/kernel/mm/transparent_hugepage/enabled
>>
>> And then do your gcc bootstraps or whatever else seems to usually
>> run when you trigger this problem.
>
> I'm running my Ultras with "# CONFIG_TRANSPARENT_HUGEPAGE is not set"
> and I still see the issue.
Thanks, that's an important datapoint.
> I tried reproducing the bug with function tracer running. It works
> but reproducing the bug takes several days... This time it was "expect"
> segfault during GCC testsuite that triggered the bug.
>
> For the test I added tracing_off() after the "Bad rss-counter
> state" printout. Now I see it should be done maybe earlier as warning/bug
> printouts are polluting the trace.
>
> Anyway, the results are here:
>
> http://www.iki.fi/aaro/junk/linux-3.14-sparc-mm-bug-trace.txt
> http://www.iki.fi/aaro/junk/linux-3.14-sparc-mm-bug-trace-dmesg.txt
Thanks a lot for doing this, I'll take a look.
> > This is todays fresh git with 3.15.0-rc6-00190-g1ee1cea on V210, THP
> > enabled & always on. Got this and a segfault on apt-spawned xz.
>
> Thanks a lot for the report.
>
> I've been bogged down with other things but I will come back to
> this stuff soon.
Just to document a strangeness that does not seem to fit the pattern of
UltraSparc III era split:
V210 with USIII family CPUs is still problematic (can not survive
repetituos local git clone, hangs (watchdog detects hang) with
filesystem corruption. This still holds for 3.15 release.
On the other hand, E420R with 4 USII CPUs does not hang and works 100%
stable with git clones etc. Very similar E220R with different config
hangs on that load.
Maybe related to some other options in kernel configs - each config is
unique and they vary intentionally.
--
Meelis Roos ([email protected])
From: [email protected]
Date: Thu, 17 Apr 2014 01:22:17 +0300 (EEST)
>> > Just for the archives, I got one of these again with 3.14:
>>
>> Meelis and Aaro, thanks again for all of your reports.
>>
>> After pouring over a lot of the data and auditing some code I'm
>> suspecting it's a problem with transparent huge pages.
>>
>> One thing you two can do to help me further confirm this is to run
>> with THP disabled for a while and see if you still get the log
>> messages.
>
> I have snice turned off CONFIG_TRANSPARENT_HUGEPAGE on 3 of 4 servers
> that had this problem (actually most of my sparc64 machines) and the 4th
> has
>
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> CONFIG_TRANSPARENT_HUGEPAGE=y
> # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
> CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
> # CONFIG_HUGETLBFS is not set
> # CONFIG_HUGETLB_PAGE is not se
>
> and also has not had this problem since then. All 4 machines have been
> running through most -rc's of every kernel.
Here is something I'd like you guys to test.
Yesterday, Christopher (CC:'d), posted some fixes yesterday and one of
them is very interesting.
Basically the update_mmu_cache() methods on sparc64 can insert an
invalid PTE into the TSB hash tables, causing livelocks and other
annoying issues.
The path where this can happen is via remove_migration_pte().
I had a discussion with Johannes Weiner about this and we determined
that it would make sense to mis-diagnose THP as being the root cause
in the RSS counter et al. problems if this bug here is the real
reason those things are happening.
That's because if you're not using THP there is less compaction going
on. Less compaction means less migration, and therefore a lower
likelyhood of this code path triggering like this.
Could you guys please try this patch below? Thanks.
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 16b58ff..8e894e0 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -351,6 +351,10 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
mm = vma->vm_mm;
+ /* Don't insert a non-valid PTE into the TSB, we'll deadlock. */
+ if (!pte_accessible(mm, pte))
+ return;
+
spin_lock_irqsave(&mm->context.lock, flags);
#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
@@ -2617,6 +2621,10 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
if (!pmd_large(entry) || !pmd_young(entry))
return;
+ /* Don't insert a non-valid PMD into the TSB, we'll deadlock. */
+ if (!(pte & _PAGE_VALID))
+ return;
+
pte = pmd_val(entry);
/* We are fabricating 8MB pages using 4MB real hw pages. */
From: Meelis Roos <[email protected]>
Date: Thu, 31 Jul 2014 01:02:53 +0300 (EEST)
>> Here is something I'd like you guys to test.
>
> Very interesting.
>
> [...]
>> Could you guys please try this patch below? Thanks.
>
> CC arch/sparc/mm/init_64.o
> arch/sparc/mm/init_64.c: In function 'update_mmu_cache_pmd':
> arch/sparc/mm/init_64.c:2625:6: error: 'pte' may be used uninitialized in this function [-Werror=uninitialized]
>
> gcc 4.6.4.
I'm very disappointed that gcc-4.6.3 didn't say anything to me about
this :-)
Here is a fixed patch, thanks.
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 16b58ff..db5ddde 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -351,6 +351,10 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
mm = vma->vm_mm;
+ /* Don't insert a non-valid PTE into the TSB, we'll deadlock. */
+ if (!pte_accessible(mm, pte))
+ return;
+
spin_lock_irqsave(&mm->context.lock, flags);
#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
@@ -2619,6 +2623,10 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
pte = pmd_val(entry);
+ /* Don't insert a non-valid PMD into the TSB, we'll deadlock. */
+ if (!(pte & _PAGE_VALID))
+ return;
+
/* We are fabricating 8MB pages using 4MB real hw pages. */
pte |= (addr & (1UL << REAL_HPAGE_SHIFT));
> Here is something I'd like you guys to test.
Very interesting.
[...]
> Could you guys please try this patch below? Thanks.
CC arch/sparc/mm/init_64.o
arch/sparc/mm/init_64.c: In function 'update_mmu_cache_pmd':
arch/sparc/mm/init_64.c:2625:6: error: 'pte' may be used uninitialized in this function [-Werror=uninitialized]
gcc 4.6.4.
--
Meelis Roos ([email protected])