2015-11-15 22:28:41

by Kyle Sanderson

[permalink] [raw]
Subject: BUG: unable to handle kernel paging request at ffffe8ff7fc00001

[290371.835867] BUG: unable to handle kernel paging request at ffffe8ff7fc00001
[290371.835891] IP: [<ffffffff810a174f>] kstat_irqs+0x4f/0x90
[290371.835912] PGD 800000000172f063 PUD 0
[290371.835929] Oops: 0000 [#1] PREEMPT SMP
[290371.835950] Modules linked in: xt_hashlimit ts_kmp xt_set ts_bm
xt_string xt_length ipt_REJECT nf_reject_ipv4 xt_recent xt_tcpudp nf
_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack ip_set_hash_ip
ip_set_hash_net ip_set nfnetlink iptable_filter ip_tables x_tabl
es ipv6 joydev
binfmt_misc x86_pkg_temp_thermal coretemp kvm_intel igb kvm ioatdma
i2c_algo_bit i2c_i801 i2c_core pcspkr xhci_pci dca rt
c_cmos processor thermal_sys
button xts gf128mul aes_x86_64 cbc sha256_generic libiscsi
scsi_transport_iscsi tg3 ptp pps_core libphy e10
00 fuse nfs lockd grace sunrpc jfs
multipath linear raid10 raid456 async_raid6_recov async_memcpy
async_pq async_xor xor async_tx raid6_
pq raid1 raid0 dm_snapshot dm_bufio dm_crypt
dm_mirror dm_region_hash dm_log dm_mod hid_sunplus hid_sony led_class
hid_samsung hid_pl
[290371.836382] hid_petalynx hid_monterey hid_microsoft hid_logitech
hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belk
in hid_apple
hid_a4tech sl811_hcd usbhid xhci_hcd ohci_hcd uhci_hcd usb_storage
ehci_pci ehci_hcd usbcore usb_common aic94xx libsas lpfc
crc_t10dif crct10dif_common
qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8
DAC960 cciss 3w_9xxx 3w_xxxx mptsas scs
i_transport_sas mptfc scsi_transport_fc
mptspi mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d
sym53c8xx gdth advansys init
io BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg
pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor
sata_vsc sata_u
li sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil
sata_promise pata_sl82c105 pata_via pata_jmicron pata_marvell pata_si
s
[290371.836962] pata_netcell pata_pdc202xx_old pata_triflex
pata_atiixp pata_opti pata_amd pata_ali pata_it8213 pata_pcmcia pcmcia
pcmc ia_core
pata_ns87415 pata_ns87410 pata_serverworks pata_artop pata_it821x
pata_optidma pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366
pata_cmd64x pata_efar
pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix libata
[290371.841239] CPU: 2 PID: 1078 Comm: usage.pl Not tainted 4.1.7-hardened-r1 #1
[290371.841274] Hardware name: Supermicro Super Server/X10SRi-F, BIOS
1.0b 04/21/2015
[290371.841310] task: ffff88058394c9b0 ti: ffff88058394cf70 task.ti:
ffff88058394cf70
[290371.841362] RIP: 0010:[<ffffffff810a174f>] [<ffffffff810a174f>]
kstat_irqs+0x4f/0x90
[290371.841402] RSP: 0018:ffff8801020f3c28 EFLAGS: 00010293
[290371.841433] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffff88087fc00000
[290371.841468] RDX: 0000000000000000 RSI: 000060f700000001 RDI:
ffffffff81a312a0
[290371.841502] RBP: ffff8801020f3c48 R08: ffffffffffffffff R09:
0000000000000000
[290371.841537] R10: 0000000000000000 R11: 0000000000000000 R12:
ffffffff81a312a0
[290371.841572] R13: ffff880855adf600 R14: 00000000094c1b2d R15:
ffff880857526b00
[290371.841607] FS: 00007f4f65348700(0000) GS:ffff88087fc80000(0000)
knlGS:0000000000000000
[290371.841644] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[290371.841676] CR2: ffffe8ff7fc00001 CR3: 0000000857750000 CR4:
00000000001606f0
[290371.841711] Stack:
[290371.841734] ffff8801020f3c48 0000000000000029 000000000000c580
0000000000000005
[290371.841781] ffff8801020f3c68 ffffffff810a17ae 00000000ffffffff
000000000000002a
[290371.841829] ffff8801020f3d78 ffffffff811f1cea ffff8801020f3d28
ffffffff8118e621
[290371.841876] Call Trace:
[290371.841902] [<ffffffff810a17ae>] kstat_irqs_usr+0x1e/0x40
[290371.841936] [<ffffffff811f1cea>] show_stat+0x5ca/0x690
[290371.841968] [<ffffffff8118e621>] ? do_last+0x151/0x13b0
[290371.842000] [<ffffffff811a5c2d>] seq_read+0xcd/0x3b0
[290371.842031] [<ffffffff811e8cc3>] proc_reg_read+0x43/0x70
[290371.842063] [<ffffffff8117e6a3>] __vfs_read+0x23/0xd0
[290371.842096] [<ffffffff8107d091>] ? get_parent_ip+0x11/0x50
[290371.842128] [<ffffffff8107d125>] ? preempt_count_add+0x55/0xb0
[290371.842161] [<ffffffff8117ed90>] vfs_read+0xc0/0x1d0
[290371.842191] [<ffffffff8117fc21>] SyS_read+0x41/0xb0
[290371.842224] [<ffffffff8171ad9f>] system_call_fastpath+0x16/0x72
[290371.842256] Code: 83 78 48 00 74 56 4c 8b 25 7f c4 6a 00 31 db ba
ff ff ff ff eb 18 66 0f 1f 44 00 00 48 63 c8 49 8b 75 48 48 8b 0c
cd 80 54 8d 81
<03> 1c 0e 83 c2 01 be 08 00 00 00 48 63 d2 4c 89 e7 e8 9b fd 44
[290371.842567] RIP [<ffffffff810a174f>] kstat_irqs+0x4f/0x90
[290371.842602] RSP <ffff8801020f3c28>
[290371.842629] CR2: ffffe8ff7fc00001
[290371.843126] ---[ end trace 65aad936a2936575 ]---


ncurses and friends seem to be wrecked. Any tips would be appreciated.
System is still live.

Thanks,
Kyle.


2015-11-16 00:59:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel paging request at ffffe8ff7fc00001

On Sun, Nov 15, 2015 at 2:28 PM, Kyle Sanderson <[email protected]> wrote:
> [] BUG: unable to handle kernel paging request at ffffe8ff7fc00001
> [] IP: [<ffffffff810a174f>] kstat_irqs+0x4f/0x90
> [] CPU: 2 PID: 1078 Comm: usage.pl Not tainted 4.1.7-hardened-r1 #1
> [] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015
RSI: 000060f700000001
> [] Call Trace:
> [] [<>] kstat_irqs_usr+0x1e/0x40
> [] [<>] show_stat+0x5ca/0x690
> [] [<>] seq_read+0xcd/0x3b0
> [] [<>] proc_reg_read+0x43/0x70
> [] [<>] __vfs_read+0x23/0xd0
> [] [<>] vfs_read+0xc0/0x1d0
> [] [<>] SyS_read+0x41/0xb0
> [] [<>] system_call_fastpath+0x16/0x72
> [] Code: 83 78 48 00 74 56 4c 8b ...

The code ends up being

mov 0x48(%r13),%rsi
mov __per_cpu_offset(,%rcx,8),%rcx
add (%rsi,%rcx,1),%ebx <-- trapping instruction

which is just the

sum += *per_cpu_ptr(desc->kstat_irqs, cpu);

part of kstat_irqs().

Your registers being

RSI: 000060f700000001
RCX: ffff88087fc00000

and it's RSI that makes no sense - RCX looks like a real kernel
pointer. So it looks like it's the "desc->kstat_irqs" thing that is
for some reason garbage.

I don't see any sane possible reason this would happen, though.
Thomas, does this look like anything you've seen before?

Linus

2015-11-16 08:54:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel paging request at ffffe8ff7fc00001

On Sun, 15 Nov 2015, Linus Torvalds wrote:
> On Sun, Nov 15, 2015 at 2:28 PM, Kyle Sanderson <[email protected]> wrote:
> > [] BUG: unable to handle kernel paging request at ffffe8ff7fc00001
> > [] IP: [<ffffffff810a174f>] kstat_irqs+0x4f/0x90
> > [] CPU: 2 PID: 1078 Comm: usage.pl Not tainted 4.1.7-hardened-r1 #1
> > [] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015
> RSI: 000060f700000001
> > [] Call Trace:
> > [] [<>] kstat_irqs_usr+0x1e/0x40

> The code ends up being
>
> mov 0x48(%r13),%rsi
> mov __per_cpu_offset(,%rcx,8),%rcx
> add (%rsi,%rcx,1),%ebx <-- trapping instruction
>
> which is just the
>
> sum += *per_cpu_ptr(desc->kstat_irqs, cpu);
>
> part of kstat_irqs().
>
> Your registers being
>
> RSI: 000060f700000001
> RCX: ffff88087fc00000
>
> and it's RSI that makes no sense - RCX looks like a real kernel
> pointer. So it looks like it's the "desc->kstat_irqs" thing that is
> for some reason garbage.
>
> I don't see any sane possible reason this would happen, though.
> Thomas, does this look like anything you've seen before?

No. What's strange is that this does explode while reading
/proc/interrupts and it did not happen when interrupt accounting took
place.

Though this looks like memory corruption and it might be an interrupt
which fired only at boot time, i.e. before the corruption happened.

No idea how to decode that. Kyle, is that reproducible?

Thanks,

tglx

2015-11-17 05:23:00

by Kyle Sanderson

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel paging request at ffffe8ff7fc00001

Looks like massive corruption, so the oops above probably isn't
anywhere near the cause. NFS was bouncing a bit, dmesg below...

I've downgraded back to 3.14.56, will see if it continues.

[523529.636418] nfs: server ftpback-bhs1-9.ip-198-100-151.net OK
[532541.586437] ntfs: driver 2.1.32 [Flags: R/O MODULE].
[532541.657558] REISERFS warning (device sda4): sh-2006
read_super_block: bread failed (dev sda4, block 8, size 1024)
[532541.657621] REISERFS warning (device sda4): sh-2006
read_super_block: bread failed (dev sda4, block 64, size 1024)
[532541.657682] REISERFS warning (device sda4): sh-2021
reiserfs_fill_super: can not find reiserfs on sda4
[532541.658379] EXT3-fs (sda4): error: unable to read superblock
[532541.659020] EXT2-fs (sda4): error: unable to read superblock
[532541.659664] EXT4-fs (sda4): unable to read superblock
[532541.660423] squashfs: SQUASHFS error: Can't find a SQUASHFS
superblock on sda4
[532541.661194] FAT-fs (sda4): bogus number of reserved sectors
[532541.661226] FAT-fs (sda4): Can't find a valid FAT filesystem
[532541.661901] isofs_fill_super: bread failed, dev=sda4,
iso_blknum=16, block=32
[532541.662588] UDF-fs: error (device sda4): udf_read_tagged: read
failed, block=256, location=256
[532541.662819] UDF-fs: error (device sda4): udf_read_tagged: tag
version 0x0000 != 0x0002 || 0x0003, block 0
[532541.662879] UDF-fs: error (device sda4): udf_read_tagged: read
failed, block=512, location=512
[532541.662935] UDF-fs: error (device sda4): udf_read_tagged: tag
version 0x0000 != 0x0002 || 0x0003, block 0
[532541.662994] UDF-fs: error (device sda4): udf_read_tagged: tag
version 0x0000 != 0x0002 || 0x0003, block 0
[532541.663052] UDF-fs: warning (device sda4): udf_load_vrs: No anchor found
[532541.663086] UDF-fs: Rescanning with blocksize 2048
[532541.663124] UDF-fs: error (device sda4): udf_read_tagged: read
failed, block=256, location=256
[532541.663181] UDF-fs: error (device sda4): udf_read_tagged: read
failed, block=512, location=512
[532541.663238] UDF-fs: warning (device sda4): udf_load_vrs: No anchor found
[532541.663271] UDF-fs: warning (device sda4): udf_fill_super: No
partition found (1)
[532541.664182] XFS (sda4): Invalid superblock magic number
[532541.665014] (mount,17550,6):ocfs2_get_sector:1822 ERROR: status = -12
[532541.665048] (mount,17550,6):ocfs2_sb_probe:821 ERROR: status = -12
[532541.665081] (mount,17550,6):ocfs2_fill_super:1026 ERROR:
superblock probe failed!
[532541.665117] (mount,17550,6):ocfs2_fill_super:1217 ERROR: status = -12
[532541.665775] attempt to access beyond end of device
[532541.665806] sda4: rw=48, want=136, limit=2
[532541.665833] gfs2: error -5 reading superblock
[532541.666457] gfs2: gfs2 mount does not exist
[532541.667821] FAT-fs (sda4): bogus number of reserved sectors
[532541.667853] FAT-fs (sda4): Can't find a valid FAT filesystem
[532541.668621] ntfs: (device sda4): read_ntfs_boot_sector(): Primary
boot sector is invalid.
[532541.672594] ntfs: (device sda4): read_ntfs_boot_sector(): Mount
option errors=recover not used. Aborting without trying to recover.
[532541.672657] ntfs: (device sda4): ntfs_fill_super(): Not an NTFS volume.
[532603.588489] REISERFS warning (device sda4): sh-2006
read_super_block: bread failed (dev sda4, block 8, size 1024)
[532603.588552] REISERFS warning (device sda4): sh-2006
read_super_block: bread failed (dev sda4, block 64, size 1024)
[532603.588613] REISERFS warning (device sda4): sh-2021
reiserfs_fill_super: can not find reiserfs on sda4
[532603.589374] EXT3-fs (sda4): error: unable to read superblock
[532603.590054] EXT2-fs (sda4): error: unable to read superblock
[532603.590738] EXT4-fs (sda4): unable to read superblock
[532603.591524] squashfs: SQUASHFS error: Can't find a SQUASHFS
superblock on sda4
[532603.592590] FAT-fs (sda4): bogus number of reserved sectors
[532603.592622] FAT-fs (sda4): Can't find a valid FAT filesystem
[532603.593273] isofs_fill_super: bread failed, dev=sda4,
iso_blknum=16, block=32
[532603.593945] UDF-fs: error (device sda4): udf_read_tagged: read
failed, block=256, location=256
[532603.594247] UDF-fs: error (device sda4): udf_read_tagged: tag
version 0x0000 != 0x0002 || 0x0003, block 0
[532603.594312] UDF-fs: error (device sda4): udf_read_tagged: read
failed, block=512, location=512
[532603.594371] UDF-fs: error (device sda4): udf_read_tagged: tag
version 0x0000 != 0x0002 || 0x0003, block 0
[532603.594432] UDF-fs: error (device sda4): udf_read_tagged: tag
version 0x0000 != 0x0002 || 0x0003, block 0
[532603.594492] UDF-fs: warning (device sda4): udf_load_vrs: No anchor found
[532603.594527] UDF-fs: Rescanning with blocksize 2048
[532603.594582] UDF-fs: error (device sda4): udf_read_tagged: read
failed, block=256, location=256
[532603.594642] UDF-fs: error (device sda4): udf_read_tagged: read
failed, block=512, location=512
[532603.594700] UDF-fs: warning (device sda4): udf_load_vrs: No anchor found
[532603.594746] UDF-fs: warning (device sda4): udf_fill_super: No
partition found (1)
[532603.595665] XFS (sda4): Invalid superblock magic number
[532603.596525] (mount,18660,7):ocfs2_get_sector:1822 ERROR: status = -12
[532603.596559] (mount,18660,7):ocfs2_sb_probe:821 ERROR: status = -12
[532603.596592] (mount,18660,7):ocfs2_fill_super:1026 ERROR:
superblock probe failed!
[532603.596627] (mount,18660,7):ocfs2_fill_super:1217 ERROR: status = -12
[532603.597271] attempt to access beyond end of device
[532603.597302] sda4: rw=48, want=136, limit=2
[532603.597329] gfs2: error -5 reading superblock
[532603.597944] gfs2: gfs2 mount does not exist
[532603.599274] FAT-fs (sda4): bogus number of reserved sectors
[532603.599306] FAT-fs (sda4): Can't find a valid FAT filesystem
[532603.600068] ntfs: (device sda4): read_ntfs_boot_sector(): Primary
boot sector is invalid.
[532603.600106] ntfs: (device sda4): read_ntfs_boot_sector(): Mount
option errors=recover not used. Aborting without trying to recover.
[532603.600169] ntfs: (device sda4): ntfs_fill_super(): Not an NTFS volume.
[532611.781265] REISERFS warning (device sda4): sh-2006
read_super_block: bread failed (dev sda4, block 8, size 1024)
[532611.781329] REISERFS warning (device sda4): sh-2006
read_super_block: bread failed (dev sda4, block 64, size 1024)
[532611.781389] REISERFS warning (device sda4): sh-2021
reiserfs_fill_super: can not find reiserfs on sda4
[532611.782171] EXT3-fs (sda4): error: unable to read superblock
[532611.782989] EXT2-fs (sda4): error: unable to read superblock
[532611.783768] EXT4-fs (sda4): unable to read superblock
[532611.784548] squashfs: SQUASHFS error: Can't find a SQUASHFS
superblock on sda4
[532611.785311] FAT-fs (sda4): bogus number of reserved sectors
[532611.785343] FAT-fs (sda4): Can't find a valid FAT filesystem
[532611.786003] isofs_fill_super: bread failed, dev=sda4,
iso_blknum=16, block=32
[532611.786658] UDF-fs: error (device sda4): udf_read_tagged: read
failed, block=256, location=256
[532611.786888] UDF-fs: error (device sda4): udf_read_tagged: tag
version 0x0000 != 0x0002 || 0x0003, block 0
[532611.786957] UDF-fs: error (device sda4): udf_read_tagged: read
failed, block=512, location=512
[532611.787014] UDF-fs: error (device sda4): udf_read_tagged: tag
version 0x0000 != 0x0002 || 0x0003, block 0
[532611.787072] UDF-fs: error (device sda4): udf_read_tagged: tag
version 0x0000 != 0x0002 || 0x0003, block 0
[532611.787131] UDF-fs: warning (device sda4): udf_load_vrs: No anchor found
[532611.787164] UDF-fs: Rescanning with blocksize 2048
[532611.787202] UDF-fs: error (device sda4): udf_read_tagged: read
failed, block=256, location=256
[532611.787259] UDF-fs: error (device sda4): udf_read_tagged: read
failed, block=512, location=512
[532611.787316] UDF-fs: warning (device sda4): udf_load_vrs: No anchor found
[532611.787349] UDF-fs: warning (device sda4): udf_fill_super: No
partition found (1)
[532611.788237] XFS (sda4): Invalid superblock magic number
[532611.789060] (mount,19654,5):ocfs2_get_sector:1822 ERROR: status = -12
[532611.789094] (mount,19654,5):ocfs2_sb_probe:821 ERROR: status = -12
[532611.789127] (mount,19654,5):ocfs2_fill_super:1026 ERROR:
superblock probe failed!
[532611.789162] (mount,19654,5):ocfs2_fill_super:1217 ERROR: status = -12
[532611.789800] attempt to access beyond end of device
[532611.789831] sda4: rw=48, want=136, limit=2
[532611.789858] gfs2: error -5 reading superblock
[532611.790487] gfs2: gfs2 mount does not exist
[532611.791860] FAT-fs (sda4): bogus number of reserved sectors
[532611.791892] FAT-fs (sda4): Can't find a valid FAT filesystem
[532611.792679] ntfs: (device sda4): read_ntfs_boot_sector(): Primary
boot sector is invalid.
[532611.792716] ntfs: (device sda4): read_ntfs_boot_sector(): Mount
option errors=recover not used. Aborting without trying to recover.
[532611.792780] ntfs: (device sda4): ntfs_fill_super(): Not an NTFS volume.

Thanks,
Kyle.


On Mon, Nov 16, 2015 at 5:53 AM, Kyle Sanderson <[email protected]> wrote:
> I'll reboot the box tonight.
>
> I've been unable to run newer kernels (> 4.0) (gentoo/hardened-sources) on a
> variety different systems (amd64: atom, ivy bridge, haswell), they usually
> panic after a few hours. Primary issue being they're remote in a DC so I
> can't hook up a serial console.
>
> Thanks guys,
> Kyle.
>
> On 16 Nov 2015 12:54 a.m., "Thomas Gleixner" <[email protected]> wrote:
>>
>> On Sun, 15 Nov 2015, Linus Torvalds wrote:
>> > On Sun, Nov 15, 2015 at 2:28 PM, Kyle Sanderson <[email protected]>
>> > wrote:
>> > > [] BUG: unable to handle kernel paging request at ffffe8ff7fc00001
>> > > [] IP: [<ffffffff810a174f>] kstat_irqs+0x4f/0x90
>> > > [] CPU: 2 PID: 1078 Comm: usage.pl Not tainted 4.1.7-hardened-r1 #1
>> > > [] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b
>> > > 04/21/2015
>> > RSI: 000060f700000001
>> > > [] Call Trace:
>> > > [] [<>] kstat_irqs_usr+0x1e/0x40
>>
>> > The code ends up being
>> >
>> > mov 0x48(%r13),%rsi
>> > mov __per_cpu_offset(,%rcx,8),%rcx
>> > add (%rsi,%rcx,1),%ebx <-- trapping instruction
>> >
>> > which is just the
>> >
>> > sum += *per_cpu_ptr(desc->kstat_irqs, cpu);
>> >
>> > part of kstat_irqs().
>> >
>> > Your registers being
>> >
>> > RSI: 000060f700000001
>> > RCX: ffff88087fc00000
>> >
>> > and it's RSI that makes no sense - RCX looks like a real kernel
>> > pointer. So it looks like it's the "desc->kstat_irqs" thing that is
>> > for some reason garbage.
>> >
>> > I don't see any sane possible reason this would happen, though.
>> > Thomas, does this look like anything you've seen before?
>>
>> No. What's strange is that this does explode while reading
>> /proc/interrupts and it did not happen when interrupt accounting took
>> place.
>>
>> Though this looks like memory corruption and it might be an interrupt
>> which fired only at boot time, i.e. before the corruption happened.
>>
>> No idea how to decode that. Kyle, is that reproducible?
>>
>> Thanks,
>>
>> tglx
>>
>