Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759479AbaD3TTw (ORCPT ); Wed, 30 Apr 2014 15:19:52 -0400 Received: from e23smtp02.au.ibm.com ([202.81.31.144]:58799 "EHLO e23smtp02.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758939AbaD3TTv (ORCPT ); Wed, 30 Apr 2014 15:19:51 -0400 Message-ID: <53614CA2.4000707@linux.vnet.ibm.com> Date: Thu, 01 May 2014 00:48:58 +0530 From: "Srivatsa S. Bhat" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Linus Torvalds CC: Davidlohr Bueso , Hugh Dickins , Linux MM , "linux-kernel@vger.kernel.org" , Rik van Riel , Michel Lespinasse , "akpm@linux-foundation.org" , Oleg Nesterov , Dave Jones Subject: Re: [BUG] kernel BUG at mm/vmacache.c:85! References: <535EA976.1080402@linux.vnet.ibm.com> <1398723290.25549.20.camel@buesod1.americas.hpqcorp.net> <535F77E8.2040000@linux.vnet.ibm.com> <53614BFE.9090804@linux.vnet.ibm.com> In-Reply-To: <53614BFE.9090804@linux.vnet.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14043019-5490-0000-0000-000005886E05 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/01/2014 12:46 AM, Srivatsa S. Bhat wrote: > On 04/29/2014 03:29 PM, Srivatsa S. Bhat wrote: >> On 04/29/2014 03:55 AM, Linus Torvalds wrote: >>> On Mon, Apr 28, 2014 at 3:14 PM, Davidlohr Bueso wrote: >>>> >>>> I think that returning some stale/bogus vma is causing those segfaults >>>> in udev. It shouldn't occur in a normal scenario. What puzzles me is >>>> that it's not always reproducible. This makes me wonder what else is >>>> going on... >>> >>> I've replaced the BUG_ON() with a WARN_ON_ONCE(), and made it be >>> unconditional (so you don't have to trigger the range check). >>> >>> That might make it show up earlier and easier (and hopefully closer to >>> the place that causes it). Maybe that makes it easier for Srivatsa to >>> reproduce this. It doesn't make *my* machine do anything different, >>> though. >>> >>> Srivatsa? It's in current -git. >>> >> >> I tried this, but still nothing so far. I rebooted 10-20 times, and also >> tried multiple runs of multi-threaded ebizzy and kernel compilations, >> but none of this hit the warning. >> > > I tried to recall the *exact* steps that I had carried out when I first > hit the bug. I realized that I had actually used kexec to boot the new > kernel. I had originally booted into a 3.7.7 kernel that happens to be > on that machine, and then kexec()'ed 3.15-rc3 on it. And that had caused > the kernel crash. Fresh boots of 3.15-rc3, as well as kexec from 3.15+ > to itself, seems to be pretty robust and has never resulted in any bad > behavior (this is why I couldn't reproduce the issue earlier, since I was > doing fresh boots of 3.15-rc). > > So I tried the same recipe again (boot into 3.7.7 and kexec into 3.15-rc3+) > and I got totally random crashes so far, once in sys_kill and two times in > exit_mmap. So I guess the bug is in 3.7.x and probably 3.15-rc is fine after > all... > > > Here is the crash around sys_kill: > > And here are the exit_mmap related ones: 1. mpt2sas0: port enable: SUCCESS scsi 0:1:0:0: Direct-Access LSI Logical Volume 3000 PQ: 0 ANSI: 6 scsi 0:1:0:0: RAID0: handle(0x00b5), wwid(0x02c5d3368a5aef06), pd_count(1), type(SSP) scsi 0:1:0:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1) scsi 0:0:0:0: Direct-Access IBM-ESXS ST9500620SS BD2C PQ: 0 ANSI: 6 scsi 0:0:0:0: SSP: handle(0x0005), sas_addr(0x5000c500559ffab5), phy(0), device_name(0x5000c500559ffab4) scsi 0:0:0:0: SSP: enclosure_logical_id(0x5005076056434d90), slot(0) scsi 0:0:0:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1) sd 0:1:0:0: [sda] 974608384 512-byte logical blocks: (498 GB/464 GiB) sd 0:1:0:0: [sda] Write Protect is off sd 0:1:0:0: [sda] Mode Sense: 03 00 00 08 sd 0:1:0:0: [sda] No Caching mode page found sd 0:1:0:0: [sda] Assuming drive cache: write through sda: sda1 sda2 sda3 sda4 sd 0:1:0:0: [sda] Attached SCSI disk EXT4-fs (sda2): INFO: recovery required on readonly filesystem EXT4-fs (sda2): write access will be enabled during recovery random: nonblocking pool is initialized EXT4-fs (sda2): recovery complete EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null) dracut: Mounted root filesystem /dev/sda2 dracut: Switching root Welcome to Red Hat Enterprise Linux Server Starting udev: udev: starting version 147 WARNING! power/level is deprecated; use power/control instead cat[11602]: segfault at 0 ip (null) sp 00007fff85a583f0 error 14traps: kdump[5307] general protection ip:3c8d22a8db sp:7fff03d1b418 error:0 in libc-2.12.so[3c8d200000+18a000] in cat[400000+b000] udevd-work[1304]: '/etc/init.d/kdump restart' unexpected exit with status 0x000b plymouth[12452]: segfault at 0 ip (null) sp 00007fff49a9e570 error 14 in plymouth[400000+7000] ------------[ cut here ]------------ WARNING: CPU: 13 PID: 12452 at mm/mmap.c:2741 exit_mmap+0x157/0x170() Modules linked in: acpi_cpufreq(+) ext4(E) jbd2(E) mbcache(E) sd_mod(E) crc_t10dif(E) crct10dif_common(E) mpt2sas(E) scsi_transport_sas(E) raid_class(E) CPU: 13 PID: 12452 Comm: plymouth Tainted: G E 3.15.0-rc3-mmdbg #1 Hardware name: IBM -[8737R2A]-/00AE502, BIOS -[B2E120QUS-1.20]- 11/14/2012 BUG: Bad page map in process kdump pte:1e00000005f98701 pmd:1031489067 addr:0000003c9e01f000 vm_flags:00100073 anon_vma:ffff88103a654550 mapping:ffff88103e91fb28 index:1f vma->vm_ops->fault: filemap_fault+0x0/0x450 vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x60 [ext4] 0000000000000ab5 ffff88202fe45bb8 ffffffff815a9b38 0000000000000ab5BUG: Bad page map in process kdump pte:1e00000005f98701 pmd:103420b067 addr:0000003c9e01f000 vm_flags:00100073 anon_vma:ffff88102d55ce58 mapping:ffff88103e91fb28 index:1f vma->vm_ops->fault: filemap_fault+0x0/0x450 vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x60 [ext4] 0000000000000000 ffff88202fe45bf8 ffffffff81050f2c 0000000000000000 ffff881032ca0140 0000000000000039 ffff881032ca0140 ffff881032ca01d8 Call Trace: [] dump_stack+0x51/0x71 [] warn_slowpath_common+0x8c/0xc0 [] warn_slowpath_null+0x1a/0x20 [] exit_mmap+0x157/0x170 [] ? exit_aio+0xb0/0x100 [] mmput+0x73/0x110 [] exit_mm+0x164/0x1d0 [] ? _raw_spin_unlock_irq+0x30/0x40 [] do_exit+0x15b/0x490 [] do_group_exit+0x5e/0xd0 [] get_signal_to_deliver+0x22e/0x470 [] ? finish_task_switch+0x48/0x120 [] do_signal+0x4b/0x140 [] ? trace_hardirqs_on_caller+0xfd/0x1c0 [] ? printk+0x4d/0x4f [] ? finish_task_switch+0x85/0x120 [] ? finish_task_switch+0x48/0x120 [] ? retint_signal+0x11/0x84 [] do_notify_resume+0x65/0x90 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] retint_signal+0x46/0x84 ---[ end trace b8be17f8a0dd8372 ]--- CPU: 2 PID: 9649 Comm: kdump Tainted: G E 3.15.0-rc3-mmdbg #1 Hardware name: IBM -[8737R2A]-/00AE502, BIOS -[B2E120QUS-1.20]- 11/14/2012 0000003c9e01f000 ffff881031705b40 ffffffff815a9b38 0000000000000001 ffff8810335f28b8 ffff881031705b90 ffffffff811660b3 0000000000000000 ffff88102fe67668 0000000000000000 0000003c9e01f000 0000000000000002 Call Trace: [] dump_stack+0x51/0x71 [] print_bad_pte+0x193/0x260 [] vm_normal_page+0x5e/0x70 [] copy_pte_range+0x217/0x5d0 [] copy_page_range+0x27a/0x4b0 [] dup_mmap+0x24f/0x3f0 [] dup_mm+0xcc/0x170 [] copy_process+0x122c/0x1260 [] ? __lock_release+0x84/0x180 [] do_fork+0x61/0x220 [] ? might_fault+0xaf/0xc0 [] ? might_fault+0x66/0xc0 [ 8118] 0 8118 27161 807 17 0 -1000 kdump /sbin/start_udev[14046] 0 14046 1011 65 7 0 -1000 logger [ 1003] 0 1003 2854 556 10 0 -1000 udevd : line 204: 116[ 6046] 0 6046 3084 796 10 0 -1000 udevd 6 Killed [14313] 0 14313 3612 1298 11 0 -1000 udevd [ 1618] 0 1618 3018 716 10 0 -1000 udevd /sbin/[13741] 0 13741 1014 65 8 0 -1000 logger CPU: 12 PID: 8596 Comm: udevd Tainted: G B E 3.15.0-rc3-mmdbg #1 [ 1453] 0 1453 2985 685 10 0 -1000 udevd Wait timeout. W[14270] 0 14270 4338 159 20 0 -1000 multipath ill continue in swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 the background.swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 [] ? search_binary_handler+0x1c0/0x1c0 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 swap_dup: Bad swap file entry 002a0000 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/