Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762700AbXISSHb (ORCPT ); Wed, 19 Sep 2007 14:07:31 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753540AbXISSHV (ORCPT ); Wed, 19 Sep 2007 14:07:21 -0400 Received: from pat.uio.no ([129.240.10.15]:57829 "EHLO pat.uio.no" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753567AbXISSHT (ORCPT ); Wed, 19 Sep 2007 14:07:19 -0400 Subject: Re: NFS + coredump OOPS From: Trond Myklebust To: NetArt - Grzegorz Nosek Cc: linux-kernel@vger.kernel.org In-Reply-To: <20070919105353.GA7392@tech.serwery.pl> References: <20070919105353.GA7392@tech.serwery.pl> Content-Type: text/plain Date: Wed, 19 Sep 2007 14:07:13 -0400 Message-Id: <1190225233.6734.8.camel@heimdal.trondhjem.org> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 7bit X-UiO-Resend: resent X-UiO-ClamAV-Virus: No X-UiO-Spam-info: not spam, SpamAssassin (score=-0.1, required=12.0, autolearn=disabled, AWL=-0.054) X-UiO-Scanned: 59533D5C0C2326D530F232FE125B0F7AC0222EA8 X-UiO-SPAM-Test: remote_host: 129.240.10.9 spam_score: 0 maxlevel 200 minaction 2 bait 0 mail/h: 99 total 3988828 max/h 8345 blacklist 0 greylist 0 ratelimit 0 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4406 Lines: 89 On Wed, 2007-09-19 at 12:53 +0200, NetArt - Grzegorz Nosek wrote: > Hello all, > > [please keep CC'd] > > This oops report comes from 2.6.18.5, so it may have been fixed in a > newer release, but I'm reporting nevertheless. OTOH, the (possibly) > relevant code looks unchanged. > > The background is _probably_ attempting a core dump of a process, > whose backing binary file is accessible via NFS. > > My understanding of the issue follows. > > After creating a list of pages to read, __do_page_cache_readahead calls > (indirectly) mapping->a_ops->readpages, which must empty the list of > pages passed to it (as asserted by the BUG_ON). However, nfs_readpages > may return early in a few cases: > > if (NFS_STALE(inode)) > goto out; > > if (filp == NULL) { > desc.ctx = nfs_find_open_context(inode, NULL, FMODE_READ); > if (desc.ctx == NULL) > return -EBADF; > } else > desc.ctx = get_nfs_open_context((struct nfs_open_context *) > filp->private_data); > > I'd guess that the inode had gone stale (the process ran for quite > some time), so nfs_readpages returned without even touching the list. > Boom. > > Taking a SWAG, I'd guess a missing > file->f_dentry->d_op->d_revalidate() in > fs/exec.c::do_core_dump(), but d_revalidate needs a nameidata > structure, which do_core_dump() doesn't seem to have at hand. > > Best regards, > Grzegorz Nosek > > [16249868.626066] ------------[ cut here ]------------ > [16249868.684345] kernel BUG at mm/readahead.c:314! > [16249868.739565] invalid opcode: 0000 [#1] > [16249868.786480] SMP > [16249868.811703] Modules linked in: xt_tcpudp iptable_nat ip_nat smbfs cls_u32 sch_sfq sch_htb xt_mark ipt_account xt_helper iptable_mangle xt_MARK xt_multiport ipt_LOG xt_limit iptable_filter ip_conntrack_ftp ip_conntrack xfs dm_mod ipmi_devintf ipmi_si ipmi_watchdog ipmi_msghandler softdog ip_tables x_tables nfsd exportfs tg3 > [16249869.159639] CPU: 0 > [16249869.159640] EIP: 0060:[] Not tainted VLI > [16249869.159641] EFLAGS: 00010212 (2.6.18.5-na1.4 #1) > [16249869.318036] EIP is at __do_page_cache_readahead+0xb4/0x212 > [16249869.386750] eax: ffffff8c ebx: c01c5399 ecx: 00000000 edx: d2ac3c20 > [16249869.471039] esi: 00000003 edi: 00000002 ebp: d2ac3c34 esp: d2ac3bc0 > [16249869.555330] ds: 007b es: 007b ss: 0068 > [16249869.607439] Process clamscan (pid: 13406, ti=d2ac2000 task=e8ebf190 task.ti=d2ac2000) > [16249869.702108] Stack: 00000002 c9c33c60 c9c33c54 00000126 f28e8900 c9c33c50 c3792904 000002db > [16249869.805908] 00001000 00000000 d2ac3c68 c013f85b 00002000 00000000 d2ac3d1c 00000000 > [16249869.909706] 00001000 d0a67000 00000200 00000001 d2ac3c90 d2ac3c88 d2ac3cd4 d2ac2000 > [16249870.013503] Call Trace: > [16249870.047964] [] show_stack_log_lvl+0xa8/0xe5 > [16249870.112527] [] show_registers+0x19f/0x22f > [16249870.175014] [] die+0x132/0x2de > [16249870.226081] [] do_trap+0x76/0xa1 > [16249870.279225] [] do_invalid_op+0x97/0xa1 > [16249870.338596] [] error_code+0x39/0x40 > [16249870.394854] [] do_page_cache_readahead+0x3d/0x51 > [16249870.464711] [] filemap_nopage+0x15e/0x3a4 > [16249870.527197] [] __handle_mm_fault+0x198/0xb69 > [16249870.592799] [] get_user_pages+0xbf/0x31c > [16249870.654248] [] elf_core_dump+0x9aa/0xcf0 > [16249870.715695] [] do_coredump+0x5c2/0x5f5 > [16249870.775070] [] get_signal_to_deliver+0x340/0x403 > [16249870.844925] [] do_notify_resume+0x19f/0x6c5 > [16249870.909489] [] work_notifysig+0x13/0x19 > [16249870.969897] Code: 4d a0 f0 ff 41 10 fb 85 ff 74 18 8b 41 38 8b 58 14 85 db 74 20 89 3c 24 8d 4d ec 8b 55 a0 8b 45 9c ff d3 8d 55 ec 3b 55 ec 74 9a <0f> 0b 3a 01 8b 95 37 c0 eb 90 c7 45 ac 00 00 00 00 c745 b0 00 > [16249871.205111] EIP: [] __do_page_cache_readahead+0xb4/0x212 SS:ESP 0068:d2ac3bc0 That bug should have been fixed in 2.6.19-rc5. See http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=029e332ea717810172e965ec50f942755ad0c58a Cheers Trond - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/