Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759977AbYFOSKo (ORCPT ); Sun, 15 Jun 2008 14:10:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758960AbYFOSKc (ORCPT ); Sun, 15 Jun 2008 14:10:32 -0400 Received: from wa-out-1112.google.com ([209.85.146.180]:32871 "EHLO wa-out-1112.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758942AbYFOSKa (ORCPT ); Sun, 15 Jun 2008 14:10:30 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=T9U43khmC5Ym9YHW2tOrEyN3G8BdZtqFZ32Du1niDY7+VIWWH3SlxU7KHwgSZbLL8T rGP4g7qgSDmaQp4NeXs3711Q22I18o9fnLzVcvAoPmC41GVzicgZ4X0lqfFZI0aUomHJ 1qZoWBNyUhkGeOrSFSf3KtCDSnjIn2K+Qg5D0= Message-ID: <6278d2220806151110x68ee91fej8cf8e6b591ce1319@mail.gmail.com> Date: Sun, 15 Jun 2008 19:10:27 +0100 From: "Daniel J Blueman" To: chucklever@gmail.com Subject: Re: [2.6.26-rc4] mount.nfsv4/memory poisoning issues... Cc: linux-nfs@vger.kernel.org, nfsv4@linux-nfs.org, "Linux Kernel" , "J. Bruce Fields" , "Trond Myklebust" , "Jeff Layton" In-Reply-To: <76bd70e30806041643j4d632a6exf64b29c34173d40f@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <6278d2220806041633n3bfe3dd2ke9602697697228b@mail.gmail.com> <76bd70e30806041643j4d632a6exf64b29c34173d40f@mail.gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2827 Lines: 68 On Thu, Jun 5, 2008 at 12:43 AM, Chuck Lever wrote: > Hi Daniel- > > On Wed, Jun 4, 2008 at 7:33 PM, Daniel J Blueman > wrote: >> Having experienced 'mount.nfs4: internal error' when mounting nfsv4 in >> the past, I have a minimal test-case I sometimes run: >> >> $ while :; do mount -t nfs4 filer:/store /store; umount /store; done >> >> After ~100 iterations, I saw the 'mount.nfs4: internal error', >> followed by symptoms of memory corruption [1], a locking issue with >> the reporting [2] and another (related?) memory-corruption issue >> (off-by-1?) [3]. A little analysis shows memory being overwritten by >> (likely) a poison value, which gets complicated if it's not >> use-after-free... >> >> Anyone dare confirm this issue? NFSv4 server is x86-64 Ubuntu 8.04 >> 2.6.24-18, client U8.04 2.6.26-rc4; batteries included [4]. > > We have some other reports of late model kernels with memory > corruption issues during NFS mount. The problem is that by the time > these canaries start singing, the evidence of what did the corrupting > is long gone. > >> I'm happy to decode addresses, test patches etc. > > If these crashes are more or less reliably reproduced, it would be > helpful if you could do a 'git bisect' on the client to figure out at > what point in the kernel revision history this problem was introduced. > > Have you seen the problem on client kernels earlier than 2.6.25? Firstly, I had omitted that I'd booted the kernel with debug_objects=1, which provides the canary here. The primary failure I see is 'mount.nfs4: internal error', and always after 358 umount/mount cycles (plus 1 initial mount) which gives us a clue; 'netstat' shows all these connections in a TIME_WAIT state, thus the bug relates to the inability to allocate a socket error path. I found that after the connection lifetime expired, you can mount again, which corroborates this theory. In this case, we saw the mount() syscall result in the mount.nfsv4 process being SEGV'd when booted with 'debug_object=1', without this option, we see: # strace /sbin/mount.nfs4 x1:/ /store ... mount("x1:/", "/store", "nfs4", 0, "addr=192.168.0.250,clientaddr=19"...) = -1 EIO (Input/output error) So, it's impossible to tell when the corruption was introduced, as it has only become detectable recently. It's worth a look-over of the socket-allocation error path, if someone can check, and reproduces 100% with the 'debug_object=1' param, available since 2.6.26-rc1 and 359 mounts in quick succession. Thanks! Daniel -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/