Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752634Ab0KATnr (ORCPT ); Mon, 1 Nov 2010 15:43:47 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:63624 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752160Ab0KATnp convert rfc822-to-8bit (ORCPT ); Mon, 1 Nov 2010 15:43:45 -0400 Subject: Re: Regression, bisected: sqlite locking failure on nfs Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: Chuck Lever In-Reply-To: <1288639376.5009.16.camel@heimdal.trondhjem.org> Date: Mon, 1 Nov 2010 15:43:18 -0400 Cc: Nick Bowler , LKML Kernel , "J. Bruce Fields" , Linux NFS Mailing List Content-Transfer-Encoding: 8BIT Message-Id: <59BBF1A5-1425-448A-8715-4DD20BCA2ED6@oracle.com> References: <20101101175854.GA3550@elliptictech.com> <20101101181938.GA3875@elliptictech.com> <187AEE96-9231-4899-9D65-A444503D2758@oracle.com> <1288639376.5009.16.camel@heimdal.trondhjem.org> To: Trond Myklebust X-Mailer: Apple Mail (2.1081) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2824 Lines: 74 On Nov 1, 2010, at 3:22 PM, Trond Myklebust wrote: > On Mon, 2010-11-01 at 14:30 -0400, Chuck Lever wrote: >> On Nov 1, 2010, at 2:19 PM, Nick Bowler wrote: >> >>> On 2010-11-01 14:07 -0400, Chuck Lever wrote: >>>> On Nov 1, 2010, at 1:58 PM, Nick Bowler wrote: >>>>> After installing 2.6.37-rc1, attempting to use sqlite in any capacity on >>>>> NFS gives a locking error: >>>>> >>>>> % echo 'select * from blah;' | sqlite3 blah.sqlite >>>>> Error: near line 1: database is locked >>>>> >>>>> % echo 'create table blargh(INT);' | sqlite3 blargh.sqlite >>>>> Error: near line 1: database is locked >>>>> >>>>> The result is that a lot of high-profile applications which make use of >>>>> sqlite fail mysteriously. Bisection reveals the following, and >>>>> reverting the implicated commit solves the issue: >>>> >>>> Nick, thanks for the report. Is 2.6.37-rc1 running on your clients or >>>> on your server? >>> >>> Sorry for not being clear: the client is running 2.6.37-rc1. The >>> server is running RHEL 5.5. >>> >>>> Does anything interesting appear in the kernel log when your test case >>>> fails? >>> >>> There are no unusual messages on the client... but I just logged into >>> the server and I see lots of messages of the following form: >>> >>> nfsd: request from insecure port (192.168.8.199:35766)! >>> nfsd: request from insecure port (192.168.8.199:35766)! >>> nfsd: request from insecure port (192.168.8.199:35766)! >>> nfsd: request from insecure port (192.168.8.199:35766)! >>> nfsd: request from insecure port (192.168.8.199:35766)! >>> >>> (192.168.8.199 is the address of the failing client). I can only assume >>> that these are a result of my recent issues, since I don't have access >>> to the system log (with timestamps) on that machine. >> >> That's the problem this patch is supposed to prevent. I'll investigate further. >> > > I suspect nlmclnt_lookup_host() is to blame. It appears to be the _only_ > thing in the kernel that actually sets this 'srcaddr' field, and it sets > it to > > const struct sockaddr source = { > .sa_family = AF_UNSPEC, > }; > > You triggered the bug by removing the line > > transport->srcaddr.ss_family = family; > > from xs_create_sock(). Thanks. Actually that line was added by Bruce very recently because Pavel's patches changed xs_bind() so it can't tolerate an AF_UNSPEC address. My patch attempts to replace the workaround with something more permanent... but looks like I didn't find all the places that needed to be fixed. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/