Return-Path: linux-nfs-owner@vger.kernel.org Received: from userp1040.oracle.com ([156.151.31.81]:39780 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751554Ab3KLQPv convert rfc822-to-8bit (ORCPT ); Tue, 12 Nov 2013 11:15:51 -0500 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: [PATCH 0/2] sunrpc: more reliable detection of running gssd From: Chuck Lever In-Reply-To: <20131112110831.72234c64@tlielax.poochiereds.net> Date: Tue, 12 Nov 2013 11:15:41 -0500 Cc: trond.myklebust@netapp.com, linux-nfs@vger.kernel.org, steved@redhat.com Message-Id: References: <1384261225-28559-1-git-send-email-jlayton@redhat.com> <20131112110831.72234c64@tlielax.poochiereds.net> To: Jeff Layton Sender: linux-nfs-owner@vger.kernel.org List-ID: On Nov 12, 2013, at 11:08 AM, Jeff Layton wrote: > On Tue, 12 Nov 2013 11:02:42 -0500 > Chuck Lever wrote: > >> >> On Nov 12, 2013, at 8:00 AM, Jeff Layton wrote: >> >>> We've gotten a lot of complaints recently about the 15s delay when >>> doing a sec=sys mount without gssd running. >>> >>> A large part of the problem is that the kernel isn't able to reliably >>> detect when rpc.gssd is running. What we currently have is a >>> gssd_running flag that is initially set to 1. When an upcall times out, >>> that gets set to 0, and subsequent upcalls get a much shorter timeout >>> (1/4s instead of 15s). It's reset back to '1' when a pipe is reopened. >>> >>> The approach of using a flag like this is pretty inadequate. First, it >>> doesn't eliminate the long delay on the initial upcall attempt. Also, >>> if gssd spontaneously dies, then the flag will still be set to 1 until >>> the next upcall attempt times out. Finally, it currently requires that >>> the pipe be reopened in order to reset the flag back to true. >>> >>> This patchset replaces that flag with a more reliable mechanism for >>> detecting when gssd is running. When rpc_pipefs is mounted, it creates a >>> new "dummy" pipe that gssd will naturally find and hold open. We'll >>> never send an upcall down this pipe, and writing to it always fails. >>> But, since we can detect when something is holding it open, we can use >>> that to determine whether gssd is running. >>> >>> The current patch just uses this mechanism to replace the gssd_running >>> flag with this new mechanism. This shortens the long delay when mounting >>> without gssd running, but does not silence these warnings: >>> >>> RPC: AUTH_GSS upcall timed out. >>> Please check user daemon is running. >>> >>> I'm willing to add a patch to do that, but I'm a little unclear on the >>> best way to do so. Those messages are generated by the auth_gss code. We >>> probably do want to print them if someone mounted with sec=krb5, but >>> suppress them when mounting with sec=sys. >>> >>> Do we need to somehow pass down that intent to auth_gss? Another idea >>> would be to call gssd_running() from the nfs mount code and use that to >>> determine whether to try and use krb5 at all... >>> >>> Discuss! >> >> I'd like to pursue the module loading solution as well. >> > > Sorry, I missed that part of the discussion. > > What's the module loading solution? Load auth_rpcgss.ko only when rpc.gssd has been started. See the "[PATCH] Adding the nfs4_secure_mounts bool" thread... If auth_rpcgss.ko is not loaded, the kernel won't ever try to do an upcall. Then, systemd can be used to restart rpc.gssd if it crashes, maybe? Just a thought. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com