Return-Path: linux-nfs-owner@vger.kernel.org Received: from mx1.redhat.com ([209.132.183.28]:45883 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756329Ab3KLWhn (ORCPT ); Tue, 12 Nov 2013 17:37:43 -0500 Date: Tue, 12 Nov 2013 17:37:38 -0500 From: Jeff Layton To: NeilBrown Cc: Steve Dickson , trond.myklebust@netapp.com, linux-nfs@vger.kernel.org Subject: Re: [PATCH 0/2] sunrpc: more reliable detection of running gssd Message-ID: <20131112173738.6e9f94cc@tlielax.poochiereds.net> In-Reply-To: <20131113091537.059162de@notabene.brown> References: <1384261225-28559-1-git-send-email-jlayton@redhat.com> <52824784.4080901@RedHat.com> <20131113091537.059162de@notabene.brown> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/Vc4MzzE.tvVQ_GCH9UOdKk1"; protocol="application/pgp-signature" Sender: linux-nfs-owner@vger.kernel.org List-ID: --Sig_/Vc4MzzE.tvVQ_GCH9UOdKk1 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Wed, 13 Nov 2013 09:15:37 +1100 NeilBrown wrote: > On Tue, 12 Nov 2013 10:21:40 -0500 Steve Dickson wrot= e: >=20 > > On 12/11/13 08:00, Jeff Layton wrote: > > > We've gotten a lot of complaints recently about the 15s delay when > > > doing a sec=3Dsys mount without gssd running. > > >=20 > > > A large part of the problem is that the kernel isn't able to reliably > > > detect when rpc.gssd is running. What we currently have is a > > > gssd_running flag that is initially set to 1. When an upcall times ou= t, > > > that gets set to 0, and subsequent upcalls get a much shorter timeout > > > (1/4s instead of 15s). It's reset back to '1' when a pipe is reopened. > > >=20 > > > The approach of using a flag like this is pretty inadequate. First, it > > > doesn't eliminate the long delay on the initial upcall attempt. Also, > > > if gssd spontaneously dies, then the flag will still be set to 1 until > > > the next upcall attempt times out. Finally, it currently requires that > > > the pipe be reopened in order to reset the flag back to true. > > >=20 > > > This patchset replaces that flag with a more reliable mechanism for > > > detecting when gssd is running. When rpc_pipefs is mounted, it create= s a > > > new "dummy" pipe that gssd will naturally find and hold open. We'll > > > never send an upcall down this pipe, and writing to it always fails. > > > But, since we can detect when something is holding it open, we can use > > > that to determine whether gssd is running. > > >=20 > > > The current patch just uses this mechanism to replace the gssd_running > > > flag with this new mechanism. This shortens the long delay when mount= ing > > > without gssd running, but does not silence these warnings: > > >=20 > > > RPC: AUTH_GSS upcall timed out. > > > Please check user daemon is running. > > >=20 > > > I'm willing to add a patch to do that, but I'm a little unclear on the > > > best way to do so. Those messages are generated by the auth_gss code.= We > > > probably do want to print them if someone mounted with sec=3Dkrb5, but > > > suppress them when mounting with sec=3Dsys. > > >=20 > > > Do we need to somehow pass down that intent to auth_gss? Another idea > > > would be to call gssd_running() from the nfs mount code and use that = to > > > determine whether to try and use krb5 at all... > > >=20 > > > Discuss! > > I've just verified that a mount, with these patches, takes about=20 > > 1.2 seconds when rpc.gssd is not running.... With rpc.gssd it=20 > > take about .2 seconds. > >=20 > > Tested-by: Steve Dickson > > >=20 > Still sounds like about one second too long. >=20 > In that patch I see: >=20 > timeout =3D 15 * HZ; > - if (!sn->gssd_running) > + if (!gssd_running(sn)) > timeout =3D HZ >> 2; >=20 Yeah, it's not clear to me where the extra delay there comes from either. I was sort of hoping Steve would track that down... ;) > Given that "!gssd_running(sn)" is now certain knowledge rather than a hin= t, > can't we just skip the upcall and any timeout? > i.e. > timeout =3D 15 * HZ; > - if (!sn->gssd_running) > + if (!gssd_running(sn)) > - timeout =3D HZ >> 2; > + return -EACCES; >=20 Good point...I was trying to keep the semantic changes to a minimum, but that does make sense. One minor nit...with the above you'll never hit warn_gss(), so it probably makes sense to put that in there too. I've got a v2 of the patchset that I'm working on that fixes a couple of bugs, makes the dir name change that Trond wants, and also has a patch that makes nfs4_init_client skip trying krb5i if gssd isn't up. I'll probably post that tomorrow... --=20 Jeff Layton --Sig_/Vc4MzzE.tvVQ_GCH9UOdKk1 Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJSgq2yAAoJEAAOaEEZVoIVRNAQAJg67o321swADGI2qT3s3Xxi I2vrV9qs65ikIeVrhzRwG64Fiiqo2QoYHfTnpMFxHqFVvhZspD8fHabwwaMgxxjL dqwxKIYn9wNwCRhmjmMiGzzA1aw0D0SvG5xpSZSehuhhZLljamQBmygJuzkb64Dv gZ9o/oHMtDPB/vDq3HoRAqzQZa6EKaUudt2K4UXp+MwJ47IjkpsN1xm7+RzHANcL E9fUA35Bs4M7lp1FlMJkRDaZB/2sfynvZSH36eNTs7w6Os+LA7qs/tzR8iHLmiWF 2lhLvhuq6mu5nm1cQcscJYe2+jDqvEA5xH1h2iorThhNLeSu4UspAb+gfB1Waz9l pgyoMdVUJnhT3Q87zv9hdQlsmHQteCSXKPnh66j4xNTZUyH5CPQPsX9IXVL0HB1I eIKg6M0FmNaNmv2zB/S2HvvmiYbQcPMNSw6QOPmqr6JJ9E8KTSr8OlWDUsTeSDbc j/jwrKWVEsajGZdIwUTgF8JmbdimqrZ/kmwtmrnDU5t6gZoXSna3dWoeE3YkfzjL mQ65Smkp3sJC0mZPsPZooTG4Razf+QNbUFIprntz+3Hz+GmrN6id4AaEFQnQyVah 25SPWMz3onqcj58O3lCHcXYQAWxk9mOCYG6s25SiFO4kbRCutRMCHG+bhW4/y0Ut ru8rBU3ITZ4Asus4wCx6 =OT+1 -----END PGP SIGNATURE----- --Sig_/Vc4MzzE.tvVQ_GCH9UOdKk1--