Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752112AbbEDHao (ORCPT ); Mon, 4 May 2015 03:30:44 -0400 Received: from cantor2.suse.de ([195.135.220.15]:32800 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751365AbbEDHag (ORCPT ); Mon, 4 May 2015 03:30:36 -0400 Date: Mon, 4 May 2015 17:30:25 +1000 From: NeilBrown To: Al Viro Cc: Christoph Hellwig , Linus Torvalds , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Message-ID: <20150504173025.50433fcb@notabene.brown> In-Reply-To: <20150504051129.GA3246@ZenIV.linux.org.uk> References: <553668C1.8030707@nod.at> <20150421154504.GT889@ZenIV.linux.org.uk> <20150421212007.GU889@ZenIV.linux.org.uk> <20150422180702.GA15209@ZenIV.linux.org.uk> <20150422201238.GW889@ZenIV.linux.org.uk> <20150422210553.GX889@ZenIV.linux.org.uk> <20150423174544.48ae3122@notabene.brown> <20150423180754.GA889@ZenIV.linux.org.uk> <20150424163534.6eb109eb@notabene.brown> <20150424134203.GC889@ZenIV.linux.org.uk> <20150504051129.GA3246@ZenIV.linux.org.uk> X-Mailer: Claws Mail 3.10.1-162-g4d0ed6 (GTK+ 2.24.25; x86_64-suse-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/ZZisg0FOgmXNUj_31Y.vjLZ"; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5222 Lines: 114 --Sig_/ZZisg0FOgmXNUj_31Y.vjLZ Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Mon, 4 May 2015 06:11:29 +0100 Al Viro wrote: > On Fri, Apr 24, 2015 at 02:42:03PM +0100, Al Viro wrote: >=20 > > That avoids this spin_lock() on each absolute symlink at the price of e= xtra > > 32 bits in struct nameidata. It looks like doing on-demand reallocation > > of nd->stack is the right way to go anyway, so the pressure on nameidat= a size > > is going to be weaker and that might be the right way to go... >=20 > OK, on-demand reallocation is done. What I have right now is > * flat MAXSYMLINKS 40, no matter what kind of nesting there might > be. > * purely iterative link_path_walk(). > * no damn nameidata on stack for generic_readlink() > * stack footprint of the entire thing independent from the nesting > depth, and about on par with "no symlinks at all" case in mainline. > * some massage towards RCU follow_link done (in the end of queue), > but quite a bit more work remains. >=20 > What I've got so far is in vfs.git#link_path_walk; I'm not too happy about > posting a 70-chunk mailbomb, but it really will need review and testing. > It survives xfstests and LTP with no regressions, but it will need > serious profiling, etc., along with RTFS. I tried to keep it in reasonab= ly > small pieces, but there's a lot of them ;-/ >=20 > FWIW, I've a bit more reorganization plotted out, but it's not far from > where we need to be for RCU follow_link. Some notes: > * I don't believe we want to pass flags to ->follow_link() - it's > much simpler to give the damn thing NULL for dentry in RCU case. In *all* > cases where we might have a change to get the symlink body without blocki= ng > we can do that by inode alone. We obviously want to pass dentry and inode > separately (and in case of fast symlinks we don't hit the filesystem at > all), but that's it - flags isn't needed. > * terminate_walk() should do bulk put_link(). So should the > failure cases of complete_walk(). _Success_ of complete_walk() should > be careful about legitimizing links - it *can* be called with one link > on stack, and be followed by access to link body. Yes, really - do_last() > in O_CREAT case. > * do_last(), lookup_last() and mountpoint_last() ought to > have put_link() done when called on non-empty stack (thus turning the loo= ps > into something like > while ((err =3D lookup_last(nd)) > 0) { > err =3D trailing_symlink(nd); > if (err) > break; > } > _After_ the point where they don't need to look at the last component of > name, obviously. > * I think we should leave terminate_walk() to callers in failure > cases of walk_component() and handle_dots(), as well as get_link(). Makes > life simpler in callers, actually. I'll play with that a bit more. > * it might make sense to add the second flag to walk_component(), > in addition to LOOKUP_FOLLOW, meaning "do put_link() once you are done lo= oking > at the name". In principle, it promises simpler logics with unlazy_walk(= ), > but that's one area I'm still not entirely sure about. Will need to > experiment a bit... > * nd->seq clobbering will need to be dealt with, as discussed upthread. > * I _really_ hate your "let's use the LSB of struct page * to tell > if we need to kunmap()" approach. It's too damn ugly to live. _And_ it's > trivial to avoid - all we need is to have (non-lazy) page_follow_link_lig= ht() > and page_symlink() to remove __GFP_HIGHMEM from inode->i_mapping before > ever asking to allocate pages there. That'll suffice, and it makes sense > regardless of RCU work - that kmap/kunmap with potential for minutes in > between (while waiting for stuck NFS server in the middle of symlink trav= ersal) > is simply wrong. Thanks! I'll have another look and see about adding what is needed for RCU symlink support. NeilBrown --Sig_/ZZisg0FOgmXNUj_31Y.vjLZ Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIVAwUBVUcgETnsnt1WYoG5AQL5Qw//eWiLQ/SHXNEuvZrGJlT6ykKkh3ZKlCBm D6NmIt+QwCuXY9u8SjwaVhS4Yhi+FrmnrDNJc45MhcvBqhC1+4MJ2Y698k8UwLcf kuD0Y1IcJGhSquZ+wI/tBRFQ1DG9yCNj+8xgEHkMryMpHOCl2qx/hZi0j/LsELjI E2qFp1/2Y+/qXrxDurKVTp/TUM929dGpPvVix7n3vIl3NG9S8XW/52lOj9fQl/NA fv69Yw93sqP2Qy/VO8FFL/80lgRR4/4BNJqYleQGweIrZrqj7x5eku2DooU2UZ7z aSZsHiWokT7I/TiVw8czwVz/ndNzzOkjrSn7bed3P2PmrK0zRSyxxm1J7GrEr17L R4S4j6GI/vopYPp1XII1sjBLFk5Zt+2oLpI7VT8tDFQwuHmOvLWdkOfS1xmBeVvQ qRVhvKXFH/PIHwNT9xqsy/ripis6AJLy1nLTbVxWVJbjAOuMYgwSBv7+eY+HbaM4 c3n4LZR3LESDKUN3KKuA5PJe9Yjvy+I1WGEwk4DXJpOi79zvOxgwzpLtXMVB1dV1 z8QfDtIbCkyLIO177iPFgWj3n56sWi3AIcKWhKYPVSwGSJIWuG0fo2kJa/6prQ9E NdgM5qcy1GLHXD/PgYT7gN1ViBfFfSQ9i6Z/ETPrfOMTFojzZdLAQr1hVjdSi0Oa tkBaNAyf2Uk= =Xyyb -----END PGP SIGNATURE----- --Sig_/ZZisg0FOgmXNUj_31Y.vjLZ-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/