From: "Steve French (smfltc)" Subject: Re: [RFC PATCH 0/9] sunrpc: teach the SUNRPC layer how to speak SMB Date: Mon, 28 Sep 2009 16:40:35 -0500 Message-ID: <4AC12D53.6060202@us.ibm.com> References: <4AC0CB04.9070502@us.ibm.com> <20090928115427.4103826e@tlielax.poochiereds.net> <4AC10317.5090403@us.ibm.com> <20090928153737.65c1ba3e@tlielax.poochiereds.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Cc: linux-cifs-client@lists.samba.org, linux-nfs@vger.kernel.org To: Jeff Layton Return-path: Received: from e3.ny.us.ibm.com ([32.97.182.143]:38660 "EHLO e3.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753109AbZI1Vlw (ORCPT ); Mon, 28 Sep 2009 17:41:52 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e3.ny.us.ibm.com (8.14.3/8.13.1) with ESMTP id n8SLYTHs007468 for ; Mon, 28 Sep 2009 17:34:29 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id n8SLftYF236576 for ; Mon, 28 Sep 2009 17:41:55 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n8SLftKl027182 for ; Mon, 28 Sep 2009 17:41:55 -0400 In-Reply-To: <20090928153737.65c1ba3e-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org> Sender: linux-nfs-owner@vger.kernel.org List-ID: Jeff Layton wrote: > On Mon, 28 Sep 2009 13:40:23 -0500 > "Steve French (smfltc)" wrote: > > >> Jeff Layton wrote: >> >>> On Mon, 28 Sep 2009 09:41:08 -0500 >>> "Steve French (smfltc)" wrote: >>> >>> >>> >>> >>> >> I don't think it changes the number of tasks sleeping. For sync >> operations it doesn't >> change how many tasks that are sleeping. For async operations, you no >> longer have a >> task sleeping in cifs_writepages or cifs_readpages, but do have the >> ability to dispatch >> a decode routine when the SMB Write or Read response is returned (may or >> may not have a pool to do this). Seems like about the same number of >> task (possibly >> less). Moving to all asynchronous operations under open, unlink, close >> doesn't reduce >> the number of sleeping tasks (it may increase it or stay the same). >> > > I think it does. Take the example of asynchronous writepages requests. > You want to send a bunch of write calls in quick succession and then > process the replies as they come in. If you rely on a SendReceive-type > interface, you have no choice but to spawn a separate slow_work task > for each call on the wire. No guarantee they'll run in parallel though > (depends on the rules for spawning new kslowd threads). > > Now, you could hack up a routine that just sends the calls, sprinkle in > a callback routine that's run by cifsd or something when the writes > come back (though you'll need to be wary of deadlock, of course). > > I was assuming the latter ... ie that cifsd processes the completion (or spawns a slow work task to process the response). For the case of write (actually writepages) there is not processing to be done (checking the rc, updating the status of the corresponding pages in the page cache). For read there are various choices (launching a slow work thread in readpages - but you only need one for what is potentially a very large number of reads from that readpages call) or processing in cifs or launching a slow work thread to process (my current preference since this is just cache updates/readahead anyway). >> >> That (splitting decode into a distinct helper) makes sense (at least for >> async capable ops, >> in particular write, read and directory change notification and byte >> range locks). The >> "smb_decode_write_response" case is a particularly simple and useful one >> to do and would >> be an easy one to do as a test. I think the prototype patches for >> async write that someone did >> for cifs a few years ago did that. >> > > Even for ops that are fundamentally synchronous, it makes sense to > split the encoding and decoding into separate routines. Maybe it's just > my personal bias, but I think it encourages a cleaner, more flexible > design. > I don't have a strong preference either way. There are a large number of SMBs which are basically "responseless" (we just look at the rc, but they don't include parameters) so they would not have interesting decode routines. Even for SMB2 although we probably have to look at the credits, the generic SMB response processing can probably deal with those. For those SMBs which have response processing, if it makes for smaller more readable functions to break out the decode routines, that seems fine. > SMB transport header is 32 bits and contains a length with a zeroed out > upper byte. The RPC over TCP transport header is 32 bits and contains a > 31 bit length + a "last fragment" flag. > > SMB has MID's, RPC has XID's > > RPC has the RPC auth info, SMB has the session UID and TID > > > There are some similar concepts but at different levels of the network stack. Having net/sunrpc assemble SMB headers (uid, mids) would be needed if you really want to leverage the similar concepts, and it seems like a strange idea to move much of fs/cifs into net/sunrpc helpers >>> The code I've proposed more or less has abstractions along those lines. >>> There's: >>> >>> transport class -- (net/sunrpc/xprtsmb.c) >>> header encoding/decoding -- (net/sunrpc/smb.c) >>> >>> ...the other parts will be implemented by the filesystem (similar to >>> how fs/nfs/nfs?xdr.c work). >>> >>> >>> >>>> I like the idea of the way SunRPC keeps task information, and it may >>>> make it easier >>>> to carry credentials around (although I think using Dave Howell's key >>>> management code >>>> might be ok instead to access Winbind). I am not sure how easy it >>>> would be to tie >>>> SunRPC credential mapping to Winbind but that could probably be done. I >>>> like the >>>> async scheduling capability of SunRPC although I suspect that it is a >>>> factor in >>>> a number of the (nfs client) performance problems we have seen so may >>>> need more work. >>>> I don't like adding (in effect) an extra transport and "encoding layer" >>>> though to >>>> protocols (cifs and smb2). NFS since it is built on SunRPC on the >>>> wire, required >>>> such a layer, and it makes sense for NFS to layer the code, like their >>>> protocol, >>>> over SunRPC. CIFS and SMB2 don't require (or even allow) XDR translation, >>>> variable encodings, and SunRPC encapsulation so the idea of abstracting the >>>> encoding of something that has a single defined encoding seems wrong. >>>> >>>> >>> I'm not sure I understand this last comment. CIFS/SMB2 and NFS are just >>> not that different in this regard. Either way you have to marshal up >>> the buffer correctly before you send it, and decode it properly. >>> >>> >>> >> As an example of one of the more complicated cases for cifs (setting a >> time field). >> The code looks something like this and is very straightforward to see where >> the "linux field" is being put into the packet - and to catch errors in >> size or >> endianness. Most cases are simpler for cifs. >> >> struct file_basic_info buf; /* the wire format of the file basic info >> SMB info level */ >> >> buf->LastAccessTime = 0; /* we can't set access time to Windows */ >> buf->LastWriteTime = cpu_to_le64(cifs_UnixTimeToNT(attrs->ia_atime)); >> (followed by a few similar assignment statements for the remaining fields) >> >> then send the SMB header followed by the buffer. There is no marshalling >> or xdr conversion needed. One thing I like about this approach is that >> "sparse" (make modules C=1) immediately catches any mismatch >> between the wire format (size or endianness) and the vfs data >> structure element being put in the frame. >> >> Looking at nfs4xdr.c as an example for comparison, it is much harder to >> see the actual line >> where a bigendian (or littlenedian) 64 bit quantity is being put into >> the request frame. >> >> > > Woah...I think we just have a difference in terminology here. When I > say "encoding" the packet, I just mean that we're stuffing the data > into the buffer in the format that the wire expects. > > Now, whether you do that with packed structures (like CIFS does and > which I tend to prefer), or via "hand-marshaling" (like the WRITE/READ > macros in the nfs?xdr.c files) is simply an implementation detail. > If we are stuffing the data into the buffer in the format the wire expects ie a series of assignment statements more or less as we do today (but broken into distinct encode and decode helpers) pSMB->SomeField = cpu_to_le64(inode->some_attribute_field); and simply using SunRPC as a convenient wrapper around the socket API for async dispatch that is probably fine (not sure if it is slower or faster ... and RPC has various performance limitations like the RPC_SLOT_COUNT) although not sure it is worth the trouble.