Hi,
I had me a little look at bdi usage in networked filesystems.
NFS, CIFS, (smbfs), AFS, CODA and NCP
And of those, NFS is the only one that I could find that creates
backing_dev_info structures. The rest seems to fall back to
default_backing_dev_info.
With my recent per bdi dirty limit patches the bdi has become more
important than it has been in the past. While falling back to the
default_backing_dev_info isn't wrong per-se, it isn't right either.
Could I implore the various maintainers to look into this issue for
their respective filesystem. I'll try and come up with some patches to
address this, but feel free to beat me to it.
peterz
On Sat, 2007-10-27 at 11:22 -0400, Jan Harkes wrote:
> On Sat, Oct 27, 2007 at 11:34:26AM +0200, Peter Zijlstra wrote:
> > I had me a little look at bdi usage in networked filesystems.
> >
> > NFS, CIFS, (smbfs), AFS, CODA and NCP
> >
> > And of those, NFS is the only one that I could find that creates
> > backing_dev_info structures. The rest seems to fall back to
> > default_backing_dev_info.
>
> While a file is opened in Coda we associate the open file handle with a
> local cache file. All read and write operations are redirected to this
> local file and we even redirect inode->i_mapping. Actual reads and
> writes are completely handled by the underlying file system. We send the
> new file contents back to the servers only after all local references
> have been released (last-close semantics).
>
> As a result, there is no need for backing_dev_info structures in Coda,
> if any congestion control is needed it will be handled by the underlying
> file system where our locally cached copies are stored.
Ok, that works. Thanks for this explanation!
On Sat, Oct 27, 2007 at 11:34:26AM +0200, Peter Zijlstra wrote:
> I had me a little look at bdi usage in networked filesystems.
>
> NFS, CIFS, (smbfs), AFS, CODA and NCP
>
> And of those, NFS is the only one that I could find that creates
> backing_dev_info structures. The rest seems to fall back to
> default_backing_dev_info.
While a file is opened in Coda we associate the open file handle with a
local cache file. All read and write operations are redirected to this
local file and we even redirect inode->i_mapping. Actual reads and
writes are completely handled by the underlying file system. We send the
new file contents back to the servers only after all local references
have been released (last-close semantics).
As a result, there is no need for backing_dev_info structures in Coda,
if any congestion control is needed it will be handled by the underlying
file system where our locally cached copies are stored.
Jan
On 10/27/07, Peter Zijlstra <[email protected]> wrote:
> Hi,
>
> I had me a little look at bdi usage in networked filesystems.
>
> NFS, CIFS, (smbfs), AFS, CODA and NCP
>
> And of those, NFS is the only one that I could find that creates
> backing_dev_info structures. The rest seems to fall back to
> default_backing_dev_info.
>
> With my recent per bdi dirty limit patches the bdi has become more
> important than it has been in the past. While falling back to the
> default_backing_dev_info isn't wrong per-se, it isn't right either.
>
> Could I implore the various maintainers to look into this issue for
> their respective filesystem. I'll try and come up with some patches to
> address this, but feel free to beat me to it.
I would like to understand more about your patches to see what bdi
values makes sense for CIFS and how to report possible congestion back
to the page manager. I had been thinking about setting bdi->ra_pages
so that we do more sensible readahead and writebehind - better
matching what is possible over the network and what the server
prefers. SMB/CIFS Servers typically allow a maximum of 50 requests
in parallel at one time from one client (although this is adjustable
for some). The CIFS client prefers to do writes 14 pages (an iovec of
56K) at a time (although many servers can efficiently handle multiple
of these 56K writes in parallel). With minor changes CIFS could
handle even larger writes (to just under 64K for Windows and just
under 128K for Samba - the current CIFS Unix Extensions allow servers
to negotiate much larger writes, but lacking a "receivepage"
equivalent Samba does not currently support larger than 128K).
Ideally, to improve large file copy utilization, I would like to see
from 3-10 writes of 56K (or larger in the future) in parallel. The
read path is harder since we only do 16K reads to Windows and Samba -
but we need to increase the number of these that are done in parallel
on the same inode. There is a large Google Summer of Code patch for
this which needs more review.
--
Thanks,
Steve
On Sat, 2007-10-27 at 16:02 -0500, Steve French wrote:
> On 10/27/07, Peter Zijlstra <[email protected]> wrote:
> > Hi,
> >
> > I had me a little look at bdi usage in networked filesystems.
> >
> > NFS, CIFS, (smbfs), AFS, CODA and NCP
> >
> > And of those, NFS is the only one that I could find that creates
> > backing_dev_info structures. The rest seems to fall back to
> > default_backing_dev_info.
> >
> > With my recent per bdi dirty limit patches the bdi has become more
> > important than it has been in the past. While falling back to the
> > default_backing_dev_info isn't wrong per-se, it isn't right either.
> >
> > Could I implore the various maintainers to look into this issue for
> > their respective filesystem. I'll try and come up with some patches to
> > address this, but feel free to beat me to it.
>
> I would like to understand more about your patches to see what bdi
> values makes sense for CIFS and how to report possible congestion back
> to the page manager.
So, what my recent patches do is carve up the total writeback cache
size, or dirty page limit as we call it, proportionally to a BDIs
writeout speed. So a fast device gets more than a slow device, but will
not starve it.
However, for this to work, each device, or remote backing store in the
case of networked filesystems, need to have a BDI.
> I had been thinking about setting bdi->ra_pages
> so that we do more sensible readahead and writebehind - better
> matching what is possible over the network and what the server
> prefers.
Well, you'd first have to create backing_dev_info instances before
setting that value :-)
> SMB/CIFS Servers typically allow a maximum of 50 requests
> in parallel at one time from one client (although this is adjustable
> for some).
That seems like a perfect point to set congestion.
So in short, stick a struct backing_dev_info into whatever represents a
client, initialize it using bdi_init(), destroy using bdi_destroy().
Mark it congested once you have 50 (or more) outstanding requests, clear
congestion when you drop below 50.
and you should be set.
On Sat, 2007-10-27 at 23:30 +0200, Peter Zijlstra wrote:
> On Sat, 2007-10-27 at 16:02 -0500, Steve French wrote:
> > On 10/27/07, Peter Zijlstra <[email protected]> wrote:
> > > Hi,
> > >
> > > I had me a little look at bdi usage in networked filesystems.
> > >
> > > NFS, CIFS, (smbfs), AFS, CODA and NCP
> > >
> > > And of those, NFS is the only one that I could find that creates
> > > backing_dev_info structures. The rest seems to fall back to
> > > default_backing_dev_info.
> > >
> > > With my recent per bdi dirty limit patches the bdi has become more
> > > important than it has been in the past. While falling back to the
> > > default_backing_dev_info isn't wrong per-se, it isn't right either.
> > >
> > > Could I implore the various maintainers to look into this issue for
> > > their respective filesystem. I'll try and come up with some patches to
> > > address this, but feel free to beat me to it.
> >
> > I would like to understand more about your patches to see what bdi
> > values makes sense for CIFS and how to report possible congestion back
> > to the page manager.
>
> So, what my recent patches do is carve up the total writeback cache
> size, or dirty page limit as we call it, proportionally to a BDIs
> writeout speed. So a fast device gets more than a slow device, but will
> not starve it.
>
> However, for this to work, each device, or remote backing store in the
> case of networked filesystems, need to have a BDI.
>
> > I had been thinking about setting bdi->ra_pages
> > so that we do more sensible readahead and writebehind - better
> > matching what is possible over the network and what the server
> > prefers.
>
> Well, you'd first have to create backing_dev_info instances before
> setting that value :-)
>
> > SMB/CIFS Servers typically allow a maximum of 50 requests
> > in parallel at one time from one client (although this is adjustable
> > for some).
>
> That seems like a perfect point to set congestion.
>
> So in short, stick a struct backing_dev_info into whatever represents a
> client, initialize it using bdi_init(), destroy using bdi_destroy().
Oh, and the most important point, make your fresh I_NEW inodes point to
this bdi struct.
> Mark it congested once you have 50 (or more) outstanding requests, clear
> congestion when you drop below 50.
>
> and you should be set.
>
Peter Zijlstra wrote:
> On Sat, 2007-10-27 at 23:30 +0200, Peter Zijlstra wrote:
>> So in short, stick a struct backing_dev_info into whatever represents a
>> client, initialize it using bdi_init(), destroy using bdi_destroy().
>
> Oh, and the most important point, make your fresh I_NEW inodes point to
> this bdi struct.
>
>> Mark it congested once you have 50 (or more) outstanding requests, clear
>> congestion when you drop below 50.
>> and you should be set.
Thanks. Unfortunately I do not think that NCPFS will switch to
backing_dev_info - it uses pagecache only for symlinks and directories,
and even if it would use pagecache, as most of servers refuse concurrent
requests even if TCP is used as a transport, there can be only one
request in flight...
Petr
P.S.: And if anyone wants to step in as ncpfs maintainer, feel free. I
did not see NetWare server for over year now...