2022-03-23 15:25:09

by Bernd Schubert

[permalink] [raw]
Subject: Re: [RFC PATCH] getvalues(2) prototype

On 3/23/22 08:16, Greg KH wrote:
> On Tue, Mar 22, 2022 at 08:27:12PM +0100, Miklos Szeredi wrote:
>> Add a new userspace API that allows getting multiple short values in a
>> single syscall.
>>
>> This would be useful for the following reasons:
>>
>> - Calling open/read/close for many small files is inefficient. E.g. on my
>> desktop invoking lsof(1) results in ~60k open + read + close calls under
>> /proc and 90% of those are 128 bytes or less.
>
> As I found out in testing readfile():
> https://lore.kernel.org/r/[email protected]
>
> microbenchmarks do show a tiny improvement in doing something like this,
> but that's not a real-world application.
>
> Do you have anything real that can use this that shows a speedup?

Add in network file systems. Demonstrating that this is useful locally
and with micro benchmarks - yeah, helps a bit to make it locally faster.
But the real case is when thousands of clients are handled by a few
network servers. Even reducing wire latency for a single client would
make a difference here.

There is a bit of chicken-egg problem - it is a bit of work to add to
file systems like NFS (or others that are not the kernel), but the work
won't be made there before there is no syscall for it. To demonstrate it
on NFS one also needs a an official protocol change first. And then
applications also need to support that new syscall first.
I had a hard time explaining weather physicist back in 2009 that it is
not a good idea to have millions of 512B files on Lustre. With recent
AI workload this gets even worse.

This is the same issue in fact with the fuse patches we are creating
(https://lwn.net/Articles/888877/). Miklos asked for benchmark numbers -
we can only demonstrate slight effects locally, but out goal is in fact
to reduce network latencies and server load.

- Bernd


2022-03-23 23:40:12

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC PATCH] getvalues(2) prototype

On Wed, Mar 23, 2022 at 11:26:11AM +0100, Bernd Schubert wrote:
> Add in network file systems. Demonstrating that this is useful
> locally and with micro benchmarks - yeah, helps a bit to make it
> locally faster. But the real case is when thousands of clients are
> handled by a few network servers. Even reducing wire latency for a
> single client would make a difference here.
>
> There is a bit of chicken-egg problem - it is a bit of work to add
> to file systems like NFS (or others that are not the kernel), but
> the work won't be made there before there is no syscall for it. To
> demonstrate it on NFS one also needs a an official protocol change
> first.

I wouldn't assume that. NFSv4 already supports compound rpc operations,
so you can do OPEN+READ+CLOSE in a single round trip. The client's
never done that, but there are pynfs tests that can testify to the fact
that our server supports it.

It's not something anyone's used much outside of artificial tests, so
there may well turn out be issues, but the protocol's definitely
sufficient to prototype this at least.

I'm not volunteering, but it doesn't seem too difficult in theory if
someone's interested.

--b.

> And then applications also need to support that new syscall
> first.
> I had a hard time explaining weather physicist back in 2009 that it
> is not a good idea to have millions of 512B files on Lustre. With
> recent AI workload this gets even worse.
>
> This is the same issue in fact with the fuse patches we are creating
> (https://lwn.net/Articles/888877/). Miklos asked for benchmark
> numbers - we can only demonstrate slight effects locally, but out
> goal is in fact to reduce network latencies and server load.
>
> - Bernd

2022-03-25 18:28:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH] getvalues(2) prototype

On Wed, Mar 23, 2022 at 11:26:11AM +0100, Bernd Schubert wrote:
> On 3/23/22 08:16, Greg KH wrote:
> > On Tue, Mar 22, 2022 at 08:27:12PM +0100, Miklos Szeredi wrote:
> > > Add a new userspace API that allows getting multiple short values in a
> > > single syscall.
> > >
> > > This would be useful for the following reasons:
> > >
> > > - Calling open/read/close for many small files is inefficient. E.g. on my
> > > desktop invoking lsof(1) results in ~60k open + read + close calls under
> > > /proc and 90% of those are 128 bytes or less.
> >
> > As I found out in testing readfile():
> > https://lore.kernel.org/r/[email protected]
> >
> > microbenchmarks do show a tiny improvement in doing something like this,
> > but that's not a real-world application.
> >
> > Do you have anything real that can use this that shows a speedup?
>
> Add in network file systems. Demonstrating that this is useful locally and
> with micro benchmarks - yeah, helps a bit to make it locally faster. But the
> real case is when thousands of clients are handled by a few network servers.
> Even reducing wire latency for a single client would make a difference here.

I think I tried running readfile on NFS. Didn't see any improvements.
But please, try it again. Also note that this proposal isn't for NFS,
or any other "real" filesystem :)

> There is a bit of chicken-egg problem - it is a bit of work to add to file
> systems like NFS (or others that are not the kernel), but the work won't be
> made there before there is no syscall for it. To demonstrate it on NFS one
> also needs a an official protocol change first. And then applications also
> need to support that new syscall first.
> I had a hard time explaining weather physicist back in 2009 that it is not a
> good idea to have millions of 512B files on Lustre. With recent AI workload
> this gets even worse.

Can you try using the readfile() patch to see if that helps you all out
on Lustre? If so, that's a good reason to consider it. But again, has
nothing to do with this getvalues(2) api.

thanks,

greg k-h