2022-03-23 17:24:41

by Bernd Schubert

[permalink] [raw]
Subject: Re: [RFC PATCH] getvalues(2) prototype



On 3/23/22 12:42, Greg KH wrote:
> On Wed, Mar 23, 2022 at 11:26:11AM +0100, Bernd Schubert wrote:
>> On 3/23/22 08:16, Greg KH wrote:
>>> On Tue, Mar 22, 2022 at 08:27:12PM +0100, Miklos Szeredi wrote:
>>>> Add a new userspace API that allows getting multiple short values in a
>>>> single syscall.
>>>>
>>>> This would be useful for the following reasons:
>>>>
>>>> - Calling open/read/close for many small files is inefficient. E.g. on my
>>>> desktop invoking lsof(1) results in ~60k open + read + close calls under
>>>> /proc and 90% of those are 128 bytes or less.
>>>
>>> As I found out in testing readfile():
>>> https://lore.kernel.org/r/[email protected]
>>>
>>> microbenchmarks do show a tiny improvement in doing something like this,
>>> but that's not a real-world application.
>>>
>>> Do you have anything real that can use this that shows a speedup?
>>
>> Add in network file systems. Demonstrating that this is useful locally and
>> with micro benchmarks - yeah, helps a bit to make it locally faster. But the
>> real case is when thousands of clients are handled by a few network servers.
>> Even reducing wire latency for a single client would make a difference here.
>
> I think I tried running readfile on NFS. Didn't see any improvements.
> But please, try it again. Also note that this proposal isn't for NFS,
> or any other "real" filesystem :)

How did you run it on NFS? To get real benefit you would need to add a
READ_FILE rpc to the NFS protocol and code? Just having it locally won't
avoid the expensive wire calls?

>
>> There is a bit of chicken-egg problem - it is a bit of work to add to file
>> systems like NFS (or others that are not the kernel), but the work won't be
>> made there before there is no syscall for it. To demonstrate it on NFS one
>> also needs a an official protocol change first. And then applications also
>> need to support that new syscall first.
>> I had a hard time explaining weather physicist back in 2009 that it is not a
>> good idea to have millions of 512B files on Lustre. With recent AI workload
>> this gets even worse.
>
> Can you try using the readfile() patch to see if that helps you all out
> on Lustre? If so, that's a good reason to consider it. But again, has
> nothing to do with this getvalues(2) api.

I don't have a Lustre system to easily play with (I'm working on another
network file system). But unless Lustre would implement aggressive
prefetch of data on stat, I don't see how either approach would work
without a protocol addition. For Lustre it probably would be helpful
only when small data are inlined into the inode.
In end this is exactly the chicken-egg problem - Lustre (or anything
else) won't implement it before the kernel does not support it. But then
the new syscall won't be added before it is proven that it helps.


- Bernd



2022-03-24 17:09:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH] getvalues(2) prototype

On Wed, Mar 23, 2022 at 01:06:33PM +0100, Bernd Schubert wrote:
>
>
> On 3/23/22 12:42, Greg KH wrote:
> > On Wed, Mar 23, 2022 at 11:26:11AM +0100, Bernd Schubert wrote:
> > > On 3/23/22 08:16, Greg KH wrote:
> > > > On Tue, Mar 22, 2022 at 08:27:12PM +0100, Miklos Szeredi wrote:
> > > > > Add a new userspace API that allows getting multiple short values in a
> > > > > single syscall.
> > > > >
> > > > > This would be useful for the following reasons:
> > > > >
> > > > > - Calling open/read/close for many small files is inefficient. E.g. on my
> > > > > desktop invoking lsof(1) results in ~60k open + read + close calls under
> > > > > /proc and 90% of those are 128 bytes or less.
> > > >
> > > > As I found out in testing readfile():
> > > > https://lore.kernel.org/r/[email protected]
> > > >
> > > > microbenchmarks do show a tiny improvement in doing something like this,
> > > > but that's not a real-world application.
> > > >
> > > > Do you have anything real that can use this that shows a speedup?
> > >
> > > Add in network file systems. Demonstrating that this is useful locally and
> > > with micro benchmarks - yeah, helps a bit to make it locally faster. But the
> > > real case is when thousands of clients are handled by a few network servers.
> > > Even reducing wire latency for a single client would make a difference here.
> >
> > I think I tried running readfile on NFS. Didn't see any improvements.
> > But please, try it again. Also note that this proposal isn't for NFS,
> > or any other "real" filesystem :)
>
> How did you run it on NFS? To get real benefit you would need to add a
> READ_FILE rpc to the NFS protocol and code? Just having it locally won't
> avoid the expensive wire calls?

I did not touch anything related to NFS code, which is perhaps why I did
not notice any difference :)

thanks,

greg k-h