2017-04-22 10:39:50

by James Dutton

[permalink] [raw]
Subject: API for Vectorising IO

Hi,

I recently read this, which talks about vectorised IO for NFS.
https://www.fsl.cs.sunysb.edu/docs/nfs4perf/vnfs-fast17.pdf

Are there any ways to do this for ext4?
For example:
A single syscall() that would list all the files in a directory.
I.e. A sort of list_all_files() all as a single syscall().
Also, having the API as async and partial results:
I.e. send list_all_files() message.
receive a request_id.
send get_next_1000_files(request_id)
receive the next 1000 files in that directory.

So, a sort of partial results,but with batching, in this case batch
1000 files at a time.

similarly for a read_all_files(list_of_files).
would then send back all the file's contents.

Surely, if the file system knows the full picture, earlier, it can
better optimise the results.
So, this lets the application request a list of many different file
operations ahead of time. gets a "request_id" back. And can then
gather the results later.

The important point here, is finding ways to limit the round-trip time
and amount of requests/responses.
The returning of a "request_id" means that the request is returned
without even needing to access the disk, thereby limiting round-trip
time to a minimum.

A similar approach could be used to optimise file find operations, but
I suspect this sort of operation is still better optimised with an
index in user space. E.g. Apache Lucene.

While vectorised IO work improve file access over the network
considerably, if local file systems also used the same API,
applications could be written, using a single API, and work
efficiently on both local and remote filesystems.

Kind Regards

James


2017-04-25 03:27:27

by Andreas Dilger

[permalink] [raw]
Subject: Re: API for Vectorising IO

On Apr 22, 2017, at 4:39 AM, James Courtier-Dutton <[email protected]> wrote:
>
> Hi,
>
> I recently read this, which talks about vectorised IO for NFS.
> https://www.fsl.cs.sunysb.edu/docs/nfs4perf/vnfs-fast17.pdf
>
> Are there any ways to do this for ext4?
> For example:
> A single syscall() that would list all the files in a directory.
> I.e. A sort of list_all_files() all as a single syscall().
> Also, having the API as async and partial results:
> I.e. send list_all_files() message.
> receive a request_id.
> send get_next_1000_files(request_id)
> receive the next 1000 files in that directory.
>
> So, a sort of partial results, but with batching, in this case batch
> 1000 files at a time.
>
> similarly for a read_all_files(list_of_files).
> would then send back all the file's contents.
>
> Surely, if the file system knows the full picture, earlier, it can
> better optimise the results. So, this lets the application request
> a list of many different file operations ahead of time. gets a
> "request_id" back. And can then gather the results later.
>
> The important point here, is finding ways to limit the round-trip time
> and amount of requests/responses. The returning of a "request_id"
> means that the request is returned without even needing to access the
> disk, thereby limiting round-trip time to a minimum.
>
> A similar approach could be used to optimise file find operations, but
> I suspect this sort of operation is still better optimised with an
> index in user space. E.g. Apache Lucene.
>
> While vectorised IO work improve file access over the network
> considerably, if local file systems also used the same API,
> applications could be written, using a single API, and work
> efficiently on both local and remote filesystems.

One thing that has been discussed a bit in the past is to provide a
generic interface for doing bulk stat operations for multiple files.

One proposal is to make this similar to "readdirplus" for NFS, which
returns directory and stat information for all files in a directory.
This is useful if you are accessing all files in the directory.

Another proposal is to submit a list of inodes (file handles?) to the
kernel and then get the stat information back in an array for all
of the inodes at once. This is about the same as readdirplus if
you are accessing all of the inodes, but is potentially much better
if you are only accessing a subset of files (e.g. "ls -l *.txt" or
"find . -type d" or similar).

I don't know if there was ever a discussion to allow reading of
multiple files at once, though this could potentially help when there
are lots of smaller files being accessed from high-latency storage
(e.g. network and/or disk) since it would allow the storage to better
optimize the IO pattern.

Cheers, Andreas






Attachments:
signature.asc (195.00 B)
Message signed with OpenPGP