2023-11-19 17:38:35

by Cedric Blancher

[permalink] [raw]
Subject: How does READ_PLUS differ from READ?

Good evening!

How does READ_PLUS differ from READ? Has anyone made a simpler
presentation (PowerPoint slides) than the RFCs?

Ced
--
Cedric Blancher <[email protected]>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur


2023-11-19 17:56:08

by Anna Schumaker

[permalink] [raw]
Subject: Re: How does READ_PLUS differ from READ?

Hi,

On Sun, Nov 19, 2023 at 12:38 PM Cedric Blancher
<[email protected]> wrote:
>
> Good evening!
>
> How does READ_PLUS differ from READ? Has anyone made a simpler
> presentation (PowerPoint slides) than the RFCs?

No slides, but at a high level READ_PLUS can compress out long ranges
of zeroes in a read reply by returning a HOLE segment instead of the
actual zeroes. It's perfectly valid for the server to skip the zero
detection and return everything as a data segment, however.

Anna

>
> Ced
> --
> Cedric Blancher <[email protected]>
> [https://plus.google.com/u/0/+CedricBlancher/]
> Institute Pasteur

2023-11-19 17:59:21

by Cedric Blancher

[permalink] [raw]
Subject: Re: How does READ_PLUS differ from READ?

On Sun, 19 Nov 2023 at 18:48, Anna Schumaker <[email protected]> wrote:
>
> Hi,
>
> On Sun, Nov 19, 2023 at 12:38 PM Cedric Blancher
> <[email protected]> wrote:
> >
> > Good evening!
> >
> > How does READ_PLUS differ from READ? Has anyone made a simpler
> > presentation (PowerPoint slides) than the RFCs?
>
> No slides, but at a high level READ_PLUS can compress out long ranges
> of zeroes in a read reply by returning a HOLE segment instead of the
> actual zeroes. It's perfectly valid for the server to skip the zero
> detection and return everything as a data segment, however.

So how do you differ between
1. a hole, aka no filesystem blocks allocated
2. a long sequence of valid data with all zero bytes in them

Ced
--
Cedric Blancher <[email protected]>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

2023-11-19 18:03:01

by Anna Schumaker

[permalink] [raw]
Subject: Re: How does READ_PLUS differ from READ?

On Sun, Nov 19, 2023 at 12:59 PM Cedric Blancher
<[email protected]> wrote:
>
> On Sun, 19 Nov 2023 at 18:48, Anna Schumaker <[email protected]> wrote:
> >
> > Hi,
> >
> > On Sun, Nov 19, 2023 at 12:38 PM Cedric Blancher
> > <[email protected]> wrote:
> > >
> > > Good evening!
> > >
> > > How does READ_PLUS differ from READ? Has anyone made a simpler
> > > presentation (PowerPoint slides) than the RFCs?
> >
> > No slides, but at a high level READ_PLUS can compress out long ranges
> > of zeroes in a read reply by returning a HOLE segment instead of the
> > actual zeroes. It's perfectly valid for the server to skip the zero
> > detection and return everything as a data segment, however.
>
> So how do you differ between
> 1. a hole, aka no filesystem blocks allocated
> 2. a long sequence of valid data with all zero bytes in them

That's up to the server! It could use something like fiemap or lseek
with SEEK_HOLE or SEEK_DATA. It could also scan the data to see if
there are any zeroes that could be compressed out.

Anna

>
> Ced
> --
> Cedric Blancher <[email protected]>
> [https://plus.google.com/u/0/+CedricBlancher/]
> Institute Pasteur

2023-11-22 22:48:07

by Cedric Blancher

[permalink] [raw]
Subject: Re: How does READ_PLUS differ from READ?

On Sun, 19 Nov 2023 at 19:02, Anna Schumaker <[email protected]> wrote:
>
> On Sun, Nov 19, 2023 at 12:59 PM Cedric Blancher
> <[email protected]> wrote:
> >
> > On Sun, 19 Nov 2023 at 18:48, Anna Schumaker <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Sun, Nov 19, 2023 at 12:38 PM Cedric Blancher
> > > <[email protected]> wrote:
> > > >
> > > > Good evening!
> > > >
> > > > How does READ_PLUS differ from READ? Has anyone made a simpler
> > > > presentation (PowerPoint slides) than the RFCs?
> > >
> > > No slides, but at a high level READ_PLUS can compress out long ranges
> > > of zeroes in a read reply by returning a HOLE segment instead of the
> > > actual zeroes. It's perfectly valid for the server to skip the zero
> > > detection and return everything as a data segment, however.
> >
> > So how do you differ between
> > 1. a hole, aka no filesystem blocks allocated
> > 2. a long sequence of valid data with all zero bytes in them
>
> That's up to the server! It could use something like fiemap or lseek
> with SEEK_HOLE or SEEK_DATA. It could also scan the data to see if
> there are any zeroes that could be compressed out.

How can the client figure out whether the data in a READ_PLUS reply
are zeros of data, or zeros from a hole?

Ced
--
Cedric Blancher <[email protected]>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

2023-11-22 23:19:19

by Rick Macklem

[permalink] [raw]
Subject: Re: How does READ_PLUS differ from READ?

On Wed, Nov 22, 2023 at 2:48 PM Cedric Blancher
<[email protected]> wrote:
>
> On Sun, 19 Nov 2023 at 19:02, Anna Schumaker <[email protected]> wrote:
> >
> > On Sun, Nov 19, 2023 at 12:59 PM Cedric Blancher
> > <[email protected]> wrote:
> > >
> > > On Sun, 19 Nov 2023 at 18:48, Anna Schumaker <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Sun, Nov 19, 2023 at 12:38 PM Cedric Blancher
> > > > <[email protected]> wrote:
> > > > >
> > > > > Good evening!
> > > > >
> > > > > How does READ_PLUS differ from READ? Has anyone made a simpler
> > > > > presentation (PowerPoint slides) than the RFCs?
> > > >
> > > > No slides, but at a high level READ_PLUS can compress out long ranges
> > > > of zeroes in a read reply by returning a HOLE segment instead of the
> > > > actual zeroes. It's perfectly valid for the server to skip the zero
> > > > detection and return everything as a data segment, however.
> > >
> > > So how do you differ between
> > > 1. a hole, aka no filesystem blocks allocated
> > > 2. a long sequence of valid data with all zero bytes in them
> >
> > That's up to the server! It could use something like fiemap or lseek
> > with SEEK_HOLE or SEEK_DATA. It could also scan the data to see if
> > there are any zeroes that could be compressed out.
>
> How can the client figure out whether the data in a READ_PLUS reply
> are zeros of data, or zeros from a hole?
As I understand the RFC, it cannot. Or put another way "a hole is a
region that reads as all 0s, which may or may not have allocated blocks
on the server file system".

Although SEEK_HOLE typically returns the offset of an unallocated
region, I don't think either the POSIX draft (was it ever ratified?) nor
RFC7862 actually define a "hole" as an unallocated region.

On a similar vein, Deallocate can simply write 0s to the region.
(It does not actually have to "deallocate data blocks".)

At least that is my understanding of POSIX and RFC7862, rick

>
> Ced
> --
> Cedric Blancher <[email protected]>
> [https://plus.google.com/u/0/+CedricBlancher/]
> Institute Pasteur
>

2023-11-23 22:14:36

by Cedric Blancher

[permalink] [raw]
Subject: Re: How does READ_PLUS differ from READ?

On Thu, 23 Nov 2023 at 00:19, Rick Macklem <[email protected]> wrote:
>
> On Wed, Nov 22, 2023 at 2:48 PM Cedric Blancher
> <[email protected]> wrote:
> >
> > On Sun, 19 Nov 2023 at 19:02, Anna Schumaker <[email protected]> wrote:
> > >
> > > On Sun, Nov 19, 2023 at 12:59 PM Cedric Blancher
> > > <[email protected]> wrote:
> > > >
> > > > On Sun, 19 Nov 2023 at 18:48, Anna Schumaker <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Sun, Nov 19, 2023 at 12:38 PM Cedric Blancher
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Good evening!
> > > > > >
> > > > > > How does READ_PLUS differ from READ? Has anyone made a simpler
> > > > > > presentation (PowerPoint slides) than the RFCs?
> > > > >
> > > > > No slides, but at a high level READ_PLUS can compress out long ranges
> > > > > of zeroes in a read reply by returning a HOLE segment instead of the
> > > > > actual zeroes. It's perfectly valid for the server to skip the zero
> > > > > detection and return everything as a data segment, however.
> > > >
> > > > So how do you differ between
> > > > 1. a hole, aka no filesystem blocks allocated
> > > > 2. a long sequence of valid data with all zero bytes in them
> > >
> > > That's up to the server! It could use something like fiemap or lseek
> > > with SEEK_HOLE or SEEK_DATA. It could also scan the data to see if
> > > there are any zeroes that could be compressed out.
> >
> > How can the client figure out whether the data in a READ_PLUS reply
> > are zeros of data, or zeros from a hole?
> As I understand the RFC, it cannot. Or put another way "a hole is a
> region that reads as all 0s, which may or may not have allocated blocks
> on the server file system".
>
> Although SEEK_HOLE typically returns the offset of an unallocated
> region, I don't think either the POSIX draft (was it ever ratified?) nor
> RFC7862 actually define a "hole" as an unallocated region.

Opengroup ratified that one. See https://austingroupbugs.net/view.php?id=415

>
> On a similar vein, Deallocate can simply write 0s to the region.
> (It does not actually have to "deallocate data blocks".)
>
> At least that is my understanding of POSIX and RFC7862, rick

Can anyone please confirm that RFC7862 and READPLUS cannot distinguish
between allocated and unallocated regions in a file?

Ced
--
Cedric Blancher <[email protected]>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

2023-11-23 22:42:23

by Rick Macklem

[permalink] [raw]
Subject: Re: How does READ_PLUS differ from READ?

On Thu, Nov 23, 2023 at 2:14 PM Cedric Blancher
<[email protected]> wrote:
>
> On Thu, 23 Nov 2023 at 00:19, Rick Macklem <[email protected]> wrote:
> >
> > On Wed, Nov 22, 2023 at 2:48 PM Cedric Blancher
> > <[email protected]> wrote:
> > >
> > > On Sun, 19 Nov 2023 at 19:02, Anna Schumaker <[email protected]> wrote:
> > > >
> > > > On Sun, Nov 19, 2023 at 12:59 PM Cedric Blancher
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Sun, 19 Nov 2023 at 18:48, Anna Schumaker <[email protected]> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Sun, Nov 19, 2023 at 12:38 PM Cedric Blancher
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > Good evening!
> > > > > > >
> > > > > > > How does READ_PLUS differ from READ? Has anyone made a simpler
> > > > > > > presentation (PowerPoint slides) than the RFCs?
> > > > > >
> > > > > > No slides, but at a high level READ_PLUS can compress out long ranges
> > > > > > of zeroes in a read reply by returning a HOLE segment instead of the
> > > > > > actual zeroes. It's perfectly valid for the server to skip the zero
> > > > > > detection and return everything as a data segment, however.
> > > > >
> > > > > So how do you differ between
> > > > > 1. a hole, aka no filesystem blocks allocated
> > > > > 2. a long sequence of valid data with all zero bytes in them
> > > >
> > > > That's up to the server! It could use something like fiemap or lseek
> > > > with SEEK_HOLE or SEEK_DATA. It could also scan the data to see if
> > > > there are any zeroes that could be compressed out.
> > >
> > > How can the client figure out whether the data in a READ_PLUS reply
> > > are zeros of data, or zeros from a hole?
> > As I understand the RFC, it cannot. Or put another way "a hole is a
> > region that reads as all 0s, which may or may not have allocated blocks
> > on the server file system".
> >
> > Although SEEK_HOLE typically returns the offset of an unallocated
> > region, I don't think either the POSIX draft (was it ever ratified?) nor
> > RFC7862 actually define a "hole" as an unallocated region.
>
> Opengroup ratified that one. See https://austingroupbugs.net/view.php?id=415
>
> >
> > On a similar vein, Deallocate can simply write 0s to the region.
> > (It does not actually have to "deallocate data blocks".)
> >
> > At least that is my understanding of POSIX and RFC7862, rick
>
> Can anyone please confirm that RFC7862 and READPLUS cannot distinguish
> between allocated and unallocated regions in a file?
The best place to ask this is the [email protected] mailing list.
Alternately, you just read the words yourself...
Having said that, here are a few snippets of RFC7862 (neither of which are
in the READ_PLUS section):
In definitions...
Hole: A byte range within a sparse file that contains all zeros. A
hole might or might not have space allocated or reserved to it.

And in the section on DEALLOCATE...
All further READs from
the region passed to DEALLOCATE MUST return zeros until overwritten.
[irrelevant stuff snipped]
Situations may arise where da_offset and/or da_offset + da_length
will not be aligned to a boundary for which the server does
allocations or deallocations. For most file systems, this is the
block size of the file system. In such a case, the server can
deallocate as many bytes as it can in the region. The blocks that
cannot be deallocated MUST be zeroed.

Now, if the above is not enough to convince you that "hole" does not
necessarily imply "unallocated", then I suggest you read it and then
ask on [email protected].
(Btw, the DEALLOCATE section uses the term "unreserved" and not
"unallocated".)

I'll also admit I do not understand why you care?
Is there a Windows API that specifically returns unallocated regions
of files?

rick

>
> Ced
> --
> Cedric Blancher <[email protected]>
> [https://plus.google.com/u/0/+CedricBlancher/]
> Institute Pasteur
>