2020-06-17 17:31:06

by Kanchan Joshi

[permalink] [raw]
Subject: [PATCH 0/3] zone-append support in aio and io-uring

This patchset enables issuing zone-append using aio and io-uring direct-io interface.

For aio, this introduces opcode IOCB_CMD_ZONE_APPEND. Application uses start LBA
of the zone to issue append. On completion 'res2' field is used to return
zone-relative offset.

For io-uring, this introduces three opcodes: IORING_OP_ZONE_APPEND/APPENDV/APPENDV_FIXED.
Since io_uring does not have aio-like res2, cqe->flags are repurposed to return zone-relative offset

Kanchan Joshi (1):
aio: add support for zone-append

Selvakumar S (2):
fs,block: Introduce IOCB_ZONE_APPEND and direct-io handling
io_uring: add support for zone-append

fs/aio.c | 8 +++++
fs/block_dev.c | 19 +++++++++++-
fs/io_uring.c | 72 +++++++++++++++++++++++++++++++++++++++++--
include/linux/fs.h | 1 +
include/uapi/linux/aio_abi.h | 1 +
include/uapi/linux/io_uring.h | 8 ++++-
6 files changed, 105 insertions(+), 4 deletions(-)

--
2.7.4


2020-06-17 17:32:05

by Kanchan Joshi

[permalink] [raw]
Subject: [PATCH 2/3] aio: add support for zone-append

Introduce IOCB_CMD_ZONE_APPEND opcode for zone-append. On append
completion zone-relative offset is returned using io_event->res2.

Signed-off-by: Kanchan Joshi <[email protected]>
Signed-off-by: Arnav Dawn <[email protected]>
Signed-off-by: SelvaKumar S <[email protected]>
Signed-off-by: Nitesh Shetty <[email protected]>
Signed-off-by: Javier Gonzalez <[email protected]>
---
fs/aio.c | 8 ++++++++
include/uapi/linux/aio_abi.h | 1 +
2 files changed, 9 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index 7ecddc2..8b10a55d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1579,6 +1579,10 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb,
__sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, true);
__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
}
+#ifdef CONFIG_BLK_DEV_ZONED
+ if (iocb->aio_lio_opcode == IOCB_CMD_ZONE_APPEND)
+ req->ki_flags |= IOCB_ZONE_APPEND;
+#endif
req->ki_flags |= IOCB_WRITE;
aio_rw_done(req, call_write_iter(file, req, &iter));
}
@@ -1846,6 +1850,10 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
return aio_fsync(&req->fsync, iocb, true);
case IOCB_CMD_POLL:
return aio_poll(req, iocb);
+#ifdef CONFIG_BLK_DEV_ZONED
+ case IOCB_CMD_ZONE_APPEND:
+ return aio_write(&req->rw, iocb, false, compat);
+#endif
default:
pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
return -EINVAL;
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 8387e0a..541d96a 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -43,6 +43,7 @@ enum {
IOCB_CMD_NOOP = 6,
IOCB_CMD_PREADV = 7,
IOCB_CMD_PWRITEV = 8,
+ IOCB_CMD_ZONE_APPEND = 9,
};

/*
--
2.7.4

2020-06-17 17:45:12

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On Wed, Jun 17, 2020 at 10:53:36PM +0530, Kanchan Joshi wrote:
> This patchset enables issuing zone-append using aio and io-uring direct-io interface.
>
> For aio, this introduces opcode IOCB_CMD_ZONE_APPEND. Application uses start LBA
> of the zone to issue append. On completion 'res2' field is used to return
> zone-relative offset.

Maybe it's obvious to everyone working with zoned drives on a daily
basis, but please explain in the commit message why you need to return
the zone-relative offset to the application.

2020-06-18 06:58:54

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On Wed, Jun 17, 2020 at 10:53:36PM +0530, Kanchan Joshi wrote:
> This patchset enables issuing zone-append using aio and io-uring direct-io interface.
>
> For aio, this introduces opcode IOCB_CMD_ZONE_APPEND. Application uses start LBA
> of the zone to issue append. On completion 'res2' field is used to return
> zone-relative offset.
>
> For io-uring, this introduces three opcodes: IORING_OP_ZONE_APPEND/APPENDV/APPENDV_FIXED.
> Since io_uring does not have aio-like res2, cqe->flags are repurposed to return zone-relative offset

And what exactly are the semantics supposed to be? Remember the
unix file abstractions does not know about zones at all.

I really don't think squeezing low-level not quite block storage
protocol details into the Linux read/write path is a good idea.

What could be a useful addition is a way for O_APPEND/RWF_APPEND writes
to report where they actually wrote, as that comes close to Zone Append
while still making sense at our usual abstraction level for file I/O.

2020-06-18 07:39:03

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 2/3] aio: add support for zone-append

On 2020/06/18 2:27, Kanchan Joshi wrote:
> Introduce IOCB_CMD_ZONE_APPEND opcode for zone-append. On append
> completion zone-relative offset is returned using io_event->res2.
>
> Signed-off-by: Kanchan Joshi <[email protected]>
> Signed-off-by: Arnav Dawn <[email protected]>
> Signed-off-by: SelvaKumar S <[email protected]>
> Signed-off-by: Nitesh Shetty <[email protected]>
> Signed-off-by: Javier Gonzalez <[email protected]>
> ---
> fs/aio.c | 8 ++++++++
> include/uapi/linux/aio_abi.h | 1 +
> 2 files changed, 9 insertions(+)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 7ecddc2..8b10a55d 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1579,6 +1579,10 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb,
> __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, true);
> __sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
> }
> +#ifdef CONFIG_BLK_DEV_ZONED
> + if (iocb->aio_lio_opcode == IOCB_CMD_ZONE_APPEND)
> + req->ki_flags |= IOCB_ZONE_APPEND;
> +#endif
> req->ki_flags |= IOCB_WRITE;
> aio_rw_done(req, call_write_iter(file, req, &iter));
> }
> @@ -1846,6 +1850,10 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
> return aio_fsync(&req->fsync, iocb, true);
> case IOCB_CMD_POLL:
> return aio_poll(req, iocb);
> +#ifdef CONFIG_BLK_DEV_ZONED
> + case IOCB_CMD_ZONE_APPEND:
> + return aio_write(&req->rw, iocb, false, compat);
> +#endif
> default:
> pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
> return -EINVAL;
> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
> index 8387e0a..541d96a 100644
> --- a/include/uapi/linux/aio_abi.h
> +++ b/include/uapi/linux/aio_abi.h
> @@ -43,6 +43,7 @@ enum {
> IOCB_CMD_NOOP = 6,
> IOCB_CMD_PREADV = 7,
> IOCB_CMD_PWRITEV = 8,
> + IOCB_CMD_ZONE_APPEND = 9,
> };
>
> /*
>

No need for all the #ifdefs.

--
Damien Le Moal
Western Digital Research

2020-06-18 08:09:19

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On 17/06/2020 19.23, Kanchan Joshi wrote:
> This patchset enables issuing zone-append using aio and io-uring direct-io interface.
>
> For aio, this introduces opcode IOCB_CMD_ZONE_APPEND. Application uses start LBA
> of the zone to issue append. On completion 'res2' field is used to return
> zone-relative offset.
>
> For io-uring, this introduces three opcodes: IORING_OP_ZONE_APPEND/APPENDV/APPENDV_FIXED.
> Since io_uring does not have aio-like res2, cqe->flags are repurposed to return zone-relative offset

Please provide a pointers to applications that are updated and ready to
take advantage of zone append.

I do not believe it's beneficial at this point to change the libaio API,
applications that would want to use this API, should anyway switch to
use io_uring.

Please also note that applications and libraries that want to take
advantage of zone append, can already use the zonefs file-system, as it
will use the zone append command when applicable.

> Kanchan Joshi (1):
> aio: add support for zone-append
>
> Selvakumar S (2):
> fs,block: Introduce IOCB_ZONE_APPEND and direct-io handling
> io_uring: add support for zone-append
>
> fs/aio.c | 8 +++++
> fs/block_dev.c | 19 +++++++++++-
> fs/io_uring.c | 72 +++++++++++++++++++++++++++++++++++++++++--
> include/linux/fs.h | 1 +
> include/uapi/linux/aio_abi.h | 1 +
> include/uapi/linux/io_uring.h | 8 ++++-
> 6 files changed, 105 insertions(+), 4 deletions(-)
>

2020-06-18 08:32:02

by Javier González

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On 17.06.2020 23:56, Christoph Hellwig wrote:
>On Wed, Jun 17, 2020 at 10:53:36PM +0530, Kanchan Joshi wrote:
>> This patchset enables issuing zone-append using aio and io-uring direct-io interface.
>>
>> For aio, this introduces opcode IOCB_CMD_ZONE_APPEND. Application uses start LBA
>> of the zone to issue append. On completion 'res2' field is used to return
>> zone-relative offset.
>>
>> For io-uring, this introduces three opcodes: IORING_OP_ZONE_APPEND/APPENDV/APPENDV_FIXED.
>> Since io_uring does not have aio-like res2, cqe->flags are repurposed to return zone-relative offset
>
>And what exactly are the semantics supposed to be? Remember the
>unix file abstractions does not know about zones at all.
>
>I really don't think squeezing low-level not quite block storage
>protocol details into the Linux read/write path is a good idea.
>
>What could be a useful addition is a way for O_APPEND/RWF_APPEND writes
>to report where they actually wrote, as that comes close to Zone Append
>while still making sense at our usual abstraction level for file I/O.

Makes sense. We will look into this for a V2.

Thanks,
Javier

2020-06-18 08:36:33

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On 18/06/2020 10.27, Javier González wrote:
> On 18.06.2020 10:04, Matias Bjørling wrote:
>> On 17/06/2020 19.23, Kanchan Joshi wrote:
>>> This patchset enables issuing zone-append using aio and io-uring
>>> direct-io interface.
>>>
>>> For aio, this introduces opcode IOCB_CMD_ZONE_APPEND. Application
>>> uses start LBA
>>> of the zone to issue append. On completion 'res2' field is used to
>>> return
>>> zone-relative offset.
>>>
>>> For io-uring, this introduces three opcodes:
>>> IORING_OP_ZONE_APPEND/APPENDV/APPENDV_FIXED.
>>> Since io_uring does not have aio-like res2, cqe->flags are
>>> repurposed to return zone-relative offset
>>
>> Please provide a pointers to applications that are updated and ready
>> to take advantage of zone append.
>
> Good point. We are posting a RFC with fio support for append. We wanted
> to start the conversation here before.
>
> We can post a fork for improve the reviews in V2.

Christoph's response points that it is not exactly clear how this
matches with the POSIX API.

fio support is great - but I was thinking along the lines of
applications that not only benchmark performance. fio should be part of
the supported applications, but should not be the sole reason the API is
added.


2020-06-18 08:42:59

by Javier González

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On 18.06.2020 10:32, Matias Bjørling wrote:
>On 18/06/2020 10.27, Javier González wrote:
>>On 18.06.2020 10:04, Matias Bjørling wrote:
>>>On 17/06/2020 19.23, Kanchan Joshi wrote:
>>>>This patchset enables issuing zone-append using aio and io-uring
>>>>direct-io interface.
>>>>
>>>>For aio, this introduces opcode IOCB_CMD_ZONE_APPEND.
>>>>Application uses start LBA
>>>>of the zone to issue append. On completion 'res2' field is used
>>>>to return
>>>>zone-relative offset.
>>>>
>>>>For io-uring, this introduces three opcodes:
>>>>IORING_OP_ZONE_APPEND/APPENDV/APPENDV_FIXED.
>>>>Since io_uring does not have aio-like res2, cqe->flags are
>>>>repurposed to return zone-relative offset
>>>
>>>Please provide a pointers to applications that are updated and
>>>ready to take advantage of zone append.
>>
>>Good point. We are posting a RFC with fio support for append. We wanted
>>to start the conversation here before.
>>
>>We can post a fork for improve the reviews in V2.
>
>Christoph's response points that it is not exactly clear how this
>matches with the POSIX API.

Yes. We will address this.
>
>fio support is great - but I was thinking along the lines of
>applications that not only benchmark performance. fio should be part
>of the supported applications, but should not be the sole reason the
>API is added.

Agree. It is a process with different steps. We definitely want to have
the right kernel interface before pushing any changes to libraries and /
or applications. These will come as the interface becomes more stable.

To start with xNVMe will be leveraging this new path. A number of
customers are leveraging the xNVMe API for their applications already.

Thanks,
Javier

2020-06-18 09:58:04

by Javier González

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On 18.06.2020 10:04, Matias Bjørling wrote:
>On 17/06/2020 19.23, Kanchan Joshi wrote:
>>This patchset enables issuing zone-append using aio and io-uring direct-io interface.
>>
>>For aio, this introduces opcode IOCB_CMD_ZONE_APPEND. Application uses start LBA
>>of the zone to issue append. On completion 'res2' field is used to return
>>zone-relative offset.
>>
>>For io-uring, this introduces three opcodes: IORING_OP_ZONE_APPEND/APPENDV/APPENDV_FIXED.
>>Since io_uring does not have aio-like res2, cqe->flags are repurposed to return zone-relative offset
>
>Please provide a pointers to applications that are updated and ready
>to take advantage of zone append.

Good point. We are posting a RFC with fio support for append. We wanted
to start the conversation here before.

We can post a fork for improve the reviews in V2.

>
>I do not believe it's beneficial at this point to change the libaio
>API, applications that would want to use this API, should anyway
>switch to use io_uring.

I can see why you say this, but isn't it too restrictive to directly
drop libaio support? We can split the patches and merge uring first- no
proble,.

>
>Please also note that applications and libraries that want to take
>advantage of zone append, can already use the zonefs file-system, as
>it will use the zone append command when applicable.

Sure. There are different paths available already, which is great. We
have use cases for uring and would like to enable them too.

Thanks,
Javier

2020-06-18 09:58:38

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On 18/06/2020 10.39, Javier González wrote:
> On 18.06.2020 10:32, Matias Bjørling wrote:
>> On 18/06/2020 10.27, Javier González wrote:
>>> On 18.06.2020 10:04, Matias Bjørling wrote:
>>>> On 17/06/2020 19.23, Kanchan Joshi wrote:
>>>>> This patchset enables issuing zone-append using aio and io-uring
>>>>> direct-io interface.
>>>>>
>>>>> For aio, this introduces opcode IOCB_CMD_ZONE_APPEND. Application
>>>>> uses start LBA
>>>>> of the zone to issue append. On completion 'res2' field is used to
>>>>> return
>>>>> zone-relative offset.
>>>>>
>>>>> For io-uring, this introduces three opcodes:
>>>>> IORING_OP_ZONE_APPEND/APPENDV/APPENDV_FIXED.
>>>>> Since io_uring does not have aio-like res2, cqe->flags are
>>>>> repurposed to return zone-relative offset
>>>>
>>>> Please provide a pointers to applications that are updated and
>>>> ready to take advantage of zone append.
>>>
>>> Good point. We are posting a RFC with fio support for append. We wanted
>>> to start the conversation here before.
>>>
>>> We can post a fork for improve the reviews in V2.
>>
>> Christoph's response points that it is not exactly clear how this
>> matches with the POSIX API.
>
> Yes. We will address this.
>>
>> fio support is great - but I was thinking along the lines of
>> applications that not only benchmark performance. fio should be part
>> of the supported applications, but should not be the sole reason the
>> API is added.
>
> Agree. It is a process with different steps. We definitely want to have
> the right kernel interface before pushing any changes to libraries and /
> or applications. These will come as the interface becomes more stable.
>
> To start with xNVMe will be leveraging this new path. A number of
> customers are leveraging the xNVMe API for their applications already.

Heh, let me be even more specific - open-source applications, that is
outside of fio (or any other benchmarking application), and libraries
that acts as a mediator between two APIs.


2020-06-18 14:21:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On Thu, Jun 18, 2020 at 10:04:32AM +0200, Matias Bj?rling wrote:
> Please provide a pointers to applications that are updated and ready to take
> advantage of zone append.

That is a pretty high bar for kernel APIs that we don't otherwise
apply unless seriously in doubt.

> I do not believe it's beneficial at this point to change the libaio API,
> applications that would want to use this API, should anyway switch to use
> io_uring.

I think that really depends on the amount of churn required. We
absolutely can expose things like small additional flags or simple
new operations, as rewriting application to different APIs is not
exactly trivial. On the other hand we really shouldn't do huge
additions to the machinery.

> Please also note that applications and libraries that want to take advantage
> of zone append, can already use the zonefs file-system, as it will use the
> zone append command when applicable.

Not really. While we already use Zone Append in Zonefs for some cases,
we can't fully take advantage of the scalability of Zone Append. For
that we'd need a way to return the file position where an O_APPEND
write actually landed, as suggested in my earlier mail. Which I think
is a very useful addition, and Damien and I had looked into adding
it both for zonefs and normal file systems, but didn't get around to
doing the work yet.

2020-06-18 20:07:54

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On 18/06/2020 21.21, Kanchan Joshi wrote:
> On Thu, Jun 18, 2020 at 10:04:32AM +0200, Matias Bjørling wrote:
>> On 17/06/2020 19.23, Kanchan Joshi wrote:
>>> This patchset enables issuing zone-append using aio and io-uring
>>> direct-io interface.
>>>
>>> For aio, this introduces opcode IOCB_CMD_ZONE_APPEND. Application
>>> uses start LBA
>>> of the zone to issue append. On completion 'res2' field is used to
>>> return
>>> zone-relative offset.
>>>
>>> For io-uring, this introduces three opcodes:
>>> IORING_OP_ZONE_APPEND/APPENDV/APPENDV_FIXED.
>>> Since io_uring does not have aio-like res2, cqe->flags are
>>> repurposed to return zone-relative offset
>>
>> Please provide a pointers to applications that are updated and ready
>> to take advantage of zone append.
>>
>> I do not believe it's beneficial at this point to change the libaio
>> API, applications that would want to use this API, should anyway
>> switch to use io_uring.
>>
>> Please also note that applications and libraries that want to take
>> advantage of zone append, can already use the zonefs file-system, as
>> it will use the zone append command when applicable.
>
> AFAIK, zonefs uses append while serving synchronous I/O. And append bio
> is waited upon synchronously. That maybe serving some purpose I do
> not know currently. But it seems applications using zonefs file
> abstraction will get benefitted if they could use the append
> themselves to
> carry the I/O, asynchronously.
Yep, please see Christoph's comment regarding adding the support to zonefs.


2020-06-19 03:34:20

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On 2020/06/19 2:55, Kanchan Joshi wrote:
> On Wed, Jun 17, 2020 at 11:56:34PM -0700, Christoph Hellwig wrote:
>> On Wed, Jun 17, 2020 at 10:53:36PM +0530, Kanchan Joshi wrote:
>>> This patchset enables issuing zone-append using aio and io-uring direct-io interface.
>>>
>>> For aio, this introduces opcode IOCB_CMD_ZONE_APPEND. Application uses start LBA
>>> of the zone to issue append. On completion 'res2' field is used to return
>>> zone-relative offset.
>>>
>>> For io-uring, this introduces three opcodes: IORING_OP_ZONE_APPEND/APPENDV/APPENDV_FIXED.
>>> Since io_uring does not have aio-like res2, cqe->flags are repurposed to return zone-relative offset
>>
>> And what exactly are the semantics supposed to be? Remember the
>> unix file abstractions does not know about zones at all.
>>
>> I really don't think squeezing low-level not quite block storage
>> protocol details into the Linux read/write path is a good idea.
>
> I was thinking of raw block-access to zone device rather than pristine file
> abstraction. And in that context, semantics, at this point, are unchanged
> (i.e. same as direct writes) while flexibility of async-interface gets
> added.

The aio->aio_offset use by the user and kernel differs for regular writes and
zone append writes. This is a significant enough change to say that semantic
changed. Yes both cases are direct IOs, but specification of the write location
by the user and where the data actually lands on disk are different.

There are a lot of subtle things that can happen that makes mapping of zone
append operations to POSIX semantic difficult. E.g. for a regular file, using
zone append for any write issued to a file open with O_APPEND maps well to POSIX
only for blocking writes. For asynchronous writes, that is not true anymore
since the order of data defined by the automatic append after the previous async
write breaks: data can land anywhere in the zone regardless of the offset
specified on submission.

> Synchronous-writes on single-zone sound fine, but synchronous-appends on
> single-zone do not sound that fine.

Why not ? This is a perfectly valid use case that actually does not have any
semantic problem. It indeed may not be the most effective method to get high
performance but saying that it is "not fine" is not correct in my opinion.

>
>> What could be a useful addition is a way for O_APPEND/RWF_APPEND writes
>> to report where they actually wrote, as that comes close to Zone Append
>> while still making sense at our usual abstraction level for file I/O.
>
> Thanks for suggesting this. O and RWF_APPEND may not go well with block
> access as end-of-file will be picked from dev inode. But perhaps a new
> flag like RWF_ZONE_APPEND can help to transform writes (aio or uring)
> into append without introducing new opcodes.

Yes, RWF_ZONE_APPEND may be better if the semantic of RWF_APPEND cannot be
cleanly reused. But as Christoph said, RWF_ZONE_APPEND semantic need to be
clarified so that all reviewer can check the code against the intended behavior,
and comment on that intended behavior too.

> And, I think, this can fit fine on file-abstraction of ZoneFS as well.

May be. Depends on what semantic you are after for user zone append interface.
Ideally, we should have at least the same for raw block device and zonefs. But
zonefs may be able to do a better job thanks to its real regular file
abstraction of zones. As Christoph said, we started looking into it but lacked
time to complete this work. This is still on-going.

--
Damien Le Moal
Western Digital Research

2020-06-19 04:04:27

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On 2020/06/19 5:04, Matias Bj?rling wrote:
> On 18/06/2020 21.21, Kanchan Joshi wrote:
>> On Thu, Jun 18, 2020 at 10:04:32AM +0200, Matias Bj?rling wrote:
>>> On 17/06/2020 19.23, Kanchan Joshi wrote:
>>>> This patchset enables issuing zone-append using aio and io-uring
>>>> direct-io interface.
>>>>
>>>> For aio, this introduces opcode IOCB_CMD_ZONE_APPEND. Application
>>>> uses start LBA
>>>> of the zone to issue append. On completion 'res2' field is used to
>>>> return
>>>> zone-relative offset.
>>>>
>>>> For io-uring, this introduces three opcodes:
>>>> IORING_OP_ZONE_APPEND/APPENDV/APPENDV_FIXED.
>>>> Since io_uring does not have aio-like res2, cqe->flags are
>>>> repurposed to return zone-relative offset
>>>
>>> Please provide a pointers to applications that are updated and ready
>>> to take advantage of zone append.
>>>
>>> I do not believe it's beneficial at this point to change the libaio
>>> API, applications that would want to use this API, should anyway
>>> switch to use io_uring.
>>>
>>> Please also note that applications and libraries that want to take
>>> advantage of zone append, can already use the zonefs file-system, as
>>> it will use the zone append command when applicable.
>>
>> AFAIK, zonefs uses append while serving synchronous I/O. And append bio
>> is waited upon synchronously. That maybe serving some purpose I do
>> not know currently. But it seems applications using zonefs file
>> abstraction will get benefitted if they could use the append
>> themselves to
>> carry the I/O, asynchronously.
> Yep, please see Christoph's comment regarding adding the support to zonefs.

For the asynchronous processing of zone append in zonefs, we need to add
plumbing in the iomap code first. Since this is missing currently, zonefs can
only do synchronous/blocking zone append for now. Will be working on that, if we
can come up with a semantic that makes sense for posix system calls. zonefs is
not a posix compliant file system, so we are not strongly tied by posix
specifications. But we still want to make it as easy as possible to understand
and use by the user.


--
Damien Le Moal
Western Digital Research

2020-06-19 07:58:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring

On Thu, Jun 18, 2020 at 11:22:58PM +0530, Kanchan Joshi wrote:
> I was thinking of raw block-access to zone device rather than pristine file
> abstraction.

Why?

> And in that context, semantics, at this point, are unchanged
> (i.e. same as direct writes) while flexibility of async-interface gets
> added.
> Synchronous-writes on single-zone sound fine, but synchronous-appends on
> single-zone do not sound that fine.

Where does synchronous access come into play?

> > What could be a useful addition is a way for O_APPEND/RWF_APPEND writes
> > to report where they actually wrote, as that comes close to Zone Append
> > while still making sense at our usual abstraction level for file I/O.
>
> Thanks for suggesting this. O and RWF_APPEND may not go well with block
> access as end-of-file will be picked from dev inode.

No, but they go really well with zonefs.

> But perhaps a new
> flag like RWF_ZONE_APPEND can help to transform writes (aio or uring)
> into append without introducing new opcodes.

I don't think this is a good idea. Zones are a concept for a a very
specific class of zoned devices. Trying to shoe-horn this into the
byte address files / whole device abstraction not only is ugly
conceptually but also adds the overhead for it to the VFS.

And O_APPEND that returns the written position OTOH makes total sense
at the file level as well and not just for raw zoned devices.