2023-11-30 13:53:57

by Ojaswin Mujoo

[permalink] [raw]
Subject: [RFC 0/7] ext4: Allocator changes for atomic write support with DIO

This patch series builds on top of John Gary's atomic direct write
patch series [1] and enables this support in ext4. This is a 2 step
process:

1. Enable aligned allocation in ext4 mballoc. This allows us to allocate
power-of-2 aligned physical blocks, which is needed for atomic writes.

2. Hook the direct IO path in ext4 to use aligned allocation to obtain
physical blocks at a given alignment, which is needed for atomic IO. If
for any reason we are not able to obtain blocks at given alignment we
fail the atomic write.

Currently this RFC does not impose any restrictions for atomic and non-atomic
allocations to any inode, which also leaves policy decisions to user-space
as much as possible. So, for example, the user space can:

* Do an atomic direct IO at any alignment and size provided it
satisfies underlying device constraints. The only restriction for now
is that it should be power of 2 len and atleast of FS block size.

* Do any combination of non atomic and atomic writes on the same file
in any order. As long as the user space is passing the RWF_ATOMIC flag
to pwritev2() it is guaranteed to do an atomic IO (or fail if not
possible).

There are some TODOs on the allocator side which are remaining like...

1. Fallback to original request size when normalized request size (due to
preallocation) allocation is not possible.
2. Testing some edge cases.

But since all the basic test scenarios were covered, hence we wanted to get
this RFC out for discussion on atomic write support for DIO in ext4.

Further points for discussion -

1. We might need an inode flag to identify that the inode has blocks/extents
atomically allocated. So that other userspace tools do not move the blocks of
the inode for e.g. during resize/fsck etc.
a. Should inode be marked as atomic similar to how we have IS_DAX(inode)
implementation? Any thoughts?

2. Should there be support for open flags like O_ATOMIC. So that in case if
user wants to do only atomic writes to an open fd, then all writes can be
considered atomic.

3. Do we need to have any feature compat flags for FS? (IMO) It doesn't look
like since say if there are block allocations done which were done atomically,
it should not matter to FS w.r.t compatibility.

4. Mostly aligned allocations are required when we don't have data=journal
mode. So should we return -EIO with data journalling mode for DIO request?

Script to test using pwritev2() can be found here:
https://gist.github.com/OjaswinM/e67accee3cbb7832bd3f1a9543c01da9

Regards,
ojaswin

[1] https://lore.kernel.org/linux-fsdevel/[email protected]


Ojaswin Mujoo (7):
iomap: Don't fall back to buffered write if the write is atomic
ext4: Factor out size and start prediction from
ext4_mb_normalize_request()
ext4: add aligned allocation support in mballoc
ext4: allow inode preallocation for aligned alloc
block: export blkdev_atomic_write_valid() and refactor api
ext4: Add aligned allocation support for atomic direct io
ext4: Support atomic write for statx

block/fops.c | 18 ++-
fs/ext4/ext4.h | 10 +-
fs/ext4/extents.c | 14 ++
fs/ext4/file.c | 49 ++++++
fs/ext4/inode.c | 142 ++++++++++++++++-
fs/ext4/mballoc.c | 302 +++++++++++++++++++++++++-----------
fs/iomap/direct-io.c | 8 +-
include/linux/blkdev.h | 2 +
include/trace/events/ext4.h | 2 +
9 files changed, 442 insertions(+), 105 deletions(-)

--
2.39.3



2023-11-30 13:54:03

by Ojaswin Mujoo

[permalink] [raw]
Subject: [RFC 1/7] iomap: Don't fall back to buffered write if the write is atomic

Currently, iomap only supports atomic writes for direct IOs and there is
no guarantees that a buffered IO will be atomic. Hence, if the user has
explicitly requested the direct write to be atomic and there's a
failure, return -EIO instead of falling back to buffered IO.

Signed-off-by: Ojaswin Mujoo <[email protected]>
---
fs/iomap/direct-io.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 6ef25e26f1a1..3e7cd9bc8f4d 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -662,7 +662,13 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (ret != -EAGAIN) {
trace_iomap_dio_invalidate_fail(inode, iomi.pos,
iomi.len);
- ret = -ENOTBLK;
+ /*
+ * if this write was supposed to be atomic,
+ * return the err rather than trying to fall
+ * back to buffered IO.
+ */
+ if (!atomic_write)
+ ret = -ENOTBLK;
}
goto out_free_dio;
}
--
2.39.3


2023-11-30 13:54:23

by Ojaswin Mujoo

[permalink] [raw]
Subject: [RFC 3/7] ext4: add aligned allocation support in mballoc

Add support in mballoc for allocating blocks that are aligned
to a certain power-of-2 offset.

1. We define a new flag EXT4_MB_ALIGNED_ALLOC to indicate that we want
an aligned allocation.

2. The alignment is determined by the length of the allocation, for
example if we ask for 8192 bytes, then the alignment of physical blocks
will also be 8192 bytes aligned (ie 2 blocks aligned on 4k blocksize).

3. We dont yet support arbitrary alignment. For aligned writes, the
length/alignment must be power of 2 blocks, ie for 4k blocksize we
can get 4k byte aligned, 8k byte aligned, 16k byte aligned ...
allocation but not 12k byte aligned.

4. We use CR_POWER2_ALIGNED criteria for aligned allocation which by
design allocates in an aligned manner. Since CR_POWER2_ALIGNED needs the
ac->ac_g_ex.fe_len to be power of 2, thats where the restriction in
point 3 above comes from. Since right now aligned allocation support is
added mainly for atomic writes use case, this restriction should be fine
since atomic write capable devices usually support only power of 2
alignments

5. For ease of review enabling inode preallocation support is done in
upcoming patches and is disabled in this patch.

6. In case we can't find anything in CR_POWER2_ALIGNED, we return ENOSPC.

Signed-off-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/ext4.h | 6 ++--
fs/ext4/mballoc.c | 69 ++++++++++++++++++++++++++++++++++---
include/trace/events/ext4.h | 1 +
3 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9418359b1d9d..38a77148b85c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -216,9 +216,11 @@ enum criteria {
/* Large fragment size list lookup succeeded at least once for cr = 0 */
#define EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED 0x8000
/* Avg fragment size rb tree lookup succeeded at least once for cr = 1 */
-#define EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED 0x00010000
+#define EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED 0x10000
/* Avg fragment size rb tree lookup succeeded at least once for cr = 1.5 */
-#define EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED 0x00020000
+#define EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED 0x20000
+/* The allocation must respect alignment requirements for physical blocks */
+#define EXT4_MB_ALIGNED_ALLOC 0x40000

struct ext4_allocation_request {
/* target inode for block we're allocating */
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 3eb7b639d36e..b1df531e6db3 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2150,8 +2150,11 @@ static void ext4_mb_use_best_found(struct ext4_allocation_context *ac,
* user requested originally, we store allocated
* space in a special descriptor.
*/
- if (ac->ac_o_ex.fe_len < ac->ac_b_ex.fe_len)
+ if (ac->ac_o_ex.fe_len < ac->ac_b_ex.fe_len) {
+ /* Aligned allocation doesn't have preallocation support */
+ WARN_ON(ac->ac_flags & EXT4_MB_ALIGNED_ALLOC);
ext4_mb_new_preallocation(ac);
+ }

}

@@ -2784,10 +2787,15 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)

BUG_ON(ac->ac_status == AC_STATUS_FOUND);

- /* first, try the goal */
- err = ext4_mb_find_by_goal(ac, &e4b);
- if (err || ac->ac_status == AC_STATUS_FOUND)
- goto out;
+ /*
+ * first, try the goal. Skip trying goal for aligned allocations since
+ * goal determination logic is not alignment aware (yet)
+ */
+ if (!(ac->ac_flags & EXT4_MB_ALIGNED_ALLOC)) {
+ err = ext4_mb_find_by_goal(ac, &e4b);
+ if (err || ac->ac_status == AC_STATUS_FOUND)
+ goto out;
+ }

if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
goto out;
@@ -2828,9 +2836,26 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
*/
if (ac->ac_2order)
cr = CR_POWER2_ALIGNED;
+ else
+ WARN_ON(ac->ac_flags & EXT4_MB_ALIGNED_ALLOC &&
+ ac->ac_g_ex.fe_len > 1);
repeat:
for (; cr < EXT4_MB_NUM_CRS && ac->ac_status == AC_STATUS_CONTINUE; cr++) {
ac->ac_criteria = cr;
+
+ if (ac->ac_criteria > CR_POWER2_ALIGNED &&
+ ac->ac_flags & EXT4_MB_ALIGNED_ALLOC &&
+ ac->ac_g_ex.fe_len > 1
+ ) {
+ /*
+ * Aligned allocation only supports power 2 alignment
+ * values which can only be satisfied by
+ * CR_POWER2_ALIGNED. The exception being allocations of
+ * 1 block which can be done via any criteria
+ */
+ break;
+ }
+
/*
* searching for the right group start
* from the goal value specified
@@ -2955,6 +2980,23 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
if (!err && ac->ac_status != AC_STATUS_FOUND && first_err)
err = first_err;

+ if (ac->ac_flags & EXT4_MB_ALIGNED_ALLOC && ac->ac_status == AC_STATUS_FOUND) {
+ ext4_fsblk_t start = ext4_grp_offs_to_block(sb, &ac->ac_b_ex);
+ ext4_grpblk_t len = EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+
+ if (!len) {
+ ext4_warning(sb, "Expected a non zero len extent");
+ ac->ac_status = AC_STATUS_BREAK;
+ goto exit;
+ }
+
+ WARN_ON(!is_power_of_2(len));
+ WARN_ON(start % len);
+ /* We don't support preallocation yet */
+ WARN_ON(ac->ac_b_ex.fe_len != ac->ac_o_ex.fe_len);
+ }
+
+ exit:
mb_debug(sb, "Best len %d, origin len %d, ac_status %u, ac_flags 0x%x, cr %d ret %d\n",
ac->ac_b_ex.fe_len, ac->ac_o_ex.fe_len, ac->ac_status,
ac->ac_flags, cr, err);
@@ -4475,6 +4517,13 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
if (ac->ac_flags & EXT4_MB_HINT_NOPREALLOC)
return;

+ /*
+ * caller may have strict alignment requirements. In this case, avoid
+ * normalization since it is not alignment aware.
+ */
+ if (ac->ac_flags & EXT4_MB_ALIGNED_ALLOC)
+ return;
+
if (ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC) {
ext4_mb_normalize_group_request(ac);
return ;
@@ -4790,6 +4839,10 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
if (!(ac->ac_flags & EXT4_MB_HINT_DATA))
return false;

+ /* using preallocated blocks is not alignment aware. */
+ if (ac->ac_flags & EXT4_MB_ALIGNED_ALLOC)
+ return false;
+
/*
* first, try per-file preallocation by searching the inode pa rbtree.
*
@@ -6069,6 +6122,12 @@ static bool ext4_mb_discard_preallocations_should_retry(struct super_block *sb,
u64 seq_retry = 0;
bool ret = false;

+ /* No need to retry for aligned allocations */
+ if (ac->ac_flags & EXT4_MB_ALIGNED_ALLOC) {
+ ret = false;
+ goto out_dbg;
+ }
+
freed = ext4_mb_discard_preallocations(sb, ac->ac_o_ex.fe_len);
if (freed) {
ret = true;
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 65029dfb92fb..56895cfb5781 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -36,6 +36,7 @@ struct partial_cluster;
{ EXT4_MB_STREAM_ALLOC, "STREAM_ALLOC" }, \
{ EXT4_MB_USE_ROOT_BLOCKS, "USE_ROOT_BLKS" }, \
{ EXT4_MB_USE_RESERVED, "USE_RESV" }, \
+ { EXT4_MB_ALIGNED_ALLOC, "ALIGNED_ALLOC" }, \
{ EXT4_MB_STRICT_CHECK, "STRICT_CHECK" })

#define show_map_flags(flags) __print_flags(flags, "|", \
--
2.39.3


2023-12-04 10:43:31

by John Garry

[permalink] [raw]
Subject: Re: [RFC 0/7] ext4: Allocator changes for atomic write support with DIO

On 30/11/2023 13:53, Ojaswin Mujoo wrote:

Thanks for putting this together.

> This patch series builds on top of John Gary's atomic direct write
> patch series [1] and enables this support in ext4. This is a 2 step
> process:
>
> 1. Enable aligned allocation in ext4 mballoc. This allows us to allocate
> power-of-2 aligned physical blocks, which is needed for atomic writes.
>
> 2. Hook the direct IO path in ext4 to use aligned allocation to obtain
> physical blocks at a given alignment, which is needed for atomic IO. If
> for any reason we are not able to obtain blocks at given alignment we
> fail the atomic write.

So are we supposed to be doing atomic writes on unwritten ranges only in
the file to get the aligned allocations?

I actually tried that, and I got a WARN triggered:

# mkfs.ext4 /dev/sda
mke2fs 1.46.5 (30-Dec-2021)
Creating filesystem with 358400 1k blocks and 89760 inodes
Filesystem UUID: 7543a44b-2957-4ddc-9d4a-db3a5fd019c9
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729, 204801, 221185

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

[ 12.745889] mkfs.ext4 (150) used greatest stack depth: 13304 bytes left
# mount /dev/sda mnt
[ 12.798804] EXT4-fs (sda): mounted filesystem
7543a44b-2957-4ddc-9d4a-db3a5fd019c9 r/w with ordered data mode. Quota
mode: none.
# touch mnt/file
#
# /test-statx -a /root/mnt/file
statx(/root/mnt/file) = 0
dump_statx results=5fff
Size: 0 Blocks: 0 IO Block: 1024 regular file
Device: 08:00 Inode: 12 Links: 1
Access: (0644/-rw-r--r--) Uid: 0 Gid: 0
Access: 2023-12-04 10:27:40.002848720+0000
Modify: 2023-12-04 10:27:40.002848720+0000
Change: 2023-12-04 10:27:40.002848720+0000
Birth: 2023-12-04 10:27:40.002848720+0000
stx_attributes_mask=0x703874
STATX_ATTR_WRITE_ATOMIC set
unit min: 1024
uunit max: 524288
Attributes: 0000000000400000 (........ ........ ........ ........
........ .?--.... ..---... .---.-..)
#



looks ok so far, then write 4KB at offset 0:

# /test-pwritev2 -a -d -p 0 -l 4096 /root/mnt/file
file=/root/mnt/file write_size=4096 offset=0 o_flags=0x4002 wr_flags=0x24
[ 46.813720] ------------[ cut here ]------------
[ 46.814934] WARNING: CPU: 1 PID: 158 at fs/ext4/mballoc.c:2991
ext4_mb_regular_allocator+0xeca/0xf20
[ 46.816344] Modules linked in:
[ 46.816831] CPU: 1 PID: 158 Comm: test-pwritev2 Not tainted
6.7.0-rc1-00038-gae3807f27e7d-dirty #968
[ 46.818220] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[ 46.819886] RIP: 0010:ext4_mb_regular_allocator+0xeca/0xf20
[ 46.820734] Code: fd ff ff f0 41 ff 81 e4 03 00 00 e9 63 fd ff ff
90 0f 0b 90 e9 fe f3 ff ff 90 48 c7 c7 e2 7a b2 86 44 89 ca e8 f7 f1
d2 ff 90 <0f> 0b 90 90 45 8b 44 24 3c e9 d4 f3 ff ff 4d 8b 45 08 4c 89
c2 4d
[ 46.823577] RSP: 0018:ffffb77dc056b7c0 EFLAGS: 00010286
[ 46.824379] RAX: 0000000000000000 RBX: ffff9b2ad77dea80 RCX:
0000000000000000
[ 46.825458] RDX: 0000000000000001 RSI: ffff9b2b3491b5c0 RDI:
ffff9b2b3491b5c0
[ 46.826557] RBP: ffff9b2adc7cd000 R08: 0000000000000000 R09:
c0000000ffffdfff
[ 46.827634] R10: ffff9b2adcb9d780 R11: ffffb77dc056b648 R12:
ffff9b2ac6778000
[ 46.828714] R13: ffff9b2adc7cd000 R14: ffff9b2adc7d0000 R15:
000000000000002a
[ 46.829796] FS: 00007f726dece740(0000) GS:ffff9b2b34900000(0000)
knlGS:0000000000000000
[ 46.830706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 46.831299] CR2: 0000000001ed72b8 CR3: 000000001c794006 CR4:
0000000000370ef0
[ 46.832041] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 46.832813] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 46.833546] Call Trace:
[ 46.833901] <TASK>
[ 46.834163] ? __warn+0x78/0x130
[ 46.834504] ? ext4_mb_regular_allocator+0xeca/0xf20
[ 46.835037] ? report_bug+0xf8/0x1e0
[ 46.835527] ? console_unlock+0x45/0xd0
[ 46.835963] ? handle_bug+0x40/0x70
[ 46.836419] ? exc_invalid_op+0x13/0x70
[ 46.836865] ? asm_exc_invalid_op+0x16/0x20
[ 46.837329] ? ext4_mb_regular_allocator+0xeca/0xf20
[ 46.837852] ext4_mb_new_blocks+0x7e8/0xe60
[ 46.838382] ? __kmalloc+0x4b/0x130
[ 46.838824] ? __kmalloc+0x4b/0x130
[ 46.839243] ? ext4_find_extent+0x347/0x360
[ 46.839743] ext4_ext_map_blocks+0xc44/0xff0
[ 46.840395] ext4_map_blocks+0x162/0x5b0
[ 46.841010] ? jbd2__journal_start+0x84/0x1f0
[ 46.841694] ext4_map_blocks_aligned+0x20/0xa0
[ 46.842382] ext4_iomap_begin+0x1e9/0x320
[ 46.843006] iomap_iter+0x16d/0x350
[ 46.843554] __iomap_dio_rw+0x3be/0x830
[ 46.844150] iomap_dio_rw+0x9/0x30
[ 46.844680] ext4_file_write_iter+0x597/0x800
[ 46.845346] do_iter_readv_writev+0xe1/0x150
[ 46.846029] do_iter_write+0x86/0x1f0
[ 46.846638] vfs_writev+0x96/0x190
[ 46.847176] ? do_pwritev+0x98/0xd0
[ 46.847721] do_pwritev+0x98/0xd0
[ 46.848230] ? syscall_trace_enter.isra.19+0x130/0x1b0
[ 46.849028] do_syscall_64+0x42/0xf0
[ 46.849590] entry_SYSCALL_64_after_hwframe+0x6f/0x77
[ 46.850405] RIP: 0033:0x7f726df9666f
[ 46.850964] Code: d5 41 54 49 89 f4 55 89 fd 53 44 89 c3 48 83 ec
18 80 3d bb fd 0b 00 00 74 2a 45 89 c1 49 89 ca 45 31 c0 b8 48 01 00
00 0f 05 <48> 3d 00 f0 ff ff 76 5c 48 8b 15 7a 77 0b 00 f7 d8 64 89 02
48 83
[ 46.854020] RSP: 002b:00007fff28b9bff0 EFLAGS: 00000246 ORIG_RAX:
0000000000000148
[ 46.855178] RAX: ffffffffffffffda RBX: 0000000000000024 RCX:
00007f726df9666f
[ 46.856248] RDX: 0000000000000001 RSI: 00007fff28b9c050 RDI:
0000000000000003
[ 46.857303] RBP: 0000000000000003 R08: 0000000000000000 R09:
0000000000000024
[ 46.858365] R10: 0000000000000000 R11: 0000000000000246 R12:
00007fff28b9c050
[ 46.859407] R13: 0000000000000001 R14: 0000000000000000 R15:
00007f726e08aa60
[ 46.860448] </TASK>
[ 46.860797] ---[ end trace 0000000000000000 ]---
[ 46.861497] EXT4-fs warning (device sda):
ext4_map_blocks_aligned:520: Returned extent couldn't satisfy
alignment requirements
main wrote -1 bytes at offset 0
[ 46.863855] test-pwritev2 (158) used greatest stack depth: 11920
bytes left
#

Please note that I tested on my own dev branch, which contains changes
over [1], but I expect it would not make a difference for this test.


>
> Currently this RFC does not impose any restrictions for atomic and non-atomic
> allocations to any inode, which also leaves policy decisions to user-space
> as much as possible. So, for example, the user space can:
>
> * Do an atomic direct IO at any alignment and size provided it
> satisfies underlying device constraints. The only restriction for now
> is that it should be power of 2 len and atleast of FS block size.
>
> * Do any combination of non atomic and atomic writes on the same file
> in any order. As long as the user space is passing the RWF_ATOMIC flag
> to pwritev2() it is guaranteed to do an atomic IO (or fail if not
> possible).
>
> There are some TODOs on the allocator side which are remaining like...
>
> 1. Fallback to original request size when normalized request size (due to
> preallocation) allocation is not possible.
> 2. Testing some edge cases.
>
> But since all the basic test scenarios were covered, hence we wanted to get
> this RFC out for discussion on atomic write support for DIO in ext4.
>
> Further points for discussion -
>
> 1. We might need an inode flag to identify that the inode has blocks/extents
> atomically allocated. So that other userspace tools do not move the blocks of
> the inode for e.g. during resize/fsck etc.
> a. Should inode be marked as atomic similar to how we have IS_DAX(inode)
> implementation? Any thoughts?
>
> 2. Should there be support for open flags like O_ATOMIC. So that in case if
> user wants to do only atomic writes to an open fd, then all writes can be
> considered atomic.
>
> 3. Do we need to have any feature compat flags for FS? (IMO) It doesn't look
> like since say if there are block allocations done which were done atomically,
> it should not matter to FS w.r.t compatibility.
>
> 4. Mostly aligned allocations are required when we don't have data=journal
> mode. So should we return -EIO with data journalling mode for DIO request?
>
> Script to test using pwritev2() can be found here:
> https://urldefense.com/v3/__https://gist.github.com/OjaswinM/e67accee3cbb7832bd3f1a9543c01da9__;!!ACWV5N9M2RV99hQ!LbVSb-43597CLDmYnhgOwH6MAcikusRh75-4fUbUrA_8go3B6JL1lWJPmhij8siPJE031qtQb6-bpdLEa1qrVA$

Please note that the posix_memalign() call in the program should PAGE align.

Thanks,
John

2023-12-04 13:39:26

by Ojaswin Mujoo

[permalink] [raw]
Subject: Re: [RFC 0/7] ext4: Allocator changes for atomic write support with DIO

Hi John,

Thanks for the review.

On Mon, Dec 04, 2023 at 10:36:01AM +0000, John Garry wrote:
> On 30/11/2023 13:53, Ojaswin Mujoo wrote:
>
> Thanks for putting this together.
>
> > This patch series builds on top of John Gary's atomic direct write
> > patch series [1] and enables this support in ext4. This is a 2 step
> > process:
> >
> > 1. Enable aligned allocation in ext4 mballoc. This allows us to allocate
> > power-of-2 aligned physical blocks, which is needed for atomic writes.
> >
> > 2. Hook the direct IO path in ext4 to use aligned allocation to obtain
> > physical blocks at a given alignment, which is needed for atomic IO. If
> > for any reason we are not able to obtain blocks at given alignment we
> > fail the atomic write.
>
> So are we supposed to be doing atomic writes on unwritten ranges only in the
> file to get the aligned allocations?

If we do an atomic write on a hole, ext4 will give us an aligned extent
provided the hole is big enough to accomodate it.

However, if we do an atomic write on a existing extent (written or
unwritten) ext4 would check if it satisfies the alignment and length
requirement and returns an error if it doesn't. Since we don't have cow
like functionality afaik the only way we could let this kind of write go
through is by punching the pre-existing extent which is not something we
can directly do in the same write operation, hence we return error.

>
> I actually tried that, and I got a WARN triggered:
>
> # mkfs.ext4 /dev/sda
> mke2fs 1.46.5 (30-Dec-2021)
> Creating filesystem with 358400 1k blocks and 89760 inodes
> Filesystem UUID: 7543a44b-2957-4ddc-9d4a-db3a5fd019c9
> Superblock backups stored on blocks:
> 8193, 24577, 40961, 57345, 73729, 204801, 221185
>
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (8192 blocks): done
> Writing superblocks and filesystem accounting information: done
>
> [ 12.745889] mkfs.ext4 (150) used greatest stack depth: 13304 bytes left
> # mount /dev/sda mnt
> [ 12.798804] EXT4-fs (sda): mounted filesystem
> 7543a44b-2957-4ddc-9d4a-db3a5fd019c9 r/w with ordered data mode. Quota
> mode: none.
> # touch mnt/file
> #
> # /test-statx -a /root/mnt/file
> statx(/root/mnt/file) = 0
> dump_statx results=5fff
> Size: 0 Blocks: 0 IO Block: 1024 regular file
> Device: 08:00 Inode: 12 Links: 1
> Access: (0644/-rw-r--r--) Uid: 0 Gid: 0
> Access: 2023-12-04 10:27:40.002848720+0000
> Modify: 2023-12-04 10:27:40.002848720+0000
> Change: 2023-12-04 10:27:40.002848720+0000
> Birth: 2023-12-04 10:27:40.002848720+0000
> stx_attributes_mask=0x703874
> STATX_ATTR_WRITE_ATOMIC set
> unit min: 1024
> uunit max: 524288
> Attributes: 0000000000400000 (........ ........ ........ ........
> ........ .?--.... ..---... .---.-..)
> #
>
>
>
> looks ok so far, then write 4KB at offset 0:
>
> # /test-pwritev2 -a -d -p 0 -l 4096 /root/mnt/file
> file=/root/mnt/file write_size=4096 offset=0 o_flags=0x4002 wr_flags=0x24
> [ 46.813720] ------------[ cut here ]------------
> [ 46.814934] WARNING: CPU: 1 PID: 158 at fs/ext4/mballoc.c:2991
> ext4_mb_regular_allocator+0xeca/0xf20
> [ 46.816344] Modules linked in:
> [ 46.816831] CPU: 1 PID: 158 Comm: test-pwritev2 Not tainted
> 6.7.0-rc1-00038-gae3807f27e7d-dirty #968
> [ 46.818220] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
> [ 46.819886] RIP: 0010:ext4_mb_regular_allocator+0xeca/0xf20
> [ 46.820734] Code: fd ff ff f0 41 ff 81 e4 03 00 00 e9 63 fd ff ff
> 90 0f 0b 90 e9 fe f3 ff ff 90 48 c7 c7 e2 7a b2 86 44 89 ca e8 f7 f1
> d2 ff 90 <0f> 0b 90 90 45 8b 44 24 3c e9 d4 f3 ff ff 4d 8b 45 08 4c 89
> c2 4d
> [ 46.823577] RSP: 0018:ffffb77dc056b7c0 EFLAGS: 00010286
> [ 46.824379] RAX: 0000000000000000 RBX: ffff9b2ad77dea80 RCX:
> 0000000000000000
> [ 46.825458] RDX: 0000000000000001 RSI: ffff9b2b3491b5c0 RDI:
> ffff9b2b3491b5c0
> [ 46.826557] RBP: ffff9b2adc7cd000 R08: 0000000000000000 R09:
> c0000000ffffdfff
> [ 46.827634] R10: ffff9b2adcb9d780 R11: ffffb77dc056b648 R12:
> ffff9b2ac6778000
> [ 46.828714] R13: ffff9b2adc7cd000 R14: ffff9b2adc7d0000 R15:
> 000000000000002a
> [ 46.829796] FS: 00007f726dece740(0000) GS:ffff9b2b34900000(0000)
> knlGS:0000000000000000
> [ 46.830706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 46.831299] CR2: 0000000001ed72b8 CR3: 000000001c794006 CR4:
> 0000000000370ef0
> [ 46.832041] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 46.832813] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 46.833546] Call Trace:
> [ 46.833901] <TASK>
> [ 46.834163] ? __warn+0x78/0x130
> [ 46.834504] ? ext4_mb_regular_allocator+0xeca/0xf20
> [ 46.835037] ? report_bug+0xf8/0x1e0
> [ 46.835527] ? console_unlock+0x45/0xd0
> [ 46.835963] ? handle_bug+0x40/0x70
> [ 46.836419] ? exc_invalid_op+0x13/0x70
> [ 46.836865] ? asm_exc_invalid_op+0x16/0x20
> [ 46.837329] ? ext4_mb_regular_allocator+0xeca/0xf20
> [ 46.837852] ext4_mb_new_blocks+0x7e8/0xe60
> [ 46.838382] ? __kmalloc+0x4b/0x130
> [ 46.838824] ? __kmalloc+0x4b/0x130
> [ 46.839243] ? ext4_find_extent+0x347/0x360
> [ 46.839743] ext4_ext_map_blocks+0xc44/0xff0
> [ 46.840395] ext4_map_blocks+0x162/0x5b0
> [ 46.841010] ? jbd2__journal_start+0x84/0x1f0
> [ 46.841694] ext4_map_blocks_aligned+0x20/0xa0
> [ 46.842382] ext4_iomap_begin+0x1e9/0x320
> [ 46.843006] iomap_iter+0x16d/0x350
> [ 46.843554] __iomap_dio_rw+0x3be/0x830
> [ 46.844150] iomap_dio_rw+0x9/0x30
> [ 46.844680] ext4_file_write_iter+0x597/0x800
> [ 46.845346] do_iter_readv_writev+0xe1/0x150
> [ 46.846029] do_iter_write+0x86/0x1f0
> [ 46.846638] vfs_writev+0x96/0x190
> [ 46.847176] ? do_pwritev+0x98/0xd0
> [ 46.847721] do_pwritev+0x98/0xd0
> [ 46.848230] ? syscall_trace_enter.isra.19+0x130/0x1b0
> [ 46.849028] do_syscall_64+0x42/0xf0
> [ 46.849590] entry_SYSCALL_64_after_hwframe+0x6f/0x77
> [ 46.850405] RIP: 0033:0x7f726df9666f
> [ 46.850964] Code: d5 41 54 49 89 f4 55 89 fd 53 44 89 c3 48 83 ec
> 18 80 3d bb fd 0b 00 00 74 2a 45 89 c1 49 89 ca 45 31 c0 b8 48 01 00
> 00 0f 05 <48> 3d 00 f0 ff ff 76 5c 48 8b 15 7a 77 0b 00 f7 d8 64 89 02
> 48 83
> [ 46.854020] RSP: 002b:00007fff28b9bff0 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000148
> [ 46.855178] RAX: ffffffffffffffda RBX: 0000000000000024 RCX:
> 00007f726df9666f
> [ 46.856248] RDX: 0000000000000001 RSI: 00007fff28b9c050 RDI:
> 0000000000000003
> [ 46.857303] RBP: 0000000000000003 R08: 0000000000000000 R09:
> 0000000000000024
> [ 46.858365] R10: 0000000000000000 R11: 0000000000000246 R12:
> 00007fff28b9c050
> [ 46.859407] R13: 0000000000000001 R14: 0000000000000000 R15:
> 00007f726e08aa60
> [ 46.860448] </TASK>
> [ 46.860797] ---[ end trace 0000000000000000 ]---
> [ 46.861497] EXT4-fs warning (device sda):
> ext4_map_blocks_aligned:520: Returned extent couldn't satisfy
> alignment requirements
> main wrote -1 bytes at offset 0
> [ 46.863855] test-pwritev2 (158) used greatest stack depth: 11920 bytes
> left
> #
>
> Please note that I tested on my own dev branch, which contains changes over
> [1], but I expect it would not make a difference for this test.

Hmm this should not ideally happen, can you please share your test
script with me if possible?

>
>
> >
> > Currently this RFC does not impose any restrictions for atomic and non-atomic
> > allocations to any inode, which also leaves policy decisions to user-space
> > as much as possible. So, for example, the user space can:
> >
> > * Do an atomic direct IO at any alignment and size provided it
> > satisfies underlying device constraints. The only restriction for now
> > is that it should be power of 2 len and atleast of FS block size.
> >
> > * Do any combination of non atomic and atomic writes on the same file
> > in any order. As long as the user space is passing the RWF_ATOMIC flag
> > to pwritev2() it is guaranteed to do an atomic IO (or fail if not
> > possible).
> >
> > There are some TODOs on the allocator side which are remaining like...
> >
> > 1. Fallback to original request size when normalized request size (due to
> > preallocation) allocation is not possible.
> > 2. Testing some edge cases.
> >
> > But since all the basic test scenarios were covered, hence we wanted to get
> > this RFC out for discussion on atomic write support for DIO in ext4.
> >
> > Further points for discussion -
> >
> > 1. We might need an inode flag to identify that the inode has blocks/extents
> > atomically allocated. So that other userspace tools do not move the blocks of
> > the inode for e.g. during resize/fsck etc.
> > a. Should inode be marked as atomic similar to how we have IS_DAX(inode)
> > implementation? Any thoughts?
> >
> > 2. Should there be support for open flags like O_ATOMIC. So that in case if
> > user wants to do only atomic writes to an open fd, then all writes can be
> > considered atomic.
> >
> > 3. Do we need to have any feature compat flags for FS? (IMO) It doesn't look
> > like since say if there are block allocations done which were done atomically,
> > it should not matter to FS w.r.t compatibility.
> >
> > 4. Mostly aligned allocations are required when we don't have data=journal
> > mode. So should we return -EIO with data journalling mode for DIO request?
> >
> > Script to test using pwritev2() can be found here:
> > https://gist.github.com/OjaswinM/e67accee3cbb7832bd3f1a9543c01da9
>
> Please note that the posix_memalign() call in the program should PAGE align.

Why do you say that? direct IO seems to be working when the userspace
buffer is 512 byte aligned, am I missing something?

Regards,
ojaswin

PS: I'm on vacation this week so might be a bit slow to address all the
review comments.

>
> Thanks,
> John

2023-12-04 14:50:10

by John Garry

[permalink] [raw]
Subject: Re: [RFC 0/7] ext4: Allocator changes for atomic write support with DIO

On 04/12/2023 13:38, Ojaswin Mujoo wrote:
>> So are we supposed to be doing atomic writes on unwritten ranges only in the
>> file to get the aligned allocations?
> If we do an atomic write on a hole, ext4 will give us an aligned extent
> provided the hole is big enough to accomodate it.
>
> However, if we do an atomic write on a existing extent (written or
> unwritten) ext4 would check if it satisfies the alignment and length
> requirement and returns an error if it doesn't.

This seems a rather big drawback.

> Since we don't have cow
> like functionality afaik the only way we could let this kind of write go
> through is by punching the pre-existing extent which is not something we
> can directly do in the same write operation, hence we return error.

Well, as you prob saw, for XFS we were relying on forcing extent
alignment, and not CoW (yet).

>
>> I actually tried that, and I got a WARN triggered:
>>
>> # mkfs.ext4 /dev/sda
>> mke2fs 1.46.5 (30-Dec-2021)
>> Creating filesystem with 358400 1k blocks and 89760 inodes
>> Filesystem UUID: 7543a44b-2957-4ddc-9d4a-db3a5fd019c9
>> Superblock backups stored on blocks:
>> 8193, 24577, 40961, 57345, 73729, 204801, 221185
>>
>> Allocating group tables: done
>> Writing inode tables: done
>> Creating journal (8192 blocks): done
>> Writing superblocks and filesystem accounting information: done
>>
>> [ 12.745889] mkfs.ext4 (150) used greatest stack depth: 13304 bytes left
>> # mount /dev/sda mnt
>> [ 12.798804] EXT4-fs (sda): mounted filesystem
>> 7543a44b-2957-4ddc-9d4a-db3a5fd019c9 r/w with ordered data mode. Quota
>> mode: none.
>> # touch mnt/file
>> #
>> # /test-statx -a /root/mnt/file
>> statx(/root/mnt/file) = 0
>> dump_statx results=5fff
>> Size: 0 Blocks: 0 IO Block: 1024 regular file
>> Device: 08:00 Inode: 12 Links: 1
>> Access: (0644/-rw-r--r--) Uid: 0 Gid: 0
>> Access: 2023-12-04 10:27:40.002848720+0000
>> Modify: 2023-12-04 10:27:40.002848720+0000
>> Change: 2023-12-04 10:27:40.002848720+0000
>> Birth: 2023-12-04 10:27:40.002848720+0000
>> stx_attributes_mask=0x703874
>> STATX_ATTR_WRITE_ATOMIC set
>> unit min: 1024
>> uunit max: 524288
>> Attributes: 0000000000400000 (........ ........ ........ ........
>> ........ .?--.... ..---... .---.-..)
>> #
>>
>>
>>
>> looks ok so far, then write 4KB at offset 0:
>>
>> # /test-pwritev2 -a -d -p 0 -l 4096 /root/mnt/file
>> file=/root/mnt/file write_size=4096 offset=0 o_flags=0x4002 wr_flags=0x24

...

>> Please note that I tested on my own dev branch, which contains changes over
>> [1], but I expect it would not make a difference for this test.
> Hmm this should not ideally happen, can you please share your test
> script with me if possible?

It's doing nothing special, just RWF_ATOMIC flag is set for DIO write:

https://github.com/johnpgarry/linux/blob/236870d48ecb19c1cf89dc439e188182a0524cd4/samples/vfs/test-pwritev2.c

>>>
>>> Script to test using pwritev2() can be found here:
>>> https://urldefense.com/v3/__https://gist.github.com/OjaswinM/e67accee3cbb7832bd3f1a9543c01da9__;!!ACWV5N9M2RV99hQ!Lr0j4iDHrfXisXOGZ82HNPefBtVDDGe9zbLhey7rRDfPE7A_tsrrQ9Dw_4Ng_qS7xTGCZaEWBKtd6pqA_LIBfA$
>> Please note that the posix_memalign() call in the program should PAGE align.
> Why do you say that? direct IO seems to be working when the userspace
> buffer is 512 byte aligned, am I missing something?

ah, sorry, if you just use 1x IOV vector then no special alignment are
required, so ignore this. Indeed, I need to improve kernel checks for
alignment anyway.

Thanks,
John


2023-12-04 18:17:36

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [RFC 1/7] iomap: Don't fall back to buffered write if the write is atomic

On Mon, Dec 04, 2023 at 09:02:56AM +0000, John Garry wrote:
> On 01/12/2023 22:07, Dave Chinner wrote:
> > > Sure, and I think that we need a better story for supporting buffered IO for
> > > atomic writes.
> > >
> > > Currently we have:
> > > - man pages tell us RWF_ATOMIC is only supported for direct IO
> > > - statx gives atomic write unit min/max, not explicitly telling us it's for
> > > direct IO
> > > - RWF_ATOMIC is ignored for !O_DIRECT
> > >
> > > So I am thinking of expanding statx support to enable querying of atomic
> > > write capabilities for buffered IO and direct IO separately.
> > You're over complicating this way too much by trying to restrict the
> > functionality down to just what you want to implement right now.
> >
> > RWF_ATOMIC is no different to RWF_NOWAIT. The API doesn't decide
> > what can be supported - the filesystems themselves decide what part
> > of the API they can support and implement those pieces.
>
> Sure, but for RWF_ATOMIC we still have the associated statx call to tell us
> whether atomic writes are supported for a file and the specific range
> capability.
>
> >
> > TO go back to RWF_NOWAIT, for a long time we (XFS) only supported
> > RWF_NOWAIT on DIO, and buffered reads and writes were given
> > -EOPNOTSUPP by the filesystem. Then other filesystems started
> > supporting DIO with RWF_NOWAIT. Then buffered read support was added
> > to the page cache and XFS, and as other filesystems were converted
> > they removed the RWF_NOWAIT exclusion check from their read IO
> > paths.
> >
> > We are now in the same place with buffered write support for
> > RWF_NOWAIT. XFS, the page cache and iomap allow buffered writes w/
> > RWF_NOWAIT, but ext4, btrfs and f2fs still all return -EOPNOTSUPP
> > because they don't support non-blocking buffered writes yet.
> >
> > This is the same model we should be applying with RWF_ATOMIC - we
> > know that over time we'll be able to expand support for atomic
> > writes across both direct and buffered IO, so we should not be
> > restricting the API or infrastructure to only allow RWF_ATOMIC w/
> > DIO.
>
> Agreed.
>
> > Just have the filesystems reject RWF_ATOMIC w/ -EOPNOTSUPP if
> > they don't support it,
>
> Yes, I was going to add this regardless.
>
> > and for those that do it is conditional on
> > whther the filesystem supports it for the given type of IO being
> > done.
> >
> > Seriously - an application can easily probe for RWF_ATOMIC support
> > without needing information to be directly exposed in statx() - just
> > open a O_TMPFILE, issue the type of RWF_ATOMIC IO you require to be
> > supported, and if it returns -EOPNOTSUPP then it you can't use
> > RWF_ATOMIC optimisations in the application....
>
> ok, if that is the done thing.
>
> So I can't imagine that atomic write unit range will be different for direct
> IO and buffered IO (ignoring for a moment Christoph's idea for CoW always
> for no HW offload) when supported. But it seems that we may have a scenario
> where statx tells is that atomic writes are supported for a file, and a DIO
> write succeeds and a buffered IO write may return -EOPNOTSUPP. If that's
> acceptable then I'll work towards that.
>
> If we could just run statx on a file descriptor here then that would be
> simpler...

statx(fd, "", AT_EMPTY_PATH, ...); ?

--D

> Thanks,
> John
>
>
>

2023-12-12 13:11:25

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC 0/7] ext4: Allocator changes for atomic write support with DIO

On Tue, Dec 12, 2023 at 07:46:51AM +0000, John Garry wrote:
> It is assumed that the user will fallocate/dd the complete file before
> issuing atomic writes, and we will have extent alignment and length as
> required.

I don't think that's a long time maintainable usage model.

2023-12-12 15:18:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC 0/7] ext4: Allocator changes for atomic write support with DIO

On Tue, Dec 12, 2023 at 05:10:42AM -0800, Christoph Hellwig wrote:
> On Tue, Dec 12, 2023 at 07:46:51AM +0000, John Garry wrote:
> > It is assumed that the user will fallocate/dd the complete file before
> > issuing atomic writes, and we will have extent alignment and length as
> > required.
>
> I don't think that's a long time maintainable usage model.

For databases that are trying to use this to significantly improve
their performance by eliminating double writes, the allocation and
writes are being done by a single process. So for *that* use case, it
is quite maintainable.

Cheers,

- Ted

2023-12-12 16:11:33

by John Garry

[permalink] [raw]
Subject: Re: [RFC 0/7] ext4: Allocator changes for atomic write support with DIO

On 12/12/2023 13:10, Christoph Hellwig wrote:
> On Tue, Dec 12, 2023 at 07:46:51AM +0000, John Garry wrote:
>> It is assumed that the user will fallocate/dd the complete file before
>> issuing atomic writes, and we will have extent alignment and length as
>> required.
> I don't think that's a long time maintainable usage model.

ok, sure - as I may have mentioned, this was even causing issues in porting.

I sent the v2 series without XFS support, but the same API proposal as
before. Let's discuss any potential API changes there.

Thanks,
John

2023-12-13 06:00:31

by Ojaswin Mujoo

[permalink] [raw]
Subject: Re: [RFC 0/7] ext4: Allocator changes for atomic write support with DIO

On Tue, Dec 12, 2023 at 07:46:51AM +0000, John Garry wrote:
> On 11/12/2023 10:54, Ojaswin Mujoo wrote:
> > > This seems a rather big drawback.
> > So if I'm not wrong, we force the extent alignment as well as the size
> > of the extent in xfs right.
>
> For XFS in my v1 patchset, we only force alignment (but not size).
>
> It is assumed that the user will fallocate/dd the complete file before
> issuing atomic writes, and we will have extent alignment and length as
> required.
>
> However - as we have seen with a trial user - it can create a problem if we
> don't do that and we write 4K and then overwrite with a 16K atomic write to
> a file, as 2x extents may be allocated for the complete 16K and it cannot be
> issued as a single BIO.

So currently, if we don't fallocate beforehand in xfs and the user
tries to do the 16k overwrite to an offset having a 4k extent, how are
we handling it?

Here ext4 will return an error indicating atomic write can't happen at
this particular offset. The way I see it is if the user passes atomic
flag to pwritev2 and we are unable to ensure atomicity for any reason we
return error, which seems like a fair approach for a generic interface.

>
> >
> > We didn't want to overly restrict the users of atomic writes by
> > forcing
> > the extents to be of a certain alignment/size irrespective of the
> > size
> > of write. The design in this patchset provides this flexibility at
> > the
> > cost of some added precautions that the user should take (eg not
> > doing
> > an atomic write on a pre existing unaligned extent etc).
>
> Doesn't bigalloc already give you what you require here?

Yes, but its an mkfs time feature and it also applies to each an every
file which might not be desirable for all use cases.

Regards,
ojaswin

2023-12-13 06:43:01

by Ojaswin Mujoo

[permalink] [raw]
Subject: Re: [RFC 0/7] ext4: Allocator changes for atomic write support with DIO

On Mon, Dec 11, 2023 at 04:24:14PM +0530, Ojaswin Mujoo wrote:
> > > > looks ok so far, then write 4KB at offset 0:
> > > >
> > > > # /test-pwritev2 -a -d -p 0 -l 4096 /root/mnt/file
> > > > file=/root/mnt/file write_size=4096 offset=0 o_flags=0x4002 wr_flags=0x24
> >
> > ...
> >
> > > > Please note that I tested on my own dev branch, which contains changes over
> > > > [1], but I expect it would not make a difference for this test.
> > > Hmm this should not ideally happen, can you please share your test
> > > script with me if possible?
> >
> > It's doing nothing special, just RWF_ATOMIC flag is set for DIO write:
> >
> > https://github.com/johnpgarry/linux/blob/236870d48ecb19c1cf89dc439e188182a0524cd4/samples/vfs/test-pwritev2.c
>
> Thanks for the script, will try to replicate this today and get back to
> you.
>

Hi John,

So I don't seem to be able to hit the warn on:

$ touch mnt/testfile
$ ./test-pwritev2 -a -d -p 0 -l 4096 mnt/testfile

file=mnt/testfile write_size=4096 offset=0 o_flags=0x4002 wr_flags=0x24
main wrote 4096 bytes at offset 0

$ filefrag -v mnt/testfile

Filesystem type is: ef53
File size of testfile is 4096 (1 block of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 0: 32900.. 32900: 1: last,eof

$ ./test-pwritev2 -a -d -p 8192 -l 8192 mnt/testfile

file=mnt/testfile write_size=8192 offset=8192 o_flags=0x4002 wr_flags=0x24
main wrote 8192 bytes at offset 8192

$ filefrag -v mnt/testfile

Filesystem type is: ef53
File size of mnt/testfile is 16384 (4 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 0: 32900.. 32900: 1:
1: 2.. 3: 33288.. 33289: 2: 32902: last,eof
mnt/testfile: 2 extents found

Not sure why you are hitting the WARN_ON. The tree I used is:

Latest ted/dev + your atomic patchset v1 + this patchset

Regards,
ojaswin