LinuxLists.cc - [PATCH] loop: Disable fallocate() zero and discard if not supported

2024-06-07 09:16:40

Subject: [PATCH] loop: Disable fallocate() zero and discard if not supported

If fallcate is implemented but zero and discard operations are not
supported by the filesystem the backing file is on we continue to fill
dmesg with errors from the blk_mq_end_request() since each time we call
fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate()
ends up propagated into the block layer. In the end syscall succeeds
since the blkdev_issue_zeroout() falls back to writing zeroes which
makes the errors even more misleading and confusing.

How to reproduce:

1. make sure /tmp is mounted as tmpfs
2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100
3. losetup /dev/loop0 /tmp/disk.img
4. mkfs.ext2 /dev/loop0
5. dmesg |tail

[710690.898214] operation not supported error, dev loop0, sector 204672 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.898279] operation not supported error, dev loop0, sector 522 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.898603] operation not supported error, dev loop0, sector 16906 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.898917] operation not supported error, dev loop0, sector 32774 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.899218] operation not supported error, dev loop0, sector 49674 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.899484] operation not supported error, dev loop0, sector 65542 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.899743] operation not supported error, dev loop0, sector 82442 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.900015] operation not supported error, dev loop0, sector 98310 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.900276] operation not supported error, dev loop0, sector 115210 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.900546] operation not supported error, dev loop0, sector 131078 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0

This patch changes the lo_fallocate() to clear the flags for zero and
discard operations if we get EOPNOTSUPP from the backing file fallocate
callback, that way we at least stop spewing errors after the first
unsuccessful try.

CC: Jan Kara <[email protected]>
Signed-off-by: Cyril Hrubis <[email protected]>
---
drivers/block/loop.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 93780f41646b..315c76e3ef4a 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -320,6 +320,21 @@ static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
ret = file->f_op->fallocate(file, mode, pos, blk_rq_bytes(rq));
if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP))
return -EIO;
+
+ if (ret == -EOPNOTSUPP) {
+ struct queue_limits lim = queue_limits_start_update(lo->lo_queue);
+
+ if (mode & FALLOC_FL_ZERO_RANGE)
+ lim.max_write_zeroes_sectors = 0;
+
+ if (mode & FALLOC_FL_PUNCH_HOLE) {
+ lim.max_hw_discard_sectors = 0;
+ lim.discard_granularity = 0;
+ }
+
+ queue_limits_commit_update(lo->lo_queue, &lim);
+ }
+
return ret;
}

--
2.44.2

2024-06-08 08:47:10

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH] loop: Disable fallocate() zero and discard if not supported

On Fri, Jun 07, 2024 at 11:15:55AM +0200, Cyril Hrubis wrote:
> If fallcate is implemented but zero and discard operations are not
> supported by the filesystem the backing file is on we continue to fill
> dmesg with errors from the blk_mq_end_request() since each time we call
> fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate()
> ends up propagated into the block layer. In the end syscall succeeds
> since the blkdev_issue_zeroout() falls back to writing zeroes which
> makes the errors even more misleading and confusing.
>
> How to reproduce:
>
> 1. make sure /tmp is mounted as tmpfs
> 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100
> 3. losetup /dev/loop0 /tmp/disk.img
> 4. mkfs.ext2 /dev/loop0
> 5. dmesg |tail

Can you wire this up for blktests?

> + if (ret == -EOPNOTSUPP) {
> + struct queue_limits lim = queue_limits_start_update(lo->lo_queue);
> +
> + if (mode & FALLOC_FL_ZERO_RANGE)
> + lim.max_write_zeroes_sectors = 0;
> +
> + if (mode & FALLOC_FL_PUNCH_HOLE) {
> + lim.max_hw_discard_sectors = 0;
> + lim.discard_granularity = 0;
> + }
> +
> + queue_limits_commit_update(lo->lo_queue, &lim);

Please split this out into a separate helper to keep it out of the
main fast path I/O handling. A little comment that we are
optimistically trying these if ->fallocate is support and might have
to paddle back here would also be useful.

(and maybe one day we figure out a way for the file system to
advertise what fallocate modes it actually supports..)

2024-06-10 02:47:44

by Chaitanya Kulkarni

[permalink] [raw]

Subject: Re: [PATCH] loop: Disable fallocate() zero and discard if not supported

On 6/7/2024 10:42 PM, Christoph Hellwig wrote:
> On Fri, Jun 07, 2024 at 11:15:55AM +0200, Cyril Hrubis wrote:
>> If fallcate is implemented but zero and discard operations are not
>> supported by the filesystem the backing file is on we continue to fill
>> dmesg with errors from the blk_mq_end_request() since each time we call
>> fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate()
>> ends up propagated into the block layer. In the end syscall succeeds
>> since the blkdev_issue_zeroout() falls back to writing zeroes which
>> makes the errors even more misleading and confusing.
>>
>> How to reproduce:
>>
>> 1. make sure /tmp is mounted as tmpfs
>> 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100
>> 3. losetup /dev/loop0 /tmp/disk.img
>> 4. mkfs.ext2 /dev/loop0
>> 5. dmesg |tail
> Can you wire this up for blktests?

Please CC Shin'ichiro Kawasaki <[email protected]> and me when
you do that ..

-ck

2024-06-10 14:14:19

by Cyril Hrubis

[permalink] [raw]

Subject: Re: [PATCH] loop: Disable fallocate() zero and discard if not supported

Hi!
> > How to reproduce:
> >
> > 1. make sure /tmp is mounted as tmpfs
> > 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100
> > 3. losetup /dev/loop0 /tmp/disk.img
> > 4. mkfs.ext2 /dev/loop0
> > 5. dmesg |tail
>
> Can you wire this up for blktests?

Will try.

> > + if (ret == -EOPNOTSUPP) {
> > + struct queue_limits lim = queue_limits_start_update(lo->lo_queue);
> > +
> > + if (mode & FALLOC_FL_ZERO_RANGE)
> > + lim.max_write_zeroes_sectors = 0;
> > +
> > + if (mode & FALLOC_FL_PUNCH_HOLE) {
> > + lim.max_hw_discard_sectors = 0;
> > + lim.discard_granularity = 0;
> > + }
> > +
> > + queue_limits_commit_update(lo->lo_queue, &lim);
>
> Please split this out into a separate helper to keep it out of the
> main fast path I/O handling. A little comment that we are
> optimistically trying these if ->fallocate is support and might have
> to paddle back here would also be useful.

Will do.

Do we need noinline attribute for the function as well or unlikely() in
the if condition?

> (and maybe one day we figure out a way for the file system to
> advertise what fallocate modes it actually supports..)

One of my ideas was to try fallocate() with zero size in the
loop_reconfigure_limits() to see if we get EOPNOTSUPP but for that to
work we would have to make sure that we do not bail early on zero size
in the vfs layer...

--
Cyril Hrubis
[email protected]

2024-06-10 14:15:25

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH] loop: Disable fallocate() zero and discard if not supported

On Mon, Jun 10, 2024 at 04:13:23PM +0200, Cyril Hrubis wrote:
> Do we need noinline attribute for the function as well or unlikely() in
> the if condition?

unlikely sounds like the right thing here.

> > (and maybe one day we figure out a way for the file system to
> > advertise what fallocate modes it actually supports..)
>
> One of my ideas was to try fallocate() with zero size in the
> loop_reconfigure_limits() to see if we get EOPNOTSUPP but for that to
> work we would have to make sure that we do not bail early on zero size
> in the vfs layer...

And the VFS layer doesn't know, it has to go all the way into the
file system..