2021-02-12 09:42:00

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH] mmp: do not use O_DIRECT when working with regular file

Currently the mmp block is read using O_DIRECT to avoid any caching tha
may be done by the VM. However when working with regular files this
creates alignment issues when the device of the host file system has
sector size smaller than the blocksize of the file system in the file
we're working with.

This can be reproduced with t_mmp_fail test when run on the device with
4k sector size because the mke2fs fails when trying to read the mmp
block.

Fix it by disabling O_DIRECT when working with regular file. I don't
think there is any risk of doing so since the file system layer, unlike
shared block device, should guarantee cache consistency.

Signed-off-by: Lukas Czerner <[email protected]>
---
lib/ext2fs/mmp.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
index c21ae272..1ac22194 100644
--- a/lib/ext2fs/mmp.c
+++ b/lib/ext2fs/mmp.c
@@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
* regardless of how the io_manager is doing reads, to avoid caching of
* the MMP block by the io_manager or the VM. It needs to be fresh. */
if (fs->mmp_fd <= 0) {
+ struct stat st;
int flags = O_RDWR | O_DIRECT;

-retry:
+ /*
+ * There is no reason for using O_DIRECT if we're working with
+ * regular file. Disabling it also avoids problems with
+ * alignment when the device of the host file system has sector
+ * size smaller than blocksize of the fs we're working with.
+ */
+ if (stat(fs->device_name, &st) == 0 &&
+ S_ISREG(st.st_mode))
+ flags &= ~O_DIRECT;
+
fs->mmp_fd = open(fs->device_name, flags);
if (fs->mmp_fd < 0) {
- struct stat st;
-
- /* Avoid O_DIRECT for filesystem image files if open
- * fails, since it breaks when running on tmpfs. */
- if (errno == EINVAL && (flags & O_DIRECT) &&
- stat(fs->device_name, &st) == 0 &&
- S_ISREG(st.st_mode)) {
- flags &= ~O_DIRECT;
- goto retry;
- }
retval = EXT2_ET_MMP_OPEN_DIRECT;
goto out;
}
--
2.26.2


2021-02-16 21:25:40

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] mmp: do not use O_DIRECT when working with regular file

On 2/12/21 3:37 AM, Lukas Czerner wrote:
> Currently the mmp block is read using O_DIRECT to avoid any caching tha
> may be done by the VM. However when working with regular files this
> creates alignment issues when the device of the host file system has
> sector size smaller than the blocksize of the file system in the file
> we're working with.
>
> This can be reproduced with t_mmp_fail test when run on the device with
> 4k sector size because the mke2fs fails when trying to read the mmp
> block.
>
> Fix it by disabling O_DIRECT when working with regular file. I don't
> think there is any risk of doing so since the file system layer, unlike
> shared block device, should guarantee cache consistency.
>
> Signed-off-by: Lukas Czerner <[email protected]>
> ---
> lib/ext2fs/mmp.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
> index c21ae272..1ac22194 100644
> --- a/lib/ext2fs/mmp.c
> +++ b/lib/ext2fs/mmp.c
> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
> * regardless of how the io_manager is doing reads, to avoid caching of
> * the MMP block by the io_manager or the VM. It needs to be fresh. */
> if (fs->mmp_fd <= 0) {
> + struct stat st;
> int flags = O_RDWR | O_DIRECT;
>
> -retry:
> + /*
> + * There is no reason for using O_DIRECT if we're working with
> + * regular file. Disabling it also avoids problems with
> + * alignment when the device of the host file system has sector
> + * size smaller than blocksize of the fs we're working with.

I think the problem is when the host filesystem that contains the image is on
a device with a logical sector size which is /larger/ than the image filesystem's
block size, right? Not smaller?

Because then you might not be able to do an image-filesystem-block-aligned direct
IO on it, if it's sub-logical-block-size for the host filesystem/device, and lands
within the larger host sector at an offset?

otherwise, this seems at least as reasonable to me as the previous tmpfs work
around, so other than the question about the comment,

Reviewed-by: Eric Sandeen <[email protected]>


> + */
> + if (stat(fs->device_name, &st) == 0 &&
> + S_ISREG(st.st_mode))
> + flags &= ~O_DIRECT;
> +
> fs->mmp_fd = open(fs->device_name, flags);
> if (fs->mmp_fd < 0) {
> - struct stat st;
> -
> - /* Avoid O_DIRECT for filesystem image files if open
> - * fails, since it breaks when running on tmpfs. */
> - if (errno == EINVAL && (flags & O_DIRECT) &&
> - stat(fs->device_name, &st) == 0 &&
> - S_ISREG(st.st_mode)) {
> - flags &= ~O_DIRECT;
> - goto retry;
> - }
> retval = EXT2_ET_MMP_OPEN_DIRECT;
> goto out;
> }
>

2021-02-16 21:58:00

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH] mmp: do not use O_DIRECT when working with regular file

On Tue, Feb 16, 2021 at 03:24:00PM -0600, Eric Sandeen wrote:
> On 2/12/21 3:37 AM, Lukas Czerner wrote:
> > Currently the mmp block is read using O_DIRECT to avoid any caching tha
> > may be done by the VM. However when working with regular files this
> > creates alignment issues when the device of the host file system has
> > sector size smaller than the blocksize of the file system in the file
> > we're working with.
> >
> > This can be reproduced with t_mmp_fail test when run on the device with
> > 4k sector size because the mke2fs fails when trying to read the mmp
> > block.
> >
> > Fix it by disabling O_DIRECT when working with regular file. I don't
> > think there is any risk of doing so since the file system layer, unlike
> > shared block device, should guarantee cache consistency.
> >
> > Signed-off-by: Lukas Czerner <[email protected]>
> > ---
> > lib/ext2fs/mmp.c | 22 +++++++++++-----------
> > 1 file changed, 11 insertions(+), 11 deletions(-)
> >
> > diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
> > index c21ae272..1ac22194 100644
> > --- a/lib/ext2fs/mmp.c
> > +++ b/lib/ext2fs/mmp.c
> > @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
> > * regardless of how the io_manager is doing reads, to avoid caching of
> > * the MMP block by the io_manager or the VM. It needs to be fresh. */
> > if (fs->mmp_fd <= 0) {
> > + struct stat st;
> > int flags = O_RDWR | O_DIRECT;
> >
> > -retry:
> > + /*
> > + * There is no reason for using O_DIRECT if we're working with
> > + * regular file. Disabling it also avoids problems with
> > + * alignment when the device of the host file system has sector
> > + * size smaller than blocksize of the fs we're working with.
>
> I think the problem is when the host filesystem that contains the image is on
> a device with a logical sector size which is /larger/ than the image filesystem's
> block size, right? Not smaller?

Yeah, it is supposed to be *larger*, of course. If it is smaller, then
there is no problem. Thanks for pointing this out I'll change the
comment and the description.

>
> Because then you might not be able to do an image-filesystem-block-aligned direct
> IO on it, if it's sub-logical-block-size for the host filesystem/device, and lands
> within the larger host sector at an offset?
>
> otherwise, this seems at least as reasonable to me as the previous tmpfs work
> around, so other than the question about the comment,
>
> Reviewed-by: Eric Sandeen <[email protected]>

Thanks!
-Lukas

>
>
> > + */
> > + if (stat(fs->device_name, &st) == 0 &&
> > + S_ISREG(st.st_mode))
> > + flags &= ~O_DIRECT;
> > +
> > fs->mmp_fd = open(fs->device_name, flags);
> > if (fs->mmp_fd < 0) {
> > - struct stat st;
> > -
> > - /* Avoid O_DIRECT for filesystem image files if open
> > - * fails, since it breaks when running on tmpfs. */
> > - if (errno == EINVAL && (flags & O_DIRECT) &&
> > - stat(fs->device_name, &st) == 0 &&
> > - S_ISREG(st.st_mode)) {
> > - flags &= ~O_DIRECT;
> > - goto retry;
> > - }
> > retval = EXT2_ET_MMP_OPEN_DIRECT;
> > goto out;
> > }
> >
>

2021-02-18 11:36:45

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2] mmp: do not use O_DIRECT when working with regular file

Currently the mmp block is read using O_DIRECT to avoid any caching that
may be done by the VM. However when working with regular files this
creates alignment issues when the device of the host file system has
sector size larger than the blocksize of the file system in the file
we're working with.

This can be reproduced with t_mmp_fail test when run on the device with
4k sector size because the mke2fs fails when trying to read the mmp
block.

Fix it by disabling O_DIRECT when working with regular files. I don't
think there is any risk of doing so since the file system layer, unlike
shared block device, should guarantee cache consistency.

Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Eric Sandeen <[email protected]>
---
v2: Fix comment - it avoids problems when the sector size is larger not
smaller than blocksize

lib/ext2fs/mmp.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
index c21ae272..cca2873b 100644
--- a/lib/ext2fs/mmp.c
+++ b/lib/ext2fs/mmp.c
@@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
* regardless of how the io_manager is doing reads, to avoid caching of
* the MMP block by the io_manager or the VM. It needs to be fresh. */
if (fs->mmp_fd <= 0) {
+ struct stat st;
int flags = O_RDWR | O_DIRECT;

-retry:
+ /*
+ * There is no reason for using O_DIRECT if we're working with
+ * regular file. Disabling it also avoids problems with
+ * alignment when the device of the host file system has sector
+ * size larger than blocksize of the fs we're working with.
+ */
+ if (stat(fs->device_name, &st) == 0 &&
+ S_ISREG(st.st_mode))
+ flags &= ~O_DIRECT;
+
fs->mmp_fd = open(fs->device_name, flags);
if (fs->mmp_fd < 0) {
- struct stat st;
-
- /* Avoid O_DIRECT for filesystem image files if open
- * fails, since it breaks when running on tmpfs. */
- if (errno == EINVAL && (flags & O_DIRECT) &&
- stat(fs->device_name, &st) == 0 &&
- S_ISREG(st.st_mode)) {
- flags &= ~O_DIRECT;
- goto retry;
- }
retval = EXT2_ET_MMP_OPEN_DIRECT;
goto out;
}
--
2.26.2

2021-02-18 22:21:52

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH v2] mmp: do not use O_DIRECT when working with regular file

On Feb 18, 2021, at 2:51 AM, Lukas Czerner <[email protected]> wrote:
>
> Currently the mmp block is read using O_DIRECT to avoid any caching that
> may be done by the VM. However when working with regular files this
> creates alignment issues when the device of the host file system has
> sector size larger than the blocksize of the file system in the file
> we're working with.
>
> This can be reproduced with t_mmp_fail test when run on the device with
> 4k sector size because the mke2fs fails when trying to read the mmp
> block.
>
> Fix it by disabling O_DIRECT when working with regular files. I don't
> think there is any risk of doing so since the file system layer, unlike
> shared block device, should guarantee cache consistency.
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Reviewed-by: Eric Sandeen <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
> v2: Fix comment - it avoids problems when the sector size is larger not
> smaller than blocksize
>
> lib/ext2fs/mmp.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
> index c21ae272..cca2873b 100644
> --- a/lib/ext2fs/mmp.c
> +++ b/lib/ext2fs/mmp.c
> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
> * regardless of how the io_manager is doing reads, to avoid caching of
> * the MMP block by the io_manager or the VM. It needs to be fresh. */
> if (fs->mmp_fd <= 0) {
> + struct stat st;
> int flags = O_RDWR | O_DIRECT;
>
> -retry:
> + /*
> + * There is no reason for using O_DIRECT if we're working with
> + * regular file. Disabling it also avoids problems with
> + * alignment when the device of the host file system has sector
> + * size larger than blocksize of the fs we're working with.
> + */
> + if (stat(fs->device_name, &st) == 0 &&
> + S_ISREG(st.st_mode))
> + flags &= ~O_DIRECT;
> +
> fs->mmp_fd = open(fs->device_name, flags);
> if (fs->mmp_fd < 0) {
> - struct stat st;
> -
> - /* Avoid O_DIRECT for filesystem image files if open
> - * fails, since it breaks when running on tmpfs. */
> - if (errno == EINVAL && (flags & O_DIRECT) &&
> - stat(fs->device_name, &st) == 0 &&
> - S_ISREG(st.st_mode)) {
> - flags &= ~O_DIRECT;
> - goto retry;
> - }
> retval = EXT2_ET_MMP_OPEN_DIRECT;
> goto out;
> }
> --
> 2.26.2
>


Cheers, Andreas






Attachments:
signature.asc (890.00 B)
Message signed with OpenPGP

2021-02-19 10:09:47

by Alexey Lyahkov

[permalink] [raw]
Subject: Re: [PATCH v2] mmp: do not use O_DIRECT when working with regular file

Andreas,

What about to disable a O_DIRECT global on any block devices in the e2fsprogs library as this don’t work on 4k disk drives at all ?
Instead of fixing an O_DIRECT access with patches sends early.


Alex

> 19 февр. 2021 г., в 1:20, Andreas Dilger <[email protected]> написал(а):
>
> On Feb 18, 2021, at 2:51 AM, Lukas Czerner <[email protected]> wrote:
>>
>> Currently the mmp block is read using O_DIRECT to avoid any caching that
>> may be done by the VM. However when working with regular files this
>> creates alignment issues when the device of the host file system has
>> sector size larger than the blocksize of the file system in the file
>> we're working with.
>>
>> This can be reproduced with t_mmp_fail test when run on the device with
>> 4k sector size because the mke2fs fails when trying to read the mmp
>> block.
>>
>> Fix it by disabling O_DIRECT when working with regular files. I don't
>> think there is any risk of doing so since the file system layer, unlike
>> shared block device, should guarantee cache consistency.
>>
>> Signed-off-by: Lukas Czerner <[email protected]>
>> Reviewed-by: Eric Sandeen <[email protected]>
>
> Reviewed-by: Andreas Dilger <[email protected]>
>
>> ---
>> v2: Fix comment - it avoids problems when the sector size is larger not
>> smaller than blocksize
>>
>> lib/ext2fs/mmp.c | 22 +++++++++++-----------
>> 1 file changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
>> index c21ae272..cca2873b 100644
>> --- a/lib/ext2fs/mmp.c
>> +++ b/lib/ext2fs/mmp.c
>> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
>> * regardless of how the io_manager is doing reads, to avoid caching of
>> * the MMP block by the io_manager or the VM. It needs to be fresh. */
>> if (fs->mmp_fd <= 0) {
>> + struct stat st;
>> int flags = O_RDWR | O_DIRECT;
>>
>> -retry:
>> + /*
>> + * There is no reason for using O_DIRECT if we're working with
>> + * regular file. Disabling it also avoids problems with
>> + * alignment when the device of the host file system has sector
>> + * size larger than blocksize of the fs we're working with.
>> + */
>> + if (stat(fs->device_name, &st) == 0 &&
>> + S_ISREG(st.st_mode))
>> + flags &= ~O_DIRECT;
>> +
>> fs->mmp_fd = open(fs->device_name, flags);
>> if (fs->mmp_fd < 0) {
>> - struct stat st;
>> -
>> - /* Avoid O_DIRECT for filesystem image files if open
>> - * fails, since it breaks when running on tmpfs. */
>> - if (errno == EINVAL && (flags & O_DIRECT) &&
>> - stat(fs->device_name, &st) == 0 &&
>> - S_ISREG(st.st_mode)) {
>> - flags &= ~O_DIRECT;
>> - goto retry;
>> - }
>> retval = EXT2_ET_MMP_OPEN_DIRECT;
>> goto out;
>> }
>> --
>> 2.26.2
>>
>
>
> Cheers, Andreas
>
>
>
>
>

2021-02-19 11:01:46

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH v2] mmp: do not use O_DIRECT when working with regular file

On Fri, Feb 19, 2021 at 01:08:17PM +0300, Alexey Lyashkov wrote:
> Andreas,
>
> What about to disable a O_DIRECT global on any block devices in the e2fsprogs library as this don’t work on 4k disk drives at all ?
> Instead of fixing an O_DIRECT access with patches sends early.

Why would it not work at all ? This is a fix for a specific problem and
I am not currently aware of ony other problems e2fsprogs should have
with 4k sector size drives. Do you have a specific problem in mind ?

Thanks!
-Lukas

>
>
> Alex
>
> > 19 февр. 2021 г., в 1:20, Andreas Dilger <[email protected]> написал(а):
> >
> > On Feb 18, 2021, at 2:51 AM, Lukas Czerner <[email protected]> wrote:
> >>
> >> Currently the mmp block is read using O_DIRECT to avoid any caching that
> >> may be done by the VM. However when working with regular files this
> >> creates alignment issues when the device of the host file system has
> >> sector size larger than the blocksize of the file system in the file
> >> we're working with.
> >>
> >> This can be reproduced with t_mmp_fail test when run on the device with
> >> 4k sector size because the mke2fs fails when trying to read the mmp
> >> block.
> >>
> >> Fix it by disabling O_DIRECT when working with regular files. I don't
> >> think there is any risk of doing so since the file system layer, unlike
> >> shared block device, should guarantee cache consistency.
> >>
> >> Signed-off-by: Lukas Czerner <[email protected]>
> >> Reviewed-by: Eric Sandeen <[email protected]>
> >
> > Reviewed-by: Andreas Dilger <[email protected]>
> >
> >> ---
> >> v2: Fix comment - it avoids problems when the sector size is larger not
> >> smaller than blocksize
> >>
> >> lib/ext2fs/mmp.c | 22 +++++++++++-----------
> >> 1 file changed, 11 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
> >> index c21ae272..cca2873b 100644
> >> --- a/lib/ext2fs/mmp.c
> >> +++ b/lib/ext2fs/mmp.c
> >> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
> >> * regardless of how the io_manager is doing reads, to avoid caching of
> >> * the MMP block by the io_manager or the VM. It needs to be fresh. */
> >> if (fs->mmp_fd <= 0) {
> >> + struct stat st;
> >> int flags = O_RDWR | O_DIRECT;
> >>
> >> -retry:
> >> + /*
> >> + * There is no reason for using O_DIRECT if we're working with
> >> + * regular file. Disabling it also avoids problems with
> >> + * alignment when the device of the host file system has sector
> >> + * size larger than blocksize of the fs we're working with.
> >> + */
> >> + if (stat(fs->device_name, &st) == 0 &&
> >> + S_ISREG(st.st_mode))
> >> + flags &= ~O_DIRECT;
> >> +
> >> fs->mmp_fd = open(fs->device_name, flags);
> >> if (fs->mmp_fd < 0) {
> >> - struct stat st;
> >> -
> >> - /* Avoid O_DIRECT for filesystem image files if open
> >> - * fails, since it breaks when running on tmpfs. */
> >> - if (errno == EINVAL && (flags & O_DIRECT) &&
> >> - stat(fs->device_name, &st) == 0 &&
> >> - S_ISREG(st.st_mode)) {
> >> - flags &= ~O_DIRECT;
> >> - goto retry;
> >> - }
> >> retval = EXT2_ET_MMP_OPEN_DIRECT;
> >> goto out;
> >> }
> >> --
> >> 2.26.2
> >>
> >
> >
> > Cheers, Andreas
> >
> >
> >
> >
> >
>

2021-02-19 11:50:40

by Alexey Lyahkov

[permalink] [raw]
Subject: Re: [PATCH v2] mmp: do not use O_DIRECT when working with regular file

Lukas,

because e2fsprogs have an bad assumption about IO size for the O_DIRECT case.
and because library uses a code like
>>
set_block_size(1k);
seek(fs, 1);
read_block();
>>>
which caused an 1k read inside of 4k disk block size not aligned by block size, which is prohibited and caused an error report.

Reference to patch.
https://patchwork.ozlabs.org/project/linux-ext4/patch/[email protected]/

Alex

> 19 февр. 2021 г., в 13:57, Lukas Czerner <[email protected]> написал(а):
>
> On Fri, Feb 19, 2021 at 01:08:17PM +0300, Alexey Lyashkov wrote:
>> Andreas,
>>
>> What about to disable a O_DIRECT global on any block devices in the e2fsprogs library as this don’t work on 4k disk drives at all ?
>> Instead of fixing an O_DIRECT access with patches sends early.
>
> Why would it not work at all ? This is a fix for a specific problem and
> I am not currently aware of ony other problems e2fsprogs should have
> with 4k sector size drives. Do you have a specific problem in mind ?
>
> Thanks!
> -Lukas
>
>>
>>
>> Alex
>>
>>> 19 февр. 2021 г., в 1:20, Andreas Dilger <[email protected]> написал(а):
>>>
>>> On Feb 18, 2021, at 2:51 AM, Lukas Czerner <[email protected]> wrote:
>>>>
>>>> Currently the mmp block is read using O_DIRECT to avoid any caching that
>>>> may be done by the VM. However when working with regular files this
>>>> creates alignment issues when the device of the host file system has
>>>> sector size larger than the blocksize of the file system in the file
>>>> we're working with.
>>>>
>>>> This can be reproduced with t_mmp_fail test when run on the device with
>>>> 4k sector size because the mke2fs fails when trying to read the mmp
>>>> block.
>>>>
>>>> Fix it by disabling O_DIRECT when working with regular files. I don't
>>>> think there is any risk of doing so since the file system layer, unlike
>>>> shared block device, should guarantee cache consistency.
>>>>
>>>> Signed-off-by: Lukas Czerner <[email protected]>
>>>> Reviewed-by: Eric Sandeen <[email protected]>
>>>
>>> Reviewed-by: Andreas Dilger <[email protected]>
>>>
>>>> ---
>>>> v2: Fix comment - it avoids problems when the sector size is larger not
>>>> smaller than blocksize
>>>>
>>>> lib/ext2fs/mmp.c | 22 +++++++++++-----------
>>>> 1 file changed, 11 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
>>>> index c21ae272..cca2873b 100644
>>>> --- a/lib/ext2fs/mmp.c
>>>> +++ b/lib/ext2fs/mmp.c
>>>> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
>>>> * regardless of how the io_manager is doing reads, to avoid caching of
>>>> * the MMP block by the io_manager or the VM. It needs to be fresh. */
>>>> if (fs->mmp_fd <= 0) {
>>>> + struct stat st;
>>>> int flags = O_RDWR | O_DIRECT;
>>>>
>>>> -retry:
>>>> + /*
>>>> + * There is no reason for using O_DIRECT if we're working with
>>>> + * regular file. Disabling it also avoids problems with
>>>> + * alignment when the device of the host file system has sector
>>>> + * size larger than blocksize of the fs we're working with.
>>>> + */
>>>> + if (stat(fs->device_name, &st) == 0 &&
>>>> + S_ISREG(st.st_mode))
>>>> + flags &= ~O_DIRECT;
>>>> +
>>>> fs->mmp_fd = open(fs->device_name, flags);
>>>> if (fs->mmp_fd < 0) {
>>>> - struct stat st;
>>>> -
>>>> - /* Avoid O_DIRECT for filesystem image files if open
>>>> - * fails, since it breaks when running on tmpfs. */
>>>> - if (errno == EINVAL && (flags & O_DIRECT) &&
>>>> - stat(fs->device_name, &st) == 0 &&
>>>> - S_ISREG(st.st_mode)) {
>>>> - flags &= ~O_DIRECT;
>>>> - goto retry;
>>>> - }
>>>> retval = EXT2_ET_MMP_OPEN_DIRECT;
>>>> goto out;
>>>> }
>>>> --
>>>> 2.26.2
>>>>
>>>
>>>
>>> Cheers, Andreas
>>>
>>>
>>>
>>>
>>>
>>
>

2021-02-19 13:37:25

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH v2] mmp: do not use O_DIRECT when working with regular file

On Fri, Feb 19, 2021 at 02:49:16PM +0300, Alexey Lyashkov wrote:
> Lukas,
>
> because e2fsprogs have an bad assumption about IO size for the O_DIRECT case.
> and because library uses a code like
> >>
> set_block_size(1k);
> seek(fs, 1);
> read_block();
> >>>
> which caused an 1k read inside of 4k disk block size not aligned by block size, which is prohibited and caused an error report.
>
> Reference to patch.
> https://patchwork.ozlabs.org/project/linux-ext4/patch/[email protected]/

Alright, I skimmed through your patch proposal and I am not sure I
completely understand the problem because you have not provided the code
adding O_DIRECT support for e2image.

However I think that it is a reasonable assumption to make that there is
not going to be a file system on a block device such that the fs blocksize
is smaller than device sector size. You can't create such fs with mke2fs
and you can't mount such file system either.

All that said I can now see that there is a problem in case of mke2fs
and debugfs when used with O_DIRECT (-D) on a file system image with 1k
block size stored on a file in the host file system on the block device
with sector size larger than 1k (...I am getting Inception flashbacks now)

In fact I can confirm that indeed, both mke2fs and debugfs will fail in
such scenario. The question is whether we care enough to support
O_DIRECT in such situations. Personally I don't care enough about this.
However it would be nice to at least have a check (probably in
ext2fs_open2, unix_open_channel or such) and notify user about the
problem.

Note that this conversation does not affect my patch since
ext2fs_mmp_read() does not use the unix_io infrastructure.

-Lukas

>
> Alex
>
> > 19 февр. 2021 г., в 13:57, Lukas Czerner <[email protected]> написал(а):
> >
> > On Fri, Feb 19, 2021 at 01:08:17PM +0300, Alexey Lyashkov wrote:
> >> Andreas,
> >>
> >> What about to disable a O_DIRECT global on any block devices in the e2fsprogs library as this don’t work on 4k disk drives at all ?
> >> Instead of fixing an O_DIRECT access with patches sends early.
> >
> > Why would it not work at all ? This is a fix for a specific problem and
> > I am not currently aware of ony other problems e2fsprogs should have
> > with 4k sector size drives. Do you have a specific problem in mind ?
> >
> > Thanks!
> > -Lukas
> >
> >>
> >>
> >> Alex
> >>
> >>> 19 февр. 2021 г., в 1:20, Andreas Dilger <[email protected]> написал(а):
> >>>
> >>> On Feb 18, 2021, at 2:51 AM, Lukas Czerner <[email protected]> wrote:
> >>>>
> >>>> Currently the mmp block is read using O_DIRECT to avoid any caching that
> >>>> may be done by the VM. However when working with regular files this
> >>>> creates alignment issues when the device of the host file system has
> >>>> sector size larger than the blocksize of the file system in the file
> >>>> we're working with.
> >>>>
> >>>> This can be reproduced with t_mmp_fail test when run on the device with
> >>>> 4k sector size because the mke2fs fails when trying to read the mmp
> >>>> block.
> >>>>
> >>>> Fix it by disabling O_DIRECT when working with regular files. I don't
> >>>> think there is any risk of doing so since the file system layer, unlike
> >>>> shared block device, should guarantee cache consistency.
> >>>>
> >>>> Signed-off-by: Lukas Czerner <[email protected]>
> >>>> Reviewed-by: Eric Sandeen <[email protected]>
> >>>
> >>> Reviewed-by: Andreas Dilger <[email protected]>
> >>>
> >>>> ---
> >>>> v2: Fix comment - it avoids problems when the sector size is larger not
> >>>> smaller than blocksize
> >>>>
> >>>> lib/ext2fs/mmp.c | 22 +++++++++++-----------
> >>>> 1 file changed, 11 insertions(+), 11 deletions(-)
> >>>>
> >>>> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
> >>>> index c21ae272..cca2873b 100644
> >>>> --- a/lib/ext2fs/mmp.c
> >>>> +++ b/lib/ext2fs/mmp.c
> >>>> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
> >>>> * regardless of how the io_manager is doing reads, to avoid caching of
> >>>> * the MMP block by the io_manager or the VM. It needs to be fresh. */
> >>>> if (fs->mmp_fd <= 0) {
> >>>> + struct stat st;
> >>>> int flags = O_RDWR | O_DIRECT;
> >>>>
> >>>> -retry:
> >>>> + /*
> >>>> + * There is no reason for using O_DIRECT if we're working with
> >>>> + * regular file. Disabling it also avoids problems with
> >>>> + * alignment when the device of the host file system has sector
> >>>> + * size larger than blocksize of the fs we're working with.
> >>>> + */
> >>>> + if (stat(fs->device_name, &st) == 0 &&
> >>>> + S_ISREG(st.st_mode))
> >>>> + flags &= ~O_DIRECT;
> >>>> +
> >>>> fs->mmp_fd = open(fs->device_name, flags);
> >>>> if (fs->mmp_fd < 0) {
> >>>> - struct stat st;
> >>>> -
> >>>> - /* Avoid O_DIRECT for filesystem image files if open
> >>>> - * fails, since it breaks when running on tmpfs. */
> >>>> - if (errno == EINVAL && (flags & O_DIRECT) &&
> >>>> - stat(fs->device_name, &st) == 0 &&
> >>>> - S_ISREG(st.st_mode)) {
> >>>> - flags &= ~O_DIRECT;
> >>>> - goto retry;
> >>>> - }
> >>>> retval = EXT2_ET_MMP_OPEN_DIRECT;
> >>>> goto out;
> >>>> }
> >>>> --
> >>>> 2.26.2
> >>>>
> >>>
> >>>
> >>> Cheers, Andreas
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
>

2021-02-19 13:54:26

by Alexey Lyahkov

[permalink] [raw]
Subject: Re: [PATCH v2] mmp: do not use O_DIRECT when working with regular file



> 19 февр. 2021 г., в 16:34, Lukas Czerner <[email protected]> написал(а):
>
> On Fri, Feb 19, 2021 at 02:49:16PM +0300, Alexey Lyashkov wrote:
>> Lukas,
>>
>> because e2fsprogs have an bad assumption about IO size for the O_DIRECT case.
>> and because library uses a code like
>>>>
>> set_block_size(1k);
>> seek(fs, 1);
>> read_block();
>>>>>
>> which caused an 1k read inside of 4k disk block size not aligned by block size, which is prohibited and caused an error report.
>>
>> Reference to patch.
>> https://patchwork.ozlabs.org/project/linux-ext4/patch/[email protected]/
>
> Alright, I skimmed through your patch proposal and I am not sure I
> completely understand the problem because you have not provided the code
> adding O_DIRECT support for e2image.

debugfs -D … will hit same problem.

>
> However I think that it is a reasonable assumption to make that there is
> not going to be a file system on a block device such that the fs blocksize
> is smaller than device sector size. You can't create such fs with mke2fs
> and you can't mount such file system either.
>
This is don’t need to be create this FS, calling an ext2_open2 is enough.


> All that said I can now see that there is a problem in case of mke2fs
> and debugfs when used with O_DIRECT (-D) on a file system image with 1k
> block size stored on a file in the host file system on the block device
> with sector size larger than 1k (...I am getting Inception flashbacks now)

if you have open a large (>256T) device with debugfs without -D you will be see a large swap.
once this FS want to consume a ~10G for bitmaps and other parts.
with cached read you memory consumption increased by two.

btw.
you can easy replicate it with losetup which able to specify a block size.

>
> In fact I can confirm that indeed, both mke2fs and debugfs will fail in
> such scenario. The question is whether we care enough to support
> O_DIRECT in such situations. Personally I don't care enough about this.
> However it would be nice to at least have a check (probably in
> ext2fs_open2, unix_open_channel or such) and notify user about the
> problem.

it’s not a tools problem. It’s problem of e2fsprogs library as ext2_open2 affected by this bug.
But this is not a single function where bug lives.


>
> Note that this conversation does not affect my patch since
> ext2fs_mmp_read() does not use the unix_io infrastructure.
>
It’s good to convert to use it.

Alex


> -Lukas
>
>>
>> Alex
>>
>>> 19 февр. 2021 г., в 13:57, Lukas Czerner <[email protected]> написал(а):
>>>
>>> On Fri, Feb 19, 2021 at 01:08:17PM +0300, Alexey Lyashkov wrote:
>>>> Andreas,
>>>>
>>>> What about to disable a O_DIRECT global on any block devices in the e2fsprogs library as this don’t work on 4k disk drives at all ?
>>>> Instead of fixing an O_DIRECT access with patches sends early.
>>>
>>> Why would it not work at all ? This is a fix for a specific problem and
>>> I am not currently aware of ony other problems e2fsprogs should have
>>> with 4k sector size drives. Do you have a specific problem in mind ?
>>>
>>> Thanks!
>>> -Lukas
>>>
>>>>
>>>>
>>>> Alex
>>>>
>>>>> 19 февр. 2021 г., в 1:20, Andreas Dilger <[email protected]> написал(а):
>>>>>
>>>>> On Feb 18, 2021, at 2:51 AM, Lukas Czerner <[email protected]> wrote:
>>>>>>
>>>>>> Currently the mmp block is read using O_DIRECT to avoid any caching that
>>>>>> may be done by the VM. However when working with regular files this
>>>>>> creates alignment issues when the device of the host file system has
>>>>>> sector size larger than the blocksize of the file system in the file
>>>>>> we're working with.
>>>>>>
>>>>>> This can be reproduced with t_mmp_fail test when run on the device with
>>>>>> 4k sector size because the mke2fs fails when trying to read the mmp
>>>>>> block.
>>>>>>
>>>>>> Fix it by disabling O_DIRECT when working with regular files. I don't
>>>>>> think there is any risk of doing so since the file system layer, unlike
>>>>>> shared block device, should guarantee cache consistency.
>>>>>>
>>>>>> Signed-off-by: Lukas Czerner <[email protected]>
>>>>>> Reviewed-by: Eric Sandeen <[email protected]>
>>>>>
>>>>> Reviewed-by: Andreas Dilger <[email protected]>
>>>>>
>>>>>> ---
>>>>>> v2: Fix comment - it avoids problems when the sector size is larger not
>>>>>> smaller than blocksize
>>>>>>
>>>>>> lib/ext2fs/mmp.c | 22 +++++++++++-----------
>>>>>> 1 file changed, 11 insertions(+), 11 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
>>>>>> index c21ae272..cca2873b 100644
>>>>>> --- a/lib/ext2fs/mmp.c
>>>>>> +++ b/lib/ext2fs/mmp.c
>>>>>> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
>>>>>> * regardless of how the io_manager is doing reads, to avoid caching of
>>>>>> * the MMP block by the io_manager or the VM. It needs to be fresh. */
>>>>>> if (fs->mmp_fd <= 0) {
>>>>>> + struct stat st;
>>>>>> int flags = O_RDWR | O_DIRECT;
>>>>>>
>>>>>> -retry:
>>>>>> + /*
>>>>>> + * There is no reason for using O_DIRECT if we're working with
>>>>>> + * regular file. Disabling it also avoids problems with
>>>>>> + * alignment when the device of the host file system has sector
>>>>>> + * size larger than blocksize of the fs we're working with.
>>>>>> + */
>>>>>> + if (stat(fs->device_name, &st) == 0 &&
>>>>>> + S_ISREG(st.st_mode))
>>>>>> + flags &= ~O_DIRECT;
>>>>>> +
>>>>>> fs->mmp_fd = open(fs->device_name, flags);
>>>>>> if (fs->mmp_fd < 0) {
>>>>>> - struct stat st;
>>>>>> -
>>>>>> - /* Avoid O_DIRECT for filesystem image files if open
>>>>>> - * fails, since it breaks when running on tmpfs. */
>>>>>> - if (errno == EINVAL && (flags & O_DIRECT) &&
>>>>>> - stat(fs->device_name, &st) == 0 &&
>>>>>> - S_ISREG(st.st_mode)) {
>>>>>> - flags &= ~O_DIRECT;
>>>>>> - goto retry;
>>>>>> - }
>>>>>> retval = EXT2_ET_MMP_OPEN_DIRECT;
>>>>>> goto out;
>>>>>> }
>>>>>> --
>>>>>> 2.26.2
>>>>>>
>>>>>
>>>>>
>>>>> Cheers, Andreas
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

2021-02-19 14:44:29

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH v2] mmp: do not use O_DIRECT when working with regular file

On Fri, Feb 19, 2021 at 04:53:05PM +0300, Alexey Lyashkov wrote:
>
>
> > 19 февр. 2021 г., в 16:34, Lukas Czerner <[email protected]> написал(а):
> >
> > On Fri, Feb 19, 2021 at 02:49:16PM +0300, Alexey Lyashkov wrote:
> >> Lukas,
> >>
> >> because e2fsprogs have an bad assumption about IO size for the O_DIRECT case.
> >> and because library uses a code like
> >>>>
> >> set_block_size(1k);
> >> seek(fs, 1);
> >> read_block();
> >>>>>
> >> which caused an 1k read inside of 4k disk block size not aligned by block size, which is prohibited and caused an error report.
> >>
> >> Reference to patch.
> >> https://patchwork.ozlabs.org/project/linux-ext4/patch/[email protected]/
> >
> > Alright, I skimmed through your patch proposal and I am not sure I
> > completely understand the problem because you have not provided the code
> > adding O_DIRECT support for e2image.
>
> debugfs -D … will hit same problem.
>
> >
> > However I think that it is a reasonable assumption to make that there is
> > not going to be a file system on a block device such that the fs blocksize
> > is smaller than device sector size. You can't create such fs with mke2fs
> > and you can't mount such file system either.
> >
> This is don’t need to be create this FS, calling an ext2_open2 is enough.
>
>
> > All that said I can now see that there is a problem in case of mke2fs
> > and debugfs when used with O_DIRECT (-D) on a file system image with 1k
> > block size stored on a file in the host file system on the block device
> > with sector size larger than 1k (...I am getting Inception flashbacks now)
>
> if you have open a large (>256T) device with debugfs without -D you will be see a large swap.
> once this FS want to consume a ~10G for bitmaps and other parts.
> with cached read you memory consumption increased by two.
>
> btw.
> you can easy replicate it with losetup which able to specify a block size.
>
> >
> > In fact I can confirm that indeed, both mke2fs and debugfs will fail in
> > such scenario. The question is whether we care enough to support
> > O_DIRECT in such situations. Personally I don't care enough about this.
> > However it would be nice to at least have a check (probably in
> > ext2fs_open2, unix_open_channel or such) and notify user about the
> > problem.
>
> it’s not a tools problem. It’s problem of e2fsprogs library as ext2_open2 affected by this bug.
> But this is not a single function where bug lives.

Sure, I am aware of what the problem is. All I am saying is that the
situation where it is manifesting itself is marginal enough for me that I
personally would be fine with just not suporting O_DIRECT in that case.

However your patch does seem to fix this particular problem on e2fsprogs
v1.45.7. It no longer applies cleanly on the current code so maybe
you should resend it ? Preferably along with added tests excercising it.

-Lukas

>
>
> >
> > Note that this conversation does not affect my patch since
> > ext2fs_mmp_read() does not use the unix_io infrastructure.
> >
> It’s good to convert to use it.
>
> Alex
>
>
> > -Lukas
> >
> >>
> >> Alex
> >>
> >>> 19 февр. 2021 г., в 13:57, Lukas Czerner <[email protected]> написал(а):
> >>>
> >>> On Fri, Feb 19, 2021 at 01:08:17PM +0300, Alexey Lyashkov wrote:
> >>>> Andreas,
> >>>>
> >>>> What about to disable a O_DIRECT global on any block devices in the e2fsprogs library as this don’t work on 4k disk drives at all ?
> >>>> Instead of fixing an O_DIRECT access with patches sends early.
> >>>
> >>> Why would it not work at all ? This is a fix for a specific problem and
> >>> I am not currently aware of ony other problems e2fsprogs should have
> >>> with 4k sector size drives. Do you have a specific problem in mind ?
> >>>
> >>> Thanks!
> >>> -Lukas
> >>>
> >>>>
> >>>>
> >>>> Alex
> >>>>
> >>>>> 19 февр. 2021 г., в 1:20, Andreas Dilger <[email protected]> написал(а):
> >>>>>
> >>>>> On Feb 18, 2021, at 2:51 AM, Lukas Czerner <[email protected]> wrote:
> >>>>>>
> >>>>>> Currently the mmp block is read using O_DIRECT to avoid any caching that
> >>>>>> may be done by the VM. However when working with regular files this
> >>>>>> creates alignment issues when the device of the host file system has
> >>>>>> sector size larger than the blocksize of the file system in the file
> >>>>>> we're working with.
> >>>>>>
> >>>>>> This can be reproduced with t_mmp_fail test when run on the device with
> >>>>>> 4k sector size because the mke2fs fails when trying to read the mmp
> >>>>>> block.
> >>>>>>
> >>>>>> Fix it by disabling O_DIRECT when working with regular files. I don't
> >>>>>> think there is any risk of doing so since the file system layer, unlike
> >>>>>> shared block device, should guarantee cache consistency.
> >>>>>>
> >>>>>> Signed-off-by: Lukas Czerner <[email protected]>
> >>>>>> Reviewed-by: Eric Sandeen <[email protected]>
> >>>>>
> >>>>> Reviewed-by: Andreas Dilger <[email protected]>
> >>>>>
> >>>>>> ---
> >>>>>> v2: Fix comment - it avoids problems when the sector size is larger not
> >>>>>> smaller than blocksize
> >>>>>>
> >>>>>> lib/ext2fs/mmp.c | 22 +++++++++++-----------
> >>>>>> 1 file changed, 11 insertions(+), 11 deletions(-)
> >>>>>>
> >>>>>> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
> >>>>>> index c21ae272..cca2873b 100644
> >>>>>> --- a/lib/ext2fs/mmp.c
> >>>>>> +++ b/lib/ext2fs/mmp.c
> >>>>>> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
> >>>>>> * regardless of how the io_manager is doing reads, to avoid caching of
> >>>>>> * the MMP block by the io_manager or the VM. It needs to be fresh. */
> >>>>>> if (fs->mmp_fd <= 0) {
> >>>>>> + struct stat st;
> >>>>>> int flags = O_RDWR | O_DIRECT;
> >>>>>>
> >>>>>> -retry:
> >>>>>> + /*
> >>>>>> + * There is no reason for using O_DIRECT if we're working with
> >>>>>> + * regular file. Disabling it also avoids problems with
> >>>>>> + * alignment when the device of the host file system has sector
> >>>>>> + * size larger than blocksize of the fs we're working with.
> >>>>>> + */
> >>>>>> + if (stat(fs->device_name, &st) == 0 &&
> >>>>>> + S_ISREG(st.st_mode))
> >>>>>> + flags &= ~O_DIRECT;
> >>>>>> +
> >>>>>> fs->mmp_fd = open(fs->device_name, flags);
> >>>>>> if (fs->mmp_fd < 0) {
> >>>>>> - struct stat st;
> >>>>>> -
> >>>>>> - /* Avoid O_DIRECT for filesystem image files if open
> >>>>>> - * fails, since it breaks when running on tmpfs. */
> >>>>>> - if (errno == EINVAL && (flags & O_DIRECT) &&
> >>>>>> - stat(fs->device_name, &st) == 0 &&
> >>>>>> - S_ISREG(st.st_mode)) {
> >>>>>> - flags &= ~O_DIRECT;
> >>>>>> - goto retry;
> >>>>>> - }
> >>>>>> retval = EXT2_ET_MMP_OPEN_DIRECT;
> >>>>>> goto out;
> >>>>>> }
> >>>>>> --
> >>>>>> 2.26.2
> >>>>>>
> >>>>>
> >>>>>
> >>>>> Cheers, Andreas
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

2021-02-19 16:20:16

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2] mmp: do not use O_DIRECT when working with regular file

Alexey,

It'd be helpful to me to understand _why_ this use case is important
for your workloads. O_DIRECT support is rarely used as far as I know,
and fs blocksize != page size is rare as well. The main use cases I
know of fs blocksize != page size is on architectures (not terribly
common) with 16k or 64k page sizes, that want to use 4k file system
blocksizes for interoperability reasons.

(And I suppose because mke2fs uses a 4k block size by default. Perhaps
we should change this so that the default is that mke2fs will use a
block size == page size, unless for some reason the page size is not
one supported by ext4 (although I'm not aware of any architecture
wanting page sizes > 64k), or the user explicitly specifies the block
size using "mke2fs -b".)

Are you trying to make O_DIRECT support in e2fsprogs a first class
reason out of completeness concern? Or is this a use case which is
important in production workloads that you are familiar with?

Thanks,

- Ted

2021-02-20 13:23:01

by Alexey Lyahkov

[permalink] [raw]
Subject: Re: [PATCH v2] mmp: do not use O_DIRECT when working with regular file

Teodore,

this important because of some points.
metadata for the large devices (>400T without bigalloc enabled) very large.
Once buffered IO enabled this generate a very large memory consumption.
(12G+ for metadata itself in page cache, and 12G+ for user memory).
I don’t think half of them is useful.



> 19 февр. 2021 г., в 19:18, Theodore Ts'o <[email protected]> написал(а):
>
> Alexey,
>
> It'd be helpful to me to understand _why_ this use case is important
> for your workloads. O_DIRECT support is rarely used as far as I know,
> and fs blocksize != page size is rare as well. The main use cases I
> know of fs blocksize != page size is on architectures (not terribly
> common) with 16k or 64k page sizes, that want to use 4k file system
> blocksizes for interoperability reasons.
>
As i point early - e2fsprogs _FORCE_ a 1k block size in some places.
Like
blk64_t ext2fs_first_backup_sb(blk64_t *superblock, unsigned int *block_size,
..
for (try_blocksize = EXT2_MIN_BLOCK_SIZE;
try_blocksize <= EXT2_MAX_BLOCK_SIZE ; try_blocksize *= 2) {
..
errcode_t ext2fs_open2(const char *name, const char *io_options,
io_channel_set_blksize(fs->io, SUPERBLOCK_OFFSET);


both cases will generate unliagned (from block device view) access.
Without any idea which a block size is in real.

> (And I suppose because mke2fs uses a 4k block size by default. Perhaps
> we should change this so that the default is that mke2fs will use a
> block size == page size, unless for some reason the page size is not
> one supported by ext4 (although I'm not aware of any architecture
> wanting page sizes > 64k), or the user explicitly specifies the block
> size using "mke2fs -b».)
Nice. AARCH64 / RHEL8 - is 64k page, so what about interoperability?
Should AARCH64 able to read devices which created on x86_64 with 4k page size?

>
> Are you trying to make O_DIRECT support in e2fsprogs a first class
> reason out of completeness concern? Or is this a use case which is
> important in production workloads that you are familiar with?
>
primary goal - debugfs -D / e2image - both in production on large storages.
I looking to the e2fsck because of large memory consumption.

If you think O_DIRECT don’t need to be supported - lets drop this code, instead of have this completely broken now.


Thanks,
Alex.

> Thanks,
>
> - Ted