2015-08-14 20:15:42

by Jeff Moyer

[permalink] [raw]
Subject: [PATCH 0/2] fix regression in direct I/O to pmem devices

Linda Knippers noticed that commit bbab37ddc20b (block: Add support
for DAX reads/writes to block devices) caused a regression in mkfs.xfs.
Further investigation also uncovered issues related to misaligned
partitions. This patch set addresses the two issues.

[PATCH 1/2] dax: fix O_DIRECT I/O to the last block of a blockdev
[PATCH 2/2] blockdev: don't set S_DAX for misaligned partitions

fs/block_dev.c | 7 +++++++
fs/dax.c | 3 ++-
2 files changed, 9 insertions(+), 1 deletion(-)


2015-08-14 20:15:44

by Jeff Moyer

[permalink] [raw]
Subject: [PATCH 1/2] dax: fix O_DIRECT I/O to the last block of a blockdev

commit bbab37ddc20b (block: Add support for DAX reads/writes to
block devices) caused a regression in mkfs.xfs. That utility
sets the block size of the device to the logical block size
using the BLKBSZSET ioctl, and then issues a single sector read
from the last sector of the device. This results in the dax_io
code trying to do a page-sized read from 512 bytes from the end
of the device. The result is -ERANGE being returned to userspace.

The fix is to align the block to the page size before calling
get_block.

Thanks to willy for simplifying my original patch.

Signed-off-by: Jeff Moyer <[email protected]>
---
fs/dax.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index a7f77e1..ef35a20 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -116,7 +116,8 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
unsigned len;
if (pos == max) {
unsigned blkbits = inode->i_blkbits;
- sector_t block = pos >> blkbits;
+ long page = pos >> PAGE_SHIFT;
+ sector_t block = page << (PAGE_SHIFT - blkbits);
unsigned first = pos - (block << blkbits);
long size;

--
1.8.3.1

2015-08-14 20:15:41

by Jeff Moyer

[permalink] [raw]
Subject: [PATCH 2/2] blockdev: don't set S_DAX for misaligned partitions

The dax code doesn't currently support misaligned partitions,
so disable O_DIRECT via dax until such time as that support
materializes.

Suggested-by: Boaz Harrosh <[email protected]>
Signed-off-by: Jeff Moyer <[email protected]>
---
fs/block_dev.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 1982437..1170f8c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1241,6 +1241,13 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
goto out_clear;
}
bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9);
+ /*
+ * If the partition is not aligned on a page
+ * boundary, we can't do dax I/O to it.
+ */
+ if ((bdev->bd_part->start_sect % (PAGE_SIZE / 512)) ||
+ (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
+ bdev->bd_inode->i_flags &= ~S_DAX;
}
} else {
if (bdev->bd_contains == bdev) {
--
1.8.3.1

2015-08-14 20:22:37

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 0/2] fix regression in direct I/O to pmem devices

On Fri, Aug 14, 2015 at 1:15 PM, Jeff Moyer <[email protected]> wrote:
> Linda Knippers noticed that commit bbab37ddc20b (block: Add support
> for DAX reads/writes to block devices) caused a regression in mkfs.xfs.
> Further investigation also uncovered issues related to misaligned
> partitions. This patch set addresses the two issues.
>
> [PATCH 1/2] dax: fix O_DIRECT I/O to the last block of a blockdev
> [PATCH 2/2] blockdev: don't set S_DAX for misaligned partitions

Should these be "Cc: <[email protected]>" given the regression
fix is applicable back to 4.0?

2015-08-14 20:23:36

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 0/2] fix regression in direct I/O to pmem devices

On Fri, Aug 14, 2015 at 1:22 PM, Dan Williams <[email protected]> wrote:
> On Fri, Aug 14, 2015 at 1:15 PM, Jeff Moyer <[email protected]> wrote:
>> Linda Knippers noticed that commit bbab37ddc20b (block: Add support
>> for DAX reads/writes to block devices) caused a regression in mkfs.xfs.
>> Further investigation also uncovered issues related to misaligned
>> partitions. This patch set addresses the two issues.
>>
>> [PATCH 1/2] dax: fix O_DIRECT I/O to the last block of a blockdev
>> [PATCH 2/2] blockdev: don't set S_DAX for misaligned partitions
>
> Should these be "Cc: <[email protected]>" given the regression
> fix is applicable back to 4.0?

Sorry, regression *since* 4.0.

2015-08-14 20:27:52

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 0/2] fix regression in direct I/O to pmem devices

Dan Williams <[email protected]> writes:

> On Fri, Aug 14, 2015 at 1:22 PM, Dan Williams <[email protected]> wrote:
>> On Fri, Aug 14, 2015 at 1:15 PM, Jeff Moyer <[email protected]> wrote:
>>> Linda Knippers noticed that commit bbab37ddc20b (block: Add support
>>> for DAX reads/writes to block devices) caused a regression in mkfs.xfs.
>>> Further investigation also uncovered issues related to misaligned
>>> partitions. This patch set addresses the two issues.
>>>
>>> [PATCH 1/2] dax: fix O_DIRECT I/O to the last block of a blockdev
>>> [PATCH 2/2] blockdev: don't set S_DAX for misaligned partitions
>>
>> Should these be "Cc: <[email protected]>" given the regression
>> fix is applicable back to 4.0?
>
> Sorry, regression *since* 4.0.

$ git describe --contains bbab37ddc20b
v4.2-rc1~2^2~4

Looks like this was not sprung on the public yet.

-Jeff

2015-08-14 20:46:10

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 2/2] blockdev: don't set S_DAX for misaligned partitions

On Aug 14, 2015, at 2:15 PM, Jeff Moyer <[email protected]> wrote:
>
> The dax code doesn't currently support misaligned partitions,
> so disable O_DIRECT via dax until such time as that support
> materializes.
>
> Suggested-by: Boaz Harrosh <[email protected]>
> Signed-off-by: Jeff Moyer <[email protected]>
> ---
> fs/block_dev.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 1982437..1170f8c 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1241,6 +1241,13 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
> goto out_clear;
> }
> bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9);
> + /*
> + * If the partition is not aligned on a page
> + * boundary, we can't do dax I/O to it.
> + */
> + if ((bdev->bd_part->start_sect % (PAGE_SIZE / 512)) ||
> + (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))

Maybe I'm missing something, but doesn't the second condition above
disable DAX for the case that the 1/2 patch is fixing (i.e. the last
sectors at the end of a non-PAGE_SIZE-multiple device)? It seems a
shame to disable DAX for the whole device because of the last sector.

> + bdev->bd_inode->i_flags &= ~S_DAX;
> }
> } else {
> if (bdev->bd_contains == bdev) {
> --
> 1.8.3.1


Cheers, Andreas




2015-08-14 20:53:28

by Linda Knippers

[permalink] [raw]
Subject: Re: [PATCH 1/2] dax: fix O_DIRECT I/O to the last block of a blockdev

On 8/14/2015 4:15 PM, Jeff Moyer wrote:
> commit bbab37ddc20b (block: Add support for DAX reads/writes to
> block devices) caused a regression in mkfs.xfs. That utility
> sets the block size of the device to the logical block size
> using the BLKBSZSET ioctl, and then issues a single sector read
> from the last sector of the device. This results in the dax_io
> code trying to do a page-sized read from 512 bytes from the end
> of the device. The result is -ERANGE being returned to userspace.
>
> The fix is to align the block to the page size before calling
> get_block.
>
> Thanks to willy for simplifying my original patch.
>
> Signed-off-by: Jeff Moyer <[email protected]>

Tested-by: Linda Knippers <[email protected]>

> ---
> fs/dax.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index a7f77e1..ef35a20 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -116,7 +116,8 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
> unsigned len;
> if (pos == max) {
> unsigned blkbits = inode->i_blkbits;
> - sector_t block = pos >> blkbits;
> + long page = pos >> PAGE_SHIFT;
> + sector_t block = page << (PAGE_SHIFT - blkbits);
> unsigned first = pos - (block << blkbits);
> long size;
>
>

2015-08-14 20:55:31

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 2/2] blockdev: don't set S_DAX for misaligned partitions

Andreas Dilger <[email protected]> writes:

> On Aug 14, 2015, at 2:15 PM, Jeff Moyer <[email protected]> wrote:
>>
>> The dax code doesn't currently support misaligned partitions,
>> so disable O_DIRECT via dax until such time as that support
>> materializes.
>>
>> Suggested-by: Boaz Harrosh <[email protected]>
>> Signed-off-by: Jeff Moyer <[email protected]>
>> ---
>> fs/block_dev.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 1982437..1170f8c 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -1241,6 +1241,13 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
>> goto out_clear;
>> }
>> bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9);
>> + /*
>> + * If the partition is not aligned on a page
>> + * boundary, we can't do dax I/O to it.
>> + */
>> + if ((bdev->bd_part->start_sect % (PAGE_SIZE / 512)) ||
>> + (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
>
> Maybe I'm missing something, but doesn't the second condition above
> disable DAX for the case that the 1/2 patch is fixing (i.e. the last
> sectors at the end of a non-PAGE_SIZE-multiple device)? It seems a
> shame to disable DAX for the whole device because of the last sector.

No. Patch 1/2 fixes a 512 byte read of the last sector of a properly
aligned partiton. The goal is to eventually fix things so we can enable
the dax path for misaligned partitions, but it's not going to happen in
time for 4.2. Also, keep in mind that this is just for opening the
block device itself with O_DIRECT.

Thanks for taking a look.

Cheers,
Jeff