Linda Knippers noticed that commit bbab37ddc20b (block: Add support
for DAX reads/writes to block devices) caused a regression in mkfs.xfs.
Further investigation also uncovered issues related to misaligned
partitions. This patch set addresses the two issues.
[PATCH 1/2] dax: fix O_DIRECT I/O to the last block of a blockdev
[PATCH 2/2] blockdev: don't set S_DAX for misaligned partitions
fs/block_dev.c | 7 +++++++
fs/dax.c | 3 ++-
2 files changed, 9 insertions(+), 1 deletion(-)
commit bbab37ddc20b (block: Add support for DAX reads/writes to
block devices) caused a regression in mkfs.xfs. That utility
sets the block size of the device to the logical block size
using the BLKBSZSET ioctl, and then issues a single sector read
from the last sector of the device. This results in the dax_io
code trying to do a page-sized read from 512 bytes from the end
of the device. The result is -ERANGE being returned to userspace.
The fix is to align the block to the page size before calling
get_block.
Thanks to willy for simplifying my original patch.
Signed-off-by: Jeff Moyer <[email protected]>
---
fs/dax.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/dax.c b/fs/dax.c
index a7f77e1..ef35a20 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -116,7 +116,8 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
unsigned len;
if (pos == max) {
unsigned blkbits = inode->i_blkbits;
- sector_t block = pos >> blkbits;
+ long page = pos >> PAGE_SHIFT;
+ sector_t block = page << (PAGE_SHIFT - blkbits);
unsigned first = pos - (block << blkbits);
long size;
--
1.8.3.1
The dax code doesn't currently support misaligned partitions,
so disable O_DIRECT via dax until such time as that support
materializes.
Suggested-by: Boaz Harrosh <[email protected]>
Signed-off-by: Jeff Moyer <[email protected]>
---
fs/block_dev.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 1982437..1170f8c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1241,6 +1241,13 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
goto out_clear;
}
bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9);
+ /*
+ * If the partition is not aligned on a page
+ * boundary, we can't do dax I/O to it.
+ */
+ if ((bdev->bd_part->start_sect % (PAGE_SIZE / 512)) ||
+ (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
+ bdev->bd_inode->i_flags &= ~S_DAX;
}
} else {
if (bdev->bd_contains == bdev) {
--
1.8.3.1
On Fri, Aug 14, 2015 at 1:15 PM, Jeff Moyer <[email protected]> wrote:
> Linda Knippers noticed that commit bbab37ddc20b (block: Add support
> for DAX reads/writes to block devices) caused a regression in mkfs.xfs.
> Further investigation also uncovered issues related to misaligned
> partitions. This patch set addresses the two issues.
>
> [PATCH 1/2] dax: fix O_DIRECT I/O to the last block of a blockdev
> [PATCH 2/2] blockdev: don't set S_DAX for misaligned partitions
Should these be "Cc: <[email protected]>" given the regression
fix is applicable back to 4.0?
On Fri, Aug 14, 2015 at 1:22 PM, Dan Williams <[email protected]> wrote:
> On Fri, Aug 14, 2015 at 1:15 PM, Jeff Moyer <[email protected]> wrote:
>> Linda Knippers noticed that commit bbab37ddc20b (block: Add support
>> for DAX reads/writes to block devices) caused a regression in mkfs.xfs.
>> Further investigation also uncovered issues related to misaligned
>> partitions. This patch set addresses the two issues.
>>
>> [PATCH 1/2] dax: fix O_DIRECT I/O to the last block of a blockdev
>> [PATCH 2/2] blockdev: don't set S_DAX for misaligned partitions
>
> Should these be "Cc: <[email protected]>" given the regression
> fix is applicable back to 4.0?
Sorry, regression *since* 4.0.
Dan Williams <[email protected]> writes:
> On Fri, Aug 14, 2015 at 1:22 PM, Dan Williams <[email protected]> wrote:
>> On Fri, Aug 14, 2015 at 1:15 PM, Jeff Moyer <[email protected]> wrote:
>>> Linda Knippers noticed that commit bbab37ddc20b (block: Add support
>>> for DAX reads/writes to block devices) caused a regression in mkfs.xfs.
>>> Further investigation also uncovered issues related to misaligned
>>> partitions. This patch set addresses the two issues.
>>>
>>> [PATCH 1/2] dax: fix O_DIRECT I/O to the last block of a blockdev
>>> [PATCH 2/2] blockdev: don't set S_DAX for misaligned partitions
>>
>> Should these be "Cc: <[email protected]>" given the regression
>> fix is applicable back to 4.0?
>
> Sorry, regression *since* 4.0.
$ git describe --contains bbab37ddc20b
v4.2-rc1~2^2~4
Looks like this was not sprung on the public yet.
-Jeff
On Aug 14, 2015, at 2:15 PM, Jeff Moyer <[email protected]> wrote:
>
> The dax code doesn't currently support misaligned partitions,
> so disable O_DIRECT via dax until such time as that support
> materializes.
>
> Suggested-by: Boaz Harrosh <[email protected]>
> Signed-off-by: Jeff Moyer <[email protected]>
> ---
> fs/block_dev.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 1982437..1170f8c 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1241,6 +1241,13 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
> goto out_clear;
> }
> bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9);
> + /*
> + * If the partition is not aligned on a page
> + * boundary, we can't do dax I/O to it.
> + */
> + if ((bdev->bd_part->start_sect % (PAGE_SIZE / 512)) ||
> + (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
Maybe I'm missing something, but doesn't the second condition above
disable DAX for the case that the 1/2 patch is fixing (i.e. the last
sectors at the end of a non-PAGE_SIZE-multiple device)? It seems a
shame to disable DAX for the whole device because of the last sector.
> + bdev->bd_inode->i_flags &= ~S_DAX;
> }
> } else {
> if (bdev->bd_contains == bdev) {
> --
> 1.8.3.1
Cheers, Andreas
On 8/14/2015 4:15 PM, Jeff Moyer wrote:
> commit bbab37ddc20b (block: Add support for DAX reads/writes to
> block devices) caused a regression in mkfs.xfs. That utility
> sets the block size of the device to the logical block size
> using the BLKBSZSET ioctl, and then issues a single sector read
> from the last sector of the device. This results in the dax_io
> code trying to do a page-sized read from 512 bytes from the end
> of the device. The result is -ERANGE being returned to userspace.
>
> The fix is to align the block to the page size before calling
> get_block.
>
> Thanks to willy for simplifying my original patch.
>
> Signed-off-by: Jeff Moyer <[email protected]>
Tested-by: Linda Knippers <[email protected]>
> ---
> fs/dax.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index a7f77e1..ef35a20 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -116,7 +116,8 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
> unsigned len;
> if (pos == max) {
> unsigned blkbits = inode->i_blkbits;
> - sector_t block = pos >> blkbits;
> + long page = pos >> PAGE_SHIFT;
> + sector_t block = page << (PAGE_SHIFT - blkbits);
> unsigned first = pos - (block << blkbits);
> long size;
>
>
Andreas Dilger <[email protected]> writes:
> On Aug 14, 2015, at 2:15 PM, Jeff Moyer <[email protected]> wrote:
>>
>> The dax code doesn't currently support misaligned partitions,
>> so disable O_DIRECT via dax until such time as that support
>> materializes.
>>
>> Suggested-by: Boaz Harrosh <[email protected]>
>> Signed-off-by: Jeff Moyer <[email protected]>
>> ---
>> fs/block_dev.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 1982437..1170f8c 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -1241,6 +1241,13 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
>> goto out_clear;
>> }
>> bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9);
>> + /*
>> + * If the partition is not aligned on a page
>> + * boundary, we can't do dax I/O to it.
>> + */
>> + if ((bdev->bd_part->start_sect % (PAGE_SIZE / 512)) ||
>> + (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
>
> Maybe I'm missing something, but doesn't the second condition above
> disable DAX for the case that the 1/2 patch is fixing (i.e. the last
> sectors at the end of a non-PAGE_SIZE-multiple device)? It seems a
> shame to disable DAX for the whole device because of the last sector.
No. Patch 1/2 fixes a 512 byte read of the last sector of a properly
aligned partiton. The goal is to eventually fix things so we can enable
the dax path for misaligned partitions, but it's not going to happen in
time for 4.2. Also, keep in mind that this is just for opening the
block device itself with O_DIRECT.
Thanks for taking a look.
Cheers,
Jeff