When a bio has an encryption context, its size must be aligned to its
crypto data unit size. A bio must not be split in the middle of a data
unit. Currently, bios are split at logical block boundaries, but a crypto
data unit size might be larger than the logical block size - e.g. a machine
could be using fscrypt (which uses 4K crypto data units) with an eMMC block
device with inline encryption hardware that has a logical block size of
512 bytes. So we need to support cases where the data unit size is larger
than the logical block size.
Patch 1 allows blk_bio_segment_split() in blk-merge.c to fail and return an
error code. This functionality is used by later patches in this series.
Patch 1 also updates all callers to handle/propagate any returned error.
Patch 2 introduces blk_crypto_bio_sectors_alignment() which returns the
required alignment for the number of sectors in any bio.
Patches 3, 4 and 5 update bounce.c, blk-crypto-fallback.c and blk-merge.c
respectively to respect blk_crypto_bio_sectors_alignment() when calling
bio_split(), so that any split bio's size has the required alignment.
Patch 5 (the patch updating blk-merge.c) updates blk_bio_segment_split() by
rounding down the number of sectors to the required alignment (aligned
sectors) just before the call to bio_split(). It may be the case that due
to the other restrictions on aligned sectors by blk_bio_segment_split(),
aligned sectors ends up being rounded down to 0. An error is returned in
that case, using the functionality introduced in Patch 1.
Since all callers to bio_split() should have been updated by the previous
patches, Patch 6 adds a WARN_ON() to bio_split() when sectors isn't aligned
to blk_crypto_bio_sectors_alignment().
As a result of the simplistic rounding down in Patch 5, it might also be
the case that nsegs in blk_bio_segment_split() is overestimated. Patch 7
calculates nsegs accurately, while being a lot more complicated than Patch
5. If the accuracy isn't really necessary, Patch 7 can be dropped
completely.
This patch series was tested by running android xfstests on the SDM630
chipset (which has eMMC inline encryption hardware with logical block size
512 bytes) with test_dummy_encryption with and without the 'inlinecrypt'
mount option.
Satya Tangirala (7):
block: make blk_bio_segment_split() able to fail and return error
block: blk-crypto: Introduce blk_crypto_bio_sectors_alignment()
block: respect blk_crypto_bio_sectors_alignment() in bounce.c
block: respect blk_crypto_bio_sectors_alignment() in
blk-crypto-fallback
block: respect blk_crypto_bio_sectors_alignment() in blk-merge
block: add WARN() in bio_split() for sector alignment
block: compute nsegs more accurately in blk_bio_segment_split()
block/bio.c | 1 +
block/blk-crypto-fallback.c | 2 +
block/blk-crypto-internal.h | 18 +++++
block/blk-merge.c | 132 +++++++++++++++++++++++++++-------
block/blk-mq.c | 5 +-
block/blk.h | 2 +-
block/bounce.c | 3 +
drivers/block/drbd/drbd_req.c | 5 +-
drivers/block/pktcdvd.c | 3 +-
drivers/block/ps3vram.c | 5 +-
drivers/block/rsxx/dev.c | 3 +-
drivers/block/umem.c | 5 +-
drivers/lightnvm/pblk-init.c | 13 +++-
drivers/md/dm.c | 8 ++-
drivers/md/md.c | 5 +-
drivers/nvme/host/multipath.c | 5 +-
drivers/s390/block/dcssblk.c | 3 +-
drivers/s390/block/xpram.c | 3 +-
include/linux/blkdev.h | 2 +-
19 files changed, 182 insertions(+), 41 deletions(-)
--
2.30.0.284.gd98b1dd5eaa7-goog
The size of any bio must be aligned to the data unit size of the bio crypt
context (if it exists) of that bio. This must also be ensured whenever a
bio is split. Introduce blk_crypto_bio_sectors_alignment() that returns
the required alignment in sectors. The number of sectors passed to
any call of bio_split() should be aligned to
blk_crypto_bio_sectors_alignment().
Signed-off-by: Satya Tangirala <[email protected]>
---
block/blk-crypto-internal.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/block/blk-crypto-internal.h b/block/blk-crypto-internal.h
index 0d36aae538d7..304e90ed99f5 100644
--- a/block/blk-crypto-internal.h
+++ b/block/blk-crypto-internal.h
@@ -60,6 +60,19 @@ static inline bool blk_crypto_rq_is_encrypted(struct request *rq)
return rq->crypt_ctx;
}
+/*
+ * Returns the alignment requirement for the number of sectors in this bio based
+ * on its bi_crypt_context. Any bios split from this bio must follow this
+ * alignment requirement as well.
+ */
+static inline unsigned int blk_crypto_bio_sectors_alignment(struct bio *bio)
+{
+ if (!bio_has_crypt_ctx(bio))
+ return 1;
+ return bio->bi_crypt_context->bc_key->crypto_cfg.data_unit_size >>
+ SECTOR_SHIFT;
+}
+
#else /* CONFIG_BLK_INLINE_ENCRYPTION */
static inline bool bio_crypt_rq_ctx_compatible(struct request *rq,
@@ -93,6 +106,11 @@ static inline bool blk_crypto_rq_is_encrypted(struct request *rq)
return false;
}
+static inline unsigned int blk_crypto_bio_sectors_alignment(struct bio *bio)
+{
+ return 1;
+}
+
#endif /* CONFIG_BLK_INLINE_ENCRYPTION */
void __bio_crypt_advance(struct bio *bio, unsigned int bytes);
--
2.30.0.284.gd98b1dd5eaa7-goog
Make blk_bio_segment_split() respect blk_crypto_bio_sectors_alignment()
when calling bio_split(). The number of sectors is rounded down to the
required alignment just before the call to bio_split(). This makes it
possible for nsegs to be overestimated, but this solution is a lot
simpler than trying to calculate the exact number of nsegs required
for the aligned number of sectors. A future patch will attempt to
calculate nsegs more accurately.
Signed-off-by: Satya Tangirala <[email protected]>
---
block/blk-merge.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index a23a91e12e24..45cda45c1066 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -236,6 +236,8 @@ static bool bvec_split_segs(const struct request_queue *q,
* following is guaranteed for the cloned bio:
* - That it has at most get_max_io_size(@q, @bio) sectors.
* - That it has at most queue_max_segments(@q) segments.
+ * - That the number of sectors in the returned bio is aligned to
+ * blk_crypto_bio_sectors_alignment(@bio)
*
* Except for discard requests the cloned bio will point at the bi_io_vec of
* the original bio. It is the responsibility of the caller to ensure that the
@@ -292,6 +294,9 @@ static int blk_bio_segment_split(struct request_queue *q,
*/
bio->bi_opf &= ~REQ_HIPRI;
+ sectors = round_down(sectors, blk_crypto_bio_sectors_alignment(bio));
+ if (WARN_ON(sectors == 0))
+ return -EIO;
*split = bio_split(bio, sectors, GFP_NOIO, bs);
return 0;
}
--
2.30.0.284.gd98b1dd5eaa7-goog
The number of sectors passed to bio_split() should be aligned to
blk_crypto_bio_sectors_alignment(). All callers have been updated to ensure
this, so add a WARN() if the number of sectors is not aligned.
Signed-off-by: Satya Tangirala <[email protected]>
---
block/bio.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/bio.c b/block/bio.c
index 1f2cc1fbe283..c5f577ee6b8d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1472,6 +1472,7 @@ struct bio *bio_split(struct bio *bio, int sectors,
BUG_ON(sectors <= 0);
BUG_ON(sectors >= bio_sectors(bio));
+ WARN_ON(!IS_ALIGNED(sectors, blk_crypto_bio_sectors_alignment(bio)));
/* Zone append commands cannot be split */
if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
--
2.30.0.284.gd98b1dd5eaa7-goog
Till now, blk_bio_segment_split() always succeeded and returned a split
bio if necessary. Instead, allow it to return an error code to indicate
errors (and pass a pointer to the split bio as an argument instead).
blk_bio_segment_split() is only called by __blk_queue_split(), which has
been updated to return the error code from blk_bio_segment_split().
This patch also updates all callers of __blk_queue_split() and
blk_queue_split() to handle any error returned by those functions.
blk_bio_segment_split() needs to be able to fail because future patches
will ensure that the size of the split bio is aligned to the data unit
size of the bio crypt context of the bio (if it exists). It's possible
that the largest aligned size that satisfies all the requirements of
blk_bio_segment_split() is 0, at which point we need error out.
Signed-off-by: Satya Tangirala <[email protected]>
---
block/blk-merge.c | 36 +++++++++++++++++++++++++----------
block/blk-mq.c | 5 ++++-
block/blk.h | 2 +-
drivers/block/drbd/drbd_req.c | 5 ++++-
drivers/block/pktcdvd.c | 3 ++-
drivers/block/ps3vram.c | 5 ++++-
drivers/block/rsxx/dev.c | 3 ++-
drivers/block/umem.c | 5 ++++-
drivers/lightnvm/pblk-init.c | 13 ++++++++++---
drivers/md/dm.c | 8 ++++++--
drivers/md/md.c | 5 ++++-
drivers/nvme/host/multipath.c | 5 ++++-
drivers/s390/block/dcssblk.c | 3 ++-
drivers/s390/block/xpram.c | 3 ++-
include/linux/blkdev.h | 2 +-
15 files changed, 76 insertions(+), 27 deletions(-)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 808768f6b174..a23a91e12e24 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -229,6 +229,7 @@ static bool bvec_split_segs(const struct request_queue *q,
* @bio: [in] bio to be split
* @bs: [in] bio set to allocate the clone from
* @segs: [out] number of segments in the bio with the first half of the sectors
+ * @split: [out] The split bio, if @bio is split
*
* Clone @bio, update the bi_iter of the clone to represent the first sectors
* of @bio and update @bio->bi_iter to represent the remaining sectors. The
@@ -241,11 +242,14 @@ static bool bvec_split_segs(const struct request_queue *q,
* original bio is not freed before the cloned bio. The caller is also
* responsible for ensuring that @bs is only destroyed after processing of the
* split bio has finished.
+ *
+ * Return: 0 on success, negative on error
*/
-static struct bio *blk_bio_segment_split(struct request_queue *q,
- struct bio *bio,
- struct bio_set *bs,
- unsigned *segs)
+static int blk_bio_segment_split(struct request_queue *q,
+ struct bio *bio,
+ struct bio_set *bs,
+ unsigned *segs,
+ struct bio **split)
{
struct bio_vec bv, bvprv, *bvprvp = NULL;
struct bvec_iter iter;
@@ -276,7 +280,8 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
}
*segs = nsegs;
- return NULL;
+ *split = NULL;
+ return 0;
split:
*segs = nsegs;
@@ -287,7 +292,8 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
*/
bio->bi_opf &= ~REQ_HIPRI;
- return bio_split(bio, sectors, GFP_NOIO, bs);
+ *split = bio_split(bio, sectors, GFP_NOIO, bs);
+ return 0;
}
/**
@@ -302,11 +308,14 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
* the responsibility of the caller to ensure that
* @bio->bi_disk->queue->bio_split is only released after processing of the
* split bio has finished.
+ *
+ * Return: 0 on succes, negative on error
*/
-void __blk_queue_split(struct bio **bio, unsigned int *nr_segs)
+int __blk_queue_split(struct bio **bio, unsigned int *nr_segs)
{
struct request_queue *q = (*bio)->bi_disk->queue;
struct bio *split = NULL;
+ int err;
switch (bio_op(*bio)) {
case REQ_OP_DISCARD:
@@ -337,7 +346,10 @@ void __blk_queue_split(struct bio **bio, unsigned int *nr_segs)
*nr_segs = 1;
break;
}
- split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
+ err = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs,
+ &split);
+ if (err)
+ return err;
break;
}
@@ -350,6 +362,8 @@ void __blk_queue_split(struct bio **bio, unsigned int *nr_segs)
submit_bio_noacct(*bio);
*bio = split;
}
+
+ return 0;
}
/**
@@ -361,12 +375,14 @@ void __blk_queue_split(struct bio **bio, unsigned int *nr_segs)
* a new bio from @bio->bi_disk->queue->bio_split, it is the responsibility of
* the caller to ensure that @bio->bi_disk->queue->bio_split is only released
* after processing of the split bio has finished.
+ *
+ * Return: 0 on success, negative on error
*/
-void blk_queue_split(struct bio **bio)
+int blk_queue_split(struct bio **bio)
{
unsigned int nr_segs;
- __blk_queue_split(bio, &nr_segs);
+ return __blk_queue_split(bio, &nr_segs);
}
EXPORT_SYMBOL(blk_queue_split);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f285a9123a8b..43fe5be6bbb7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2143,7 +2143,10 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
bool hipri;
blk_queue_bounce(q, &bio);
- __blk_queue_split(&bio, &nr_segs);
+ if (__blk_queue_split(&bio, &nr_segs)) {
+ bio_io_error(bio);
+ goto queue_exit;
+ }
if (!bio_integrity_prep(bio))
goto queue_exit;
diff --git a/block/blk.h b/block/blk.h
index 7550364c326c..22096c8272cb 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -218,7 +218,7 @@ ssize_t part_timeout_show(struct device *, struct device_attribute *, char *);
ssize_t part_timeout_store(struct device *, struct device_attribute *,
const char *, size_t);
-void __blk_queue_split(struct bio **bio, unsigned int *nr_segs);
+int __blk_queue_split(struct bio **bio, unsigned int *nr_segs);
int ll_back_merge_fn(struct request *req, struct bio *bio,
unsigned int nr_segs);
int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 330f851cb8f0..1baaaad22bff 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1598,7 +1598,10 @@ blk_qc_t drbd_submit_bio(struct bio *bio)
struct drbd_device *device = bio->bi_disk->private_data;
unsigned long start_jif;
- blk_queue_split(&bio);
+ if (blk_queue_split(&bio)) {
+ bio_io_error(bio);
+ return BLK_QC_T_NONE;
+ }
start_jif = jiffies;
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index b8bb8ec7538d..702f7e5564ff 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2372,7 +2372,8 @@ static blk_qc_t pkt_submit_bio(struct bio *bio)
char b[BDEVNAME_SIZE];
struct bio *split;
- blk_queue_split(&bio);
+ if (blk_queue_split(&bio))
+ goto end_io;
pd = bio->bi_disk->queue->queuedata;
if (!pd) {
diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index b71d28372ef3..772e0c3e7036 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -587,7 +587,10 @@ static blk_qc_t ps3vram_submit_bio(struct bio *bio)
dev_dbg(&dev->core, "%s\n", __func__);
- blk_queue_split(&bio);
+ if (blk_queue_split(&bio)) {
+ bio_io_error(bio);
+ return BLK_QC_T_NONE;
+ }
spin_lock_irq(&priv->lock);
busy = !bio_list_empty(&priv->list);
diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index edacefff6e35..e9d3538a2625 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -126,7 +126,8 @@ static blk_qc_t rsxx_submit_bio(struct bio *bio)
struct rsxx_bio_meta *bio_meta;
blk_status_t st = BLK_STS_IOERR;
- blk_queue_split(&bio);
+ if (blk_queue_split(&bio))
+ goto req_err;
might_sleep();
diff --git a/drivers/block/umem.c b/drivers/block/umem.c
index 2b95d7b33b91..ac1e8a0750a9 100644
--- a/drivers/block/umem.c
+++ b/drivers/block/umem.c
@@ -527,7 +527,10 @@ static blk_qc_t mm_submit_bio(struct bio *bio)
(unsigned long long)bio->bi_iter.bi_sector,
bio->bi_iter.bi_size);
- blk_queue_split(&bio);
+ if (blk_queue_split(&bio)) {
+ bio_io_error(bio);
+ return BLK_QC_T_NONE;
+ }
spin_lock_irq(&card->lock);
*card->biotail = bio;
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index b6246f73895c..42873e04edf4 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -63,15 +63,22 @@ static blk_qc_t pblk_submit_bio(struct bio *bio)
* constraint. Writes can be of arbitrary size.
*/
if (bio_data_dir(bio) == READ) {
- blk_queue_split(&bio);
+ if (blk_queue_split(&bio)) {
+ bio_io_error(bio);
+ return BLK_QC_T_NONE;
+ }
pblk_submit_read(pblk, bio);
} else {
/* Prevent deadlock in the case of a modest LUN configuration
* and large user I/Os. Unless stalled, the rate limiter
* leaves at least 256KB available for user I/O.
*/
- if (pblk_get_secs(bio) > pblk_rl_max_io(&pblk->rl))
- blk_queue_split(&bio);
+ if (pblk_get_secs(bio) > pblk_rl_max_io(&pblk->rl)) {
+ if (blk_queue_split(&bio)) {
+ bio_io_error(bio);
+ return BLK_QC_T_NONE;
+ }
+ }
pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER);
}
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b3c3c8b4cb42..f304cb017176 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1654,8 +1654,12 @@ static blk_qc_t dm_submit_bio(struct bio *bio)
* Use blk_queue_split() for abnormal IO (e.g. discard, writesame, etc)
* otherwise associated queue_limits won't be imposed.
*/
- if (is_abnormal_io(bio))
- blk_queue_split(&bio);
+ if (is_abnormal_io(bio)) {
+ if (blk_queue_split(&bio)) {
+ bio_io_error(bio);
+ goto out;
+ }
+ }
ret = __split_and_process_bio(md, map, bio);
out:
diff --git a/drivers/md/md.c b/drivers/md/md.c
index ca409428b4fc..66c9cc95d14d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -498,7 +498,10 @@ static blk_qc_t md_submit_bio(struct bio *bio)
return BLK_QC_T_NONE;
}
- blk_queue_split(&bio);
+ if (blk_queue_split(&bio)) {
+ bio_io_error(bio);
+ return BLK_QC_T_NONE;
+ }
if (mddev->ro == 1 && unlikely(rw == WRITE)) {
if (bio_sectors(bio) != 0)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 9ac762b28811..34874dc5258f 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -307,7 +307,10 @@ blk_qc_t nvme_ns_head_submit_bio(struct bio *bio)
* different queue via blk_steal_bios(), so we need to use the bio_split
* pool from the original queue to allocate the bvecs from.
*/
- blk_queue_split(&bio);
+ if (blk_queue_split(&bio)) {
+ bio_io_error(bio);
+ return ret;
+ }
srcu_idx = srcu_read_lock(&head->srcu);
ns = nvme_find_path(head);
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 299e77ec2c41..33904f527f62 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -876,7 +876,8 @@ dcssblk_submit_bio(struct bio *bio)
unsigned long source_addr;
unsigned long bytes_done;
- blk_queue_split(&bio);
+ if (blk_queue_split(&bio))
+ goto fail;
bytes_done = 0;
dev_info = bio->bi_disk->private_data;
diff --git a/drivers/s390/block/xpram.c b/drivers/s390/block/xpram.c
index c2536f7767b3..6f2f772e81e8 100644
--- a/drivers/s390/block/xpram.c
+++ b/drivers/s390/block/xpram.c
@@ -191,7 +191,8 @@ static blk_qc_t xpram_submit_bio(struct bio *bio)
unsigned long page_addr;
unsigned long bytes;
- blk_queue_split(&bio);
+ if (blk_queue_split(&bio))
+ goto fail;
if ((bio->bi_iter.bi_sector & 7) != 0 ||
(bio->bi_iter.bi_size & 4095) != 0)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94ee3089e01..b4b44d7262e5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -926,7 +926,7 @@ extern void blk_rq_unprep_clone(struct request *rq);
extern blk_status_t blk_insert_cloned_request(struct request_queue *q,
struct request *rq);
extern int blk_rq_append_bio(struct request *rq, struct bio **bio);
-extern void blk_queue_split(struct bio **);
+extern int blk_queue_split(struct bio **);
extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int);
extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,
unsigned int, void __user *);
--
2.30.0.284.gd98b1dd5eaa7-goog
Previously, we rounded down the number of sectors just before calling
bio_split() in blk_bio_segment_split(). While this ensures that bios
are not split in the middle of a data unit, it makes it possible
for nsegs to be overestimated. This patch calculates nsegs accurately (it
calculates the smallest number of segments required for the aligned number
of sectors in the split bio).
Signed-off-by: Satya Tangirala <[email protected]>
---
block/blk-merge.c | 97 ++++++++++++++++++++++++++++++++++++++---------
1 file changed, 80 insertions(+), 17 deletions(-)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 45cda45c1066..58428d348661 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -145,17 +145,17 @@ static inline unsigned get_max_io_size(struct request_queue *q,
struct bio *bio)
{
unsigned sectors = blk_max_size_offset(q, bio->bi_iter.bi_sector, 0);
- unsigned max_sectors = sectors;
unsigned pbs = queue_physical_block_size(q) >> SECTOR_SHIFT;
unsigned lbs = queue_logical_block_size(q) >> SECTOR_SHIFT;
- unsigned start_offset = bio->bi_iter.bi_sector & (pbs - 1);
+ unsigned pbs_aligned_sector =
+ round_down(sectors + bio->bi_iter.bi_sector, pbs);
- max_sectors += start_offset;
- max_sectors &= ~(pbs - 1);
- if (max_sectors > start_offset)
- return max_sectors - start_offset;
+ lbs = max(lbs, blk_crypto_bio_sectors_alignment(bio));
- return sectors & ~(lbs - 1);
+ if (pbs_aligned_sector >= bio->bi_iter.bi_sector + lbs)
+ sectors = pbs_aligned_sector;
+
+ return round_down(sectors, lbs);
}
static inline unsigned get_max_segment_size(const struct request_queue *q,
@@ -174,6 +174,41 @@ static inline unsigned get_max_segment_size(const struct request_queue *q,
(unsigned long)queue_max_segment_size(q));
}
+/**
+ * update_aligned_sectors_and_segs() - Ensures that *@aligned_sectors is aligned
+ * to @bio_sectors_alignment, and that
+ * *@aligned_segs is the value of nsegs
+ * when sectors reached/first exceeded that
+ * value of *@aligned_sectors.
+ *
+ * @nsegs: [in] The current number of segs
+ * @sectors: [in] The current number of sectors
+ * @aligned_segs: [in,out] The number of segments that make up @aligned_sectors
+ * @aligned_sectors: [in,out] The largest number of sectors <= @sectors that is
+ * aligned to @sectors
+ * @bio_sectors_alignment: [in] The alignment requirement for the number of
+ * sectors
+ *
+ * Updates *@aligned_sectors to the largest number <= @sectors that is also a
+ * multiple of @bio_sectors_alignment. This is done by updating *@aligned_sectors
+ * whenever @sectors is at least @bio_sectors_alignment more than
+ * *@aligned_sectors, since that means we can increment *@aligned_sectors while
+ * still keeping it aligned to @bio_sectors_alignment and also keeping it <=
+ * @sectors. *@aligned_segs is updated to the value of nsegs when @sectors first
+ * reaches/exceeds any value that causes *@aligned_sectors to be updated.
+ */
+static inline void update_aligned_sectors_and_segs(const unsigned int nsegs,
+ const unsigned int sectors,
+ unsigned int *aligned_segs,
+ unsigned int *aligned_sectors,
+ const unsigned int bio_sectors_alignment)
+{
+ if (sectors - *aligned_sectors < bio_sectors_alignment)
+ return;
+ *aligned_sectors = round_down(sectors, bio_sectors_alignment);
+ *aligned_segs = nsegs;
+}
+
/**
* bvec_split_segs - verify whether or not a bvec should be split in the middle
* @q: [in] request queue associated with the bio associated with @bv
@@ -195,9 +230,12 @@ static inline unsigned get_max_segment_size(const struct request_queue *q,
* the block driver.
*/
static bool bvec_split_segs(const struct request_queue *q,
- const struct bio_vec *bv, unsigned *nsegs,
- unsigned *sectors, unsigned max_segs,
- unsigned max_sectors)
+ const struct bio_vec *bv, unsigned int *nsegs,
+ unsigned int *sectors, unsigned int *aligned_segs,
+ unsigned int *aligned_sectors,
+ unsigned int bio_sectors_alignment,
+ unsigned int max_segs,
+ unsigned int max_sectors)
{
unsigned max_len = (min(max_sectors, UINT_MAX >> 9) - *sectors) << 9;
unsigned len = min(bv->bv_len, max_len);
@@ -211,6 +249,11 @@ static bool bvec_split_segs(const struct request_queue *q,
(*nsegs)++;
total_len += seg_size;
+ update_aligned_sectors_and_segs(*nsegs,
+ *sectors + (total_len >> 9),
+ aligned_segs,
+ aligned_sectors,
+ bio_sectors_alignment);
len -= seg_size;
if ((bv->bv_offset + total_len) & queue_virt_boundary(q))
@@ -258,6 +301,9 @@ static int blk_bio_segment_split(struct request_queue *q,
unsigned nsegs = 0, sectors = 0;
const unsigned max_sectors = get_max_io_size(q, bio);
const unsigned max_segs = queue_max_segments(q);
+ const unsigned int bio_sectors_alignment =
+ blk_crypto_bio_sectors_alignment(bio);
+ unsigned int aligned_segs = 0, aligned_sectors = 0;
bio_for_each_bvec(bv, bio, iter) {
/*
@@ -272,8 +318,14 @@ static int blk_bio_segment_split(struct request_queue *q,
bv.bv_offset + bv.bv_len <= PAGE_SIZE) {
nsegs++;
sectors += bv.bv_len >> 9;
- } else if (bvec_split_segs(q, &bv, &nsegs, §ors, max_segs,
- max_sectors)) {
+ update_aligned_sectors_and_segs(nsegs, sectors,
+ &aligned_segs,
+ &aligned_sectors,
+ bio_sectors_alignment);
+ } else if (bvec_split_segs(q, &bv, &nsegs, §ors,
+ &aligned_segs, &aligned_sectors,
+ bio_sectors_alignment, max_segs,
+ max_sectors)) {
goto split;
}
@@ -281,11 +333,18 @@ static int blk_bio_segment_split(struct request_queue *q,
bvprvp = &bvprv;
}
+ /*
+ * The input bio's number of sectors is assumed to be aligned to
+ * bio_sectors_alignment. If that's the case, then this function should
+ * ensure that aligned_segs == nsegs and aligned_sectors == sectors if
+ * the bio is not going to be split.
+ */
+ WARN_ON(aligned_segs != nsegs || aligned_sectors != sectors);
*segs = nsegs;
*split = NULL;
return 0;
split:
- *segs = nsegs;
+ *segs = aligned_segs;
/*
* Bio splitting may cause subtle trouble such as hang when doing sync
@@ -294,10 +353,9 @@ static int blk_bio_segment_split(struct request_queue *q,
*/
bio->bi_opf &= ~REQ_HIPRI;
- sectors = round_down(sectors, blk_crypto_bio_sectors_alignment(bio));
- if (WARN_ON(sectors == 0))
+ if (WARN_ON(aligned_sectors == 0))
return -EIO;
- *split = bio_split(bio, sectors, GFP_NOIO, bs);
+ *split = bio_split(bio, aligned_sectors, GFP_NOIO, bs);
return 0;
}
@@ -395,6 +453,9 @@ unsigned int blk_recalc_rq_segments(struct request *rq)
{
unsigned int nr_phys_segs = 0;
unsigned int nr_sectors = 0;
+ unsigned int nr_aligned_phys_segs = 0;
+ unsigned int nr_aligned_sectors = 0;
+ unsigned int bio_sectors_alignment;
struct req_iterator iter;
struct bio_vec bv;
@@ -410,9 +471,11 @@ unsigned int blk_recalc_rq_segments(struct request *rq)
return 1;
}
+ bio_sectors_alignment = blk_crypto_bio_sectors_alignment(rq->bio);
rq_for_each_bvec(bv, rq, iter)
bvec_split_segs(rq->q, &bv, &nr_phys_segs, &nr_sectors,
- UINT_MAX, UINT_MAX);
+ &nr_aligned_phys_segs, &nr_aligned_sectors,
+ bio_sectors_alignment, UINT_MAX, UINT_MAX);
return nr_phys_segs;
}
--
2.30.0.284.gd98b1dd5eaa7-goog
Make blk_crypto_split_bio_if_needed() respect
blk_crypto_bio_sectors_alignment() when calling bio_split().
Signed-off-by: Satya Tangirala <[email protected]>
---
block/blk-crypto-fallback.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index c162b754efbd..77e20175df40 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -222,6 +222,8 @@ static bool blk_crypto_split_bio_if_needed(struct bio **bio_ptr)
if (num_sectors < bio_sectors(bio)) {
struct bio *split_bio;
+ num_sectors = round_down(len,
+ blk_crypto_bio_sectors_alignment(bio));
split_bio = bio_split(bio, num_sectors, GFP_NOIO, NULL);
if (!split_bio) {
bio->bi_status = BLK_STS_RESOURCE;
--
2.30.0.284.gd98b1dd5eaa7-goog
On Thu, Jan 14, 2021 at 03:47:16PM +0000, Satya Tangirala wrote:
> When a bio has an encryption context, its size must be aligned to its
> crypto data unit size. A bio must not be split in the middle of a data
> unit. Currently, bios are split at logical block boundaries, but a crypto
> data unit size might be larger than the logical block size - e.g. a machine
> could be using fscrypt (which uses 4K crypto data units) with an eMMC block
> device with inline encryption hardware that has a logical block size of
> 512 bytes. So we need to support cases where the data unit size is larger
> than the logical block size.
I think this model is rather broken. Instead of creating an -EIO path
we can't handle anywhere make sure that the size limits exposed by the
driver that wants to split always align to the crypto data units to
avoid this issue to start with.
On Thu, Jan 21, 2021 at 05:11:29PM +0000, Christoph Hellwig wrote:
> On Thu, Jan 14, 2021 at 03:47:16PM +0000, Satya Tangirala wrote:
> > When a bio has an encryption context, its size must be aligned to its
> > crypto data unit size. A bio must not be split in the middle of a data
> > unit. Currently, bios are split at logical block boundaries, but a crypto
> > data unit size might be larger than the logical block size - e.g. a machine
> > could be using fscrypt (which uses 4K crypto data units) with an eMMC block
> > device with inline encryption hardware that has a logical block size of
> > 512 bytes. So we need to support cases where the data unit size is larger
> > than the logical block size.
>
> I think this model is rather broken. Instead of creating an -EIO path
> we can't handle anywhere make sure that the size limits exposed by the
> driver that wants to split always align to the crypto data units to
> avoid this issue to start with.
Hey Christoph,
Thanks for the suggestion. I finally sent out v2 for this patch at
https://lore.kernel.org/linux-block/[email protected]/
I tried doing something similar to what you suggested to avoid
creating an -EIO path, but instead of changing the size limits exposed
by the driver, I changed the allowed data unit sizes based on the
exposed size limits. I did it that way because the limits that
interfere with inline encryption happen to be "hard limits" a driver
can't lie about, like having support for SG gaps or not requiring
chunk sectors. Another reason for doing it this way is so that we
don't interfere with regular unencrypted I/O by changing driver
exposed limits unconditionally (and I didn't think it was
straightforward to expose two different sets of limits of encrypted
and unencrypted I/O respectively). Please take a look at the new patch
series if you're able to. Thanks!