This patch set fix various problems spotted during T10/DIF integrity machinery testing.
TOC:
## Fix various bugs in T10/DIF/DIX infrastructure
0001-bio-integrity-Do-not-allocate-integrity-context-for-fsync
0002-bio-integrity-save-original-iterator-for-verify-stage
0003-bio-integrity-bio_trim-should-truncate-integrity-vec
0004-bio-integrity-fix-interface-for-bio_integrity_trim
## Cleanup T10/DIF/DIX infrastructure
0005-bio-integrity-add-bio_integrity_setup-helper
0006-T10-Move-opencoded-contants-to-common-header
## General bulletproof protection for block layer
0007-Guard-bvec-iteration-logic-v2
Changes since V1
- fix issues potted by kbuild bot
- Replace BUG_ON with error logic for 7'th patch
Testcase: xfstest blockdev/003
https://github.com/dmonakhov/xfstests/commit/3c6509eaa83b9c17cd0bc95d73fcdd76e1c54a85
In order to perform verification we need to know original data vector
But, after bio traverse io-stack it may be advanced, splited and relocated
many times so it is hard to guess original data vector.
In fact currently ->verify_fn not woks at all because at the moment
it it called bio->bi_iter.bi_size == 0
The simplest way to fix that is to save original data vector and treat is
as immutable.
Signed-off-by: Dmitry Monakhov <[email protected]>
---
block/bio-integrity.c | 6 ++++--
include/linux/bio.h | 1 +
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index b5009a8..43a4476 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -238,10 +238,10 @@ static int bio_integrity_process(struct bio *bio,
iter.disk_name = bio->bi_bdev->bd_disk->disk_name;
iter.interval = 1 << bi->interval_exp;
- iter.seed = bip_get_seed(bip);
+ iter.seed = bip->bip_verify_iter.bi_sector;
iter.prot_buf = prot_buf;
- bio_for_each_segment(bv, bio, bviter) {
+ __bio_for_each_segment(bv, bio, bviter, bip->bip_verify_iter) {
void *kaddr = kmap_atomic(bv.bv_page);
iter.data_buf = kaddr + bv.bv_offset;
@@ -310,6 +310,7 @@ int bio_integrity_prep(struct bio *bio)
bip->bip_flags |= BIP_BLOCK_INTEGRITY;
bip->bip_iter.bi_size = len;
bip_set_seed(bip, bio->bi_iter.bi_sector);
+ bip->bip_verify_iter = bio->bi_iter;
if (bi->flags & BLK_INTEGRITY_IP_CHECKSUM)
bip->bip_flags |= BIP_IP_CHECKSUM;
@@ -476,6 +477,7 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
bip->bip_vcnt = bip_src->bip_vcnt;
bip->bip_iter = bip_src->bip_iter;
+ bip->bip_verify_iter = bip_src->bip_verify_iter;
return 0;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8e52119..00b086a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -308,6 +308,7 @@ struct bio_integrity_payload {
struct bio *bip_bio; /* parent bio */
struct bvec_iter bip_iter;
+ struct bvec_iter bip_verify_iter;/* saved orig data iterator */
bio_end_io_t *bip_end_io; /* saved I/O completion fn */
--
2.9.3
Currently if some one try to advance bvec beyond it's size we simply
dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
This simply means that we endup dereferencing/corrupting random memory
region.
Sane reaction would be to propagate error back to calling context
But bvec_iter_advance's calling context is not always good for error
handling. For safity reason let truncate iterator size to zero which
will break external iteration loop which prevent us from unpredictable
memory range corruption. And even it caller ignores an error, it will
corrupt it's own bvecs, not others.
This patch does:
- Return error back to caller with hope that it will react on this
- Truncate iterator size
Code was added long time ago here 4550dd6c, luckily no one hit it
in real life :)
changes since V1:
- Replace BUG_ON with error logic.
Signed-off-by: Dmitry Monakhov <[email protected]>
---
drivers/nvdimm/blk.c | 4 +++-
drivers/nvdimm/btt.c | 4 +++-
include/linux/bio.h | 8 ++++++--
include/linux/bvec.h | 11 ++++++++---
4 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
index 1edb3f3..04c3075 100644
--- a/drivers/nvdimm/blk.c
+++ b/drivers/nvdimm/blk.c
@@ -106,7 +106,9 @@ static int nd_blk_rw_integrity(struct nd_namespace_blk *nsblk,
len -= cur_len;
dev_offset += cur_len;
- bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
+ err = bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
+ if (err)
+ return err;
}
return err;
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 03ded8d..3f3aa7b 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -942,7 +942,9 @@ static int btt_rw_integrity(struct btt *btt, struct bio_integrity_payload *bip,
len -= cur_len;
meta_nsoff += cur_len;
- bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
+ ret = bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
+ if (ret)
+ return ret;
}
return ret;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 0c1c95c..8bf1564 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -168,8 +168,12 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
if (bio_no_advance_iter(bio))
iter->bi_size -= bytes;
- else
- bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+ else {
+ int err;
+ err = bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+ if (unlikely(err))
+ bio->bi_error = err;
+ }
}
#define __bio_for_each_segment(bvl, bio, iter, start) \
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 89b65b8..c117f1a 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -22,6 +22,7 @@
#include <linux/kernel.h>
#include <linux/bug.h>
+#include <linux/errno.h>
/*
* was unsigned short, but we might as well be ready for > 64kB I/O pages
@@ -66,12 +67,15 @@ struct bvec_iter {
.bv_offset = bvec_iter_offset((bvec), (iter)), \
})
-static inline void bvec_iter_advance(const struct bio_vec *bv,
+static inline int bvec_iter_advance(const struct bio_vec *bv,
struct bvec_iter *iter,
unsigned bytes)
{
- WARN_ONCE(bytes > iter->bi_size,
- "Attempted to advance past end of bvec iter\n");
+ if(unlikely(bytes > iter->bi_size)) {
+ WARN(1, "Attempted to advance past end of bvec iter\n");
+ iter->bi_size = 0;
+ return -EINVAL;
+ }
while (bytes) {
unsigned iter_len = bvec_iter_len(bv, *iter);
@@ -86,6 +90,7 @@ static inline void bvec_iter_advance(const struct bio_vec *bv,
iter->bi_idx++;
}
}
+ return 0;
}
#define for_each_bvec(bvl, bio_vec, iter, start) \
--
2.9.3
Currently all integrity prep hooks are open-coded, and if prepare fails
we ignore it's code and fail bio with EIO. Let's return real error to
upper layer, so later caller may react accordingly. For example retry in
case of ENOMEM.
Signed-off-by: Dmitry Monakhov <[email protected]>
---
block/blk-core.c | 5 +----
block/blk-mq.c | 8 ++------
drivers/nvdimm/blk.c | 13 ++-----------
drivers/nvdimm/btt.c | 13 ++-----------
include/linux/bio.h | 25 +++++++++++++++++++++++++
5 files changed, 32 insertions(+), 32 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index d772c22..071a998 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1637,11 +1637,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
blk_queue_split(q, &bio, q->bio_split);
- if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
- bio->bi_error = -EIO;
- bio_endio(bio);
+ if (bio_integrity_setup(bio))
return BLK_QC_T_NONE;
- }
if (op_is_flush(bio->bi_opf)) {
spin_lock_irq(q->queue_lock);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 08a49c6..a9931ec 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1489,10 +1489,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
blk_queue_bounce(q, &bio);
- if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
- bio_io_error(bio);
+ if (bio_integrity_setup(bio))
return BLK_QC_T_NONE;
- }
blk_queue_split(q, &bio, q->bio_split);
@@ -1611,10 +1609,8 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
blk_queue_bounce(q, &bio);
- if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
- bio_io_error(bio);
+ if (bio_integrity_setup(bio))
return BLK_QC_T_NONE;
- }
blk_queue_split(q, &bio, q->bio_split);
diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
index 9faaa96..1edb3f3 100644
--- a/drivers/nvdimm/blk.c
+++ b/drivers/nvdimm/blk.c
@@ -179,16 +179,8 @@ static blk_qc_t nd_blk_make_request(struct request_queue *q, struct bio *bio)
int err = 0, rw;
bool do_acct;
- /*
- * bio_integrity_enabled also checks if the bio already has an
- * integrity payload attached. If it does, we *don't* do a
- * bio_integrity_prep here - the payload has been generated by
- * another kernel subsystem, and we just pass it through.
- */
- if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
- bio->bi_error = -EIO;
- goto out;
- }
+ if (bio_integrity_setup(bio))
+ return BLK_QC_T_NONE;
bip = bio_integrity(bio);
nsblk = q->queuedata;
@@ -212,7 +204,6 @@ static blk_qc_t nd_blk_make_request(struct request_queue *q, struct bio *bio)
if (do_acct)
nd_iostat_end(bio, start);
- out:
bio_endio(bio);
return BLK_QC_T_NONE;
}
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 368795a..03ded8d 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1158,16 +1158,8 @@ static blk_qc_t btt_make_request(struct request_queue *q, struct bio *bio)
int err = 0;
bool do_acct;
- /*
- * bio_integrity_enabled also checks if the bio already has an
- * integrity payload attached. If it does, we *don't* do a
- * bio_integrity_prep here - the payload has been generated by
- * another kernel subsystem, and we just pass it through.
- */
- if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
- bio->bi_error = -EIO;
- goto out;
- }
+ if (bio_integrity_setup(bio))
+ return BLK_QC_T_NONE;
do_acct = nd_iostat_start(bio, &start);
bio_for_each_segment(bvec, bio, iter) {
@@ -1194,7 +1186,6 @@ static blk_qc_t btt_make_request(struct request_queue *q, struct bio *bio)
if (do_acct)
nd_iostat_end(bio, start);
-out:
bio_endio(bio);
return BLK_QC_T_NONE;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 350f71d..0c1c95c 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -738,6 +738,26 @@ extern int bioset_integrity_create(struct bio_set *, int);
extern void bioset_integrity_free(struct bio_set *);
extern void bio_integrity_init(void);
+static inline int bio_integrity_setup(struct bio *bio)
+{
+ int err = 0;
+
+ /*
+ * bio_integrity_enabled also checks if the bio already has an
+ * integrity payload attached. If it does, we *don't* do a
+ * bio_integrity_prep here - the payload has been generated by
+ * another kernel subsystem, and we just pass it through.
+ */
+ if (bio_integrity_enabled(bio)) {
+ err = bio_integrity_prep(bio);
+ if (err) {
+ bio->bi_error = err;
+ bio_endio(bio);
+ }
+ }
+ return err;
+}
+
#else /* CONFIG_BLK_DEV_INTEGRITY */
static inline void *bio_integrity(struct bio *bio)
@@ -765,6 +785,11 @@ static inline int bio_integrity_prep(struct bio *bio)
return 0;
}
+static inline int bio_integrity_setup(struct bio *bio)
+{
+ return 0;
+}
+
static inline void bio_integrity_free(struct bio *bio)
{
return;
--
2.9.3
Signed-off-by: Dmitry Monakhov <[email protected]>
---
block/t10-pi.c | 9 +++------
drivers/scsi/lpfc/lpfc_scsi.c | 5 +++--
drivers/scsi/qla2xxx/qla_isr.c | 8 ++++----
drivers/target/target_core_sbc.c | 2 +-
include/linux/t10-pi.h | 3 +++
5 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 2c97912..485cecd 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -28,9 +28,6 @@
typedef __be16 (csum_fn) (void *, unsigned int);
-static const __be16 APP_ESCAPE = (__force __be16) 0xffff;
-static const __be32 REF_ESCAPE = (__force __be32) 0xffffffff;
-
static __be16 t10_pi_crc_fn(void *data, unsigned int len)
{
return cpu_to_be16(crc_t10dif(data, len));
@@ -82,7 +79,7 @@ static int t10_pi_verify(struct blk_integrity_iter *iter, csum_fn *fn,
switch (type) {
case 1:
case 2:
- if (pi->app_tag == APP_ESCAPE)
+ if (pi->app_tag == T10_APP_ESCAPE)
goto next;
if (be32_to_cpu(pi->ref_tag) !=
@@ -95,8 +92,8 @@ static int t10_pi_verify(struct blk_integrity_iter *iter, csum_fn *fn,
}
break;
case 3:
- if (pi->app_tag == APP_ESCAPE &&
- pi->ref_tag == REF_ESCAPE)
+ if (pi->app_tag == T10_APP_ESCAPE &&
+ pi->ref_tag == T10_REF_ESCAPE)
goto next;
break;
}
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 54fd0c8..6f6b40e 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -26,6 +26,7 @@
#include <linux/export.h>
#include <linux/delay.h>
#include <asm/unaligned.h>
+#include <linux/t10-pi.h>
#include <linux/crc-t10dif.h>
#include <net/checksum.h>
@@ -2934,8 +2935,8 @@ lpfc_calc_bg_err(struct lpfc_hba *phba, struct lpfc_scsi_buf *lpfc_cmd)
* First check to see if a protection data
* check is valid
*/
- if ((src->ref_tag == 0xffffffff) ||
- (src->app_tag == 0xffff)) {
+ if ((src->ref_tag == T10_REF_ESCAPE) ||
+ (src->app_tag == T10_APP_ESCAPE)) {
start_ref_tag++;
goto skipit;
}
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 3203367..ed4b302 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -1950,9 +1950,9 @@ qla2x00_handle_dif_error(srb_t *sp, struct sts_entry_24xx *sts24)
* For type 3: ref & app tag is all 'f's
* For type 0,1,2: app tag is all 'f's
*/
- if ((a_app_tag == 0xffff) &&
+ if ((a_app_tag == T10_APP_ESCAPE) &&
((scsi_get_prot_type(cmd) != SCSI_PROT_DIF_TYPE3) ||
- (a_ref_tag == 0xffffffff))) {
+ (a_ref_tag == T10_REF_ESCAPE))) {
uint32_t blocks_done, resid;
sector_t lba_s = scsi_get_lba(cmd);
@@ -1994,9 +1994,9 @@ qla2x00_handle_dif_error(srb_t *sp, struct sts_entry_24xx *sts24)
spt = page_address(sg_page(sg)) + sg->offset;
spt += j;
- spt->app_tag = 0xffff;
+ spt->app_tag = T10_APP_ESCAPE;
if (scsi_get_prot_type(cmd) == SCSI_PROT_DIF_TYPE3)
- spt->ref_tag = 0xffffffff;
+ spt->ref_tag = T10_REF_ESCAPE;
}
return 0;
diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index c194063..927ef44 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -1446,7 +1446,7 @@ sbc_dif_verify(struct se_cmd *cmd, sector_t start, unsigned int sectors,
(unsigned long long)sector, sdt->guard_tag,
sdt->app_tag, be32_to_cpu(sdt->ref_tag));
- if (sdt->app_tag == cpu_to_be16(0xffff)) {
+ if (sdt->app_tag == T10_APP_ESCAPE) {
dsg_off += block_size;
goto next;
}
diff --git a/include/linux/t10-pi.h b/include/linux/t10-pi.h
index 9fba9dd..c96845c 100644
--- a/include/linux/t10-pi.h
+++ b/include/linux/t10-pi.h
@@ -24,6 +24,9 @@ enum t10_dif_type {
T10_PI_TYPE3_PROTECTION = 0x3,
};
+static const __be16 T10_APP_ESCAPE = (__force __be16) 0xffff;
+static const __be32 T10_REF_ESCAPE = (__force __be32) 0xffffffff;
+
/*
* T10 Protection Information tuple.
*/
--
2.9.3
bio_integrity_trim inherent it's interface from bio_trim and accept
offset and size, but this API is error prone because data offset
must always be insync with bio's data offset. That is why we have
integrity update hook in bio_advance()
So only meaningful offset is 0. Let's just remove it completely.
TODO: add xfstests testcase here
Signed-off-by: Dmitry Monakhov <[email protected]>
---
block/bio-integrity.c | 8 +-------
block/bio.c | 4 ++--
drivers/md/dm.c | 2 +-
include/linux/bio.h | 5 ++---
4 files changed, 6 insertions(+), 13 deletions(-)
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 43a4476..43895a0 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -433,21 +433,15 @@ EXPORT_SYMBOL(bio_integrity_advance);
/**
* bio_integrity_trim - Trim integrity vector
* @bio: bio whose integrity vector to update
- * @offset: offset to first data sector
* @sectors: number of data sectors
*
* Description: Used to trim the integrity vector in a cloned bio.
- * The ivec will be advanced corresponding to 'offset' data sectors
- * and the length will be truncated corresponding to 'len' data
- * sectors.
*/
-void bio_integrity_trim(struct bio *bio, unsigned int offset,
- unsigned int sectors)
+void bio_integrity_trim(struct bio *bio, unsigned int sectors)
{
struct bio_integrity_payload *bip = bio_integrity(bio);
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
- bio_integrity_advance(bio, offset << 9);
bip->bip_iter.bi_size = bio_integrity_bytes(bi, sectors);
}
EXPORT_SYMBOL(bio_integrity_trim);
diff --git a/block/bio.c b/block/bio.c
index fa84323..6895986 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1878,7 +1878,7 @@ struct bio *bio_split(struct bio *bio, int sectors,
split->bi_iter.bi_size = sectors << 9;
if (bio_integrity(split))
- bio_integrity_trim(split, 0, sectors);
+ bio_integrity_trim(split, sectors);
bio_advance(bio, split->bi_iter.bi_size);
@@ -1909,7 +1909,7 @@ void bio_trim(struct bio *bio, int offset, int size)
bio->bi_iter.bi_size = size;
if (bio_integrity(bio))
- bio_integrity_trim(bio, 0, size);
+ bio_integrity_trim(bio, size);
}
EXPORT_SYMBOL_GPL(bio_trim);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index dfb7597..e54ecdd 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1102,7 +1102,7 @@ static int clone_bio(struct dm_target_io *tio, struct bio *bio,
clone->bi_iter.bi_size = to_bytes(len);
if (bio_integrity(bio))
- bio_integrity_trim(clone, 0, len);
+ bio_integrity_trim(clone, len);
return 0;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 00b086a..350f71d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -732,7 +732,7 @@ extern bool bio_integrity_enabled(struct bio *bio);
extern int bio_integrity_prep(struct bio *);
extern void bio_integrity_endio(struct bio *);
extern void bio_integrity_advance(struct bio *, unsigned int);
-extern void bio_integrity_trim(struct bio *, unsigned int, unsigned int);
+extern void bio_integrity_trim(struct bio *, unsigned int);
extern int bio_integrity_clone(struct bio *, struct bio *, gfp_t);
extern int bioset_integrity_create(struct bio_set *, int);
extern void bioset_integrity_free(struct bio_set *);
@@ -782,8 +782,7 @@ static inline void bio_integrity_advance(struct bio *bio,
return;
}
-static inline void bio_integrity_trim(struct bio *bio, unsigned int offset,
- unsigned int sectors)
+static inline void bio_integrity_trim(struct bio *bio, unsigned int sectors)
{
return;
}
--
2.9.3
Signed-off-by: Dmitry Monakhov <[email protected]>
---
block/bio.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/block/bio.c b/block/bio.c
index e75878f..fa84323 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1907,6 +1907,10 @@ void bio_trim(struct bio *bio, int offset, int size)
bio_advance(bio, offset << 9);
bio->bi_iter.bi_size = size;
+
+ if (bio_integrity(bio))
+ bio_integrity_trim(bio, 0, size);
+
}
EXPORT_SYMBOL_GPL(bio_trim);
--
2.9.3
If bio has no data, such as ones from blkdev_issue_flush(),
then we have nothing to protect.
This patch prevent bugon like follows:
kfree_debugcheck: out of range ptr ac1fa1d106742a5ah
kernel BUG at mm/slab.c:2773!
invalid opcode: 0000 [#1] SMP
Modules linked in: bcache
CPU: 0 PID: 4428 Comm: xfs_io Tainted: G W 4.11.0-rc4-ext4-00041-g2ef0043-dirty #43
Hardware name: Virtuozzo KVM, BIOS seabios-1.7.5-11.vz7.4 04/01/2014
task: ffff880137786440 task.stack: ffffc90000ba8000
RIP: 0010:kfree_debugcheck+0x25/0x2a
RSP: 0018:ffffc90000babde0 EFLAGS: 00010082
RAX: 0000000000000034 RBX: ac1fa1d106742a5a RCX: 0000000000000007
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88013f3ccb40
RBP: ffffc90000babde8 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000fcb76420 R11: 00000000725172ed R12: 0000000000000282
R13: ffffffff8150e766 R14: ffff88013a145e00 R15: 0000000000000001
FS: 00007fb09384bf40(0000) GS:ffff88013f200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd0172f9e40 CR3: 0000000137fa9000 CR4: 00000000000006f0
Call Trace:
kfree+0xc8/0x1b3
bio_integrity_free+0xc3/0x16b
bio_free+0x25/0x66
bio_put+0x14/0x26
blkdev_issue_flush+0x7a/0x85
blkdev_fsync+0x35/0x42
vfs_fsync_range+0x8e/0x9f
vfs_fsync+0x1c/0x1e
do_fsync+0x31/0x4a
SyS_fsync+0x10/0x14
entry_SYSCALL_64_fastpath+0x1f/0xc2
Signed-off-by: Dmitry Monakhov <[email protected]>
---
block/bio-integrity.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 5384713..b5009a8 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -175,6 +175,9 @@ bool bio_integrity_enabled(struct bio *bio)
if (bio_op(bio) != REQ_OP_READ && bio_op(bio) != REQ_OP_WRITE)
return false;
+ if (!bio_sectors(bio))
+ return false;
+
/* Already protected? */
if (bio_integrity(bio))
return false;
--
2.9.3
On 04/03/2017 01:23 AM, Dmitry Monakhov wrote:
> @@ -66,12 +67,15 @@ struct bvec_iter {
> .bv_offset = bvec_iter_offset((bvec), (iter)), \
> })
>
> -static inline void bvec_iter_advance(const struct bio_vec *bv,
> +static inline int bvec_iter_advance(const struct bio_vec *bv,
> struct bvec_iter *iter,
> unsigned bytes)
> {
> - WARN_ONCE(bytes > iter->bi_size,
> - "Attempted to advance past end of bvec iter\n");
> + if(unlikely(bytes > iter->bi_size)) {
> + WARN(1, "Attempted to advance past end of bvec iter\n");
> + iter->bi_size = 0;
> + return -EINVAL;
> + }
if (WARN_ONCE(bytes > iter->bi_size,
"Attempted to advance past end of bvec iter\n")) {
...
would be cleaner.
--
Jens Axboe
Dmitry Monakhov <[email protected]> writes:
Dmitry,
> This patch set fix various problems spotted during T10/DIF integrity
> machinery testing.
>
> TOC:
> ## Fix various bugs in T10/DIF/DIX infrastructure
> 0001-bio-integrity-Do-not-allocate-integrity-context-for-fsync
> 0002-bio-integrity-save-original-iterator-for-verify-stage
> 0003-bio-integrity-bio_trim-should-truncate-integrity-vec
> 0004-bio-integrity-fix-interface-for-bio_integrity_trim
> ## Cleanup T10/DIF/DIX infrastructure
> 0005-bio-integrity-add-bio_integrity_setup-helper
> 0006-T10-Move-opencoded-contants-to-common-header
> ## General bulletproof protection for block layer
> 0007-Guard-bvec-iteration-logic-v2
Looks like a nice cleanup of some of the things that have rotted a bit
as a result of the immutable bvec efforts. No major objections from
here. I'll try your series on my qual setup tomorrow to make sure
everything is working correctly.
--
Martin K. Petersen Oracle Linux Engineering
Looks good,
Reviewed-by: Christoph Hellwig <[email protected]>
This is a pretty big increase in the bio_integrity_payload size,
but I guess we can't get around it..
Reviewed-by: Christoph Hellwig <[email protected]>
Looks good,
Reviewed-by: Christoph Hellwig <[email protected]>
On Mon, Apr 03, 2017 at 11:23:29AM +0400, Dmitry Monakhov wrote:
> bio_integrity_trim inherent it's interface from bio_trim and accept
> offset and size, but this API is error prone because data offset
> must always be insync with bio's data offset. That is why we have
> integrity update hook in bio_advance()
>
> So only meaningful offset is 0. Let's just remove it completely.
I think we can get rid of size as well and derive it from the bio,
can't we?
On Mon, Apr 03, 2017 at 11:23:30AM +0400, Dmitry Monakhov wrote:
> Currently all integrity prep hooks are open-coded, and if prepare fails
> we ignore it's code and fail bio with EIO. Let's return real error to
> upper layer, so later caller may react accordingly. For example retry in
> case of ENOMEM.
bio_integrity_enabled and bio_integrity_prep seem to be unused outside
of bio_integrity_setup, so they can be removed / folded into
bio_integrity_setup. Which at this point might just keep the
bio_integrity_prep name to fit into the blocking traditions :)
Also please update Documentation/block/data-integrity.txt for your
changes and add a kerneldoc comment for the new function.
> - if ((src->ref_tag == 0xffffffff) ||
> - (src->app_tag == 0xffff)) {
> + if ((src->ref_tag == T10_REF_ESCAPE) ||
> + (src->app_tag == T10_APP_ESCAPE)) {
Please remove the inner braces while you're at it (also later in the
patch).
> index 9fba9dd..c96845c 100644
> --- a/include/linux/t10-pi.h
> +++ b/include/linux/t10-pi.h
> @@ -24,6 +24,9 @@ enum t10_dif_type {
> T10_PI_TYPE3_PROTECTION = 0x3,
> };
>
> +static const __be16 T10_APP_ESCAPE = (__force __be16) 0xffff;
> +static const __be32 T10_REF_ESCAPE = (__force __be32) 0xffffffff;
I'd do this as:
#define T10_APP_ESCAPE cpu_to_be16(0xffff);
#define T10_REF_ESCAPE cpu_to_be32(0xffffffff);
This avoids relying on the compiler to merge constants, and also gets
the endianess annotation right instead of force escaping it.
Christoph Hellwig <[email protected]> writes:
> This is a pretty big increase in the bio_integrity_payload size,
> but I guess we can't get around it..
Yes, everybody hate this solution, me too, but I've stated with
other approach and it is appeaded to be very ugly.
My idea was that we have two types of iterator incrementors: bio_advance()
and bio_xx_complete, First one is called during split, later is called
on completion ( req_bio_endio() ) . So we can add new field "bi_done" to
iterator which has similar meaning as bi_bvec_done, but at full iterator
scope. It is incremented during completion, but before end_io.
Chain bios will propogate bi_done to parent bio to parent one.
On ->vefify_fn() iterator will be rewinded (counter part of bvec_advance) to
iter->bi_done bytes, so we will get oritinal iterator.
I've even prepare a patch for this idea and it looks big and awful.
Even more it does not works if chained bios overlapts (raid1,raid10,
etc).
But... at the time I've wrote this email I've realized that I do not
care about what happen with chained bios. The only thing is important
is parent bio and how far it was advanced. If bi_done is incremented
inside bvec_iter_advance() I can be shure that at the moment
->bi_end_io()
original position can be restored by rewinding back to io_done bytes.
I'll try to implement this.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
On Mon, Apr 3, 2017 at 3:23 PM, Dmitry Monakhov <[email protected]> wrote:
> Currently if some one try to advance bvec beyond it's size we simply
> dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
> This simply means that we endup dereferencing/corrupting random memory
> region.
>
> Sane reaction would be to propagate error back to calling context
> But bvec_iter_advance's calling context is not always good for error
> handling. For safity reason let truncate iterator size to zero which
IMO, we can avoid continuing to iterate by checking the return value,
and looks it is rude to just set iterator size as 0.
> will break external iteration loop which prevent us from unpredictable
> memory range corruption. And even it caller ignores an error, it will
> corrupt it's own bvecs, not others.
>
> This patch does:
> - Return error back to caller with hope that it will react on this
> - Truncate iterator size
>
> Code was added long time ago here 4550dd6c, luckily no one hit it
> in real life :)
>
> changes since V1:
> - Replace BUG_ON with error logic.
>
> Signed-off-by: Dmitry Monakhov <[email protected]>
> ---
> drivers/nvdimm/blk.c | 4 +++-
> drivers/nvdimm/btt.c | 4 +++-
> include/linux/bio.h | 8 ++++++--
> include/linux/bvec.h | 11 ++++++++---
> 4 files changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
> index 1edb3f3..04c3075 100644
> --- a/drivers/nvdimm/blk.c
> +++ b/drivers/nvdimm/blk.c
> @@ -106,7 +106,9 @@ static int nd_blk_rw_integrity(struct nd_namespace_blk *nsblk,
>
> len -= cur_len;
> dev_offset += cur_len;
> - bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
> + err = bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
> + if (err)
> + return err;
> }
>
> return err;
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index 03ded8d..3f3aa7b 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -942,7 +942,9 @@ static int btt_rw_integrity(struct btt *btt, struct bio_integrity_payload *bip,
>
> len -= cur_len;
> meta_nsoff += cur_len;
> - bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
> + ret = bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
> + if (ret)
> + return ret;
> }
>
> return ret;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 0c1c95c..8bf1564 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -168,8 +168,12 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
>
> if (bio_no_advance_iter(bio))
> iter->bi_size -= bytes;
> - else
> - bvec_iter_advance(bio->bi_io_vec, iter, bytes);
> + else {
> + int err;
> + err = bvec_iter_advance(bio->bi_io_vec, iter, bytes);
> + if (unlikely(err))
> + bio->bi_error = err;
> + }
> }
>
> #define __bio_for_each_segment(bvl, bio, iter, start) \
> diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> index 89b65b8..c117f1a 100644
> --- a/include/linux/bvec.h
> +++ b/include/linux/bvec.h
> @@ -22,6 +22,7 @@
>
> #include <linux/kernel.h>
> #include <linux/bug.h>
> +#include <linux/errno.h>
>
> /*
> * was unsigned short, but we might as well be ready for > 64kB I/O pages
> @@ -66,12 +67,15 @@ struct bvec_iter {
> .bv_offset = bvec_iter_offset((bvec), (iter)), \
> })
>
> -static inline void bvec_iter_advance(const struct bio_vec *bv,
> +static inline int bvec_iter_advance(const struct bio_vec *bv,
> struct bvec_iter *iter,
> unsigned bytes)
> {
> - WARN_ONCE(bytes > iter->bi_size,
> - "Attempted to advance past end of bvec iter\n");
> + if(unlikely(bytes > iter->bi_size)) {
> + WARN(1, "Attempted to advance past end of bvec iter\n");
> + iter->bi_size = 0;
> + return -EINVAL;
> + }
>
> while (bytes) {
> unsigned iter_len = bvec_iter_len(bv, *iter);
> @@ -86,6 +90,7 @@ static inline void bvec_iter_advance(const struct bio_vec *bv,
> iter->bi_idx++;
> }
> }
> + return 0;
> }
>
> #define for_each_bvec(bvl, bio_vec, iter, start) \
> --
> 2.9.3
>
--
Ming Lei
Ming Lei <[email protected]> writes:
> On Mon, Apr 3, 2017 at 3:23 PM, Dmitry Monakhov <[email protected]> wrote:
>> Currently if some one try to advance bvec beyond it's size we simply
>> dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
>> This simply means that we endup dereferencing/corrupting random memory
>> region.
>>
>> Sane reaction would be to propagate error back to calling context
>> But bvec_iter_advance's calling context is not always good for error
>> handling. For safity reason let truncate iterator size to zero which
>
> IMO, we can avoid continuing to iterate by checking the return value,
> and looks it is rude to just set iterator size as 0.
But situation itself is horrible already. IMHO this is BUG_ON situation,
but since Linus hate bugons, I try to replace with something loud, but
very safe. Since there is no guarantee that caller will ignore an error
and try to dereference bvec the only safe thing we can to is to clamp
iterator to zero to prevent any possible usage in future.
>
>> will break external iteration loop which prevent us from unpredictable
>> memory range corruption. And even it caller ignores an error, it will
>> corrupt it's own bvecs, not others.
>>
>> This patch does:
>> - Return error back to caller with hope that it will react on this
>> - Truncate iterator size
>>
>> Code was added long time ago here 4550dd6c, luckily no one hit it
>> in real life :)
>>
>> changes since V1:
>> - Replace BUG_ON with error logic.
>>
>> Signed-off-by: Dmitry Monakhov <[email protected]>
>> ---
>> drivers/nvdimm/blk.c | 4 +++-
>> drivers/nvdimm/btt.c | 4 +++-
>> include/linux/bio.h | 8 ++++++--
>> include/linux/bvec.h | 11 ++++++++---
>> 4 files changed, 20 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
>> index 1edb3f3..04c3075 100644
>> --- a/drivers/nvdimm/blk.c
>> +++ b/drivers/nvdimm/blk.c
>> @@ -106,7 +106,9 @@ static int nd_blk_rw_integrity(struct nd_namespace_blk *nsblk,
>>
>> len -= cur_len;
>> dev_offset += cur_len;
>> - bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
>> + err = bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
>> + if (err)
>> + return err;
>> }
>>
>> return err;
>> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
>> index 03ded8d..3f3aa7b 100644
>> --- a/drivers/nvdimm/btt.c
>> +++ b/drivers/nvdimm/btt.c
>> @@ -942,7 +942,9 @@ static int btt_rw_integrity(struct btt *btt, struct bio_integrity_payload *bip,
>>
>> len -= cur_len;
>> meta_nsoff += cur_len;
>> - bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
>> + ret = bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
>> + if (ret)
>> + return ret;
>> }
>>
>> return ret;
>> diff --git a/include/linux/bio.h b/include/linux/bio.h
>> index 0c1c95c..8bf1564 100644
>> --- a/include/linux/bio.h
>> +++ b/include/linux/bio.h
>> @@ -168,8 +168,12 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
>>
>> if (bio_no_advance_iter(bio))
>> iter->bi_size -= bytes;
>> - else
>> - bvec_iter_advance(bio->bi_io_vec, iter, bytes);
>> + else {
>> + int err;
>> + err = bvec_iter_advance(bio->bi_io_vec, iter, bytes);
>> + if (unlikely(err))
>> + bio->bi_error = err;
>> + }
>> }
>>
>> #define __bio_for_each_segment(bvl, bio, iter, start) \
>> diff --git a/include/linux/bvec.h b/include/linux/bvec.h
>> index 89b65b8..c117f1a 100644
>> --- a/include/linux/bvec.h
>> +++ b/include/linux/bvec.h
>> @@ -22,6 +22,7 @@
>>
>> #include <linux/kernel.h>
>> #include <linux/bug.h>
>> +#include <linux/errno.h>
>>
>> /*
>> * was unsigned short, but we might as well be ready for > 64kB I/O pages
>> @@ -66,12 +67,15 @@ struct bvec_iter {
>> .bv_offset = bvec_iter_offset((bvec), (iter)), \
>> })
>>
>> -static inline void bvec_iter_advance(const struct bio_vec *bv,
>> +static inline int bvec_iter_advance(const struct bio_vec *bv,
>> struct bvec_iter *iter,
>> unsigned bytes)
>> {
>> - WARN_ONCE(bytes > iter->bi_size,
>> - "Attempted to advance past end of bvec iter\n");
>> + if(unlikely(bytes > iter->bi_size)) {
>> + WARN(1, "Attempted to advance past end of bvec iter\n");
>> + iter->bi_size = 0;
>> + return -EINVAL;
>> + }
>>
>> while (bytes) {
>> unsigned iter_len = bvec_iter_len(bv, *iter);
>> @@ -86,6 +90,7 @@ static inline void bvec_iter_advance(const struct bio_vec *bv,
>> iter->bi_idx++;
>> }
>> }
>> + return 0;
>> }
>>
>> #define for_each_bvec(bvl, bio_vec, iter, start) \
>> --
>> 2.9.3
>>
>
>
>
> --
> Ming Lei
On Tue, Apr 4, 2017 at 11:19 PM, Dmitry Monakhov <[email protected]> wrote:
> Ming Lei <[email protected]> writes:
>
>> On Mon, Apr 3, 2017 at 3:23 PM, Dmitry Monakhov <[email protected]> wrote:
>>> Currently if some one try to advance bvec beyond it's size we simply
>>> dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
>>> This simply means that we endup dereferencing/corrupting random memory
>>> region.
>>>
>>> Sane reaction would be to propagate error back to calling context
>>> But bvec_iter_advance's calling context is not always good for error
>>> handling. For safity reason let truncate iterator size to zero which
>>
>> IMO, we can avoid continuing to iterate by checking the return value,
>> and looks it is rude to just set iterator size as 0.
> But situation itself is horrible already. IMHO this is BUG_ON situation,
Not sure it is a real BUG_ON() since corrupt bvec array shouldn't happen
because we usually don't modify bvec array directly and just copy to local
variable for further access, but dereferencing random memory might happen.
> but since Linus hate bugons, I try to replace with something loud, but
> very safe. Since there is no guarantee that caller will ignore an error
> and try to dereference bvec the only safe thing we can to is to clamp
> iterator to zero to prevent any possible usage in future.
Since you prevent the case from happening, no dereference invalid bvec
can happen any more, but may cause dead loop. Setting iterator size as
zero can break the dead loop, but the driver still may not know this error
and continue to do its following work.
Don't have a better idea now, and looks it is fine to set iter.bi_size as
zero at the beginning of guarding bvec iteration.
Thanks,
Ming Lei