Hello, all.
These patches are available in the following git tree but it's on top
of the previous blk-map-related-fixes patchset which needs some
updating, so this posting is just for review and comments. This
patchset needs to spend quite some time in RCs even if it gets acked
eventually, so definitely no .30 material. Please also note that I
haven't updated the comment about bio chaining violating queue limit,
so please ignore that part.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blk-map
http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=blk-map
The goals of this patchset are
* Clean up and refactor blk/bio PC map API interface and
implementation.
* Add blk_rq_map_kern_sg() so that in-kernel user can issue
multi-segment PC request. scsi_lib.c currently does this directly
by building rq itself.
* This is also in the general direction of making block layer
interface more strictk. For PC requests, users only get to deal
with requests and buffers.
This patchset uses sg list to pass multi-segment memory area into
blk_rq_map_kern_sg() and also uses sgl to simplify internal
implementation.
Boaz Harrosh has some objections against both of them. For the
former, I can't really think of using anything other than sgl for the
API. For the second point, the use of sgl inside bio for map
implementation does add overhead for blk_rq_map_user*() calls because
it first maps to sgl and then gets interpreted to bvec. The reason
that sgl is used for implementation is because it has richer API,
mainly sg_mapping_iter.
So, it basically comes down to whether reducing the overhead of
alloc/copying and freeing sgl one more time for each SG_IO call is
worth the added complexity. I don't really think it will show up
anywhere but if this is a problem, switching over to bvec for internal
implementation wouldn't be too difficult although then copying back
and forth would require some hairy code. Also, please note that the
new code is more generic in that it can handle highpage io and
bouncing buffers without adding any special provision for them.
I think the root problem is that we really don't have a proper data
structure and API for handling in-kernel segments. iovec can't cover
high pages. sgl is currently the closest thing but it's fat due for
the dma mapping addresses. I think what we need to do is improve sgl
(sgt) such that it maintains separate lists for kernel segments and
DMA mappings and enable it to be linked, spliced, cloned or whatever,
so we don't have to re-map back and forth. Well, that's for another
time.
This patchset contains the following 17 patches.
0001-blk-map-move-blk_rq_map_user-below-blk_rq_map_use.patch
0002-scatterlist-improve-atomic-mapping-handling-in-mapp.patch
0003-blk-map-improve-alignment-checking-for-blk_rq_map_u.patch
0004-bio-bio.h-cleanup.patch
0005-bio-cleanup-rw-usage.patch
0006-blk-map-bio-use-struct-iovec-instead-of-sg_iovec.patch
0007-blk-map-bio-rename-stuff.patch
0008-bio-reimplement-bio_copy_user_iov.patch
0009-bio-collapse-__bio_map_user_iov-__bio_unmap_user.patch
0010-bio-use-bio_create_from_sgl-in-bio_map_user_iov.patch
0011-bio-add-sgl-source-support-to-bci-and-implement-bio.patch
0012-bio-implement-bio_-map-copy-_kern_sgl.patch
0013-blk-map-implement-blk_rq_map_kern_sgl.patch
0014-scsi-replace-custom-rq-mapping-with-blk_rq_map_kern.patch
0015-bio-blk-map-kill-unused-stuff-and-un-export-interna.patch
0016-blk-map-bio-remove-superflous-len-parameter-from-b.patch
0017-blk-map-bio-remove-superflous-q-from-blk_rq_map_-u.patch
0001 is misc prep. 0002 improves sg_miter so that it can be used for
copying between sgls. 0003 update alignment checking during direct
mapping (I think this is correct because there's no other way for low
level drivers to express segment length requirement). 0004-0007
cleans up the code for future updates.
0008-0013 refactors map code and adds blk_rq_map_kern_sgl(). 0014
drops custom sg mapping in scsi_lib.c and make it use
blk_rq_map_kern_sgl(). 0015 kills now unused stuff including
blk_rq_append_bio().
0016-0017 removes duplicate parameters from API.
This patchset is on top of the current linux-2.6-block#for-linus[1] +
blk-map-related-fixes patchset[2].
block/blk-map.c | 216 ++----
block/bsg.c | 5
block/scsi_ioctl.c | 21
drivers/block/pktcdvd.c | 2
drivers/cdrom/cdrom.c | 2
drivers/mmc/host/sdhci.c | 4
drivers/scsi/device_handler/scsi_dh_alua.c | 2
drivers/scsi/device_handler/scsi_dh_emc.c | 2
drivers/scsi/device_handler/scsi_dh_rdac.c | 2
drivers/scsi/scsi_lib.c | 113 ---
drivers/scsi/scsi_tgt_lib.c | 3
drivers/scsi/sg.c | 8
drivers/scsi/st.c | 3
fs/bio.c | 980 +++++++++++++++--------------
include/linux/bio.h | 170 ++---
include/linux/blkdev.h | 23
include/linux/scatterlist.h | 11
lib/scatterlist.c | 49 +
18 files changed, 806 insertions(+), 810 deletions(-)
Thanks.
--
tejun
[1] 714ed0cf62319b14dc327273a7339a9a199fe046
[2] http://thread.gmane.org/gmane.linux.kernel/815647
Impact: code reorganization
blk_rq_map_user() is about to be reimplemented using
blk_rq_map_user_iov(). Move it below blk_rq_map_user_iov() so that
the actual conversion patch is easier to read.
Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-map.c | 70 +++++++++++++++++++++++++++---------------------------
1 files changed, 35 insertions(+), 35 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index a43c93c..fdef591 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -41,41 +41,6 @@ static int __blk_rq_unmap_user(struct bio *bio)
}
/**
- * blk_rq_map_user - map user data to a request, for REQ_TYPE_BLOCK_PC usage
- * @q: request queue where request should be inserted
- * @rq: request structure to fill
- * @map_data: pointer to the rq_map_data holding pages (if necessary)
- * @ubuf: the user buffer
- * @len: length of user data
- * @gfp_mask: memory allocation flags
- *
- * Description:
- * Data will be mapped directly for zero copy I/O, if possible. Otherwise
- * a kernel bounce buffer is used.
- *
- * A matching blk_rq_unmap_user() must be issued at the end of I/O, while
- * still in process context.
- *
- * Note: The mapped bio may need to be bounced through blk_queue_bounce()
- * before being submitted to the device, as pages mapped may be out of
- * reach. It's the callers responsibility to make sure this happens. The
- * original bio must be passed back in to blk_rq_unmap_user() for proper
- * unmapping.
- */
-int blk_rq_map_user(struct request_queue *q, struct request *rq,
- struct rq_map_data *map_data, void __user *ubuf,
- unsigned long len, gfp_t gfp_mask)
-{
- struct sg_iovec iov;
-
- iov.iov_base = ubuf;
- iov.iov_len = len;
-
- return blk_rq_map_user_iov(q, rq, map_data, &iov, 1, len, gfp_mask);
-}
-EXPORT_SYMBOL(blk_rq_map_user);
-
-/**
* blk_rq_map_user_iov - map user data to a request, for REQ_TYPE_BLOCK_PC usage
* @q: request queue where request should be inserted
* @rq: request to map data to
@@ -151,6 +116,41 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
EXPORT_SYMBOL(blk_rq_map_user_iov);
/**
+ * blk_rq_map_user - map user data to a request, for REQ_TYPE_BLOCK_PC usage
+ * @q: request queue where request should be inserted
+ * @rq: request structure to fill
+ * @map_data: pointer to the rq_map_data holding pages (if necessary)
+ * @ubuf: the user buffer
+ * @len: length of user data
+ * @gfp_mask: memory allocation flags
+ *
+ * Description:
+ * Data will be mapped directly for zero copy I/O, if possible. Otherwise
+ * a kernel bounce buffer is used.
+ *
+ * A matching blk_rq_unmap_user() must be issued at the end of I/O, while
+ * still in process context.
+ *
+ * Note: The mapped bio may need to be bounced through blk_queue_bounce()
+ * before being submitted to the device, as pages mapped may be out of
+ * reach. It's the callers responsibility to make sure this happens. The
+ * original bio must be passed back in to blk_rq_unmap_user() for proper
+ * unmapping.
+ */
+int blk_rq_map_user(struct request_queue *q, struct request *rq,
+ struct rq_map_data *map_data, void __user *ubuf,
+ unsigned long len, gfp_t gfp_mask)
+{
+ struct sg_iovec iov;
+
+ iov.iov_base = ubuf;
+ iov.iov_len = len;
+
+ return blk_rq_map_user_iov(q, rq, map_data, &iov, 1, len, gfp_mask);
+}
+EXPORT_SYMBOL(blk_rq_map_user);
+
+/**
* blk_rq_unmap_user - unmap a request with user data
* @bio: start of bio list
*
--
1.6.0.2
Impact: stricter more consistent alignment checking
Move all alignment checks in blk_rq_map_user_iov() to
__bio_map_user_iov() which has to walk the iov at the beginning
anyway. Improve alignment check such that it checks for both the
start address and length of each segment.
Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-map.c | 21 +++++----------------
fs/bio.c | 20 +++++++++++++-------
2 files changed, 18 insertions(+), 23 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index fdef591..b0b65ef 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -67,28 +67,17 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
struct rq_map_data *map_data, struct sg_iovec *iov,
int iov_count, unsigned int len, gfp_t gfp_mask)
{
- struct bio *bio;
- int i, read = rq_data_dir(rq) == READ;
- int unaligned = 0;
+ struct bio *bio = ERR_PTR(-EINVAL);
+ int read = rq_data_dir(rq) == READ;
if (!iov || iov_count <= 0)
return -EINVAL;
- for (i = 0; i < iov_count; i++) {
- unsigned long uaddr = (unsigned long)iov[i].iov_base;
-
- if (uaddr & queue_dma_alignment(q)) {
- unaligned = 1;
- break;
- }
- }
-
- if (unaligned || (q->dma_pad_mask & len) || map_data)
+ if (!map_data)
+ bio = bio_map_user_iov(q, NULL, iov, iov_count, read, gfp_mask);
+ if (bio == ERR_PTR(-EINVAL))
bio = bio_copy_user_iov(q, map_data, iov, iov_count, read,
gfp_mask);
- else
- bio = bio_map_user_iov(q, NULL, iov, iov_count, read, gfp_mask);
-
if (IS_ERR(bio))
return PTR_ERR(bio);
diff --git a/fs/bio.c b/fs/bio.c
index 728bef9..80f61ed 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -26,6 +26,7 @@
#include <linux/mempool.h>
#include <linux/workqueue.h>
#include <linux/blktrace_api.h>
+#include <linux/pfn.h>
#include <trace/block.h>
#include <scsi/sg.h> /* for struct sg_iovec */
@@ -921,6 +922,7 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
int write_to_vm, gfp_t gfp_mask)
{
int i, j;
+ size_t tot_len = 0;
int nr_pages = 0;
struct page **pages;
struct bio *bio;
@@ -930,18 +932,22 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
for (i = 0; i < iov_count; i++) {
unsigned long uaddr = (unsigned long)iov[i].iov_base;
unsigned long len = iov[i].iov_len;
- unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
- unsigned long start = uaddr >> PAGE_SHIFT;
- nr_pages += end - start;
+ nr_pages += PFN_UP(uaddr + len) - PFN_DOWN(uaddr);
+ tot_len += len;
+
/*
- * buffer must be aligned to at least hardsector size for now
+ * Each segment must be aligned on DMA boundary. The
+ * last one may have unaligned length as long as the
+ * total length is aligned to DMA padding alignment.
*/
- if (uaddr & queue_dma_alignment(q))
+ if (i == count - 1)
+ len = 0;
+ if ((uaddr | len) & queue_dma_alignment(q))
return ERR_PTR(-EINVAL);
}
-
- if (!nr_pages)
+ /* and total length on DMA padding alignment */
+ if (!nr_pages || tot_len & q->dma_pad_mask)
return ERR_PTR(-EINVAL);
bio = bio_kmalloc(gfp_mask, nr_pages);
--
1.6.0.2
Impact: cleanup
bio is about to go through major update. Take the chance and cleanup
bio.h such that
* forward declaration of structs are in one place.
* collect bio_copy/map*() prototypes in one place.
* function prototypes don't have unncessary extern in front of them
and have their parameters named. (dropping extern makes it much
easier to have named parameters)
* dummy integrity APIs are inline functions instead of macros so that
type check still occurs and unused variable warnings aren't
triggered.
* fix return values of dummy bio_integrity_set/get_tag(),
bio_integrity_prep() and bio_integrity_clone().
Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/bio.h | 178 +++++++++++++++++++++++++++++++--------------------
1 files changed, 108 insertions(+), 70 deletions(-)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8647dd9..4bf7442 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -354,56 +354,64 @@ struct bio_pair {
atomic_t cnt;
int error;
};
-extern struct bio_pair *bio_split(struct bio *bi, int first_sectors);
-extern void bio_pair_release(struct bio_pair *dbio);
-extern struct bio_set *bioset_create(unsigned int, unsigned int);
-extern void bioset_free(struct bio_set *);
-
-extern struct bio *bio_alloc(gfp_t, int);
-extern struct bio *bio_kmalloc(gfp_t, int);
-extern struct bio *bio_alloc_bioset(gfp_t, int, struct bio_set *);
-extern void bio_put(struct bio *);
-extern void bio_free(struct bio *, struct bio_set *);
-
-extern void bio_endio(struct bio *, int);
struct request_queue;
-extern int bio_phys_segments(struct request_queue *, struct bio *);
-
-extern void __bio_clone(struct bio *, struct bio *);
-extern struct bio *bio_clone(struct bio *, gfp_t);
-
-extern void bio_init(struct bio *);
-
-extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
-extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
- unsigned int, unsigned int);
-extern int bio_get_nr_vecs(struct block_device *);
-extern sector_t bio_sector_offset(struct bio *, unsigned short, unsigned int);
-extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
- unsigned long, unsigned int, int, gfp_t);
struct sg_iovec;
struct rq_map_data;
-extern struct bio *bio_map_user_iov(struct request_queue *,
- struct block_device *,
- struct sg_iovec *, int, int, gfp_t);
-extern void bio_unmap_user(struct bio *);
-extern struct bio *bio_map_kern(struct request_queue *, void *, unsigned int,
- gfp_t);
-extern struct bio *bio_copy_kern(struct request_queue *, void *, unsigned int,
- gfp_t, int);
-extern void bio_set_pages_dirty(struct bio *bio);
-extern void bio_check_pages_dirty(struct bio *bio);
-extern struct bio *bio_copy_user(struct request_queue *, struct rq_map_data *,
- unsigned long, unsigned int, int, gfp_t);
-extern struct bio *bio_copy_user_iov(struct request_queue *,
- struct rq_map_data *, struct sg_iovec *,
- int, int, gfp_t);
-extern int bio_uncopy_user(struct bio *);
+
+struct bio_pair *bio_split(struct bio *bi, int first_sectors);
+void bio_pair_release(struct bio_pair *dbio);
+
+struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad);
+void bioset_free(struct bio_set *bs);
+
+struct bio *bio_alloc(gfp_t gfp_mask, int nr_iovecs);
+struct bio *bio_kmalloc(gfp_t gfp_mask, int nr_iovecs);
+struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs);
+void bio_put(struct bio *bio);
+void bio_free(struct bio *bio, struct bio_set *bs);
+
+void bio_endio(struct bio *bio, int error);
+int bio_phys_segments(struct request_queue *q, struct bio *bio);
+
+void __bio_clone(struct bio *bio, struct bio *bio_src);
+struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask);
+
+void bio_init(struct bio *bio);
+
+int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
+ unsigned int offset);
+int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page *page,
+ unsigned int len, unsigned int offset);
+int bio_get_nr_vecs(struct block_device *bdev);
+sector_t bio_sector_offset(struct bio *bio, unsigned short index,
+ unsigned int offset);
+struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
+ unsigned long uaddr, unsigned int len,
+ int write_to_vm, gfp_t gfp_mask);
+struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
+ struct sg_iovec *iov, int iov_count,
+ int write_to_vm, gfp_t gfp_mask);
+void bio_unmap_user(struct bio *bio);
+struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
+ unsigned long uaddr, unsigned int len,
+ int write_to_vm, gfp_t gfp_mask);
+struct bio *bio_copy_user_iov(struct request_queue *q,
+ struct rq_map_data *map_data,
+ struct sg_iovec *iov, int iov_count,
+ int write_to_vm, gfp_t gfp_mask);
+int bio_uncopy_user(struct bio *bio);
+struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
+ gfp_t gfp_mask);
+struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
+ gfp_t gfp_mask, int reading);
+void bio_set_pages_dirty(struct bio *bio);
+void bio_check_pages_dirty(struct bio *bio);
void zero_fill_bio(struct bio *bio);
-extern struct bio_vec *bvec_alloc_bs(gfp_t, int, unsigned long *, struct bio_set *);
-extern void bvec_free_bs(struct bio_set *, struct bio_vec *, unsigned int);
-extern unsigned int bvec_nr_vecs(unsigned short idx);
+struct bio_vec *bvec_alloc_bs(gfp_t gfp_mask, int nr, unsigned long *idx,
+ struct bio_set *bs);
+void bvec_free_bs(struct bio_set *bs, struct bio_vec *bv, unsigned int idx);
+unsigned int bvec_nr_vecs(unsigned short idx);
/*
* Allow queuer to specify a completion CPU for this bio
@@ -516,34 +524,64 @@ static inline int bio_has_data(struct bio *bio)
#define bip_for_each_vec(bvl, bip, i) \
__bip_for_each_vec(bvl, bip, i, (bip)->bip_idx)
-#define bio_integrity(bio) (bio->bi_integrity != NULL)
-
-extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int);
-extern void bio_integrity_free(struct bio *);
-extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int);
-extern int bio_integrity_enabled(struct bio *bio);
-extern int bio_integrity_set_tag(struct bio *, void *, unsigned int);
-extern int bio_integrity_get_tag(struct bio *, void *, unsigned int);
-extern int bio_integrity_prep(struct bio *);
-extern void bio_integrity_endio(struct bio *, int);
-extern void bio_integrity_advance(struct bio *, unsigned int);
-extern void bio_integrity_trim(struct bio *, unsigned int, unsigned int);
-extern void bio_integrity_split(struct bio *, struct bio_pair *, int);
-extern int bio_integrity_clone(struct bio *, struct bio *, gfp_t);
+static inline bool bio_integrity(struct bio *bio)
+{
+ return bio->bi_integrity != NULL;
+}
+
+struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
+ gfp_t gfp_mask, unsigned int nr_vecs);
+void bio_integrity_free(struct bio *bio);
+int bio_integrity_add_page(struct bio *bio, struct page *page,
+ unsigned int len, unsigned int offset);
+int bio_integrity_enabled(struct bio *bio);
+int bio_integrity_set_tag(struct bio *bio, void *tag_buf, unsigned int len);
+int bio_integrity_get_tag(struct bio *bio, void *tag_buf, unsigned int len);
+int bio_integrity_prep(struct bio *bio);
+void bio_integrity_endio(struct bio *bio, int error);
+void bio_integrity_advance(struct bio *bio, unsigned int bytes_done);
+void bio_integrity_trim(struct bio *bio, unsigned int offset,
+ unsigned int sectors);
+void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors);
+int bio_integrity_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp_mask);
#else /* CONFIG_BLK_DEV_INTEGRITY */
-#define bio_integrity(a) (0)
-#define bio_integrity_prep(a) (0)
-#define bio_integrity_enabled(a) (0)
-#define bio_integrity_clone(a, b, c) (0)
-#define bio_integrity_free(a) do { } while (0)
-#define bio_integrity_endio(a, b) do { } while (0)
-#define bio_integrity_advance(a, b) do { } while (0)
-#define bio_integrity_trim(a, b, c) do { } while (0)
-#define bio_integrity_split(a, b, c) do { } while (0)
-#define bio_integrity_set_tag(a, b, c) do { } while (0)
-#define bio_integrity_get_tag(a, b, c) do { } while (0)
+static inline bool bio_integrity(struct bio *bio)
+{ return false; }
+static inline struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
+ gfp_t gfp_mask, unsigned int nr_vecs)
+{ return NULL; }
+static inline void bio_integrity_free(struct bio *bio)
+{ }
+static inline int bio_integrity_add_page(struct bio *bio, struct page *page,
+ unsigned int len, unsigned int offset)
+{ return 0; }
+static inline int bio_integrity_enabled(struct bio *bio)
+{ return 0; }
+static inline int bio_integrity_set_tag(struct bio *bio, void *tag_buf,
+ unsigned int len)
+{ return -1; }
+static inline int bio_integrity_get_tag(struct bio *bio, void *tag_buf,
+ unsigned int len)
+{ return -1; }
+static inline int bio_integrity_prep(struct bio *bio)
+{ return -EIO; }
+static inline void bio_integrity_endio(struct bio *bio, int error)
+{ }
+static inline void bio_integrity_advance(struct bio *bio,
+ unsigned int bytes_done)
+{ }
+static inline void bio_integrity_trim(struct bio *bio, unsigned int offset,
+ unsigned int sectors)
+{ }
+static inline void bio_integrity_split(struct bio *bio, struct bio_pair *bp,
+ int sectors)
+{ }
+
+static inline int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
+ gfp_t gfp_mask)
+{ return -EIO; }
#endif /* CONFIG_BLK_DEV_INTEGRITY */
--
1.6.0.2
Impact: cleanup
blk-map and bio use sg_iovec for addr-len segments although there
isn't anything sg-specific about the API. This is mostly due to
historical reasons. sg_iovec is by definition identical to iovec.
Use iovec instead. This removes bogus dependency on scsi sg and will
allow use of iovec helpers.
Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-map.c | 5 ++---
block/scsi_ioctl.c | 8 +++-----
fs/bio.c | 23 +++++++++++------------
include/linux/bio.h | 6 +++---
include/linux/blkdev.h | 8 ++++----
5 files changed, 23 insertions(+), 27 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index 29aa60d..4f0221a 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -5,7 +5,6 @@
#include <linux/module.h>
#include <linux/bio.h>
#include <linux/blkdev.h>
-#include <scsi/sg.h> /* for struct sg_iovec */
#include "blk.h"
@@ -64,7 +63,7 @@ static int __blk_rq_unmap_user(struct bio *bio)
* unmapping.
*/
int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
- struct rq_map_data *map_data, struct sg_iovec *iov,
+ struct rq_map_data *map_data, struct iovec *iov,
int iov_count, unsigned int len, gfp_t gfp_mask)
{
struct bio *bio = ERR_PTR(-EINVAL);
@@ -130,7 +129,7 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq,
struct rq_map_data *map_data, void __user *ubuf,
unsigned long len, gfp_t gfp_mask)
{
- struct sg_iovec iov;
+ struct iovec iov;
iov.iov_base = ubuf;
iov.iov_len = len;
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index c8e8868..73cfd91 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -289,7 +289,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
if (hdr->iovec_count) {
const int size = sizeof(struct sg_iovec) * hdr->iovec_count;
size_t iov_data_len;
- struct sg_iovec *iov;
+ struct iovec *iov;
iov = kmalloc(size, GFP_KERNEL);
if (!iov) {
@@ -304,11 +304,9 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
}
/* SG_IO howto says that the shorter of the two wins */
- iov_data_len = iov_length((struct iovec *)iov,
- hdr->iovec_count);
+ iov_data_len = iov_length(iov, hdr->iovec_count);
if (hdr->dxfer_len < iov_data_len) {
- hdr->iovec_count = iov_shorten((struct iovec *)iov,
- hdr->iovec_count,
+ hdr->iovec_count = iov_shorten(iov, hdr->iovec_count,
hdr->dxfer_len);
iov_data_len = hdr->dxfer_len;
}
diff --git a/fs/bio.c b/fs/bio.c
index 70e5153..9d13f21 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -28,7 +28,6 @@
#include <linux/blktrace_api.h>
#include <linux/pfn.h>
#include <trace/block.h>
-#include <scsi/sg.h> /* for struct sg_iovec */
DEFINE_TRACE(block_split);
@@ -656,17 +655,17 @@ int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
struct bio_map_data {
struct bio_vec *iovecs;
- struct sg_iovec *sgvecs;
+ struct iovec *sgvecs;
int nr_sgvecs;
int is_our_pages;
};
static void bio_set_map_data(struct bio_map_data *bmd, struct bio *bio,
- struct sg_iovec *iov, int iov_count,
+ struct iovec *iov, int iov_count,
int is_our_pages)
{
memcpy(bmd->iovecs, bio->bi_io_vec, sizeof(struct bio_vec) * bio->bi_vcnt);
- memcpy(bmd->sgvecs, iov, sizeof(struct sg_iovec) * iov_count);
+ memcpy(bmd->sgvecs, iov, sizeof(struct iovec) * iov_count);
bmd->nr_sgvecs = iov_count;
bmd->is_our_pages = is_our_pages;
bio->bi_private = bmd;
@@ -693,7 +692,7 @@ static struct bio_map_data *bio_alloc_map_data(int nr_segs, int iov_count,
return NULL;
}
- bmd->sgvecs = kmalloc(sizeof(struct sg_iovec) * iov_count, gfp_mask);
+ bmd->sgvecs = kmalloc(sizeof(struct iovec) * iov_count, gfp_mask);
if (bmd->sgvecs)
return bmd;
@@ -703,7 +702,7 @@ static struct bio_map_data *bio_alloc_map_data(int nr_segs, int iov_count,
}
static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
- struct sg_iovec *iov, int iov_count, int uncopy,
+ struct iovec *iov, int iov_count, int uncopy,
int do_free_page)
{
int ret = 0, i;
@@ -789,7 +788,7 @@ int bio_uncopy_user(struct bio *bio)
*/
struct bio *bio_copy_user_iov(struct request_queue *q,
struct rq_map_data *map_data,
- struct sg_iovec *iov, int iov_count, int rw,
+ struct iovec *iov, int iov_count, int rw,
gfp_t gfp_mask)
{
struct bio_map_data *bmd;
@@ -909,7 +908,7 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
unsigned long uaddr, unsigned int len, int rw,
gfp_t gfp_mask)
{
- struct sg_iovec iov;
+ struct iovec iov;
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
@@ -919,7 +918,7 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
static struct bio *__bio_map_user_iov(struct request_queue *q,
struct block_device *bdev,
- struct sg_iovec *iov, int iov_count,
+ struct iovec *iov, int iov_count,
int rw, gfp_t gfp_mask)
{
int i, j;
@@ -1044,7 +1043,7 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
unsigned long uaddr, unsigned int len, int rw,
gfp_t gfp_mask)
{
- struct sg_iovec iov;
+ struct iovec iov;
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
@@ -1053,7 +1052,7 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
}
/**
- * bio_map_user_iov - map user sg_iovec table into bio
+ * bio_map_user_iov - map user iovec table into bio
* @q: the struct request_queue for the bio
* @bdev: destination block device
* @iov: the iovec.
@@ -1065,7 +1064,7 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
* device. Returns an error pointer in case of error.
*/
struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
- struct sg_iovec *iov, int iov_count, int rw,
+ struct iovec *iov, int iov_count, int rw,
gfp_t gfp_mask)
{
struct bio *bio;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 45f56d2..8215ded 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -23,6 +23,7 @@
#include <linux/highmem.h>
#include <linux/mempool.h>
#include <linux/ioprio.h>
+#include <linux/uio.h>
#ifdef CONFIG_BLOCK
@@ -356,7 +357,6 @@ struct bio_pair {
};
struct request_queue;
-struct sg_iovec;
struct rq_map_data;
struct bio_pair *bio_split(struct bio *bi, int first_sectors);
@@ -390,7 +390,7 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
unsigned long uaddr, unsigned int len, int rw,
gfp_t gfp_mask);
struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
- struct sg_iovec *iov, int iov_count, int rw,
+ struct iovec *iov, int iov_count, int rw,
gfp_t gfp_mask);
void bio_unmap_user(struct bio *bio);
struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
@@ -398,7 +398,7 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
gfp_t gfp_mask);
struct bio *bio_copy_user_iov(struct request_queue *q,
struct rq_map_data *map_data,
- struct sg_iovec *iov, int iov_count, int rw,
+ struct iovec *iov, int iov_count, int rw,
gfp_t gfp_mask);
int bio_uncopy_user(struct bio *bio);
struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 465d6ba..d7bb20c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -19,6 +19,7 @@
#include <linux/gfp.h>
#include <linux/bsg.h>
#include <linux/smp.h>
+#include <linux/uio.h>
#include <asm/scatterlist.h>
@@ -29,7 +30,6 @@ struct elevator_queue;
struct request_pm_state;
struct blk_trace;
struct request;
-struct sg_io_hdr;
#define BLKDEV_MIN_RQ 4
#define BLKDEV_MAX_RQ 128 /* Default maximum */
@@ -781,9 +781,9 @@ extern int blk_rq_map_user(struct request_queue *, struct request *,
gfp_t);
extern int blk_rq_unmap_user(struct bio *);
extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
-extern int blk_rq_map_user_iov(struct request_queue *, struct request *,
- struct rq_map_data *, struct sg_iovec *, int,
- unsigned int, gfp_t);
+extern int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
+ struct rq_map_data *map_data, struct iovec *iov,
+ int iov_count, unsigned int len, gfp_t gfp_mask);
extern int blk_execute_rq(struct request_queue *, struct gendisk *,
struct request *, int);
extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
--
1.6.0.2
Impact: cleanup
bio confusingly uses @write_to_vm and @reading for data directions,
both of which mean the opposite of the usual block/bio convention of
using READ and WRITE w.r.t. IO devices. The only place where the
inversion is necessary is when caling get_user_pages_fast() in
bio_copy_user_iov() as the gup uses the VM convention of read/write
w.r.t. VM.
This patch converts all bio functions to use READ/WRITE rw parameter
and let the one place where inversion is necessary to rw == READ.
Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-map.c | 10 +++++-----
fs/bio.c | 50 +++++++++++++++++++++++++-------------------------
include/linux/bio.h | 18 +++++++++---------
3 files changed, 39 insertions(+), 39 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index b0b65ef..29aa60d 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -68,15 +68,15 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
int iov_count, unsigned int len, gfp_t gfp_mask)
{
struct bio *bio = ERR_PTR(-EINVAL);
- int read = rq_data_dir(rq) == READ;
+ int rw = rq_data_dir(rq);
if (!iov || iov_count <= 0)
return -EINVAL;
if (!map_data)
- bio = bio_map_user_iov(q, NULL, iov, iov_count, read, gfp_mask);
+ bio = bio_map_user_iov(q, NULL, iov, iov_count, rw, gfp_mask);
if (bio == ERR_PTR(-EINVAL))
- bio = bio_copy_user_iov(q, map_data, iov, iov_count, read,
+ bio = bio_copy_user_iov(q, map_data, iov, iov_count, rw,
gfp_mask);
if (IS_ERR(bio))
return PTR_ERR(bio);
@@ -177,7 +177,7 @@ EXPORT_SYMBOL(blk_rq_unmap_user);
int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
unsigned int len, gfp_t gfp_mask)
{
- int reading = rq_data_dir(rq) == READ;
+ int rw = rq_data_dir(rq);
int do_copy = 0;
struct bio *bio;
@@ -188,7 +188,7 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
do_copy = !blk_rq_aligned(q, kbuf, len) || object_is_on_stack(kbuf);
if (do_copy)
- bio = bio_copy_kern(q, kbuf, len, gfp_mask, reading);
+ bio = bio_copy_kern(q, kbuf, len, gfp_mask, rw);
else
bio = bio_map_kern(q, kbuf, len, gfp_mask);
diff --git a/fs/bio.c b/fs/bio.c
index 80f61ed..70e5153 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -780,7 +780,7 @@ int bio_uncopy_user(struct bio *bio)
* @map_data: pointer to the rq_map_data holding pages (if necessary)
* @iov: the iovec.
* @iov_count: number of elements in the iovec
- * @write_to_vm: bool indicating writing to pages or not
+ * @rw: READ or WRITE
* @gfp_mask: memory allocation flags
*
* Prepares and returns a bio for indirect user io, bouncing data
@@ -789,8 +789,8 @@ int bio_uncopy_user(struct bio *bio)
*/
struct bio *bio_copy_user_iov(struct request_queue *q,
struct rq_map_data *map_data,
- struct sg_iovec *iov, int iov_count,
- int write_to_vm, gfp_t gfp_mask)
+ struct sg_iovec *iov, int iov_count, int rw,
+ gfp_t gfp_mask)
{
struct bio_map_data *bmd;
struct bio_vec *bvec;
@@ -823,7 +823,8 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
if (!bio)
goto out_bmd;
- bio->bi_rw |= (!write_to_vm << BIO_RW);
+ if (rw == WRITE)
+ bio->bi_rw |= 1 << BIO_RW;
ret = 0;
@@ -872,7 +873,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
*/
if (unlikely(map_data && map_data->null_mapped))
bio->bi_flags |= (1 << BIO_NULL_MAPPED);
- else if (!write_to_vm) {
+ else if (rw == WRITE) {
ret = __bio_copy_iov(bio, bio->bi_io_vec, iov, iov_count, 0, 0);
if (ret)
goto cleanup;
@@ -897,7 +898,7 @@ out_bmd:
* @map_data: pointer to the rq_map_data holding pages (if necessary)
* @uaddr: start of user address
* @len: length in bytes
- * @write_to_vm: bool indicating writing to pages or not
+ * @rw: READ or WRITE
* @gfp_mask: memory allocation flags
*
* Prepares and returns a bio for indirect user io, bouncing data
@@ -905,21 +906,21 @@ out_bmd:
* call bio_uncopy_user() on io completion.
*/
struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
- unsigned long uaddr, unsigned int len,
- int write_to_vm, gfp_t gfp_mask)
+ unsigned long uaddr, unsigned int len, int rw,
+ gfp_t gfp_mask)
{
struct sg_iovec iov;
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
- return bio_copy_user_iov(q, map_data, &iov, 1, write_to_vm, gfp_mask);
+ return bio_copy_user_iov(q, map_data, &iov, 1, rw, gfp_mask);
}
static struct bio *__bio_map_user_iov(struct request_queue *q,
struct block_device *bdev,
struct sg_iovec *iov, int iov_count,
- int write_to_vm, gfp_t gfp_mask)
+ int rw, gfp_t gfp_mask)
{
int i, j;
size_t tot_len = 0;
@@ -967,8 +968,8 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
const int local_nr_pages = end - start;
const int page_limit = cur_page + local_nr_pages;
- ret = get_user_pages_fast(uaddr, local_nr_pages,
- write_to_vm, &pages[cur_page]);
+ ret = get_user_pages_fast(uaddr, local_nr_pages, rw == READ,
+ &pages[cur_page]);
if (ret < local_nr_pages) {
ret = -EFAULT;
goto out_unmap;
@@ -1008,7 +1009,7 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
/*
* set data direction, and check if mapped pages need bouncing
*/
- if (!write_to_vm)
+ if (rw == WRITE)
bio->bi_rw |= (1 << BIO_RW);
bio->bi_bdev = bdev;
@@ -1033,14 +1034,14 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
* @bdev: destination block device
* @uaddr: start of user address
* @len: length in bytes
- * @write_to_vm: bool indicating writing to pages or not
+ * @rw: READ or WRITE
* @gfp_mask: memory allocation flags
*
* Map the user space address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
- unsigned long uaddr, unsigned int len, int write_to_vm,
+ unsigned long uaddr, unsigned int len, int rw,
gfp_t gfp_mask)
{
struct sg_iovec iov;
@@ -1048,7 +1049,7 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
- return bio_map_user_iov(q, bdev, &iov, 1, write_to_vm, gfp_mask);
+ return bio_map_user_iov(q, bdev, &iov, 1, rw, gfp_mask);
}
/**
@@ -1057,20 +1058,19 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
* @bdev: destination block device
* @iov: the iovec.
* @iov_count: number of elements in the iovec
- * @write_to_vm: bool indicating writing to pages or not
+ * @rw: READ or WRITE
* @gfp_mask: memory allocation flags
*
* Map the user space address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
- struct sg_iovec *iov, int iov_count,
- int write_to_vm, gfp_t gfp_mask)
+ struct sg_iovec *iov, int iov_count, int rw,
+ gfp_t gfp_mask)
{
struct bio *bio;
- bio = __bio_map_user_iov(q, bdev, iov, iov_count, write_to_vm,
- gfp_mask);
+ bio = __bio_map_user_iov(q, bdev, iov, iov_count, rw, gfp_mask);
if (IS_ERR(bio))
return bio;
@@ -1219,23 +1219,23 @@ static void bio_copy_kern_endio(struct bio *bio, int err)
* @data: pointer to buffer to copy
* @len: length in bytes
* @gfp_mask: allocation flags for bio and page allocation
- * @reading: data direction is READ
+ * @rw: READ or WRITE
*
* copy the kernel address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp_mask, int reading)
+ gfp_t gfp_mask, int rw)
{
struct bio *bio;
struct bio_vec *bvec;
int i;
- bio = bio_copy_user(q, NULL, (unsigned long)data, len, 1, gfp_mask);
+ bio = bio_copy_user(q, NULL, (unsigned long)data, len, READ, gfp_mask);
if (IS_ERR(bio))
return bio;
- if (!reading) {
+ if (rw == WRITE) {
void *p = data;
bio_for_each_segment(bvec, bio, i) {
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 4bf7442..45f56d2 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -387,24 +387,24 @@ int bio_get_nr_vecs(struct block_device *bdev);
sector_t bio_sector_offset(struct bio *bio, unsigned short index,
unsigned int offset);
struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
- unsigned long uaddr, unsigned int len,
- int write_to_vm, gfp_t gfp_mask);
+ unsigned long uaddr, unsigned int len, int rw,
+ gfp_t gfp_mask);
struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
- struct sg_iovec *iov, int iov_count,
- int write_to_vm, gfp_t gfp_mask);
+ struct sg_iovec *iov, int iov_count, int rw,
+ gfp_t gfp_mask);
void bio_unmap_user(struct bio *bio);
struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
- unsigned long uaddr, unsigned int len,
- int write_to_vm, gfp_t gfp_mask);
+ unsigned long uaddr, unsigned int len, int rw,
+ gfp_t gfp_mask);
struct bio *bio_copy_user_iov(struct request_queue *q,
struct rq_map_data *map_data,
- struct sg_iovec *iov, int iov_count,
- int write_to_vm, gfp_t gfp_mask);
+ struct sg_iovec *iov, int iov_count, int rw,
+ gfp_t gfp_mask);
int bio_uncopy_user(struct bio *bio);
struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
gfp_t gfp_mask);
struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp_mask, int reading);
+ gfp_t gfp_mask, int rw);
void bio_set_pages_dirty(struct bio *bio);
void bio_check_pages_dirty(struct bio *bio);
void zero_fill_bio(struct bio *bio);
--
1.6.0.2
Impact: better atomic mapping handling
sg-miter supported atomic mapping using single flag - SG_MITER_ATOMIC.
It implicly used KM_BIO_SRC_IRQ and required irq to be disabled for
each iteration to protect the kernel mapping. This was too limiting
and didn't allow multiple iterators to be used concurrently (e.g. for
sgl -> sgl copying).
This patch adds a new interface sg_miter_start_atomic() which takes
km_type argument and drops @flags from sg_miter_start() so that
km_type can be specified by the caller for atomic iterators.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Pierre Ossman <[email protected]>
---
drivers/mmc/host/sdhci.c | 4 +-
include/linux/scatterlist.h | 11 +++++++--
lib/scatterlist.c | 49 ++++++++++++++++++++++++++++++++----------
3 files changed, 47 insertions(+), 17 deletions(-)
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index accb592..559aca7 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -704,8 +704,8 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_data *data)
}
if (!(host->flags & SDHCI_REQ_USE_DMA)) {
- sg_miter_start(&host->sg_miter,
- data->sg, data->sg_len, SG_MITER_ATOMIC);
+ sg_miter_start_atomic(&host->sg_miter,
+ data->sg, data->sg_len, KM_BIO_SRC_IRQ);
host->blocks = data->blocks;
}
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index e599698..2249caa 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -6,6 +6,7 @@
#include <linux/mm.h>
#include <linux/string.h>
#include <asm/io.h>
+#include <asm/kmap_types.h>
struct sg_table {
struct scatterlist *sgl; /* the list */
@@ -254,11 +255,15 @@ struct sg_mapping_iter {
struct scatterlist *__sg; /* current entry */
unsigned int __nents; /* nr of remaining entries */
unsigned int __offset; /* offset within sg */
- unsigned int __flags;
+ unsigned int __flags; /* internal flags */
+ enum km_type __km_type; /* atomic kmap type to use */
};
-void sg_miter_start(struct sg_mapping_iter *miter, struct scatterlist *sgl,
- unsigned int nents, unsigned int flags);
+void sg_miter_start(struct sg_mapping_iter *miter,
+ struct scatterlist *sgl, unsigned int nents);
+void sg_miter_start_atomic(struct sg_mapping_iter *miter,
+ struct scatterlist *sgl, unsigned int nents,
+ enum km_type km_type);
bool sg_miter_next(struct sg_mapping_iter *miter);
void sg_miter_stop(struct sg_mapping_iter *miter);
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index a295e40..4158a7c 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -306,19 +306,41 @@ EXPORT_SYMBOL(sg_alloc_table);
* Context:
* Don't care.
*/
-void sg_miter_start(struct sg_mapping_iter *miter, struct scatterlist *sgl,
- unsigned int nents, unsigned int flags)
+void sg_miter_start(struct sg_mapping_iter *miter,
+ struct scatterlist *sgl, unsigned int nents)
{
memset(miter, 0, sizeof(struct sg_mapping_iter));
miter->__sg = sgl;
miter->__nents = nents;
miter->__offset = 0;
- miter->__flags = flags;
}
EXPORT_SYMBOL(sg_miter_start);
/**
+ * sg_miter_start_atomic - start atomic mapping iteration over a sg list
+ * @miter: sg mapping iter to be started
+ * @sgl: sg list to iterate over
+ * @nents: number of sg entries
+ * @idx: kmap type to use for atomic mapping
+ *
+ * Description:
+ * Starts atomic mapping iterator @miter.
+ *
+ * Context:
+ * Don't care.
+ */
+void sg_miter_start_atomic(struct sg_mapping_iter *miter,
+ struct scatterlist *sgl, unsigned int nents,
+ enum km_type km_type)
+{
+ sg_miter_start(miter, sgl, nents);
+ miter->__flags |= SG_MITER_ATOMIC;
+ miter->__km_type = km_type;
+}
+EXPORT_SYMBOL(sg_miter_start_atomic);
+
+/**
* sg_miter_next - proceed mapping iterator to the next mapping
* @miter: sg mapping iter to proceed
*
@@ -329,8 +351,11 @@ EXPORT_SYMBOL(sg_miter_start);
* current mapping.
*
* Context:
- * IRQ disabled if SG_MITER_ATOMIC. IRQ must stay disabled till
- * @miter@ is stopped. May sleep if !SG_MITER_ATOMIC.
+ * Atomic for atomic miters. Atomic state must be maintained till
+ * @miter@ is stopped. Note that the selected KM type limits which
+ * atomic (preempt, softirq or hardirq) contexts are allowed. The
+ * rules are identical to those of kmap_atomic(). May sleep for
+ * non-atomic miters.
*
* Returns:
* true if @miter contains the next mapping. false if end of sg
@@ -365,7 +390,7 @@ bool sg_miter_next(struct sg_mapping_iter *miter)
miter->consumed = miter->length;
if (miter->__flags & SG_MITER_ATOMIC)
- miter->addr = kmap_atomic(miter->page, KM_BIO_SRC_IRQ) + off;
+ miter->addr = kmap_atomic(miter->page, miter->__km_type) + off;
else
miter->addr = kmap(miter->page) + off;
@@ -384,7 +409,7 @@ EXPORT_SYMBOL(sg_miter_next);
* resources (kmap) need to be released during iteration.
*
* Context:
- * IRQ disabled if the SG_MITER_ATOMIC is set. Don't care otherwise.
+ * Atomic if the SG_MITER_ATOMIC is set. Don't care otherwise.
*/
void sg_miter_stop(struct sg_mapping_iter *miter)
{
@@ -394,10 +419,9 @@ void sg_miter_stop(struct sg_mapping_iter *miter)
if (miter->addr) {
miter->__offset += miter->consumed;
- if (miter->__flags & SG_MITER_ATOMIC) {
- WARN_ON(!irqs_disabled());
- kunmap_atomic(miter->addr, KM_BIO_SRC_IRQ);
- } else
+ if (miter->__flags & SG_MITER_ATOMIC)
+ kunmap_atomic(miter->addr, miter->__km_type);
+ else
kunmap(miter->page);
miter->page = NULL;
@@ -427,8 +451,9 @@ static size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents,
struct sg_mapping_iter miter;
unsigned long flags;
- sg_miter_start(&miter, sgl, nents, SG_MITER_ATOMIC);
+ sg_miter_start_atomic(&miter, sgl, nents, KM_BIO_SRC_IRQ);
+ /* dunno the context we're being called from, plug IRQ */
local_irq_save(flags);
while (sg_miter_next(&miter) && offset < buflen) {
--
1.6.0.2
Impact: more modular implementation
Break down bio_copy_user_iov() into the following steps.
1. bci and page allocation
2. copying data if WRITE
3. create bio accordingly
bci is now responsible for managing any copy related resources. Given
source iov, bci_create() allocates bci and fills it with enough pages
to cover the source iov. The allocated pages are described with a
sgl.
Note that new allocator always rounds up rq_map_data->offset to page
boundary to simplify implementation and guarantee enough DMA padding
area at the end. As the only user, scsi sg, always passes in zero
offset, this doesn't cause any actual behavior difference. Also,
nth_page() is used to walk to the next page rather than directly
adding to struct page *.
Copying back and forth is done using bio_memcpy_sgl_uiov() which is
implemented using sg mapping iterator and iov iterator.
The last step is done using bio_create_from_sgl().
This patch by itself adds one more level of indirection via sgl and
more code but components factored out here will be used for future
code refactoring.
Signed-off-by: Tejun Heo <[email protected]>
---
fs/bio.c | 465 +++++++++++++++++++++++++++++++++++++++-----------------------
1 files changed, 293 insertions(+), 172 deletions(-)
diff --git a/fs/bio.c b/fs/bio.c
index 1cd97e3..f13aef0 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -26,6 +26,7 @@
#include <linux/mempool.h>
#include <linux/workqueue.h>
#include <linux/blktrace_api.h>
+#include <linux/scatterlist.h>
#include <linux/pfn.h>
#include <trace/block.h>
@@ -653,102 +654,303 @@ int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
return __bio_add_page(q, bio, page, len, offset, q->max_sectors);
}
-struct bio_copy_info {
- struct bio_vec *iovecs;
- struct iovec *src_iov;
- int src_count;
- int is_our_pages;
-};
+/**
+ * bio_sgl_free_pages - free pages from sgl
+ * @sgl: target sgl
+ * @nents: nents for @sgl
+ *
+ * Free all pages pointed to by @sgl. Note that this function
+ * assumes that each sg entry is pointing to single page.
+ */
+static void bio_sgl_free_pages(struct scatterlist *sgl, int nents)
+{
+ struct scatterlist *sg;
+ int i;
-static void bci_set(struct bio_copy_info *bci, struct bio *bio,
- struct iovec *iov, int count, int is_our_pages)
+ for_each_sg(sgl, sg, nents, i)
+ __free_page(sg_page(sg));
+}
+
+/**
+ * bio_alloc_sgl_with_pages - allocate sgl and populate with pages
+ * @len: target length
+ * @gfp: gfp for data structure allocations
+ * @page_gfp: gfp for page allocations
+ * @md: optional preallocated page pool
+ * @nentsp: out parameter for nents of the result sgl
+ *
+ * Alloate sgl to cover @len bytes and populate it with pages.
+ * The area the last sg points to is guaranteed to have enough
+ * room to page align @len. This is used by blk_rq_map_sg() to
+ * honor q->dma_pad_mask. Each entry in the resulting sgl always
+ * points to upto a single page.
+ *
+ * RETURNS:
+ * Allocated sgl on success, NULL on failure.
+ */
+static struct scatterlist *bio_alloc_sgl_with_pages(size_t len,
+ gfp_t gfp, gfp_t page_gfp,
+ struct rq_map_data *md,
+ int *nentsp)
{
- memcpy(bci->iovecs, bio->bi_io_vec, sizeof(struct bio_vec) * bio->bi_vcnt);
- memcpy(bci->src_iov, iov, sizeof(struct iovec) * count);
- bci->src_count = count;
- bci->is_our_pages = is_our_pages;
- bio->bi_private = bci;
+ int nr_pages = PFN_UP(len);
+ int nents = 0;
+ struct scatterlist *sgl;
+ unsigned int md_pages;
+ unsigned int md_pgidx;
+
+ sgl = kmalloc(sizeof(sgl[0]) * nr_pages, gfp);
+ if (!sgl)
+ return ERR_PTR(-ENOMEM);
+ sg_init_table(sgl, nr_pages);
+
+ if (md) {
+ /*
+ * The area might need to be extended upto the next
+ * PAGE_SIZE alignment for DMA padding. Always start
+ * page aligned.
+ */
+ md_pgidx = PFN_UP(md->offset);
+ md_pages = 1 << md->page_order;
+ }
+ while (len) {
+ unsigned int bytes = min(PAGE_SIZE, len);
+ struct page *page = NULL;
+
+ if (md) {
+ if (md_pgidx < md->nr_entries * md_pages) {
+ page = md->pages[md_pgidx / md_pages];
+ page = nth_page(page, md_pgidx % md_pages);
+ md_pgidx++;
+ }
+ } else
+ page = alloc_page(page_gfp);
+
+ if (!page)
+ goto err;
+
+ sg_set_page(&sgl[nents++], page, bytes, 0);
+
+ len -= bytes;
+ }
+ BUG_ON(nents != nr_pages);
+
+ *nentsp = nents;
+ return sgl;
+
+err:
+ if (!md)
+ bio_sgl_free_pages(sgl, nents);
+ kfree(sgl);
+ return NULL;
}
-static void bci_free(struct bio_copy_info *bci)
+/*
+ * bio_copy_info
+ *
+ * A bci carries information about copy-mapped bio so that data can be
+ * copied back to the original bio for READ and resources can be
+ * released after completion.
+ *
+ * bci_create() allocates all resources necessary for copying and
+ * bci_destroy() does the opposite.
+ */
+struct bio_copy_info {
+ struct scatterlist *copy_sgl; /* one sg per page */
+ struct iovec *src_iov; /* source iovec from userland */
+ int copy_nents; /* #entries in copy_sgl */
+ int src_count; /* #entries in src_iov */
+ size_t len; /* total length in bytes */
+ int is_our_pages; /* do we own copied pages? */
+};
+
+/**
+ * bci_destroy - destroy a bci
+ * @bci: bci to destroy
+ *
+ * Release all resources @bci is holding and free it.
+ */
+static void bci_destroy(struct bio_copy_info *bci)
{
- kfree(bci->iovecs);
+ if (bci->is_our_pages && bci->copy_sgl)
+ bio_sgl_free_pages(bci->copy_sgl, bci->copy_nents);
+ kfree(bci->copy_sgl);
kfree(bci->src_iov);
kfree(bci);
}
-static struct bio_copy_info *bci_alloc(int nr_segs, int count, gfp_t gfp)
+/**
+ * bci_create - create a bci
+ * @src_iov: source iovec
+ * @src_count: number of entries in @src_iov
+ * @gfp: gfp for data structure allocations
+ * @page_gfp: gfp for page allocations
+ * @md: optional preallocated page pool
+ *
+ * Create a bci. A bci is allocated and populated with pages so
+ * that the src area can be copied. The src pair is copied for
+ * later reference.
+ *
+ * RETURNS:
+ * Pointer to the new bci on success, NULL on failure.
+ */
+static struct bio_copy_info *bci_create(struct iovec *src_iov, int src_count,
+ gfp_t gfp, gfp_t page_gfp,
+ struct rq_map_data *md)
{
- struct bio_copy_info *bci = kmalloc(sizeof(*bci), gfp);
+ struct bio_copy_info *bci;
+ bci = kzalloc(sizeof(*bci), gfp);
if (!bci)
return NULL;
- bci->iovecs = kmalloc(sizeof(struct bio_vec) * nr_segs, gfp);
- if (!bci->iovecs) {
- kfree(bci);
- return NULL;
- }
+ bci->src_count = src_count;
+ bci->len = iov_length(src_iov, src_count);
+ bci->is_our_pages = md == NULL;
- bci->src_iov = kmalloc(sizeof(struct iovec) * count, gfp);
- if (bci->src_count)
- return bci;
+ bci->copy_sgl = bio_alloc_sgl_with_pages(bci->len, gfp, page_gfp, md,
+ &bci->copy_nents);
+ if (!bci->copy_sgl)
+ goto err;
- kfree(bci->iovecs);
- kfree(bci);
+ bci->src_iov = kmalloc(sizeof(src_iov[0]) * src_count, gfp);
+ if (!bci->src_iov)
+ goto err;
+ memcpy(bci->src_iov, src_iov, sizeof(src_iov[0]) * src_count);
+
+ return bci;
+
+err:
+ bci_destroy(bci);
return NULL;
}
-static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
- struct iovec *iov, int count, int uncopy,
- int do_free_page)
+/**
+ * bio_memcpy_sgl_uiov - copy data between sgl and user iov
+ * @sgl: sgl to copy to/from
+ * @nents: number of entries in @sgl
+ * @iov: user iov to copy from/to
+ * @count: number of entries in @iov
+ * @to_sgl: copy direction, true for @iov->@sgl, false for @sgl->@iov
+ *
+ * Copy data between kernel area described by @sgl/@nents and
+ * user area described by @iov/@count. Copy direction is
+ * determined by @to_sgl. The areas must be of the same size.
+ *
+ * RETURNS:
+ * 0 on success, -EFAULT on user area access error.
+ */
+static int bio_memcpy_sgl_uiov(struct scatterlist *sgl, int nents,
+ struct iovec *iov, int count, bool to_sgl)
{
- int ret = 0, i;
- struct bio_vec *bvec;
- int iov_idx = 0;
- unsigned int iov_off = 0;
- int read = bio_data_dir(bio) == READ;
+ struct sg_mapping_iter si;
+ struct iov_iter ii;
+ int ret = 0;
- __bio_for_each_segment(bvec, bio, i, 0) {
- char *bv_addr = page_address(bvec->bv_page);
- unsigned int bv_len = iovecs[i].bv_len;
-
- while (bv_len && iov_idx < count) {
- unsigned int bytes;
- char *iov_addr;
-
- bytes = min_t(unsigned int,
- iov[iov_idx].iov_len - iov_off, bv_len);
- iov_addr = iov[iov_idx].iov_base + iov_off;
-
- if (!ret) {
- if (!read && !uncopy)
- ret = copy_from_user(bv_addr, iov_addr,
- bytes);
- if (read && uncopy)
- ret = copy_to_user(iov_addr, bv_addr,
- bytes);
-
- if (ret)
- ret = -EFAULT;
- }
+ sg_miter_start(&si, sgl, nents);
+ iov_iter_init(&ii, iov, count, iov_length(iov, count), 0);
+
+ while (sg_miter_next(&si)) {
+ void *saddr = si.addr;
+ size_t slen = si.length;
+
+ while (slen && iov_iter_count(&ii)) {
+ void __user *iaddr = ii.iov->iov_base + ii.iov_offset;
+ size_t copy = min(slen, iov_iter_single_seg_count(&ii));
+ size_t left;
- bv_len -= bytes;
- bv_addr += bytes;
- iov_addr += bytes;
- iov_off += bytes;
+ if (to_sgl)
+ left = copy_from_user(saddr, iaddr, copy);
+ else
+ left = copy_to_user(iaddr, saddr, copy);
- if (iov[iov_idx].iov_len == iov_off) {
- iov_idx++;
- iov_off = 0;
+ if (unlikely(left)) {
+ ret = -EFAULT;
+ goto done;
}
+
+ saddr += copy;
+ slen -= copy;
+ iov_iter_advance(&ii, copy);
}
+ WARN_ON_ONCE(slen); /* iov too short */
+ }
+ WARN_ON_ONCE(iov_iter_count(&ii)); /* bio too short */
+done:
+ sg_miter_stop(&si);
+ return ret;
+}
+
+/**
+ * bio_init_from_sgl - initialize bio from sgl
+ * @bio: bio to initialize
+ * @q: request_queue new bio belongs to
+ * @sgl: sgl describing the data area
+ * @nents: number of entries in @sgl
+ * @nr_pages: number of pages in @sgl
+ * @rw: the data direction of new bio
+ *
+ * Populate @bio with the data area described by @sgl. Note that
+ * the resulting bio might not contain the whole @sgl area. This
+ * can be checked by testing bio->bi_size against total area
+ * length.
+ */
+static void bio_init_from_sgl(struct bio *bio, struct request_queue *q,
+ struct scatterlist *sgl, int nents,
+ int nr_pages, int rw)
+{
+ struct scatterlist *sg;
+ int i;
+
+ if (rw == WRITE)
+ bio->bi_rw |= 1 << BIO_RW;
+
+ for_each_sg(sgl, sg, nents, i) {
+ struct page *page = sg_page(sg);
+ size_t offset = sg->offset;
+ size_t len = sg->length;
- if (do_free_page)
- __free_page(bvec->bv_page);
+ while (len) {
+ size_t bytes = min_t(size_t, len, PAGE_SIZE - offset);
+
+ if (unlikely(bio_add_pc_page(q, bio, page,
+ bytes, offset) < bytes))
+ break;
+
+ offset = 0;
+ len -= bytes;
+ page = nth_page(page, 1);
+ }
}
+}
- return ret;
+/**
+ * bio_create_from_sgl - create bio from sgl
+ * @q: request_queue new bio belongs to
+ * @sgl: sgl describing the data area
+ * @nents: number of entries in @sgl
+ * @nr_pages: number of pages in @sgl
+ * @rw: the data direction of new bio
+ * @gfp: gfp for data structure allocations
+ *
+ * Allocate a bio and populate it with using bio_init_from_sgl().
+ *
+ * RETURNS:
+ * Pointer to the new bio on success, ERR_PTR(-errno) on error.
+ */
+static struct bio *bio_create_from_sgl(struct request_queue *q,
+ struct scatterlist *sgl, int nents,
+ int nr_pages, int rw, int gfp)
+{
+ struct bio *bio;
+
+ bio = bio_kmalloc(gfp, nr_pages);
+ if (!bio)
+ return ERR_PTR(-ENOMEM);
+
+ bio_init_from_sgl(bio, q, sgl, nents, nr_pages, rw);
+
+ return bio;
}
/**
@@ -763,10 +965,10 @@ int bio_uncopy_user(struct bio *bio)
struct bio_copy_info *bci = bio->bi_private;
int ret = 0;
- if (!bio_flagged(bio, BIO_NULL_MAPPED))
- ret = __bio_copy_iov(bio, bci->iovecs, bci->src_iov,
- bci->src_count, 1, bci->is_our_pages);
- bci_free(bci);
+ if (!bio_flagged(bio, BIO_NULL_MAPPED) && bio_data_dir(bio) == READ)
+ ret = bio_memcpy_sgl_uiov(bci->copy_sgl, bci->copy_nents,
+ bci->src_iov, bci->src_count, false);
+ bci_destroy(bci);
bio_put(bio);
return ret;
}
@@ -788,102 +990,32 @@ struct bio *bio_copy_user_iov(struct request_queue *q, struct rq_map_data *md,
struct iovec *iov, int count, int rw, gfp_t gfp)
{
struct bio_copy_info *bci;
- struct bio_vec *bvec;
- struct page *page;
struct bio *bio;
- int i, ret;
- int nr_pages = 0;
- unsigned int len = 0;
- unsigned int offset = md ? md->offset & ~PAGE_MASK : 0;
+ int ret;
- for (i = 0; i < count; i++) {
- unsigned long uaddr;
- unsigned long end;
- unsigned long start;
-
- uaddr = (unsigned long)iov[i].iov_base;
- end = (uaddr + iov[i].iov_len + PAGE_SIZE - 1) >> PAGE_SHIFT;
- start = uaddr >> PAGE_SHIFT;
-
- nr_pages += end - start;
- len += iov[i].iov_len;
- }
-
- bci = bci_alloc(nr_pages, count, gfp);
+ bci = bci_create(iov, count, gfp, q->bounce_gfp | gfp, md);
if (!bci)
return ERR_PTR(-ENOMEM);
- ret = -ENOMEM;
- bio = bio_kmalloc(gfp, nr_pages);
- if (!bio)
- goto out_bci;
-
- if (rw == WRITE)
- bio->bi_rw |= 1 << BIO_RW;
-
- ret = 0;
-
- if (md) {
- nr_pages = 1 << md->page_order;
- i = md->offset / PAGE_SIZE;
- }
- while (len) {
- unsigned int bytes = PAGE_SIZE;
-
- bytes -= offset;
-
- if (bytes > len)
- bytes = len;
-
- if (md) {
- if (i == md->nr_entries * nr_pages) {
- ret = -ENOMEM;
- break;
- }
-
- page = md->pages[i / nr_pages];
- page += (i % nr_pages);
-
- i++;
- } else {
- page = alloc_page(q->bounce_gfp | gfp);
- if (!page) {
- ret = -ENOMEM;
- break;
- }
- }
-
- if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes)
- break;
-
- len -= bytes;
- offset = 0;
+ if (rw == WRITE && (!md || !md->null_mapped)) {
+ ret = bio_memcpy_sgl_uiov(bci->copy_sgl, bci->copy_nents,
+ iov, count, true);
+ if (ret)
+ goto err_bci;
}
- if (ret)
- goto cleanup;
-
- /*
- * success
- */
- if (unlikely(md && md->null_mapped))
- bio->bi_flags |= (1 << BIO_NULL_MAPPED);
- else if (rw == WRITE) {
- ret = __bio_copy_iov(bio, bio->bi_io_vec, iov, count, 0, 0);
- if (ret)
- goto cleanup;
+ bio = bio_create_from_sgl(q, bci->copy_sgl, bci->copy_nents,
+ bci->copy_nents, rw, gfp);
+ if (IS_ERR(bio)) {
+ ret = PTR_ERR(bio);
+ goto err_bci;
}
- bci_set(bci, bio, iov, count, md ? 0 : 1);
+ bio->bi_private = bci;
return bio;
-cleanup:
- if (!md)
- bio_for_each_segment(bvec, bio, i)
- __free_page(bvec->bv_page);
- bio_put(bio);
-out_bci:
- bci_free(bci);
+err_bci:
+ bci_destroy(bci);
return ERR_PTR(ret);
}
@@ -1186,24 +1318,13 @@ struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
static void bio_copy_kern_endio(struct bio *bio, int err)
{
- struct bio_vec *bvec;
- const int read = bio_data_dir(bio) == READ;
struct bio_copy_info *bci = bio->bi_private;
- int i;
- char *p = bci->src_iov[0].iov_base;
- __bio_for_each_segment(bvec, bio, i, 0) {
- char *addr = page_address(bvec->bv_page);
- int len = bci->iovecs[i].bv_len;
-
- if (read && !err)
- memcpy(p, addr, len);
-
- __free_page(bvec->bv_page);
- p += len;
- }
+ if (bio_data_dir(bio) == READ)
+ bio_memcpy_sgl_uiov(bci->copy_sgl, bci->copy_nents,
+ bci->src_iov, bci->src_count, false);
- bci_free(bci);
+ bci_destroy(bci);
bio_put(bio);
}
--
1.6.0.2
Impact: add multi segment support to kernel bio mapping
Implement bio_{map|copy}_kern_sgl() and implement
bio_{map|copy}_kern() in terms of them. As all map/copy helpers
support sgl, it's quite simple. The sgl versions will be used to
extend blk_rq_map_kern*() interface.
Signed-off-by: Tejun Heo <[email protected]>
---
fs/bio.c | 254 +++++++++++++++++++++++++++++++-------------------
include/linux/bio.h | 5 +
2 files changed, 162 insertions(+), 97 deletions(-)
diff --git a/fs/bio.c b/fs/bio.c
index 04bc5c2..9c921f9 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -26,7 +26,6 @@
#include <linux/mempool.h>
#include <linux/workqueue.h>
#include <linux/blktrace_api.h>
-#include <linux/scatterlist.h>
#include <linux/pfn.h>
#include <trace/block.h>
@@ -899,13 +898,55 @@ static int bio_memcpy_sgl_uiov(struct scatterlist *sgl, int nents,
}
WARN_ON_ONCE(slen); /* iov too short */
}
- WARN_ON_ONCE(iov_iter_count(&ii)); /* bio too short */
+ WARN_ON_ONCE(iov_iter_count(&ii)); /* sgl too short */
done:
sg_miter_stop(&si);
return ret;
}
/**
+ * bio_memcpy_sgl_sgl - copy data betweel two sgls
+ * @dsgl: destination sgl
+ * @dnents: number of entries in @dsgl
+ * @ssgl: source sgl
+ * @snents: number of entries in @ssgl
+ * @d_km_type; km_type to use for mapping @dsgl
+ * @s_km_type; km_type to use for mapping @ssgl
+ *
+ * Copy data from @ssgl to @dsgl. The areas should be of the
+ * same size.
+ */
+static void bio_memcpy_sgl_sgl(struct scatterlist *dsgl, int dnents,
+ struct scatterlist *ssgl, int snents,
+ enum km_type d_km_type, enum km_type s_km_type)
+{
+ struct sg_mapping_iter si, di;
+
+ sg_miter_start_atomic(&di, dsgl, dnents, d_km_type);
+ sg_miter_start_atomic(&si, ssgl, snents, s_km_type);
+
+ while (sg_miter_next(&di)) {
+ void *daddr = di.addr;
+ size_t dlen = di.length;
+
+ while (dlen && sg_miter_next(&si)) {
+ size_t copy = min(dlen, si.length);
+
+ memcpy(daddr, si.addr, copy);
+
+ daddr += copy;
+ dlen -= copy;
+ si.consumed = copy;
+ }
+ WARN_ON_ONCE(dlen); /* ssgl too short */
+ }
+ WARN_ON_ONCE(sg_miter_next(&si)); /* dsgl too short */
+
+ sg_miter_stop(&di);
+ sg_miter_stop(&si);
+}
+
+/**
* bio_init_from_sgl - initialize bio from sgl
* @bio: bio to initialize
* @q: request_queue new bio belongs to
@@ -949,48 +990,6 @@ static void bio_init_from_sgl(struct bio *bio, struct request_queue *q,
}
/**
- * bio_memcpy_sgl_sgl - copy data betweel two sgls
- * @dsgl: destination sgl
- * @dnents: number of entries in @dsgl
- * @ssgl: source sgl
- * @snents: number of entries in @ssgl
- *
- * Copy data from @ssgl to @dsgl. The areas should be of the
- * same size.
- */
-static void bio_memcpy_sgl_sgl(struct scatterlist *dsgl, int dnents,
- struct scatterlist *ssgl, int snents)
-{
- struct sg_mapping_iter si, di;
-
- /*
- * si will be nested inside di, use atomic mapping for it to
- * avoid (mostly theoretical) possibility of deadlock.
- */
- sg_miter_start(&di, dsgl, dnents, 0);
- sg_miter_start(&si, ssgl, snents, SG_MITER_ATOMIC);
-
- while (sg_miter_next(&di)) {
- void *daddr = di.addr;
- size_t dlen = di.length;
-
- while (dlen && sg_miter_next(&si)) {
- size_t copy = min(dlen, si.length);
-
- memcpy(daddr, si.addr, copy);
-
- daddr += copy;
- dlen -= copy;
- si.consumed = copy;
- }
- WARN_ON_ONCE(dlen); /* ssgl too short */
- sg_miter_stop(&si); /* advancing di might sleep, stop si */
- }
- WARN_ON_ONCE(sg_miter_next(&si)); /* dsgl too short */
- sg_miter_stop(&di);
-}
-
-/**
* bio_create_from_sgl - create bio from sgl
* @q: request_queue new bio belongs to
* @sgl: sgl describing the data area
@@ -1286,50 +1285,54 @@ static void bio_map_kern_endio(struct bio *bio, int err)
}
/**
- * bio_map_kern - map kernel address into bio
+ * bio_map_kern_sg - map kernel data into bio
* @q: the struct request_queue for the bio
- * @data: pointer to buffer to map
- * @len: length in bytes
+ * @sgl: the sglist
+ * @nents: number of elements in the sgl
+ * @rw: READ or WRITE
* @gfp: allocation flags for bio allocation
*
* Map the kernel address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
-struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp)
+struct bio *bio_map_kern_sg(struct request_queue *q, struct scatterlist *sgl,
+ int nents, int rw, gfp_t gfp)
{
- unsigned long kaddr = (unsigned long)data;
- unsigned long end = (kaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
- unsigned long start = kaddr >> PAGE_SHIFT;
- const int nr_pages = end - start;
- int offset, i;
+ size_t tot_len = 0;
+ int nr_pages = 0;
+ struct scatterlist *sg;
struct bio *bio;
+ int i;
- bio = bio_kmalloc(gfp, nr_pages);
- if (!bio)
- return ERR_PTR(-ENOMEM);
-
- offset = offset_in_page(kaddr);
- for (i = 0; i < nr_pages; i++) {
- unsigned int bytes = PAGE_SIZE - offset;
-
- if (len <= 0)
- break;
-
- if (bytes > len)
- bytes = len;
+ for_each_sg(sgl, sg, nents, i) {
+ void *page_addr = page_address(sg_page(sg));
+ unsigned int len = sg->length;
- if (bio_add_pc_page(q, bio, virt_to_page(data), bytes,
- offset) < bytes)
- break;
+ nr_pages += PFN_UP(sg->offset + len);
+ tot_len += len;
- data += bytes;
- len -= bytes;
- offset = 0;
+ /*
+ * Each segment must be aligned on DMA boundary and
+ * not on stack. The last one may have unaligned
+ * length as long as the total length is aligned to
+ * DMA padding alignment.
+ */
+ if (i == nents - 1)
+ len = 0;
+ if (((sg->offset | len) & queue_dma_alignment(q)) ||
+ (page_addr && object_is_on_stack(page_addr + sg->offset)))
+ return ERR_PTR(-EINVAL);
}
+ /* and total length on DMA padding alignment */
+ if (!nr_pages || tot_len & q->dma_pad_mask)
+ return ERR_PTR(-EINVAL);
+
+ bio = bio_create_from_sgl(q, sgl, nents, nr_pages, rw, gfp);
+ if (IS_ERR(bio))
+ return bio;
/* doesn't support partial mappings */
- if (unlikely(bio->bi_size != len)) {
+ if (bio->bi_size != tot_len) {
bio_put(bio);
return ERR_PTR(-EINVAL);
}
@@ -1338,54 +1341,111 @@ struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
return bio;
}
+/**
+ * bio_map_kern - map kernel address into bio
+ * @q: the struct request_queue for the bio
+ * @data: pointer to buffer to map
+ * @len: length in bytes
+ * @gfp: allocation flags for bio allocation
+ *
+ * Map the kernel address into a bio suitable for io to a block
+ * device. Returns an error pointer in case of error.
+ */
+struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
+ gfp_t gfp)
+{
+ struct scatterlist sg;
+
+ sg_init_one(&sg, data, len);
+
+ return bio_map_kern_sg(q, &sg, 1, READ, gfp);
+}
+
static void bio_copy_kern_endio(struct bio *bio, int err)
{
struct bio_copy_info *bci = bio->bi_private;
- if (bio_data_dir(bio) == READ)
- bio_memcpy_sgl_uiov(bci->copy_sgl, bci->copy_nents,
- bci->src_iov, bci->src_count, false);
+ if (bio_data_dir(bio) == READ) {
+ unsigned long flags;
+ local_irq_save(flags); /* to protect KMs */
+ bio_memcpy_sgl_sgl(bci->src_sgl, bci->src_nents,
+ bci->copy_sgl, bci->copy_nents,
+ KM_BIO_DST_IRQ, KM_BIO_SRC_IRQ);
+ local_irq_restore(flags);
+ }
bci_destroy(bci);
bio_put(bio);
}
/**
- * bio_copy_kern - copy kernel address into bio
+ * bio_copy_kern_sg - copy kernel data into bio
* @q: the struct request_queue for the bio
- * @data: pointer to buffer to copy
- * @len: length in bytes
- * @gfp: allocation flags for bio and page allocation
+ * @sgl: the sglist
+ * @nents: number of elements in the sgl
* @rw: READ or WRITE
+ * @gfp: allocation flags for bio and page allocation
*
* copy the kernel address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
-struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp, int rw)
+struct bio *bio_copy_kern_sg(struct request_queue *q, struct scatterlist *sgl,
+ int nents, int rw, gfp_t gfp)
{
+ struct bio_copy_info *bci;
struct bio *bio;
- struct bio_vec *bvec;
- int i;
-
- bio = bio_copy_user(q, NULL, (unsigned long)data, len, READ, gfp);
- if (IS_ERR(bio))
- return bio;
+ int ret;
- if (rw == WRITE) {
- void *p = data;
+ bci = bci_create(NULL, 0, sgl, nents, gfp, q->bounce_gfp | gfp, NULL);
+ if (!bci)
+ return ERR_PTR(-ENOMEM);
- bio_for_each_segment(bvec, bio, i) {
- char *addr = page_address(bvec->bv_page);
+ if (rw == WRITE)
+ bio_memcpy_sgl_sgl(bci->copy_sgl, bci->copy_nents, sgl, nents,
+ KM_USER1, KM_USER0);
- memcpy(addr, p, bvec->bv_len);
- p += bvec->bv_len;
- }
+ bio = bio_create_from_sgl(q, bci->copy_sgl, bci->copy_nents,
+ bci->copy_nents, rw, gfp);
+ if (IS_ERR(bio)) {
+ ret = PTR_ERR(bio);
+ goto err_bci;
}
- bio->bi_end_io = bio_copy_kern_endio;
+ /* doesn't support partial mappings */
+ ret= -EINVAL;
+ if (bio->bi_size != bci->len)
+ goto err_bio;
+ bio->bi_end_io = bio_copy_kern_endio;
+ bio->bi_private = bci;
return bio;
+
+err_bio:
+ bio_put(bio);
+err_bci:
+ bci_destroy(bci);
+ return ERR_PTR(ret);
+}
+
+/**
+ * bio_copy_kern - copy kernel address into bio
+ * @q: the struct request_queue for the bio
+ * @data: pointer to buffer to copy
+ * @len: length in bytes
+ * @gfp: allocation flags for bio and page allocation
+ * @rw: READ or WRITE
+ *
+ * copy the kernel address into a bio suitable for io to a block
+ * device. Returns an error pointer in case of error.
+ */
+struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
+ gfp_t gfp, int rw)
+{
+ struct scatterlist sg;
+
+ sg_init_one(&sg, data, len);
+
+ return bio_copy_kern_sg(q, &sg, 1, rw, gfp);
}
/*
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1c21e59..1c28c5c 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -24,6 +24,7 @@
#include <linux/mempool.h>
#include <linux/ioprio.h>
#include <linux/uio.h>
+#include <linux/scatterlist.h>
#ifdef CONFIG_BLOCK
@@ -400,8 +401,12 @@ struct bio *bio_copy_user_iov(struct request_queue *q, struct rq_map_data *md,
int bio_uncopy_user(struct bio *bio);
struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
gfp_t gfp);
+struct bio *bio_map_kern_sg(struct request_queue *q, struct scatterlist *sgl,
+ int nents, int rw, gfp_t gfp);
struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
gfp_t gfp, int rw);
+struct bio *bio_copy_kern_sg(struct request_queue *q, struct scatterlist *sgl,
+ int nents, int rw, gfp_t gfp_mask);
void bio_set_pages_dirty(struct bio *bio);
void bio_check_pages_dirty(struct bio *bio);
void zero_fill_bio(struct bio *bio);
--
1.6.0.2
Impact: cleanup
Collapse double underbar prefixed internal functions which have only
single user into their respective call sites. This prepares for
reimplementation.
Signed-off-by: Tejun Heo <[email protected]>
---
fs/bio.c | 129 +++++++++++++++++++++++--------------------------------------
1 files changed, 49 insertions(+), 80 deletions(-)
diff --git a/fs/bio.c b/fs/bio.c
index f13aef0..4540afc 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1044,10 +1044,20 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *md,
return bio_copy_user_iov(q, md, &iov, 1, rw, gfp);
}
-static struct bio *__bio_map_user_iov(struct request_queue *q,
- struct block_device *bdev,
- struct iovec *iov, int count, int rw,
- gfp_t gfp)
+/**
+ * bio_map_user_iov - map user iovec table into bio
+ * @q: the struct request_queue for the bio
+ * @bdev: destination block device
+ * @iov: the iovec.
+ * @count: number of elements in the iovec
+ * @rw: READ or WRITE
+ * @gfp: memory allocation flags
+ *
+ * Map the user space address into a bio suitable for io to a block
+ * device. Returns an error pointer in case of error.
+ */
+struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
+ struct iovec *iov, int count, int rw, gfp_t gfp)
{
int i, j;
size_t tot_len = 0;
@@ -1141,6 +1151,15 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
bio->bi_bdev = bdev;
bio->bi_flags |= (1 << BIO_USER_MAPPED);
+
+ /*
+ * subtle -- if __bio_map_user() ended up bouncing a bio,
+ * it would normally disappear when its bi_end_io is run.
+ * however, we need it for the unmap, so grab an extra
+ * reference to it
+ */
+ bio_get(bio);
+
return bio;
out_unmap:
@@ -1180,38 +1199,15 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
}
/**
- * bio_map_user_iov - map user iovec table into bio
- * @q: the struct request_queue for the bio
- * @bdev: destination block device
- * @iov: the iovec.
- * @count: number of elements in the iovec
- * @rw: READ or WRITE
- * @gfp: memory allocation flags
+ * bio_unmap_user - unmap a bio
+ * @bio: the bio being unmapped
*
- * Map the user space address into a bio suitable for io to a block
- * device. Returns an error pointer in case of error.
+ * Unmap a bio previously mapped by bio_map_user(). Must be called with
+ * a process context.
+ *
+ * bio_unmap_user() may sleep.
*/
-struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
- struct iovec *iov, int count, int rw, gfp_t gfp)
-{
- struct bio *bio;
-
- bio = __bio_map_user_iov(q, bdev, iov, count, rw, gfp);
- if (IS_ERR(bio))
- return bio;
-
- /*
- * subtle -- if __bio_map_user() ended up bouncing a bio,
- * it would normally disappear when its bi_end_io is run.
- * however, we need it for the unmap, so grab an extra
- * reference to it
- */
- bio_get(bio);
-
- return bio;
-}
-
-static void __bio_unmap_user(struct bio *bio)
+void bio_unmap_user(struct bio *bio)
{
struct bio_vec *bvec;
int i;
@@ -1226,21 +1222,8 @@ static void __bio_unmap_user(struct bio *bio)
page_cache_release(bvec->bv_page);
}
+ /* see comment on the success return path of bio_map_user_iov() */
bio_put(bio);
-}
-
-/**
- * bio_unmap_user - unmap a bio
- * @bio: the bio being unmapped
- *
- * Unmap a bio previously mapped by bio_map_user(). Must be called with
- * a process context.
- *
- * bio_unmap_user() may sleep.
- */
-void bio_unmap_user(struct bio *bio)
-{
- __bio_unmap_user(bio);
bio_put(bio);
}
@@ -1249,9 +1232,18 @@ static void bio_map_kern_endio(struct bio *bio, int err)
bio_put(bio);
}
-
-static struct bio *__bio_map_kern(struct request_queue *q, void *data,
- unsigned int len, gfp_t gfp)
+/**
+ * bio_map_kern - map kernel address into bio
+ * @q: the struct request_queue for the bio
+ * @data: pointer to buffer to map
+ * @len: length in bytes
+ * @gfp: allocation flags for bio allocation
+ *
+ * Map the kernel address into a bio suitable for io to a block
+ * device. Returns an error pointer in case of error.
+ */
+struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
+ gfp_t gfp)
{
unsigned long kaddr = (unsigned long)data;
unsigned long end = (kaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -1283,39 +1275,16 @@ static struct bio *__bio_map_kern(struct request_queue *q, void *data,
offset = 0;
}
+ /* doesn't support partial mappings */
+ if (unlikely(bio->bi_size != len)) {
+ bio_put(bio);
+ return ERR_PTR(-EINVAL);
+ }
+
bio->bi_end_io = bio_map_kern_endio;
return bio;
}
-/**
- * bio_map_kern - map kernel address into bio
- * @q: the struct request_queue for the bio
- * @data: pointer to buffer to map
- * @len: length in bytes
- * @gfp: allocation flags for bio allocation
- *
- * Map the kernel address into a bio suitable for io to a block
- * device. Returns an error pointer in case of error.
- */
-struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp)
-{
- struct bio *bio;
-
- bio = __bio_map_kern(q, data, len, gfp);
- if (IS_ERR(bio))
- return bio;
-
- if (bio->bi_size == len)
- return bio;
-
- /*
- * Don't support partial mappings.
- */
- bio_put(bio);
- return ERR_PTR(-EINVAL);
-}
-
static void bio_copy_kern_endio(struct bio *bio, int err)
{
struct bio_copy_info *bci = bio->bi_private;
--
1.6.0.2
Impact: add new features to internal functions to prepare for future changes
Expand bci such that it supports sgl as source and implement
bio_memcpy_sgl_sgl() which can copy data between two sgls. These will
be used to implement blk_rq_copy/map_kern_iov().
Signed-off-by: Tejun Heo <[email protected]>
---
fs/bio.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 74 insertions(+), 8 deletions(-)
diff --git a/fs/bio.c b/fs/bio.c
index 1ca8b16..04bc5c2 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -758,8 +758,10 @@ err:
struct bio_copy_info {
struct scatterlist *copy_sgl; /* one sg per page */
struct iovec *src_iov; /* source iovec from userland */
+ struct scatterlist *src_sgl; /* source sgl */
int copy_nents; /* #entries in copy_sgl */
int src_count; /* #entries in src_iov */
+ int src_nents; /* #entries in src_sgl */
size_t len; /* total length in bytes */
int is_our_pages; /* do we own copied pages? */
};
@@ -776,6 +778,7 @@ static void bci_destroy(struct bio_copy_info *bci)
bio_sgl_free_pages(bci->copy_sgl, bci->copy_nents);
kfree(bci->copy_sgl);
kfree(bci->src_iov);
+ kfree(bci->src_sgl);
kfree(bci);
}
@@ -783,6 +786,8 @@ static void bci_destroy(struct bio_copy_info *bci)
* bci_create - create a bci
* @src_iov: source iovec
* @src_count: number of entries in @src_iov
+ * @src_sgl: source sgl
+ * @src_nents: number of entries in @src_sgl
* @gfp: gfp for data structure allocations
* @page_gfp: gfp for page allocations
* @md: optional preallocated page pool
@@ -795,28 +800,47 @@ static void bci_destroy(struct bio_copy_info *bci)
* Pointer to the new bci on success, NULL on failure.
*/
static struct bio_copy_info *bci_create(struct iovec *src_iov, int src_count,
- gfp_t gfp, gfp_t page_gfp,
- struct rq_map_data *md)
+ struct scatterlist *src_sgl, int src_nents,
+ gfp_t gfp, gfp_t page_gfp,
+ struct rq_map_data *md)
{
struct bio_copy_info *bci;
+ struct scatterlist *sg;
+ int i;
+
+ BUG_ON(!src_iov == !src_sgl);
bci = kzalloc(sizeof(*bci), gfp);
if (!bci)
return NULL;
bci->src_count = src_count;
- bci->len = iov_length(src_iov, src_count);
+ bci->src_nents = src_nents;
bci->is_our_pages = md == NULL;
+ if (src_iov)
+ bci->len = iov_length(src_iov, src_count);
+ else
+ for_each_sg(src_sgl, sg, src_nents, i)
+ bci->len += sg->length;
+
bci->copy_sgl = bio_alloc_sgl_with_pages(bci->len, gfp, page_gfp, md,
&bci->copy_nents);
if (!bci->copy_sgl)
goto err;
- bci->src_iov = kmalloc(sizeof(src_iov[0]) * src_count, gfp);
- if (!bci->src_iov)
- goto err;
- memcpy(bci->src_iov, src_iov, sizeof(src_iov[0]) * src_count);
+ if (src_iov) {
+ bci->src_iov = kmalloc(sizeof(src_iov[0]) * src_count, gfp);
+ if (!bci->src_iov)
+ goto err;
+ memcpy(bci->src_iov, src_iov, sizeof(src_iov[0]) * src_count);
+ } else {
+ bci->src_sgl = kmalloc(sizeof(src_sgl[0]) * src_nents, gfp);
+ if (!bci->src_sgl)
+ goto err;
+ for_each_sg(src_sgl, sg, src_nents, i)
+ bci->src_sgl[i] = *sg;
+ }
return bci;
@@ -925,6 +949,48 @@ static void bio_init_from_sgl(struct bio *bio, struct request_queue *q,
}
/**
+ * bio_memcpy_sgl_sgl - copy data betweel two sgls
+ * @dsgl: destination sgl
+ * @dnents: number of entries in @dsgl
+ * @ssgl: source sgl
+ * @snents: number of entries in @ssgl
+ *
+ * Copy data from @ssgl to @dsgl. The areas should be of the
+ * same size.
+ */
+static void bio_memcpy_sgl_sgl(struct scatterlist *dsgl, int dnents,
+ struct scatterlist *ssgl, int snents)
+{
+ struct sg_mapping_iter si, di;
+
+ /*
+ * si will be nested inside di, use atomic mapping for it to
+ * avoid (mostly theoretical) possibility of deadlock.
+ */
+ sg_miter_start(&di, dsgl, dnents, 0);
+ sg_miter_start(&si, ssgl, snents, SG_MITER_ATOMIC);
+
+ while (sg_miter_next(&di)) {
+ void *daddr = di.addr;
+ size_t dlen = di.length;
+
+ while (dlen && sg_miter_next(&si)) {
+ size_t copy = min(dlen, si.length);
+
+ memcpy(daddr, si.addr, copy);
+
+ daddr += copy;
+ dlen -= copy;
+ si.consumed = copy;
+ }
+ WARN_ON_ONCE(dlen); /* ssgl too short */
+ sg_miter_stop(&si); /* advancing di might sleep, stop si */
+ }
+ WARN_ON_ONCE(sg_miter_next(&si)); /* dsgl too short */
+ sg_miter_stop(&di);
+}
+
+/**
* bio_create_from_sgl - create bio from sgl
* @q: request_queue new bio belongs to
* @sgl: sgl describing the data area
@@ -993,7 +1059,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q, struct rq_map_data *md,
struct bio *bio;
int ret;
- bci = bci_create(iov, count, gfp, q->bounce_gfp | gfp, md);
+ bci = bci_create(iov, count, NULL, 0, gfp, q->bounce_gfp | gfp, md);
if (!bci)
return ERR_PTR(-ENOMEM);
--
1.6.0.2
Impact: hack removal
SCSI needs to map sgl into rq for kernel PC requests; however, block
API didn't have such feature so it used its own rq mapping function
which hooked into block/bio internals and is generally considered an
ugly hack. The private function may also produce requests which are
bigger than queue per-rq limits.
Block blk_rq_map_kern_sgl(). Kill the private implementation and use
it.
Signed-off-by: Tejun Heo <[email protected]>
---
drivers/scsi/scsi_lib.c | 108 +----------------------------------------------
1 files changed, 1 insertions(+), 107 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3196c83..3fa5589 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -296,112 +296,6 @@ static void scsi_end_async(struct request *req, int uptodate)
__blk_put_request(req->q, req);
}
-static int scsi_merge_bio(struct request *rq, struct bio *bio)
-{
- struct request_queue *q = rq->q;
-
- bio->bi_flags &= ~(1 << BIO_SEG_VALID);
- if (rq_data_dir(rq) == WRITE)
- bio->bi_rw |= (1 << BIO_RW);
- blk_queue_bounce(q, &bio);
-
- return blk_rq_append_bio(q, rq, bio);
-}
-
-static void scsi_bi_endio(struct bio *bio, int error)
-{
- bio_put(bio);
-}
-
-/**
- * scsi_req_map_sg - map a scatterlist into a request
- * @rq: request to fill
- * @sgl: scatterlist
- * @nsegs: number of elements
- * @bufflen: len of buffer
- * @gfp: memory allocation flags
- *
- * scsi_req_map_sg maps a scatterlist into a request so that the
- * request can be sent to the block layer. We do not trust the scatterlist
- * sent to use, as some ULDs use that struct to only organize the pages.
- */
-static int scsi_req_map_sg(struct request *rq, struct scatterlist *sgl,
- int nsegs, unsigned bufflen, gfp_t gfp)
-{
- struct request_queue *q = rq->q;
- int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
- unsigned int data_len = bufflen, len, bytes, off;
- struct scatterlist *sg;
- struct page *page;
- struct bio *bio = NULL;
- int i, err, nr_vecs = 0;
-
- for_each_sg(sgl, sg, nsegs, i) {
- page = sg_page(sg);
- off = sg->offset;
- len = sg->length;
-
- while (len > 0 && data_len > 0) {
- /*
- * sg sends a scatterlist that is larger than
- * the data_len it wants transferred for certain
- * IO sizes
- */
- bytes = min_t(unsigned int, len, PAGE_SIZE - off);
- bytes = min(bytes, data_len);
-
- if (!bio) {
- nr_vecs = min_t(int, BIO_GUARANTEED_PAGES,
- nr_pages);
- nr_pages -= nr_vecs;
-
- bio = bio_alloc(gfp, nr_vecs);
- if (!bio) {
- err = -ENOMEM;
- goto free_bios;
- }
- bio->bi_end_io = scsi_bi_endio;
- }
-
- if (bio_add_pc_page(q, bio, page, bytes, off) !=
- bytes) {
- bio_put(bio);
- err = -EINVAL;
- goto free_bios;
- }
-
- if (bio->bi_vcnt >= nr_vecs) {
- err = scsi_merge_bio(rq, bio);
- if (err) {
- bio_endio(bio, 0);
- goto free_bios;
- }
- bio = NULL;
- }
-
- page++;
- len -= bytes;
- data_len -=bytes;
- off = 0;
- }
- }
-
- rq->buffer = rq->data = NULL;
- rq->data_len = bufflen;
- return 0;
-
-free_bios:
- while ((bio = rq->bio) != NULL) {
- rq->bio = bio->bi_next;
- /*
- * call endio instead of bio_put incase it was bounced
- */
- bio_endio(bio, 0);
- }
-
- return err;
-}
-
/**
* scsi_execute_async - insert request
* @sdev: scsi device
@@ -438,7 +332,7 @@ int scsi_execute_async(struct scsi_device *sdev, const unsigned char *cmd,
req->cmd_flags |= REQ_QUIET;
if (use_sg)
- err = scsi_req_map_sg(req, buffer, use_sg, bufflen, gfp);
+ err = blk_rq_map_kern_sg(req->q, req, buffer, use_sg, gfp);
else if (bufflen)
err = blk_rq_map_kern(req->q, req, buffer, bufflen, gfp);
--
1.6.0.2
Impact: cleanup
blk-map and bio are about to go through major update. Make the
following renames to ease future changes.
* more concise/wieldy names: s/gfp_mask/gfp/, s/map_data/md/,
s/iov_count/count/. Note that count is and will continue to be used
only for number of entries in iovec. Similarly, nents will be used
for number of entries in sgl.
* less confusing names: bio_map_data doesn't have much to do with
mapping per-se but is aux info for copying. Also, it's very
confusing with rq_map_data which BTW is reserved pool of pages for
copying. Rename bio_map_data to bio_copy_info and everything
related to bci_*(). This part of API is gonna receive major
overhaul. The names and semantics will get clearer with future
changes.
Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-map.c | 37 ++++----
fs/bio.c | 231 +++++++++++++++++++++++------------------------
include/linux/bio.h | 37 ++++----
include/linux/blkdev.h | 4 +-
4 files changed, 150 insertions(+), 159 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index 4f0221a..eb206df 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -43,11 +43,11 @@ static int __blk_rq_unmap_user(struct bio *bio)
* blk_rq_map_user_iov - map user data to a request, for REQ_TYPE_BLOCK_PC usage
* @q: request queue where request should be inserted
* @rq: request to map data to
- * @map_data: pointer to the rq_map_data holding pages (if necessary)
+ * @md: pointer to the rq_map_data holding pages (if necessary)
* @iov: pointer to the iovec
- * @iov_count: number of elements in the iovec
+ * @count: number of elements in the iovec
* @len: I/O byte count
- * @gfp_mask: memory allocation flags
+ * @gfp: memory allocation flags
*
* Description:
* Data will be mapped directly for zero copy I/O, if possible. Otherwise
@@ -63,20 +63,19 @@ static int __blk_rq_unmap_user(struct bio *bio)
* unmapping.
*/
int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
- struct rq_map_data *map_data, struct iovec *iov,
- int iov_count, unsigned int len, gfp_t gfp_mask)
+ struct rq_map_data *md, struct iovec *iov, int count,
+ unsigned int len, gfp_t gfp)
{
struct bio *bio = ERR_PTR(-EINVAL);
int rw = rq_data_dir(rq);
- if (!iov || iov_count <= 0)
+ if (!iov || count <= 0)
return -EINVAL;
- if (!map_data)
- bio = bio_map_user_iov(q, NULL, iov, iov_count, rw, gfp_mask);
+ if (!md)
+ bio = bio_map_user_iov(q, NULL, iov, count, rw, gfp);
if (bio == ERR_PTR(-EINVAL))
- bio = bio_copy_user_iov(q, map_data, iov, iov_count, rw,
- gfp_mask);
+ bio = bio_copy_user_iov(q, md, iov, count, rw, gfp);
if (IS_ERR(bio))
return PTR_ERR(bio);
@@ -107,10 +106,10 @@ EXPORT_SYMBOL(blk_rq_map_user_iov);
* blk_rq_map_user - map user data to a request, for REQ_TYPE_BLOCK_PC usage
* @q: request queue where request should be inserted
* @rq: request structure to fill
- * @map_data: pointer to the rq_map_data holding pages (if necessary)
+ * @md: pointer to the rq_map_data holding pages (if necessary)
* @ubuf: the user buffer
* @len: length of user data
- * @gfp_mask: memory allocation flags
+ * @gfp: memory allocation flags
*
* Description:
* Data will be mapped directly for zero copy I/O, if possible. Otherwise
@@ -126,15 +125,15 @@ EXPORT_SYMBOL(blk_rq_map_user_iov);
* unmapping.
*/
int blk_rq_map_user(struct request_queue *q, struct request *rq,
- struct rq_map_data *map_data, void __user *ubuf,
- unsigned long len, gfp_t gfp_mask)
+ struct rq_map_data *md, void __user *ubuf,
+ unsigned long len, gfp_t gfp)
{
struct iovec iov;
iov.iov_base = ubuf;
iov.iov_len = len;
- return blk_rq_map_user_iov(q, rq, map_data, &iov, 1, len, gfp_mask);
+ return blk_rq_map_user_iov(q, rq, md, &iov, 1, len, gfp);
}
EXPORT_SYMBOL(blk_rq_map_user);
@@ -167,14 +166,14 @@ EXPORT_SYMBOL(blk_rq_unmap_user);
* @rq: request to fill
* @kbuf: the kernel buffer
* @len: length of user data
- * @gfp_mask: memory allocation flags
+ * @gfp: memory allocation flags
*
* Description:
* Data will be mapped directly if possible. Otherwise a bounce
* buffer is used.
*/
int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
- unsigned int len, gfp_t gfp_mask)
+ unsigned int len, gfp_t gfp)
{
int rw = rq_data_dir(rq);
int do_copy = 0;
@@ -187,9 +186,9 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
do_copy = !blk_rq_aligned(q, kbuf, len) || object_is_on_stack(kbuf);
if (do_copy)
- bio = bio_copy_kern(q, kbuf, len, gfp_mask, rw);
+ bio = bio_copy_kern(q, kbuf, len, gfp, rw);
else
- bio = bio_map_kern(q, kbuf, len, gfp_mask);
+ bio = bio_map_kern(q, kbuf, len, gfp);
if (IS_ERR(bio))
return PTR_ERR(bio);
diff --git a/fs/bio.c b/fs/bio.c
index 9d13f21..1cd97e3 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -163,7 +163,7 @@ void bvec_free_bs(struct bio_set *bs, struct bio_vec *bv, unsigned int idx)
}
}
-struct bio_vec *bvec_alloc_bs(gfp_t gfp_mask, int nr, unsigned long *idx,
+struct bio_vec *bvec_alloc_bs(gfp_t gfp, int nr, unsigned long *idx,
struct bio_set *bs)
{
struct bio_vec *bvl;
@@ -200,24 +200,24 @@ struct bio_vec *bvec_alloc_bs(gfp_t gfp_mask, int nr, unsigned long *idx,
*/
if (*idx == BIOVEC_MAX_IDX) {
fallback:
- bvl = mempool_alloc(bs->bvec_pool, gfp_mask);
+ bvl = mempool_alloc(bs->bvec_pool, gfp);
} else {
struct biovec_slab *bvs = bvec_slabs + *idx;
- gfp_t __gfp_mask = gfp_mask & ~(__GFP_WAIT | __GFP_IO);
+ gfp_t __gfp = gfp & ~(__GFP_WAIT | __GFP_IO);
/*
* Make this allocation restricted and don't dump info on
* allocation failures, since we'll fallback to the mempool
* in case of failure.
*/
- __gfp_mask |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN;
+ __gfp |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN;
/*
* Try a slab allocation. If this fails and __GFP_WAIT
* is set, retry with the 1-entry mempool
*/
- bvl = kmem_cache_alloc(bvs->slab, __gfp_mask);
- if (unlikely(!bvl && (gfp_mask & __GFP_WAIT))) {
+ bvl = kmem_cache_alloc(bvs->slab, __gfp);
+ if (unlikely(!bvl && (gfp & __GFP_WAIT))) {
*idx = BIOVEC_MAX_IDX;
goto fallback;
}
@@ -256,7 +256,7 @@ void bio_init(struct bio *bio)
/**
* bio_alloc_bioset - allocate a bio for I/O
- * @gfp_mask: the GFP_ mask given to the slab allocator
+ * @gfp: the GFP_ mask given to the slab allocator
* @nr_iovecs: number of iovecs to pre-allocate
* @bs: the bio_set to allocate from. If %NULL, just use kmalloc
*
@@ -270,7 +270,7 @@ void bio_init(struct bio *bio)
* of a bio, to do the appropriate freeing of the bio once the reference
* count drops to zero.
**/
-struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
+struct bio *bio_alloc_bioset(gfp_t gfp, int nr_iovecs, struct bio_set *bs)
{
struct bio_vec *bvl = NULL;
struct bio *bio = NULL;
@@ -278,12 +278,12 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
void *p = NULL;
if (bs) {
- p = mempool_alloc(bs->bio_pool, gfp_mask);
+ p = mempool_alloc(bs->bio_pool, gfp);
if (!p)
goto err;
bio = p + bs->front_pad;
} else {
- bio = kmalloc(sizeof(*bio), gfp_mask);
+ bio = kmalloc(sizeof(*bio), gfp);
if (!bio)
goto err;
}
@@ -297,12 +297,12 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
bvl = bio->bi_inline_vecs;
nr_iovecs = BIO_INLINE_VECS;
} else if (bs) {
- bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, &idx, bs);
+ bvl = bvec_alloc_bs(gfp, nr_iovecs, &idx, bs);
if (unlikely(!bvl))
goto err_free;
nr_iovecs = bvec_nr_vecs(idx);
} else {
- bvl = kmalloc(nr_iovecs * sizeof(struct bio_vec), gfp_mask);
+ bvl = kmalloc(nr_iovecs * sizeof(struct bio_vec), gfp);
if (unlikely(!bvl))
goto err_free;
}
@@ -329,18 +329,18 @@ static void bio_fs_destructor(struct bio *bio)
/**
* bio_alloc - allocate a new bio, memory pool backed
- * @gfp_mask: allocation mask to use
+ * @gfp: allocation mask to use
* @nr_iovecs: number of iovecs
*
- * Allocate a new bio with @nr_iovecs bvecs. If @gfp_mask
+ * Allocate a new bio with @nr_iovecs bvecs. If @gfp
* contains __GFP_WAIT, the allocation is guaranteed to succeed.
*
* RETURNS:
* Pointer to new bio on success, NULL on failure.
*/
-struct bio *bio_alloc(gfp_t gfp_mask, int nr_iovecs)
+struct bio *bio_alloc(gfp_t gfp, int nr_iovecs)
{
- struct bio *bio = bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
+ struct bio *bio = bio_alloc_bioset(gfp, nr_iovecs, fs_bio_set);
if (bio)
bio->bi_destructor = bio_fs_destructor;
@@ -359,19 +359,19 @@ static void bio_kmalloc_destructor(struct bio *bio)
/**
* bio_kmalloc - allocate a new bio
- * @gfp_mask: allocation mask to use
+ * @gfp: allocation mask to use
* @nr_iovecs: number of iovecs
*
* Similar to bio_alloc() but uses regular kmalloc for allocation
- * and can fail unless __GFP_NOFAIL is set in @gfp_mask. This is
+ * and can fail unless __GFP_NOFAIL is set in @gfp. This is
* useful for more permanant or over-sized bio allocations.
*
* RETURNS:
* Poitner to new bio on success, NULL on failure.
*/
-struct bio *bio_kmalloc(gfp_t gfp_mask, int nr_iovecs)
+struct bio *bio_kmalloc(gfp_t gfp, int nr_iovecs)
{
- struct bio *bio = bio_alloc_bioset(gfp_mask, nr_iovecs, NULL);
+ struct bio *bio = bio_alloc_bioset(gfp, nr_iovecs, NULL);
if (bio)
bio->bi_destructor = bio_kmalloc_destructor;
@@ -453,13 +453,13 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
/**
* bio_clone - clone a bio
* @bio: bio to clone
- * @gfp_mask: allocation priority
+ * @gfp: allocation priority
*
* Like __bio_clone, only also allocates the returned bio
*/
-struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
+struct bio *bio_clone(struct bio *bio, gfp_t gfp)
{
- struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, fs_bio_set);
+ struct bio *b = bio_alloc_bioset(gfp, bio->bi_max_vecs, fs_bio_set);
if (!b)
return NULL;
@@ -470,7 +470,7 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
if (bio_integrity(bio)) {
int ret;
- ret = bio_integrity_clone(b, bio, gfp_mask);
+ ret = bio_integrity_clone(b, bio, gfp);
if (ret < 0) {
bio_put(b);
@@ -653,56 +653,54 @@ int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
return __bio_add_page(q, bio, page, len, offset, q->max_sectors);
}
-struct bio_map_data {
- struct bio_vec *iovecs;
- struct iovec *sgvecs;
- int nr_sgvecs;
- int is_our_pages;
+struct bio_copy_info {
+ struct bio_vec *iovecs;
+ struct iovec *src_iov;
+ int src_count;
+ int is_our_pages;
};
-static void bio_set_map_data(struct bio_map_data *bmd, struct bio *bio,
- struct iovec *iov, int iov_count,
- int is_our_pages)
+static void bci_set(struct bio_copy_info *bci, struct bio *bio,
+ struct iovec *iov, int count, int is_our_pages)
{
- memcpy(bmd->iovecs, bio->bi_io_vec, sizeof(struct bio_vec) * bio->bi_vcnt);
- memcpy(bmd->sgvecs, iov, sizeof(struct iovec) * iov_count);
- bmd->nr_sgvecs = iov_count;
- bmd->is_our_pages = is_our_pages;
- bio->bi_private = bmd;
+ memcpy(bci->iovecs, bio->bi_io_vec, sizeof(struct bio_vec) * bio->bi_vcnt);
+ memcpy(bci->src_iov, iov, sizeof(struct iovec) * count);
+ bci->src_count = count;
+ bci->is_our_pages = is_our_pages;
+ bio->bi_private = bci;
}
-static void bio_free_map_data(struct bio_map_data *bmd)
+static void bci_free(struct bio_copy_info *bci)
{
- kfree(bmd->iovecs);
- kfree(bmd->sgvecs);
- kfree(bmd);
+ kfree(bci->iovecs);
+ kfree(bci->src_iov);
+ kfree(bci);
}
-static struct bio_map_data *bio_alloc_map_data(int nr_segs, int iov_count,
- gfp_t gfp_mask)
+static struct bio_copy_info *bci_alloc(int nr_segs, int count, gfp_t gfp)
{
- struct bio_map_data *bmd = kmalloc(sizeof(*bmd), gfp_mask);
+ struct bio_copy_info *bci = kmalloc(sizeof(*bci), gfp);
- if (!bmd)
+ if (!bci)
return NULL;
- bmd->iovecs = kmalloc(sizeof(struct bio_vec) * nr_segs, gfp_mask);
- if (!bmd->iovecs) {
- kfree(bmd);
+ bci->iovecs = kmalloc(sizeof(struct bio_vec) * nr_segs, gfp);
+ if (!bci->iovecs) {
+ kfree(bci);
return NULL;
}
- bmd->sgvecs = kmalloc(sizeof(struct iovec) * iov_count, gfp_mask);
- if (bmd->sgvecs)
- return bmd;
+ bci->src_iov = kmalloc(sizeof(struct iovec) * count, gfp);
+ if (bci->src_count)
+ return bci;
- kfree(bmd->iovecs);
- kfree(bmd);
+ kfree(bci->iovecs);
+ kfree(bci);
return NULL;
}
static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
- struct iovec *iov, int iov_count, int uncopy,
+ struct iovec *iov, int count, int uncopy,
int do_free_page)
{
int ret = 0, i;
@@ -715,7 +713,7 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
char *bv_addr = page_address(bvec->bv_page);
unsigned int bv_len = iovecs[i].bv_len;
- while (bv_len && iov_idx < iov_count) {
+ while (bv_len && iov_idx < count) {
unsigned int bytes;
char *iov_addr;
@@ -762,13 +760,13 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
*/
int bio_uncopy_user(struct bio *bio)
{
- struct bio_map_data *bmd = bio->bi_private;
+ struct bio_copy_info *bci = bio->bi_private;
int ret = 0;
if (!bio_flagged(bio, BIO_NULL_MAPPED))
- ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
- bmd->nr_sgvecs, 1, bmd->is_our_pages);
- bio_free_map_data(bmd);
+ ret = __bio_copy_iov(bio, bci->iovecs, bci->src_iov,
+ bci->src_count, 1, bci->is_our_pages);
+ bci_free(bci);
bio_put(bio);
return ret;
}
@@ -776,31 +774,29 @@ int bio_uncopy_user(struct bio *bio)
/**
* bio_copy_user_iov - copy user data to bio
* @q: destination block queue
- * @map_data: pointer to the rq_map_data holding pages (if necessary)
- * @iov: the iovec.
- * @iov_count: number of elements in the iovec
+ * @md: pointer to the rq_map_data holding pages (if necessary)
+ * @iov: the iovec.
+ * @count: number of elements in the iovec
* @rw: READ or WRITE
- * @gfp_mask: memory allocation flags
+ * @gfp: memory allocation flags
*
* Prepares and returns a bio for indirect user io, bouncing data
* to/from kernel pages as necessary. Must be paired with
* call bio_uncopy_user() on io completion.
*/
-struct bio *bio_copy_user_iov(struct request_queue *q,
- struct rq_map_data *map_data,
- struct iovec *iov, int iov_count, int rw,
- gfp_t gfp_mask)
+struct bio *bio_copy_user_iov(struct request_queue *q, struct rq_map_data *md,
+ struct iovec *iov, int count, int rw, gfp_t gfp)
{
- struct bio_map_data *bmd;
+ struct bio_copy_info *bci;
struct bio_vec *bvec;
struct page *page;
struct bio *bio;
int i, ret;
int nr_pages = 0;
unsigned int len = 0;
- unsigned int offset = map_data ? map_data->offset & ~PAGE_MASK : 0;
+ unsigned int offset = md ? md->offset & ~PAGE_MASK : 0;
- for (i = 0; i < iov_count; i++) {
+ for (i = 0; i < count; i++) {
unsigned long uaddr;
unsigned long end;
unsigned long start;
@@ -813,23 +809,23 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
len += iov[i].iov_len;
}
- bmd = bio_alloc_map_data(nr_pages, iov_count, gfp_mask);
- if (!bmd)
+ bci = bci_alloc(nr_pages, count, gfp);
+ if (!bci)
return ERR_PTR(-ENOMEM);
ret = -ENOMEM;
- bio = bio_kmalloc(gfp_mask, nr_pages);
+ bio = bio_kmalloc(gfp, nr_pages);
if (!bio)
- goto out_bmd;
+ goto out_bci;
if (rw == WRITE)
bio->bi_rw |= 1 << BIO_RW;
ret = 0;
- if (map_data) {
- nr_pages = 1 << map_data->page_order;
- i = map_data->offset / PAGE_SIZE;
+ if (md) {
+ nr_pages = 1 << md->page_order;
+ i = md->offset / PAGE_SIZE;
}
while (len) {
unsigned int bytes = PAGE_SIZE;
@@ -839,18 +835,18 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
if (bytes > len)
bytes = len;
- if (map_data) {
- if (i == map_data->nr_entries * nr_pages) {
+ if (md) {
+ if (i == md->nr_entries * nr_pages) {
ret = -ENOMEM;
break;
}
- page = map_data->pages[i / nr_pages];
+ page = md->pages[i / nr_pages];
page += (i % nr_pages);
i++;
} else {
- page = alloc_page(q->bounce_gfp | gfp_mask);
+ page = alloc_page(q->bounce_gfp | gfp);
if (!page) {
ret = -ENOMEM;
break;
@@ -870,56 +866,56 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
/*
* success
*/
- if (unlikely(map_data && map_data->null_mapped))
+ if (unlikely(md && md->null_mapped))
bio->bi_flags |= (1 << BIO_NULL_MAPPED);
else if (rw == WRITE) {
- ret = __bio_copy_iov(bio, bio->bi_io_vec, iov, iov_count, 0, 0);
+ ret = __bio_copy_iov(bio, bio->bi_io_vec, iov, count, 0, 0);
if (ret)
goto cleanup;
}
- bio_set_map_data(bmd, bio, iov, iov_count, map_data ? 0 : 1);
+ bci_set(bci, bio, iov, count, md ? 0 : 1);
return bio;
cleanup:
- if (!map_data)
+ if (!md)
bio_for_each_segment(bvec, bio, i)
__free_page(bvec->bv_page);
bio_put(bio);
-out_bmd:
- bio_free_map_data(bmd);
+out_bci:
+ bci_free(bci);
return ERR_PTR(ret);
}
/**
* bio_copy_user - copy user data to bio
* @q: destination block queue
- * @map_data: pointer to the rq_map_data holding pages (if necessary)
+ * @md: pointer to the rq_map_data holding pages (if necessary)
* @uaddr: start of user address
* @len: length in bytes
* @rw: READ or WRITE
- * @gfp_mask: memory allocation flags
+ * @gfp: memory allocation flags
*
* Prepares and returns a bio for indirect user io, bouncing data
* to/from kernel pages as necessary. Must be paired with
* call bio_uncopy_user() on io completion.
*/
-struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
+struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *md,
unsigned long uaddr, unsigned int len, int rw,
- gfp_t gfp_mask)
+ gfp_t gfp)
{
struct iovec iov;
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
- return bio_copy_user_iov(q, map_data, &iov, 1, rw, gfp_mask);
+ return bio_copy_user_iov(q, md, &iov, 1, rw, gfp);
}
static struct bio *__bio_map_user_iov(struct request_queue *q,
struct block_device *bdev,
- struct iovec *iov, int iov_count,
- int rw, gfp_t gfp_mask)
+ struct iovec *iov, int count, int rw,
+ gfp_t gfp)
{
int i, j;
size_t tot_len = 0;
@@ -929,7 +925,7 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
int cur_page = 0;
int ret, offset;
- for (i = 0; i < iov_count; i++) {
+ for (i = 0; i < count; i++) {
unsigned long uaddr = (unsigned long)iov[i].iov_base;
unsigned long len = iov[i].iov_len;
@@ -950,16 +946,16 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
if (!nr_pages || tot_len & q->dma_pad_mask)
return ERR_PTR(-EINVAL);
- bio = bio_kmalloc(gfp_mask, nr_pages);
+ bio = bio_kmalloc(gfp, nr_pages);
if (!bio)
return ERR_PTR(-ENOMEM);
ret = -ENOMEM;
- pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask);
+ pages = kcalloc(nr_pages, sizeof(struct page *), gfp);
if (!pages)
goto out;
- for (i = 0; i < iov_count; i++) {
+ for (i = 0; i < count; i++) {
unsigned long uaddr = (unsigned long)iov[i].iov_base;
unsigned long len = iov[i].iov_len;
unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -1034,21 +1030,21 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
* @uaddr: start of user address
* @len: length in bytes
* @rw: READ or WRITE
- * @gfp_mask: memory allocation flags
+ * @gfp: memory allocation flags
*
* Map the user space address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
unsigned long uaddr, unsigned int len, int rw,
- gfp_t gfp_mask)
+ gfp_t gfp)
{
struct iovec iov;
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
- return bio_map_user_iov(q, bdev, &iov, 1, rw, gfp_mask);
+ return bio_map_user_iov(q, bdev, &iov, 1, rw, gfp);
}
/**
@@ -1056,20 +1052,19 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
* @q: the struct request_queue for the bio
* @bdev: destination block device
* @iov: the iovec.
- * @iov_count: number of elements in the iovec
+ * @count: number of elements in the iovec
* @rw: READ or WRITE
- * @gfp_mask: memory allocation flags
+ * @gfp: memory allocation flags
*
* Map the user space address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
- struct iovec *iov, int iov_count, int rw,
- gfp_t gfp_mask)
+ struct iovec *iov, int count, int rw, gfp_t gfp)
{
struct bio *bio;
- bio = __bio_map_user_iov(q, bdev, iov, iov_count, rw, gfp_mask);
+ bio = __bio_map_user_iov(q, bdev, iov, count, rw, gfp);
if (IS_ERR(bio))
return bio;
@@ -1124,7 +1119,7 @@ static void bio_map_kern_endio(struct bio *bio, int err)
static struct bio *__bio_map_kern(struct request_queue *q, void *data,
- unsigned int len, gfp_t gfp_mask)
+ unsigned int len, gfp_t gfp)
{
unsigned long kaddr = (unsigned long)data;
unsigned long end = (kaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -1133,7 +1128,7 @@ static struct bio *__bio_map_kern(struct request_queue *q, void *data,
int offset, i;
struct bio *bio;
- bio = bio_kmalloc(gfp_mask, nr_pages);
+ bio = bio_kmalloc(gfp, nr_pages);
if (!bio)
return ERR_PTR(-ENOMEM);
@@ -1165,17 +1160,17 @@ static struct bio *__bio_map_kern(struct request_queue *q, void *data,
* @q: the struct request_queue for the bio
* @data: pointer to buffer to map
* @len: length in bytes
- * @gfp_mask: allocation flags for bio allocation
+ * @gfp: allocation flags for bio allocation
*
* Map the kernel address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp_mask)
+ gfp_t gfp)
{
struct bio *bio;
- bio = __bio_map_kern(q, data, len, gfp_mask);
+ bio = __bio_map_kern(q, data, len, gfp);
if (IS_ERR(bio))
return bio;
@@ -1193,13 +1188,13 @@ static void bio_copy_kern_endio(struct bio *bio, int err)
{
struct bio_vec *bvec;
const int read = bio_data_dir(bio) == READ;
- struct bio_map_data *bmd = bio->bi_private;
+ struct bio_copy_info *bci = bio->bi_private;
int i;
- char *p = bmd->sgvecs[0].iov_base;
+ char *p = bci->src_iov[0].iov_base;
__bio_for_each_segment(bvec, bio, i, 0) {
char *addr = page_address(bvec->bv_page);
- int len = bmd->iovecs[i].bv_len;
+ int len = bci->iovecs[i].bv_len;
if (read && !err)
memcpy(p, addr, len);
@@ -1208,7 +1203,7 @@ static void bio_copy_kern_endio(struct bio *bio, int err)
p += len;
}
- bio_free_map_data(bmd);
+ bci_free(bci);
bio_put(bio);
}
@@ -1217,20 +1212,20 @@ static void bio_copy_kern_endio(struct bio *bio, int err)
* @q: the struct request_queue for the bio
* @data: pointer to buffer to copy
* @len: length in bytes
- * @gfp_mask: allocation flags for bio and page allocation
+ * @gfp: allocation flags for bio and page allocation
* @rw: READ or WRITE
*
* copy the kernel address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp_mask, int rw)
+ gfp_t gfp, int rw)
{
struct bio *bio;
struct bio_vec *bvec;
int i;
- bio = bio_copy_user(q, NULL, (unsigned long)data, len, READ, gfp_mask);
+ bio = bio_copy_user(q, NULL, (unsigned long)data, len, READ, gfp);
if (IS_ERR(bio))
return bio;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8215ded..1c21e59 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -365,9 +365,9 @@ void bio_pair_release(struct bio_pair *dbio);
struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad);
void bioset_free(struct bio_set *bs);
-struct bio *bio_alloc(gfp_t gfp_mask, int nr_iovecs);
-struct bio *bio_kmalloc(gfp_t gfp_mask, int nr_iovecs);
-struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs);
+struct bio *bio_alloc(gfp_t gfp, int nr_iovecs);
+struct bio *bio_kmalloc(gfp_t gfp, int nr_iovecs);
+struct bio *bio_alloc_bioset(gfp_t gfp, int nr_iovecs, struct bio_set *bs);
void bio_put(struct bio *bio);
void bio_free(struct bio *bio, struct bio_set *bs);
@@ -375,7 +375,7 @@ void bio_endio(struct bio *bio, int error);
int bio_phys_segments(struct request_queue *q, struct bio *bio);
void __bio_clone(struct bio *bio, struct bio *bio_src);
-struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask);
+struct bio *bio_clone(struct bio *bio, gfp_t gfp);
void bio_init(struct bio *bio);
@@ -388,27 +388,24 @@ sector_t bio_sector_offset(struct bio *bio, unsigned short index,
unsigned int offset);
struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
unsigned long uaddr, unsigned int len, int rw,
- gfp_t gfp_mask);
+ gfp_t gfp);
struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
- struct iovec *iov, int iov_count, int rw,
- gfp_t gfp_mask);
+ struct iovec *iov, int count, int rw, gfp_t gfp);
void bio_unmap_user(struct bio *bio);
-struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
+struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *md,
unsigned long uaddr, unsigned int len, int rw,
- gfp_t gfp_mask);
-struct bio *bio_copy_user_iov(struct request_queue *q,
- struct rq_map_data *map_data,
- struct iovec *iov, int iov_count, int rw,
- gfp_t gfp_mask);
+ gfp_t gfp);
+struct bio *bio_copy_user_iov(struct request_queue *q, struct rq_map_data *md,
+ struct iovec *iov, int count, int rw, gfp_t gfp);
int bio_uncopy_user(struct bio *bio);
struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp_mask);
+ gfp_t gfp);
struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp_mask, int rw);
+ gfp_t gfp, int rw);
void bio_set_pages_dirty(struct bio *bio);
void bio_check_pages_dirty(struct bio *bio);
void zero_fill_bio(struct bio *bio);
-struct bio_vec *bvec_alloc_bs(gfp_t gfp_mask, int nr, unsigned long *idx,
+struct bio_vec *bvec_alloc_bs(gfp_t gfp, int nr, unsigned long *idx,
struct bio_set *bs);
void bvec_free_bs(struct bio_set *bs, struct bio_vec *bv, unsigned int idx);
unsigned int bvec_nr_vecs(unsigned short idx);
@@ -530,7 +527,7 @@ static inline bool bio_integrity(struct bio *bio)
}
struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
- gfp_t gfp_mask, unsigned int nr_vecs);
+ gfp_t gfp, unsigned int nr_vecs);
void bio_integrity_free(struct bio *bio);
int bio_integrity_add_page(struct bio *bio, struct page *page,
unsigned int len, unsigned int offset);
@@ -543,14 +540,14 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done);
void bio_integrity_trim(struct bio *bio, unsigned int offset,
unsigned int sectors);
void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors);
-int bio_integrity_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp_mask);
+int bio_integrity_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp);
#else /* CONFIG_BLK_DEV_INTEGRITY */
static inline bool bio_integrity(struct bio *bio)
{ return false; }
static inline struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
- gfp_t gfp_mask, unsigned int nr_vecs)
+ gfp_t gfp, unsigned int nr_vecs)
{ return NULL; }
static inline void bio_integrity_free(struct bio *bio)
{ }
@@ -580,7 +577,7 @@ static inline void bio_integrity_split(struct bio *bio, struct bio_pair *bp,
{ }
static inline int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
- gfp_t gfp_mask)
+ gfp_t gfp)
{ return -EIO; }
#endif /* CONFIG_BLK_DEV_INTEGRITY */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d7bb20c..d04e118 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -782,8 +782,8 @@ extern int blk_rq_map_user(struct request_queue *, struct request *,
extern int blk_rq_unmap_user(struct bio *);
extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
extern int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
- struct rq_map_data *map_data, struct iovec *iov,
- int iov_count, unsigned int len, gfp_t gfp_mask);
+ struct rq_map_data *md, struct iovec *iov,
+ int count, unsigned int len, gfp_t gfp);
extern int blk_execute_rq(struct request_queue *, struct gendisk *,
struct request *, int);
extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
--
1.6.0.2
Impact: API cleanup
blk_rq_map_user_iov() took @len parameter which contains duplicate
information as the total length is available as the sum of all iov
segments. This doesn't save anything either as the mapping function
should walk the whole iov on entry to check for alignment anyway.
Remove the superflous parameter.
Removing the superflous parameter removes the pathological corner case
where the caller passes in shorter @len than @iov but @iov mappings
ends up capped due to queue limits and bio->bi_size ends up matching
@len thus resulting in successful map. With the superflous parameter
gone, blk-map/bio can now simply fail partial mappings.
Move partial mapping detection to bio_create_from_sgl() which is
shared by all map/copy paths and remove partial mapping handling from
all other places.
This change removes one of the two users of __blk_rq_unmap_user() and
it gets collapsed into blk_rq_unmap_user().
Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-map.c | 47 +++++++++++++++--------------------------------
block/scsi_ioctl.c | 11 +++--------
drivers/scsi/sg.c | 2 +-
fs/bio.c | 43 +++++++++++++++++--------------------------
include/linux/blkdev.h | 2 +-
5 files changed, 37 insertions(+), 68 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index dc4097c..f60f439 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -8,20 +8,6 @@
#include "blk.h"
-static int __blk_rq_unmap_user(struct bio *bio)
-{
- int ret = 0;
-
- if (bio) {
- if (bio_flagged(bio, BIO_USER_MAPPED))
- bio_unmap_user(bio);
- else
- ret = bio_uncopy_user(bio);
- }
-
- return ret;
-}
-
/**
* blk_rq_map_user_iov - map user data to a request, for REQ_TYPE_BLOCK_PC usage
* @q: request queue where request should be inserted
@@ -29,7 +15,6 @@ static int __blk_rq_unmap_user(struct bio *bio)
* @md: pointer to the rq_map_data holding pages (if necessary)
* @iov: pointer to the iovec
* @count: number of elements in the iovec
- * @len: I/O byte count
* @gfp: memory allocation flags
*
* Description:
@@ -47,7 +32,7 @@ static int __blk_rq_unmap_user(struct bio *bio)
*/
int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
struct rq_map_data *md, struct iovec *iov, int count,
- unsigned int len, gfp_t gfp)
+ gfp_t gfp)
{
struct bio *bio = ERR_PTR(-EINVAL);
int rw = rq_data_dir(rq);
@@ -62,23 +47,17 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
if (IS_ERR(bio))
return PTR_ERR(bio);
- if (bio->bi_size != len) {
- /*
- * Grab an extra reference to this bio, as bio_unmap_user()
- * expects to be able to drop it twice as it happens on the
- * normal IO completion path
- */
- bio_get(bio);
- bio_endio(bio, 0);
- __blk_rq_unmap_user(bio);
- return -EINVAL;
- }
-
if (!bio_flagged(bio, BIO_USER_MAPPED))
rq->cmd_flags |= REQ_COPY_USER;
- blk_queue_bounce(q, &bio);
+ /*
+ * Grab an extra reference to this bio, as bio_unmap_user()
+ * expects to be able to drop it twice as it happens on the
+ * normal IO completion path.
+ */
bio_get(bio);
+
+ blk_queue_bounce(q, &bio);
blk_rq_bio_prep(q, rq, bio);
rq->buffer = rq->data = NULL;
return 0;
@@ -116,7 +95,7 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq,
iov.iov_base = ubuf;
iov.iov_len = len;
- return blk_rq_map_user_iov(q, rq, md, &iov, 1, len, gfp);
+ return blk_rq_map_user_iov(q, rq, md, &iov, 1, gfp);
}
EXPORT_SYMBOL(blk_rq_map_user);
@@ -132,12 +111,16 @@ EXPORT_SYMBOL(blk_rq_map_user);
int blk_rq_unmap_user(struct bio *bio)
{
struct bio *mapped_bio = bio;
- int ret;
+ int ret = 0;
if (unlikely(bio_flagged(bio, BIO_BOUNCED)))
mapped_bio = bio->bi_private;
- ret = __blk_rq_unmap_user(mapped_bio);
+ if (bio_flagged(bio, BIO_USER_MAPPED))
+ bio_unmap_user(mapped_bio);
+ else
+ ret = bio_uncopy_user(mapped_bio);
+
bio_put(bio);
return ret;
}
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 73cfd91..fd538f8 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -288,7 +288,6 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
if (hdr->iovec_count) {
const int size = sizeof(struct sg_iovec) * hdr->iovec_count;
- size_t iov_data_len;
struct iovec *iov;
iov = kmalloc(size, GFP_KERNEL);
@@ -304,15 +303,11 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
}
/* SG_IO howto says that the shorter of the two wins */
- iov_data_len = iov_length(iov, hdr->iovec_count);
- if (hdr->dxfer_len < iov_data_len) {
- hdr->iovec_count = iov_shorten(iov, hdr->iovec_count,
- hdr->dxfer_len);
- iov_data_len = hdr->dxfer_len;
- }
+ hdr->iovec_count = iov_shorten(iov, hdr->iovec_count,
+ hdr->dxfer_len);
ret = blk_rq_map_user_iov(q, rq, NULL, iov, hdr->iovec_count,
- iov_data_len, GFP_KERNEL);
+ GFP_KERNEL);
kfree(iov);
} else if (hdr->dxfer_len)
ret = blk_rq_map_user(q, rq, NULL, hdr->dxferp, hdr->dxfer_len,
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index b4ef2f8..5fcf436 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1673,7 +1673,7 @@ static int sg_start_req(Sg_request *srp, unsigned char *cmd)
if (iov_count)
res = blk_rq_map_user_iov(q, rq, md, hp->dxferp, iov_count,
- hp->dxfer_len, GFP_ATOMIC);
+ GFP_ATOMIC);
else
res = blk_rq_map_user(q, rq, md, hp->dxferp,
hp->dxfer_len, GFP_ATOMIC);
diff --git a/fs/bio.c b/fs/bio.c
index fe796dc..9466b05 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -956,14 +956,14 @@ static void bio_memcpy_sgl_sgl(struct scatterlist *dsgl, int dnents,
* @nr_pages: number of pages in @sgl
* @rw: the data direction of new bio
*
- * Populate @bio with the data area described by @sgl. Note that
- * the resulting bio might not contain the whole @sgl area. This
- * can be checked by testing bio->bi_size against total area
- * length.
+ * Populate @bio with the data area described by @sgl.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
*/
-static void bio_init_from_sgl(struct bio *bio, struct request_queue *q,
- struct scatterlist *sgl, int nents,
- int nr_pages, int rw)
+static int bio_init_from_sgl(struct bio *bio, struct request_queue *q,
+ struct scatterlist *sgl, int nents,
+ int nr_pages, int rw)
{
struct scatterlist *sg;
int i;
@@ -979,15 +979,18 @@ static void bio_init_from_sgl(struct bio *bio, struct request_queue *q,
while (len) {
size_t bytes = min_t(size_t, len, PAGE_SIZE - offset);
+ /* doesn't support partial mappings */
if (unlikely(bio_add_pc_page(q, bio, page,
bytes, offset) < bytes))
- break;
+ return -EINVAL;
offset = 0;
len -= bytes;
page = nth_page(page, 1);
}
}
+
+ return 0;
}
/**
@@ -1009,12 +1012,17 @@ static struct bio *bio_create_from_sgl(struct request_queue *q,
int nr_pages, int rw, int gfp)
{
struct bio *bio;
+ int ret;
bio = bio_kmalloc(gfp, nr_pages);
if (!bio)
return ERR_PTR(-ENOMEM);
- bio_init_from_sgl(bio, q, sgl, nents, nr_pages, rw);
+ ret = bio_init_from_sgl(bio, q, sgl, nents, nr_pages, rw);
+ if (ret) {
+ bio_put(bio);
+ return ERR_PTR(ret);
+ }
return bio;
}
@@ -1170,10 +1178,6 @@ struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
goto err_pages;
}
- /* release the pages we didn't map into the bio, if any */
- for (i = bio->bi_vcnt; i < nr_pages; i++)
- page_cache_release(pages[i]);
-
bio->bi_bdev = bdev;
bio->bi_flags |= (1 << BIO_USER_MAPPED);
@@ -1283,12 +1287,6 @@ struct bio *bio_map_kern_sg(struct request_queue *q, struct scatterlist *sgl,
if (IS_ERR(bio))
return bio;
- /* doesn't support partial mappings */
- if (bio->bi_size != tot_len) {
- bio_put(bio);
- return ERR_PTR(-EINVAL);
- }
-
bio->bi_end_io = bio_map_kern_endio;
return bio;
}
@@ -1343,17 +1341,10 @@ struct bio *bio_copy_kern_sg(struct request_queue *q, struct scatterlist *sgl,
goto err_bci;
}
- /* doesn't support partial mappings */
- ret= -EINVAL;
- if (bio->bi_size != bci->len)
- goto err_bio;
-
bio->bi_end_io = bio_copy_kern_endio;
bio->bi_private = bci;
return bio;
-err_bio:
- bio_put(bio);
err_bci:
bci_destroy(bci);
return ERR_PTR(ret);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 40bec76..6876466 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -777,7 +777,7 @@ extern int blk_rq_unmap_user(struct bio *);
extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
extern int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
struct rq_map_data *md, struct iovec *iov,
- int count, unsigned int len, gfp_t gfp);
+ int count, gfp_t gfp);
extern int blk_rq_map_kern_sg(struct request_queue *q, struct request *rq,
struct scatterlist *sgl, int nents, gfp_t gfp);
extern int blk_execute_rq(struct request_queue *, struct gendisk *,
--
1.6.0.2
Impact: cleanup, removal of unused / undesirable API
With recent changes, the following functions aren't used anymore.
* bio_{map|copy}_{user|kern}()
* blk_rq_append_bio()
The following functions aren't used outside of block layer.
* bio_add_pc_page()
* bio_un{map|copy}_user()
Kill the first group and unexport the second. As bio_add_pc_page() is
used only inside fs/bio.c, it's made static.
Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-map.c | 17 --------
fs/bio.c | 102 +----------------------------------------------
include/linux/bio.h | 12 ------
include/linux/blkdev.h | 6 ---
4 files changed, 3 insertions(+), 134 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index 0474c09..dc4097c 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -8,23 +8,6 @@
#include "blk.h"
-int blk_rq_append_bio(struct request_queue *q, struct request *rq,
- struct bio *bio)
-{
- if (!rq->bio)
- blk_rq_bio_prep(q, rq, bio);
- else if (!ll_back_merge_fn(q, rq, bio))
- return -EINVAL;
- else {
- rq->biotail->bi_next = bio;
- rq->biotail = bio;
-
- rq->data_len += bio->bi_size;
- }
- return 0;
-}
-EXPORT_SYMBOL(blk_rq_append_bio);
-
static int __blk_rq_unmap_user(struct bio *bio)
{
int ret = 0;
diff --git a/fs/bio.c b/fs/bio.c
index 9c921f9..fe796dc 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -627,8 +627,9 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
* smaller than PAGE_SIZE, so it is always possible to add a single
* page to an empty bio. This should only be used by REQ_PC bios.
*/
-int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page *page,
- unsigned int len, unsigned int offset)
+static int bio_add_pc_page(struct request_queue *q, struct bio *bio,
+ struct page *page, unsigned int len,
+ unsigned int offset)
{
return __bio_add_page(q, bio, page, len, offset, q->max_hw_sectors);
}
@@ -1085,31 +1086,6 @@ err_bci:
}
/**
- * bio_copy_user - copy user data to bio
- * @q: destination block queue
- * @md: pointer to the rq_map_data holding pages (if necessary)
- * @uaddr: start of user address
- * @len: length in bytes
- * @rw: READ or WRITE
- * @gfp: memory allocation flags
- *
- * Prepares and returns a bio for indirect user io, bouncing data
- * to/from kernel pages as necessary. Must be paired with
- * call bio_uncopy_user() on io completion.
- */
-struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *md,
- unsigned long uaddr, unsigned int len, int rw,
- gfp_t gfp)
-{
- struct iovec iov;
-
- iov.iov_base = (void __user *)uaddr;
- iov.iov_len = len;
-
- return bio_copy_user_iov(q, md, &iov, 1, rw, gfp);
-}
-
-/**
* bio_map_user_iov - map user iovec table into bio
* @q: the struct request_queue for the bio
* @bdev: destination block device
@@ -1227,30 +1203,6 @@ err_sgl:
}
/**
- * bio_map_user - map user address into bio
- * @q: the struct request_queue for the bio
- * @bdev: destination block device
- * @uaddr: start of user address
- * @len: length in bytes
- * @rw: READ or WRITE
- * @gfp: memory allocation flags
- *
- * Map the user space address into a bio suitable for io to a block
- * device. Returns an error pointer in case of error.
- */
-struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
- unsigned long uaddr, unsigned int len, int rw,
- gfp_t gfp)
-{
- struct iovec iov;
-
- iov.iov_base = (void __user *)uaddr;
- iov.iov_len = len;
-
- return bio_map_user_iov(q, bdev, &iov, 1, rw, gfp);
-}
-
-/**
* bio_unmap_user - unmap a bio
* @bio: the bio being unmapped
*
@@ -1341,26 +1293,6 @@ struct bio *bio_map_kern_sg(struct request_queue *q, struct scatterlist *sgl,
return bio;
}
-/**
- * bio_map_kern - map kernel address into bio
- * @q: the struct request_queue for the bio
- * @data: pointer to buffer to map
- * @len: length in bytes
- * @gfp: allocation flags for bio allocation
- *
- * Map the kernel address into a bio suitable for io to a block
- * device. Returns an error pointer in case of error.
- */
-struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp)
-{
- struct scatterlist sg;
-
- sg_init_one(&sg, data, len);
-
- return bio_map_kern_sg(q, &sg, 1, READ, gfp);
-}
-
static void bio_copy_kern_endio(struct bio *bio, int err)
{
struct bio_copy_info *bci = bio->bi_private;
@@ -1427,27 +1359,6 @@ err_bci:
return ERR_PTR(ret);
}
-/**
- * bio_copy_kern - copy kernel address into bio
- * @q: the struct request_queue for the bio
- * @data: pointer to buffer to copy
- * @len: length in bytes
- * @gfp: allocation flags for bio and page allocation
- * @rw: READ or WRITE
- *
- * copy the kernel address into a bio suitable for io to a block
- * device. Returns an error pointer in case of error.
- */
-struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp, int rw)
-{
- struct scatterlist sg;
-
- sg_init_one(&sg, data, len);
-
- return bio_copy_kern_sg(q, &sg, 1, rw, gfp);
-}
-
/*
* bio_set_pages_dirty() and bio_check_pages_dirty() are support functions
* for performing direct-IO in BIOs.
@@ -1840,16 +1751,9 @@ EXPORT_SYMBOL(__bio_clone);
EXPORT_SYMBOL(bio_clone);
EXPORT_SYMBOL(bio_phys_segments);
EXPORT_SYMBOL(bio_add_page);
-EXPORT_SYMBOL(bio_add_pc_page);
EXPORT_SYMBOL(bio_get_nr_vecs);
-EXPORT_SYMBOL(bio_map_user);
-EXPORT_SYMBOL(bio_unmap_user);
-EXPORT_SYMBOL(bio_map_kern);
-EXPORT_SYMBOL(bio_copy_kern);
EXPORT_SYMBOL(bio_pair_release);
EXPORT_SYMBOL(bio_split);
-EXPORT_SYMBOL(bio_copy_user);
-EXPORT_SYMBOL(bio_uncopy_user);
EXPORT_SYMBOL(bioset_create);
EXPORT_SYMBOL(bioset_free);
EXPORT_SYMBOL(bio_alloc_bioset);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1c28c5c..e662da9 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -382,29 +382,17 @@ void bio_init(struct bio *bio);
int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
unsigned int offset);
-int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page *page,
- unsigned int len, unsigned int offset);
int bio_get_nr_vecs(struct block_device *bdev);
sector_t bio_sector_offset(struct bio *bio, unsigned short index,
unsigned int offset);
-struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
- unsigned long uaddr, unsigned int len, int rw,
- gfp_t gfp);
struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
struct iovec *iov, int count, int rw, gfp_t gfp);
void bio_unmap_user(struct bio *bio);
-struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *md,
- unsigned long uaddr, unsigned int len, int rw,
- gfp_t gfp);
struct bio *bio_copy_user_iov(struct request_queue *q, struct rq_map_data *md,
struct iovec *iov, int count, int rw, gfp_t gfp);
int bio_uncopy_user(struct bio *bio);
-struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp);
struct bio *bio_map_kern_sg(struct request_queue *q, struct scatterlist *sgl,
int nents, int rw, gfp_t gfp);
-struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
- gfp_t gfp, int rw);
struct bio *bio_copy_kern_sg(struct request_queue *q, struct scatterlist *sgl,
int nents, int rw, gfp_t gfp_mask);
void bio_set_pages_dirty(struct bio *bio);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 58b41da..40bec76 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -745,12 +745,6 @@ extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
struct scsi_ioctl_command __user *);
/*
- * Temporary export, until SCSI gets fixed up.
- */
-extern int blk_rq_append_bio(struct request_queue *q, struct request *rq,
- struct bio *bio);
-
-/*
* A queue has just exitted congestion. Note this in the global counter of
* congested queues, and wake up anyone who was waiting for requests to be
* put back.
--
1.6.0.2
Impact: new API
Implement blk_rq_map_kern_sgl() using bio_copy_{map|kern}_sgl() and
reimplement blk_rq_map_kern() in terms of it. As the bio helpers
already have all the necessary checks, all blk_rq_map_kern_sgl() has
to do is wrap them and initialize rq accordingly. The implementation
closely resembles blk_rq_msp_user_iov().
This is an exported API and will be used to replace hack in scsi ioctl
implementation.
Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-map.c | 54 ++++++++++++++++++++++++++++++-----------------
include/linux/blkdev.h | 2 +
2 files changed, 36 insertions(+), 20 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index eb206df..0474c09 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -161,47 +161,61 @@ int blk_rq_unmap_user(struct bio *bio)
EXPORT_SYMBOL(blk_rq_unmap_user);
/**
- * blk_rq_map_kern - map kernel data to a request, for REQ_TYPE_BLOCK_PC usage
+ * blk_rq_map_kern_sg - map kernel data to a request, for REQ_TYPE_BLOCK_PC
* @q: request queue where request should be inserted
* @rq: request to fill
- * @kbuf: the kernel buffer
- * @len: length of user data
+ * @sgl: area to map
+ * @nents: number of elements in @sgl
* @gfp: memory allocation flags
*
* Description:
* Data will be mapped directly if possible. Otherwise a bounce
* buffer is used.
*/
-int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
- unsigned int len, gfp_t gfp)
+int blk_rq_map_kern_sg(struct request_queue *q, struct request *rq,
+ struct scatterlist *sgl, int nents, gfp_t gfp)
{
int rw = rq_data_dir(rq);
- int do_copy = 0;
struct bio *bio;
- if (len > (q->max_hw_sectors << 9))
- return -EINVAL;
- if (!len || !kbuf)
+ if (!sgl || nents <= 0)
return -EINVAL;
- do_copy = !blk_rq_aligned(q, kbuf, len) || object_is_on_stack(kbuf);
- if (do_copy)
- bio = bio_copy_kern(q, kbuf, len, gfp, rw);
- else
- bio = bio_map_kern(q, kbuf, len, gfp);
-
+ bio = bio_map_kern_sg(q, sgl, nents, rw, gfp);
+ if (IS_ERR(bio))
+ bio = bio_copy_kern_sg(q, sgl, nents, rw, gfp);
if (IS_ERR(bio))
return PTR_ERR(bio);
- if (rq_data_dir(rq) == WRITE)
- bio->bi_rw |= (1 << BIO_RW);
-
- if (do_copy)
+ if (!bio_flagged(bio, BIO_USER_MAPPED))
rq->cmd_flags |= REQ_COPY_USER;
+ blk_queue_bounce(q, &bio);
blk_rq_bio_prep(q, rq, bio);
- blk_queue_bounce(q, &rq->bio);
rq->buffer = rq->data = NULL;
return 0;
}
+EXPORT_SYMBOL(blk_rq_map_kern_sg);
+
+/**
+ * blk_rq_map_kern - map kernel data to a request, for REQ_TYPE_BLOCK_PC usage
+ * @q: request queue where request should be inserted
+ * @rq: request to fill
+ * @kbuf: the kernel buffer
+ * @len: length of user data
+ * @gfp: memory allocation flags
+ *
+ * Description:
+ * Data will be mapped directly if possible. Otherwise a bounce
+ * buffer is used.
+ */
+int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
+ unsigned int len, gfp_t gfp)
+{
+ struct scatterlist sg;
+
+ sg_init_one(&sg, kbuf, len);
+
+ return blk_rq_map_kern_sg(q, rq, &sg, 1, gfp);
+}
EXPORT_SYMBOL(blk_rq_map_kern);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d04e118..58b41da 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -784,6 +784,8 @@ extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, uns
extern int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
struct rq_map_data *md, struct iovec *iov,
int count, unsigned int len, gfp_t gfp);
+extern int blk_rq_map_kern_sg(struct request_queue *q, struct request *rq,
+ struct scatterlist *sgl, int nents, gfp_t gfp);
extern int blk_execute_rq(struct request_queue *, struct gendisk *,
struct request *, int);
extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
--
1.6.0.2
Impact: cleanup
As bio_map_user_iov() is unique in the way it acquires pages to map
from all other three operations (user-copy, kern-map, kern-copy), the
function still needs to get the pages itself but the bio creation can
use bio_create_from_sgl(). Create sgl of mapped pages and use it to
create bio from it. The code will be further simplified with future
changes to bio_create_from_sgl().
Signed-off-by: Tejun Heo <[email protected]>
---
fs/bio.c | 79 ++++++++++++++++++++++++++------------------------------------
1 files changed, 33 insertions(+), 46 deletions(-)
diff --git a/fs/bio.c b/fs/bio.c
index 4540afc..1ca8b16 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1059,13 +1059,13 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *md,
struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
struct iovec *iov, int count, int rw, gfp_t gfp)
{
- int i, j;
size_t tot_len = 0;
int nr_pages = 0;
+ int nents = 0;
+ struct scatterlist *sgl;
struct page **pages;
struct bio *bio;
- int cur_page = 0;
- int ret, offset;
+ int i, ret;
for (i = 0; i < count; i++) {
unsigned long uaddr = (unsigned long)iov[i].iov_base;
@@ -1088,70 +1088,57 @@ struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
if (!nr_pages || tot_len & q->dma_pad_mask)
return ERR_PTR(-EINVAL);
- bio = bio_kmalloc(gfp, nr_pages);
- if (!bio)
+ sgl = kmalloc(nr_pages * sizeof(sgl[0]), gfp);
+ if (!sgl)
return ERR_PTR(-ENOMEM);
+ sg_init_table(sgl, nr_pages);
ret = -ENOMEM;
- pages = kcalloc(nr_pages, sizeof(struct page *), gfp);
+ pages = kcalloc(nr_pages, sizeof(pages[0]), gfp);
if (!pages)
- goto out;
+ goto err_sgl;
for (i = 0; i < count; i++) {
unsigned long uaddr = (unsigned long)iov[i].iov_base;
unsigned long len = iov[i].iov_len;
- unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
- unsigned long start = uaddr >> PAGE_SHIFT;
- const int local_nr_pages = end - start;
- const int page_limit = cur_page + local_nr_pages;
-
+ unsigned long offset = offset_in_page(uaddr);
+ int local_nr_pages = PFN_UP(uaddr + len) - PFN_DOWN(uaddr);
+
ret = get_user_pages_fast(uaddr, local_nr_pages, rw == READ,
- &pages[cur_page]);
+ &pages[nents]);
if (ret < local_nr_pages) {
ret = -EFAULT;
- goto out_unmap;
+ goto err_pages;
}
- offset = uaddr & ~PAGE_MASK;
- for (j = cur_page; j < page_limit; j++) {
- unsigned int bytes = PAGE_SIZE - offset;
+ while (len) {
+ unsigned int bytes = min(PAGE_SIZE - offset, len);
- if (len <= 0)
- break;
-
- if (bytes > len)
- bytes = len;
-
- /*
- * sorry...
- */
- if (bio_add_pc_page(q, bio, pages[j], bytes, offset) <
- bytes)
- break;
+ sg_set_page(&sgl[nents], pages[nents], bytes, offset);
+ nents++;
len -= bytes;
offset = 0;
}
-
- cur_page = j;
- /*
- * release the pages we didn't map into the bio, if any
- */
- while (j < page_limit)
- page_cache_release(pages[j++]);
}
+ BUG_ON(nents != nr_pages);
- kfree(pages);
+ bio = bio_create_from_sgl(q, sgl, nents, nr_pages, rw, gfp);
+ if (IS_ERR(bio)) {
+ ret = PTR_ERR(bio);
+ goto err_pages;
+ }
- /*
- * set data direction, and check if mapped pages need bouncing
- */
- if (rw == WRITE)
- bio->bi_rw |= (1 << BIO_RW);
+ /* release the pages we didn't map into the bio, if any */
+ for (i = bio->bi_vcnt; i < nr_pages; i++)
+ page_cache_release(pages[i]);
bio->bi_bdev = bdev;
bio->bi_flags |= (1 << BIO_USER_MAPPED);
+ kfree(sgl);
+ kfree(pages);
+
/*
* subtle -- if __bio_map_user() ended up bouncing a bio,
* it would normally disappear when its bi_end_io is run.
@@ -1162,15 +1149,15 @@ struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
return bio;
- out_unmap:
+err_pages:
for (i = 0; i < nr_pages; i++) {
- if(!pages[i])
+ if (!pages[i])
break;
page_cache_release(pages[i]);
}
- out:
kfree(pages);
- bio_put(bio);
+err_sgl:
+ kfree(sgl);
return ERR_PTR(ret);
}
--
1.6.0.2
Impact: API cleanup
All initialized requests must and are guaranteed to have rq->q set.
Taking both @q and @rq parameters doesn't improve anything. It only
adds pathological corner case where @q != !@rq->q. Kill the
superflous @q parameter from blk_rq_map_{user|kern}*().
Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-map.c | 27 +++++++++++----------------
block/bsg.c | 5 ++---
block/scsi_ioctl.c | 6 +++---
drivers/block/pktcdvd.c | 2 +-
drivers/cdrom/cdrom.c | 2 +-
drivers/scsi/device_handler/scsi_dh_alua.c | 2 +-
drivers/scsi/device_handler/scsi_dh_emc.c | 2 +-
drivers/scsi/device_handler/scsi_dh_rdac.c | 2 +-
drivers/scsi/scsi_lib.c | 7 +++----
drivers/scsi/scsi_tgt_lib.c | 3 +--
drivers/scsi/sg.c | 6 +++---
drivers/scsi/st.c | 3 +--
include/linux/blkdev.h | 15 +++++++--------
13 files changed, 36 insertions(+), 46 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index f60f439..885d359 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -10,7 +10,6 @@
/**
* blk_rq_map_user_iov - map user data to a request, for REQ_TYPE_BLOCK_PC usage
- * @q: request queue where request should be inserted
* @rq: request to map data to
* @md: pointer to the rq_map_data holding pages (if necessary)
* @iov: pointer to the iovec
@@ -30,10 +29,10 @@
* original bio must be passed back in to blk_rq_unmap_user() for proper
* unmapping.
*/
-int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
- struct rq_map_data *md, struct iovec *iov, int count,
- gfp_t gfp)
+int blk_rq_map_user_iov(struct request *rq, struct rq_map_data *md,
+ struct iovec *iov, int count, gfp_t gfp)
{
+ struct request_queue *q = rq->q;
struct bio *bio = ERR_PTR(-EINVAL);
int rw = rq_data_dir(rq);
@@ -66,7 +65,6 @@ EXPORT_SYMBOL(blk_rq_map_user_iov);
/**
* blk_rq_map_user - map user data to a request, for REQ_TYPE_BLOCK_PC usage
- * @q: request queue where request should be inserted
* @rq: request structure to fill
* @md: pointer to the rq_map_data holding pages (if necessary)
* @ubuf: the user buffer
@@ -86,16 +84,15 @@ EXPORT_SYMBOL(blk_rq_map_user_iov);
* original bio must be passed back in to blk_rq_unmap_user() for proper
* unmapping.
*/
-int blk_rq_map_user(struct request_queue *q, struct request *rq,
- struct rq_map_data *md, void __user *ubuf,
- unsigned long len, gfp_t gfp)
+int blk_rq_map_user(struct request *rq, struct rq_map_data *md,
+ void __user *ubuf, unsigned long len, gfp_t gfp)
{
struct iovec iov;
iov.iov_base = ubuf;
iov.iov_len = len;
- return blk_rq_map_user_iov(q, rq, md, &iov, 1, gfp);
+ return blk_rq_map_user_iov(rq, md, &iov, 1, gfp);
}
EXPORT_SYMBOL(blk_rq_map_user);
@@ -128,7 +125,6 @@ EXPORT_SYMBOL(blk_rq_unmap_user);
/**
* blk_rq_map_kern_sg - map kernel data to a request, for REQ_TYPE_BLOCK_PC
- * @q: request queue where request should be inserted
* @rq: request to fill
* @sgl: area to map
* @nents: number of elements in @sgl
@@ -138,9 +134,10 @@ EXPORT_SYMBOL(blk_rq_unmap_user);
* Data will be mapped directly if possible. Otherwise a bounce
* buffer is used.
*/
-int blk_rq_map_kern_sg(struct request_queue *q, struct request *rq,
- struct scatterlist *sgl, int nents, gfp_t gfp)
+int blk_rq_map_kern_sg(struct request *rq, struct scatterlist *sgl, int nents,
+ gfp_t gfp)
{
+ struct request_queue *q = rq->q;
int rw = rq_data_dir(rq);
struct bio *bio;
@@ -165,7 +162,6 @@ EXPORT_SYMBOL(blk_rq_map_kern_sg);
/**
* blk_rq_map_kern - map kernel data to a request, for REQ_TYPE_BLOCK_PC usage
- * @q: request queue where request should be inserted
* @rq: request to fill
* @kbuf: the kernel buffer
* @len: length of user data
@@ -175,13 +171,12 @@ EXPORT_SYMBOL(blk_rq_map_kern_sg);
* Data will be mapped directly if possible. Otherwise a bounce
* buffer is used.
*/
-int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
- unsigned int len, gfp_t gfp)
+int blk_rq_map_kern(struct request *rq, void *kbuf, unsigned int len, gfp_t gfp)
{
struct scatterlist sg;
sg_init_one(&sg, kbuf, len);
- return blk_rq_map_kern_sg(q, rq, &sg, 1, gfp);
+ return blk_rq_map_kern_sg(rq, &sg, 1, gfp);
}
EXPORT_SYMBOL(blk_rq_map_kern);
diff --git a/block/bsg.c b/block/bsg.c
index 206060e..69c222a 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -283,7 +283,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t has_write_perm,
next_rq->cmd_type = rq->cmd_type;
dxferp = (void*)(unsigned long)hdr->din_xferp;
- ret = blk_rq_map_user(q, next_rq, NULL, dxferp,
+ ret = blk_rq_map_user(next_rq, NULL, dxferp,
hdr->din_xfer_len, GFP_KERNEL);
if (ret)
goto out;
@@ -299,8 +299,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t has_write_perm,
dxfer_len = 0;
if (dxfer_len) {
- ret = blk_rq_map_user(q, rq, NULL, dxferp, dxfer_len,
- GFP_KERNEL);
+ ret = blk_rq_map_user(rq, NULL, dxferp, dxfer_len, GFP_KERNEL);
if (ret)
goto out;
}
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index fd538f8..a98e4ec 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -306,11 +306,11 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
hdr->iovec_count = iov_shorten(iov, hdr->iovec_count,
hdr->dxfer_len);
- ret = blk_rq_map_user_iov(q, rq, NULL, iov, hdr->iovec_count,
+ ret = blk_rq_map_user_iov(rq, NULL, iov, hdr->iovec_count,
GFP_KERNEL);
kfree(iov);
} else if (hdr->dxfer_len)
- ret = blk_rq_map_user(q, rq, NULL, hdr->dxferp, hdr->dxfer_len,
+ ret = blk_rq_map_user(rq, NULL, hdr->dxferp, hdr->dxfer_len,
GFP_KERNEL);
if (ret)
@@ -449,7 +449,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,
break;
}
- if (bytes && blk_rq_map_kern(q, rq, buffer, bytes, __GFP_WAIT)) {
+ if (bytes && blk_rq_map_kern(rq, buffer, bytes, __GFP_WAIT)) {
err = DRIVER_ERROR << 24;
goto out;
}
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index dc7a8c3..a4e5e9b 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -771,7 +771,7 @@ static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command *
WRITE : READ, __GFP_WAIT);
if (cgc->buflen) {
- if (blk_rq_map_kern(q, rq, cgc->buffer, cgc->buflen, __GFP_WAIT))
+ if (blk_rq_map_kern(rq, cgc->buffer, cgc->buflen, __GFP_WAIT))
goto out;
}
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index cceace6..ef67aec 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -2112,7 +2112,7 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf,
break;
}
- ret = blk_rq_map_user(q, rq, NULL, ubuf, len, GFP_KERNEL);
+ ret = blk_rq_map_user(rq, NULL, ubuf, len, GFP_KERNEL);
if (ret) {
blk_put_request(rq);
break;
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index e356b43..f3cb900 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -101,7 +101,7 @@ static struct request *get_alua_req(struct scsi_device *sdev,
return NULL;
}
- if (buflen && blk_rq_map_kern(q, rq, buffer, buflen, GFP_NOIO)) {
+ if (buflen && blk_rq_map_kern(rq, buffer, buflen, GFP_NOIO)) {
blk_put_request(rq);
sdev_printk(KERN_INFO, sdev,
"%s: blk_rq_map_kern failed\n", __func__);
diff --git a/drivers/scsi/device_handler/scsi_dh_emc.c b/drivers/scsi/device_handler/scsi_dh_emc.c
index 0e572d2..dbbd56d 100644
--- a/drivers/scsi/device_handler/scsi_dh_emc.c
+++ b/drivers/scsi/device_handler/scsi_dh_emc.c
@@ -308,7 +308,7 @@ static struct request *get_req(struct scsi_device *sdev, int cmd,
rq->timeout = CLARIION_TIMEOUT;
rq->retries = CLARIION_RETRIES;
- if (blk_rq_map_kern(rq->q, rq, buffer, len, GFP_NOIO)) {
+ if (blk_rq_map_kern(rq, buffer, len, GFP_NOIO)) {
blk_put_request(rq);
return NULL;
}
diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
index 5366476..f50b33a 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -219,7 +219,7 @@ static struct request *get_rdac_req(struct scsi_device *sdev,
return NULL;
}
- if (buflen && blk_rq_map_kern(q, rq, buffer, buflen, GFP_NOIO)) {
+ if (buflen && blk_rq_map_kern(rq, buffer, buflen, GFP_NOIO)) {
blk_put_request(rq);
sdev_printk(KERN_INFO, sdev,
"get_rdac_req: blk_rq_map_kern failed.\n");
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3fa5589..66c3d0b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -216,8 +216,7 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
req = blk_get_request(sdev->request_queue, write, __GFP_WAIT);
- if (bufflen && blk_rq_map_kern(sdev->request_queue, req,
- buffer, bufflen, __GFP_WAIT))
+ if (bufflen && blk_rq_map_kern(req, buffer, bufflen, __GFP_WAIT))
goto out;
req->cmd_len = COMMAND_SIZE(cmd[0]);
@@ -332,9 +331,9 @@ int scsi_execute_async(struct scsi_device *sdev, const unsigned char *cmd,
req->cmd_flags |= REQ_QUIET;
if (use_sg)
- err = blk_rq_map_kern_sg(req->q, req, buffer, use_sg, gfp);
+ err = blk_rq_map_kern_sg(req, buffer, use_sg, gfp);
else if (bufflen)
- err = blk_rq_map_kern(req->q, req, buffer, bufflen, gfp);
+ err = blk_rq_map_kern(req, buffer, bufflen, gfp);
if (err)
goto free_req;
diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index 48ba413..55beba4 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -357,12 +357,11 @@ static int scsi_tgt_transfer_response(struct scsi_cmnd *cmd)
static int scsi_map_user_pages(struct scsi_tgt_cmd *tcmd, struct scsi_cmnd *cmd,
unsigned long uaddr, unsigned int len, int rw)
{
- struct request_queue *q = cmd->request->q;
struct request *rq = cmd->request;
int err;
dprintk("%lx %u\n", uaddr, len);
- err = blk_rq_map_user(q, rq, NULL, (void *)uaddr, len, GFP_KERNEL);
+ err = blk_rq_map_user(rq, NULL, (void *)uaddr, len, GFP_KERNEL);
if (err) {
/*
* TODO: need to fixup sg_tablesize, max_segment_size,
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 5fcf436..a769041 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1672,11 +1672,11 @@ static int sg_start_req(Sg_request *srp, unsigned char *cmd)
}
if (iov_count)
- res = blk_rq_map_user_iov(q, rq, md, hp->dxferp, iov_count,
+ res = blk_rq_map_user_iov(rq, md, hp->dxferp, iov_count,
GFP_ATOMIC);
else
- res = blk_rq_map_user(q, rq, md, hp->dxferp,
- hp->dxfer_len, GFP_ATOMIC);
+ res = blk_rq_map_user(rq, md, hp->dxferp, hp->dxfer_len,
+ GFP_ATOMIC);
if (!res) {
srp->bio = rq->bio;
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index c6f19ee..c4615bb 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -492,8 +492,7 @@ static int st_scsi_execute(struct st_request *SRpnt, const unsigned char *cmd,
mdata->null_mapped = 1;
if (bufflen) {
- err = blk_rq_map_user(req->q, req, mdata, NULL, bufflen,
- GFP_KERNEL);
+ err = blk_rq_map_user(req, mdata, NULL, bufflen, GFP_KERNEL);
if (err) {
blk_put_request(req);
return DRIVER_ERROR << 24;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6876466..ff0ad9e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -770,15 +770,14 @@ extern void __blk_stop_queue(struct request_queue *q);
extern void __blk_run_queue(struct request_queue *);
extern void blk_run_queue(struct request_queue *);
extern void blk_start_queueing(struct request_queue *);
-extern int blk_rq_map_user(struct request_queue *, struct request *,
- struct rq_map_data *, void __user *, unsigned long,
- gfp_t);
+extern int blk_rq_map_user(struct request *rq, struct rq_map_data *md,
+ void __user *ubuf, unsigned long len, gfp_t gfp);
extern int blk_rq_unmap_user(struct bio *);
-extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
-extern int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
- struct rq_map_data *md, struct iovec *iov,
- int count, gfp_t gfp);
-extern int blk_rq_map_kern_sg(struct request_queue *q, struct request *rq,
+extern int blk_rq_map_kern(struct request *rq, void *kbuf, unsigned int len,
+ gfp_t gfp);
+extern int blk_rq_map_user_iov(struct request *rq, struct rq_map_data *md,
+ struct iovec *iov, int count, gfp_t gfp);
+extern int blk_rq_map_kern_sg(struct request *rq,
struct scatterlist *sgl, int nents, gfp_t gfp);
extern int blk_execute_rq(struct request_queue *, struct gendisk *,
struct request *, int);
--
1.6.0.2
On 04/01/2009 04:44 PM, Tejun Heo wrote:
> Impact: cleanup, removal of unused / undesirable API
>
> With recent changes, the following functions aren't used anymore.
>
> * bio_{map|copy}_{user|kern}()
> * blk_rq_append_bio()
>
This function is used by drivers/scsi/osd/osd_initiator.c
Currently in mainline. Please use allmodconfig to compile
everything.
> The following functions aren't used outside of block layer.
>
> * bio_add_pc_page()
And also this
> * bio_un{map|copy}_user()
>
> Kill the first group and unexport the second. As bio_add_pc_page() is
> used only inside fs/bio.c, it's made static.
>
> Signed-off-by: Tejun Heo <[email protected]>
I've posted propositions on how to fix osd_initiator.c which
involves making blk_map_kern() append to the request, and a new
blk_make_request(bio).
Any suggestions are welcome
Thanks
Boaz
Hello,
Boaz Harrosh wrote:
> On 04/01/2009 04:44 PM, Tejun Heo wrote:
>> Impact: cleanup, removal of unused / undesirable API
>>
>> With recent changes, the following functions aren't used anymore.
>>
>> * bio_{map|copy}_{user|kern}()
>> * blk_rq_append_bio()
>>
>
> This function is used by drivers/scsi/osd/osd_initiator.c
> Currently in mainline. Please use allmodconfig to compile
> everything.
The patchse is against Jens's tree which hasn't received the changes
yet and I did allmodconfig.
>> The following functions aren't used outside of block layer.
>>
>> * bio_add_pc_page()
>
> And also this
>
>> * bio_un{map|copy}_user()
>>
>> Kill the first group and unexport the second. As bio_add_pc_page() is
>> used only inside fs/bio.c, it's made static.
>>
>> Signed-off-by: Tejun Heo <[email protected]>
>
> I've posted propositions on how to fix osd_initiator.c which
> involves making blk_map_kern() append to the request, and a new
> blk_make_request(bio).
Yeap, I'll take a look at that.
Thanks.
--
tejun
Tejun Heo wrote:
> Hello, all.
>
> These patches are available in the following git tree but it's on top
> of the previous blk-map-related-fixes patchset which needs some
> updating, so this posting is just for review and comments. This
> patchset needs to spend quite some time in RCs even if it gets acked
> eventually, so definitely no .30 material. Please also note that I
> haven't updated the comment about bio chaining violating queue limit,
> so please ignore that part.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blk-map
> http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=blk-map
Oops, forgot to cc linux-scsi. Cc'ing linux-scsi and James. The
thread is (gmane hasn't picked it up yet).
http://lkml.org/lkml/2009/4/1/228
Thanks.
--
tejun
On 04/01/2009 04:44 PM, Tejun Heo wrote:
> Impact: cleanup
>
> blk-map and bio use sg_iovec for addr-len segments although there
> isn't anything sg-specific about the API. This is mostly due to
> historical reasons. sg_iovec is by definition identical to iovec.
> Use iovec instead. This removes bogus dependency on scsi sg and will
> allow use of iovec helpers.
>
> Signed-off-by: Tejun Heo <[email protected]>
> ---
> block/blk-map.c | 5 ++---
> block/scsi_ioctl.c | 8 +++-----
> fs/bio.c | 23 +++++++++++------------
> include/linux/bio.h | 6 +++---
> include/linux/blkdev.h | 8 ++++----
> 5 files changed, 23 insertions(+), 27 deletions(-)
>
OK, The actual one user in sg.c passes a void*, so no casts are
needed. (I couldn't find where are the type-casts of old users)
Should we make this a part of a bigger cleanup that removes
sg_iovec, from Kernel altogether and only makes a #define for
user-mode?
BTW:
user-mode scsi/sg.h does not come from the Kernels exported
headers. It comes with the gcc distribution.
If we remove it alltogether it will not affect anybody.
If you want I can help with this little chore?
Boaz
> diff --git a/block/blk-map.c b/block/blk-map.c
> index 29aa60d..4f0221a 100644
> --- a/block/blk-map.c
> +++ b/block/blk-map.c
> @@ -5,7 +5,6 @@
> #include <linux/module.h>
> #include <linux/bio.h>
> #include <linux/blkdev.h>
> -#include <scsi/sg.h> /* for struct sg_iovec */
>
> #include "blk.h"
>
> @@ -64,7 +63,7 @@ static int __blk_rq_unmap_user(struct bio *bio)
> * unmapping.
> */
> int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
> - struct rq_map_data *map_data, struct sg_iovec *iov,
> + struct rq_map_data *map_data, struct iovec *iov,
> int iov_count, unsigned int len, gfp_t gfp_mask)
> {
> struct bio *bio = ERR_PTR(-EINVAL);
> @@ -130,7 +129,7 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq,
> struct rq_map_data *map_data, void __user *ubuf,
> unsigned long len, gfp_t gfp_mask)
> {
> - struct sg_iovec iov;
> + struct iovec iov;
>
> iov.iov_base = ubuf;
> iov.iov_len = len;
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index c8e8868..73cfd91 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -289,7 +289,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> if (hdr->iovec_count) {
> const int size = sizeof(struct sg_iovec) * hdr->iovec_count;
> size_t iov_data_len;
> - struct sg_iovec *iov;
> + struct iovec *iov;
>
> iov = kmalloc(size, GFP_KERNEL);
> if (!iov) {
> @@ -304,11 +304,9 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> }
>
> /* SG_IO howto says that the shorter of the two wins */
> - iov_data_len = iov_length((struct iovec *)iov,
> - hdr->iovec_count);
> + iov_data_len = iov_length(iov, hdr->iovec_count);
> if (hdr->dxfer_len < iov_data_len) {
> - hdr->iovec_count = iov_shorten((struct iovec *)iov,
> - hdr->iovec_count,
> + hdr->iovec_count = iov_shorten(iov, hdr->iovec_count,
> hdr->dxfer_len);
> iov_data_len = hdr->dxfer_len;
> }
> diff --git a/fs/bio.c b/fs/bio.c
> index 70e5153..9d13f21 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -28,7 +28,6 @@
> #include <linux/blktrace_api.h>
> #include <linux/pfn.h>
> #include <trace/block.h>
> -#include <scsi/sg.h> /* for struct sg_iovec */
>
> DEFINE_TRACE(block_split);
>
> @@ -656,17 +655,17 @@ int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
>
> struct bio_map_data {
> struct bio_vec *iovecs;
> - struct sg_iovec *sgvecs;
> + struct iovec *sgvecs;
> int nr_sgvecs;
> int is_our_pages;
> };
>
> static void bio_set_map_data(struct bio_map_data *bmd, struct bio *bio,
> - struct sg_iovec *iov, int iov_count,
> + struct iovec *iov, int iov_count,
> int is_our_pages)
> {
> memcpy(bmd->iovecs, bio->bi_io_vec, sizeof(struct bio_vec) * bio->bi_vcnt);
> - memcpy(bmd->sgvecs, iov, sizeof(struct sg_iovec) * iov_count);
> + memcpy(bmd->sgvecs, iov, sizeof(struct iovec) * iov_count);
> bmd->nr_sgvecs = iov_count;
> bmd->is_our_pages = is_our_pages;
> bio->bi_private = bmd;
> @@ -693,7 +692,7 @@ static struct bio_map_data *bio_alloc_map_data(int nr_segs, int iov_count,
> return NULL;
> }
>
> - bmd->sgvecs = kmalloc(sizeof(struct sg_iovec) * iov_count, gfp_mask);
> + bmd->sgvecs = kmalloc(sizeof(struct iovec) * iov_count, gfp_mask);
> if (bmd->sgvecs)
> return bmd;
>
> @@ -703,7 +702,7 @@ static struct bio_map_data *bio_alloc_map_data(int nr_segs, int iov_count,
> }
>
> static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
> - struct sg_iovec *iov, int iov_count, int uncopy,
> + struct iovec *iov, int iov_count, int uncopy,
> int do_free_page)
> {
> int ret = 0, i;
> @@ -789,7 +788,7 @@ int bio_uncopy_user(struct bio *bio)
> */
> struct bio *bio_copy_user_iov(struct request_queue *q,
> struct rq_map_data *map_data,
> - struct sg_iovec *iov, int iov_count, int rw,
> + struct iovec *iov, int iov_count, int rw,
> gfp_t gfp_mask)
> {
> struct bio_map_data *bmd;
> @@ -909,7 +908,7 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
> unsigned long uaddr, unsigned int len, int rw,
> gfp_t gfp_mask)
> {
> - struct sg_iovec iov;
> + struct iovec iov;
>
> iov.iov_base = (void __user *)uaddr;
> iov.iov_len = len;
> @@ -919,7 +918,7 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
>
> static struct bio *__bio_map_user_iov(struct request_queue *q,
> struct block_device *bdev,
> - struct sg_iovec *iov, int iov_count,
> + struct iovec *iov, int iov_count,
> int rw, gfp_t gfp_mask)
> {
> int i, j;
> @@ -1044,7 +1043,7 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
> unsigned long uaddr, unsigned int len, int rw,
> gfp_t gfp_mask)
> {
> - struct sg_iovec iov;
> + struct iovec iov;
>
> iov.iov_base = (void __user *)uaddr;
> iov.iov_len = len;
> @@ -1053,7 +1052,7 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
> }
>
> /**
> - * bio_map_user_iov - map user sg_iovec table into bio
> + * bio_map_user_iov - map user iovec table into bio
> * @q: the struct request_queue for the bio
> * @bdev: destination block device
> * @iov: the iovec.
> @@ -1065,7 +1064,7 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
> * device. Returns an error pointer in case of error.
> */
> struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
> - struct sg_iovec *iov, int iov_count, int rw,
> + struct iovec *iov, int iov_count, int rw,
> gfp_t gfp_mask)
> {
> struct bio *bio;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 45f56d2..8215ded 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -23,6 +23,7 @@
> #include <linux/highmem.h>
> #include <linux/mempool.h>
> #include <linux/ioprio.h>
> +#include <linux/uio.h>
>
> #ifdef CONFIG_BLOCK
>
> @@ -356,7 +357,6 @@ struct bio_pair {
> };
>
> struct request_queue;
> -struct sg_iovec;
> struct rq_map_data;
>
> struct bio_pair *bio_split(struct bio *bi, int first_sectors);
> @@ -390,7 +390,7 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
> unsigned long uaddr, unsigned int len, int rw,
> gfp_t gfp_mask);
> struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
> - struct sg_iovec *iov, int iov_count, int rw,
> + struct iovec *iov, int iov_count, int rw,
> gfp_t gfp_mask);
> void bio_unmap_user(struct bio *bio);
> struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
> @@ -398,7 +398,7 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
> gfp_t gfp_mask);
> struct bio *bio_copy_user_iov(struct request_queue *q,
> struct rq_map_data *map_data,
> - struct sg_iovec *iov, int iov_count, int rw,
> + struct iovec *iov, int iov_count, int rw,
> gfp_t gfp_mask);
> int bio_uncopy_user(struct bio *bio);
> struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 465d6ba..d7bb20c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -19,6 +19,7 @@
> #include <linux/gfp.h>
> #include <linux/bsg.h>
> #include <linux/smp.h>
> +#include <linux/uio.h>
>
> #include <asm/scatterlist.h>
>
> @@ -29,7 +30,6 @@ struct elevator_queue;
> struct request_pm_state;
> struct blk_trace;
> struct request;
> -struct sg_io_hdr;
>
> #define BLKDEV_MIN_RQ 4
> #define BLKDEV_MAX_RQ 128 /* Default maximum */
> @@ -781,9 +781,9 @@ extern int blk_rq_map_user(struct request_queue *, struct request *,
> gfp_t);
> extern int blk_rq_unmap_user(struct bio *);
> extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
> -extern int blk_rq_map_user_iov(struct request_queue *, struct request *,
> - struct rq_map_data *, struct sg_iovec *, int,
> - unsigned int, gfp_t);
> +extern int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
> + struct rq_map_data *map_data, struct iovec *iov,
> + int iov_count, unsigned int len, gfp_t gfp_mask);
> extern int blk_execute_rq(struct request_queue *, struct gendisk *,
> struct request *, int);
> extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
Hello,
Boaz Harrosh wrote:
> On 04/01/2009 04:44 PM, Tejun Heo wrote:
>> Impact: cleanup
>>
>> blk-map and bio use sg_iovec for addr-len segments although there
>> isn't anything sg-specific about the API. This is mostly due to
>> historical reasons. sg_iovec is by definition identical to iovec.
>> Use iovec instead. This removes bogus dependency on scsi sg and will
>> allow use of iovec helpers.
>>
>> Signed-off-by: Tejun Heo <[email protected]>
>> ---
>> block/blk-map.c | 5 ++---
>> block/scsi_ioctl.c | 8 +++-----
>> fs/bio.c | 23 +++++++++++------------
>> include/linux/bio.h | 6 +++---
>> include/linux/blkdev.h | 8 ++++----
>> 5 files changed, 23 insertions(+), 27 deletions(-)
>>
>
> OK, The actual one user in sg.c passes a void*, so no casts are
> needed. (I couldn't find where are the type-casts of old users)
>
> Should we make this a part of a bigger cleanup that removes
> sg_iovec, from Kernel altogether and only makes a #define for
> user-mode?
>
> BTW:
> user-mode scsi/sg.h does not come from the Kernels exported
> headers. It comes with the gcc distribution.
> If we remove it alltogether it will not affect anybody.
>
> If you want I can help with this little chore?
Sure, that would be a nice cleanup. If dropping sg_iovec doesn't
affect userland, I think it would better to just drop it.
Thanks.
--
tejun
On 04/01/2009 04:44 PM, Tejun Heo wrote:
> Impact: more modular implementation
>
> Break down bio_copy_user_iov() into the following steps.
>
> 1. bci and page allocation
> 2. copying data if WRITE
> 3. create bio accordingly
>
> bci is now responsible for managing any copy related resources. Given
> source iov, bci_create() allocates bci and fills it with enough pages
> to cover the source iov. The allocated pages are described with a
> sgl.
>
> Note that new allocator always rounds up rq_map_data->offset to page
> boundary to simplify implementation and guarantee enough DMA padding
> area at the end. As the only user, scsi sg, always passes in zero
> offset, this doesn't cause any actual behavior difference. Also,
> nth_page() is used to walk to the next page rather than directly
> adding to struct page *.
>
> Copying back and forth is done using bio_memcpy_sgl_uiov() which is
> implemented using sg mapping iterator and iov iterator.
>
> The last step is done using bio_create_from_sgl().
>
> This patch by itself adds one more level of indirection via sgl and
> more code but components factored out here will be used for future
> code refactoring.
>
> Signed-off-by: Tejun Heo <[email protected]>
Hi dear Tejun
I've looked hard and deep into your patchset, and I would like to
suggest an improvement.
[Option 1]
What your code is actually using from sgl-code base is:
for_each_sg
sg_mapping_iter and it's
sg_miter_start, sg_miter_next
... (what else)
I would like if you can define above for bvec(s) just the way you like
them. Then code works directly on the destination bvect inside the final
bio. One less copy no intermediate allocation, and no kmalloc of
bigger-then-page buffers.
These are all small inlines, duplicating those will not affect
Kernel size at all. You are not using the chaining ability of sgl(s)
so it can be simplified. You will see that not having the intermediate
copy simplifies the code even more.
Since no out-side user currently needs sgl(s) no functionality is lost.
[Option 2]
Keep pointer to sgl and not bvec at bio, again code works on final destination.
Later users of block layer that call blk_rq_fill_sgl (blk_rq_map_sg) will just
get a copy of the pointer and another allocation and copy is gained.
This option will spill outside of the current patches scope. Into bvec hacking
code.
I do like your long term vision of separating the DMA part from the virtual part
of scatterlists. Note how they are actually two disjoint lists altogether. After
the dma_map does its thing the dma physical list might be shorter then virtual
and sizes might not correspond at all. The dma mapping code regards the dma part
as an empty list that gets appended while processing, any segments match is
accidental. (That is: inside the scatterlist the virtual address most probably
does not match the dma address)
So [option 1] matches more closely to that vision.
Historically code was doing
Many-sources => scatterlist => biovec => scatterlist => dma-scatterlist
Only at 2.6.30 we can say that we shorten a step to do:
Many-sources => biovec => scatterlist => dma-scatterlist
Now you want to return the extra step, I hate it.
[Option 2] can make that even shorter.
Many-sources => scatterlist => dma-scatterlist
Please consider [option 1] it will only add some source code
but it will not increase code size, maybe it will decrease,
and it will be fast.
Please consider that this code-path is used by me, in exofs and
pNFS-objcets in a very very hot path, where memory pressure is a
common scenario.
And I have one more question.
Are you sure kmalloc of bigger-then-page buffers are safe? As I
understood it, that tries to allocate physically contiguous pages
which degrades as time passes, and last time I tried this with a kmem_cache
(do to a bug) it crashed the kernel randomly after 2 minutes of use.
Thanks
Boaz
On 04/01/2009 04:44 PM, Tejun Heo wrote:
> Impact: cleanup
>
> As bio_map_user_iov() is unique in the way it acquires pages to map
> from all other three operations (user-copy, kern-map, kern-copy), the
> function still needs to get the pages itself but the bio creation can
> use bio_create_from_sgl(). Create sgl of mapped pages and use it to
> create bio from it. The code will be further simplified with future
> changes to bio_create_from_sgl().
>
> Signed-off-by: Tejun Heo <[email protected]>
> ---
> fs/bio.c | 79 ++++++++++++++++++++++++++------------------------------------
> 1 files changed, 33 insertions(+), 46 deletions(-)
>
> diff --git a/fs/bio.c b/fs/bio.c
> index 4540afc..1ca8b16 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -1059,13 +1059,13 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *md,
> struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
> struct iovec *iov, int count, int rw, gfp_t gfp)
> {
> - int i, j;
> size_t tot_len = 0;
> int nr_pages = 0;
> + int nents = 0;
> + struct scatterlist *sgl;
> struct page **pages;
> struct bio *bio;
> - int cur_page = 0;
> - int ret, offset;
> + int i, ret;
>
> for (i = 0; i < count; i++) {
> unsigned long uaddr = (unsigned long)iov[i].iov_base;
> @@ -1088,70 +1088,57 @@ struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
> if (!nr_pages || tot_len & q->dma_pad_mask)
> return ERR_PTR(-EINVAL);
>
> - bio = bio_kmalloc(gfp, nr_pages);
> - if (!bio)
> + sgl = kmalloc(nr_pages * sizeof(sgl[0]), gfp);
> + if (!sgl)
> return ERR_PTR(-ENOMEM);
> + sg_init_table(sgl, nr_pages);
>
> ret = -ENOMEM;
> - pages = kcalloc(nr_pages, sizeof(struct page *), gfp);
> + pages = kcalloc(nr_pages, sizeof(pages[0]), gfp);
> if (!pages)
> - goto out;
> + goto err_sgl;
>
> for (i = 0; i < count; i++) {
> unsigned long uaddr = (unsigned long)iov[i].iov_base;
> unsigned long len = iov[i].iov_len;
> - unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
> - unsigned long start = uaddr >> PAGE_SHIFT;
> - const int local_nr_pages = end - start;
> - const int page_limit = cur_page + local_nr_pages;
> -
> + unsigned long offset = offset_in_page(uaddr);
> + int local_nr_pages = PFN_UP(uaddr + len) - PFN_DOWN(uaddr);
> +
> ret = get_user_pages_fast(uaddr, local_nr_pages, rw == READ,
> - &pages[cur_page]);
> + &pages[nents]);
Fast look at all users of get_user_pages_fast().
This can be converted to take an sglist instead of pages* array.
Outside of this code all users either have a fixed-size allocated array
or hard coded one page. This here is the main user. (Out of 5)
Note to self:
> if (ret < local_nr_pages) {
> ret = -EFAULT;
> - goto out_unmap;
> + goto err_pages;
> }
>
> - offset = uaddr & ~PAGE_MASK;
> - for (j = cur_page; j < page_limit; j++) {
> - unsigned int bytes = PAGE_SIZE - offset;
> + while (len) {
> + unsigned int bytes = min(PAGE_SIZE - offset, len);
>
> - if (len <= 0)
> - break;
> -
> - if (bytes > len)
> - bytes = len;
> -
> - /*
> - * sorry...
> - */
> - if (bio_add_pc_page(q, bio, pages[j], bytes, offset) <
> - bytes)
> - break;
> + sg_set_page(&sgl[nents], pages[nents], bytes, offset);
>
> + nents++;
> len -= bytes;
> offset = 0;
> }
> -
> - cur_page = j;
> - /*
> - * release the pages we didn't map into the bio, if any
> - */
> - while (j < page_limit)
> - page_cache_release(pages[j++]);
> }
> + BUG_ON(nents != nr_pages);
>
> - kfree(pages);
> + bio = bio_create_from_sgl(q, sgl, nents, nr_pages, rw, gfp);
> + if (IS_ERR(bio)) {
> + ret = PTR_ERR(bio);
> + goto err_pages;
> + }
>
> - /*
> - * set data direction, and check if mapped pages need bouncing
> - */
> - if (rw == WRITE)
> - bio->bi_rw |= (1 << BIO_RW);
> + /* release the pages we didn't map into the bio, if any */
> + for (i = bio->bi_vcnt; i < nr_pages; i++)
> + page_cache_release(pages[i]);
>
> bio->bi_bdev = bdev;
> bio->bi_flags |= (1 << BIO_USER_MAPPED);
>
> + kfree(sgl);
> + kfree(pages);
> +
> /*
> * subtle -- if __bio_map_user() ended up bouncing a bio,
> * it would normally disappear when its bi_end_io is run.
> @@ -1162,15 +1149,15 @@ struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
>
> return bio;
>
> - out_unmap:
> +err_pages:
> for (i = 0; i < nr_pages; i++) {
> - if(!pages[i])
> + if (!pages[i])
> break;
> page_cache_release(pages[i]);
> }
> - out:
> kfree(pages);
> - bio_put(bio);
> +err_sgl:
> + kfree(sgl);
> return ERR_PTR(ret);
> }
>
On 04/01/2009 04:44 PM, Tejun Heo wrote:
> Impact: new API
>
> Implement blk_rq_map_kern_sgl() using bio_copy_{map|kern}_sgl() and
> reimplement blk_rq_map_kern() in terms of it. As the bio helpers
> already have all the necessary checks, all blk_rq_map_kern_sgl() has
> to do is wrap them and initialize rq accordingly. The implementation
> closely resembles blk_rq_msp_user_iov().
>
> This is an exported API and will be used to replace hack in scsi ioctl
> implementation.
>
> Signed-off-by: Tejun Heo <[email protected]>
> ---
> block/blk-map.c | 54 ++++++++++++++++++++++++++++++-----------------
> include/linux/blkdev.h | 2 +
> 2 files changed, 36 insertions(+), 20 deletions(-)
>
> diff --git a/block/blk-map.c b/block/blk-map.c
> index eb206df..0474c09 100644
> --- a/block/blk-map.c
> +++ b/block/blk-map.c
> @@ -161,47 +161,61 @@ int blk_rq_unmap_user(struct bio *bio)
> EXPORT_SYMBOL(blk_rq_unmap_user);
>
> /**
> - * blk_rq_map_kern - map kernel data to a request, for REQ_TYPE_BLOCK_PC usage
> + * blk_rq_map_kern_sg - map kernel data to a request, for REQ_TYPE_BLOCK_PC
> * @q: request queue where request should be inserted
> * @rq: request to fill
> - * @kbuf: the kernel buffer
> - * @len: length of user data
> + * @sgl: area to map
> + * @nents: number of elements in @sgl
> * @gfp: memory allocation flags
> *
> * Description:
> * Data will be mapped directly if possible. Otherwise a bounce
> * buffer is used.
> */
> -int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
> - unsigned int len, gfp_t gfp)
> +int blk_rq_map_kern_sg(struct request_queue *q, struct request *rq,
> + struct scatterlist *sgl, int nents, gfp_t gfp)
> {
> int rw = rq_data_dir(rq);
> - int do_copy = 0;
> struct bio *bio;
>
> - if (len > (q->max_hw_sectors << 9))
> - return -EINVAL;
> - if (!len || !kbuf)
> + if (!sgl || nents <= 0)
> return -EINVAL;
>
> - do_copy = !blk_rq_aligned(q, kbuf, len) || object_is_on_stack(kbuf);
> - if (do_copy)
> - bio = bio_copy_kern(q, kbuf, len, gfp, rw);
> - else
> - bio = bio_map_kern(q, kbuf, len, gfp);
> -
> + bio = bio_map_kern_sg(q, sgl, nents, rw, gfp);
> + if (IS_ERR(bio))
> + bio = bio_copy_kern_sg(q, sgl, nents, rw, gfp);
> if (IS_ERR(bio))
> return PTR_ERR(bio);
You might want to call bio_copy_kern_sg from within bio_map_kern_sg
and remove yet another export from bio layer
Same thing with bio_map_user_iov/bio_copy_user_iov
>
> - if (rq_data_dir(rq) == WRITE)
> - bio->bi_rw |= (1 << BIO_RW);
> -
> - if (do_copy)
> + if (!bio_flagged(bio, BIO_USER_MAPPED))
> rq->cmd_flags |= REQ_COPY_USER;
>
> + blk_queue_bounce(q, &bio);
> blk_rq_bio_prep(q, rq, bio);
It could be nice to call blk_rq_append_bio() here
and support multiple calls to this member.
This will solve half of my problem with osd_initiator
Hmm .. but you wanted to drop multiple bio chaining
perhaps you would reconsider.
> - blk_queue_bounce(q, &rq->bio);
> rq->buffer = rq->data = NULL;
> return 0;
> }
> +EXPORT_SYMBOL(blk_rq_map_kern_sg);
> +
> +/**
> + * blk_rq_map_kern - map kernel data to a request, for REQ_TYPE_BLOCK_PC usage
> + * @q: request queue where request should be inserted
> + * @rq: request to fill
> + * @kbuf: the kernel buffer
> + * @len: length of user data
> + * @gfp: memory allocation flags
> + *
> + * Description:
> + * Data will be mapped directly if possible. Otherwise a bounce
> + * buffer is used.
> + */
> +int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
> + unsigned int len, gfp_t gfp)
> +{
> + struct scatterlist sg;
> +
> + sg_init_one(&sg, kbuf, len);
> +
> + return blk_rq_map_kern_sg(q, rq, &sg, 1, gfp);
> +}
> EXPORT_SYMBOL(blk_rq_map_kern);
could be inline like with the end_request functions
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index d04e118..58b41da 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -784,6 +784,8 @@ extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, uns
> extern int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
> struct rq_map_data *md, struct iovec *iov,
> int count, unsigned int len, gfp_t gfp);
> +extern int blk_rq_map_kern_sg(struct request_queue *q, struct request *rq,
> + struct scatterlist *sgl, int nents, gfp_t gfp);
> extern int blk_execute_rq(struct request_queue *, struct gendisk *,
> struct request *, int);
> extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
On 04/01/2009 04:44 PM, Tejun Heo wrote:
> Impact: hack removal
>
> SCSI needs to map sgl into rq for kernel PC requests; however, block
> API didn't have such feature so it used its own rq mapping function
> which hooked into block/bio internals and is generally considered an
> ugly hack. The private function may also produce requests which are
> bigger than queue per-rq limits.
>
> Block blk_rq_map_kern_sgl(). Kill the private implementation and use
> it.
>
> Signed-off-by: Tejun Heo <[email protected]>
James, TOMO
what happened to Tomo's patches that removes all this after fixing up
all users (sg.c)?
I thought that was agreed and done? What is left to do for that to go
in.
Thanks Boaz
> ---
> drivers/scsi/scsi_lib.c | 108 +----------------------------------------------
> 1 files changed, 1 insertions(+), 107 deletions(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 3196c83..3fa5589 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -296,112 +296,6 @@ static void scsi_end_async(struct request *req, int uptodate)
> __blk_put_request(req->q, req);
> }
>
> -static int scsi_merge_bio(struct request *rq, struct bio *bio)
> -{
> - struct request_queue *q = rq->q;
> -
> - bio->bi_flags &= ~(1 << BIO_SEG_VALID);
> - if (rq_data_dir(rq) == WRITE)
> - bio->bi_rw |= (1 << BIO_RW);
> - blk_queue_bounce(q, &bio);
> -
> - return blk_rq_append_bio(q, rq, bio);
> -}
> -
> -static void scsi_bi_endio(struct bio *bio, int error)
> -{
> - bio_put(bio);
> -}
> -
> -/**
> - * scsi_req_map_sg - map a scatterlist into a request
> - * @rq: request to fill
> - * @sgl: scatterlist
> - * @nsegs: number of elements
> - * @bufflen: len of buffer
> - * @gfp: memory allocation flags
> - *
> - * scsi_req_map_sg maps a scatterlist into a request so that the
> - * request can be sent to the block layer. We do not trust the scatterlist
> - * sent to use, as some ULDs use that struct to only organize the pages.
> - */
> -static int scsi_req_map_sg(struct request *rq, struct scatterlist *sgl,
> - int nsegs, unsigned bufflen, gfp_t gfp)
> -{
> - struct request_queue *q = rq->q;
> - int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> - unsigned int data_len = bufflen, len, bytes, off;
> - struct scatterlist *sg;
> - struct page *page;
> - struct bio *bio = NULL;
> - int i, err, nr_vecs = 0;
> -
> - for_each_sg(sgl, sg, nsegs, i) {
> - page = sg_page(sg);
> - off = sg->offset;
> - len = sg->length;
> -
> - while (len > 0 && data_len > 0) {
> - /*
> - * sg sends a scatterlist that is larger than
> - * the data_len it wants transferred for certain
> - * IO sizes
> - */
> - bytes = min_t(unsigned int, len, PAGE_SIZE - off);
> - bytes = min(bytes, data_len);
> -
> - if (!bio) {
> - nr_vecs = min_t(int, BIO_GUARANTEED_PAGES,
> - nr_pages);
> - nr_pages -= nr_vecs;
> -
> - bio = bio_alloc(gfp, nr_vecs);
> - if (!bio) {
> - err = -ENOMEM;
> - goto free_bios;
> - }
> - bio->bi_end_io = scsi_bi_endio;
> - }
> -
> - if (bio_add_pc_page(q, bio, page, bytes, off) !=
> - bytes) {
> - bio_put(bio);
> - err = -EINVAL;
> - goto free_bios;
> - }
> -
> - if (bio->bi_vcnt >= nr_vecs) {
> - err = scsi_merge_bio(rq, bio);
> - if (err) {
> - bio_endio(bio, 0);
> - goto free_bios;
> - }
> - bio = NULL;
> - }
> -
> - page++;
> - len -= bytes;
> - data_len -=bytes;
> - off = 0;
> - }
> - }
> -
> - rq->buffer = rq->data = NULL;
> - rq->data_len = bufflen;
> - return 0;
> -
> -free_bios:
> - while ((bio = rq->bio) != NULL) {
> - rq->bio = bio->bi_next;
> - /*
> - * call endio instead of bio_put incase it was bounced
> - */
> - bio_endio(bio, 0);
> - }
> -
> - return err;
> -}
> -
> /**
> * scsi_execute_async - insert request
> * @sdev: scsi device
> @@ -438,7 +332,7 @@ int scsi_execute_async(struct scsi_device *sdev, const unsigned char *cmd,
> req->cmd_flags |= REQ_QUIET;
>
> if (use_sg)
> - err = scsi_req_map_sg(req, buffer, use_sg, bufflen, gfp);
> + err = blk_rq_map_kern_sg(req->q, req, buffer, use_sg, gfp);
> else if (bufflen)
> err = blk_rq_map_kern(req->q, req, buffer, bufflen, gfp);
>
On Wed, 2009-04-01 at 20:00 +0300, Boaz Harrosh wrote:
> On 04/01/2009 04:44 PM, Tejun Heo wrote:
> > Impact: hack removal
> >
> > SCSI needs to map sgl into rq for kernel PC requests; however, block
> > API didn't have such feature so it used its own rq mapping function
> > which hooked into block/bio internals and is generally considered an
> > ugly hack. The private function may also produce requests which are
> > bigger than queue per-rq limits.
> >
> > Block blk_rq_map_kern_sgl(). Kill the private implementation and use
> > it.
> >
> > Signed-off-by: Tejun Heo <[email protected]>
>
> James, TOMO
>
> what happened to Tomo's patches that removes all this after fixing up
> all users (sg.c)?
>
> I thought that was agreed and done? What is left to do for that to go
> in.
They couldn't go in because they would break libosd. You were going to
send patches to fix libosd so it no longer relied on the exported
function ... did that happen and I missed it?
James
On 04/01/2009 04:44 PM, Tejun Heo wrote:
> Impact: API cleanup
>
> All initialized requests must and are guaranteed to have rq->q set.
> Taking both @q and @rq parameters doesn't improve anything. It only
> adds pathological corner case where @q != !@rq->q. Kill the
> superflous @q parameter from blk_rq_map_{user|kern}*().
>
> Signed-off-by: Tejun Heo <[email protected]>
Thank god, great stuff
> ---
> block/blk-map.c | 27 +++++++++++----------------
> block/bsg.c | 5 ++---
> block/scsi_ioctl.c | 6 +++---
> drivers/block/pktcdvd.c | 2 +-
> drivers/cdrom/cdrom.c | 2 +-
> drivers/scsi/device_handler/scsi_dh_alua.c | 2 +-
> drivers/scsi/device_handler/scsi_dh_emc.c | 2 +-
> drivers/scsi/device_handler/scsi_dh_rdac.c | 2 +-
> drivers/scsi/scsi_lib.c | 7 +++----
> drivers/scsi/scsi_tgt_lib.c | 3 +--
> drivers/scsi/sg.c | 6 +++---
> drivers/scsi/st.c | 3 +--
> include/linux/blkdev.h | 15 +++++++--------
> 13 files changed, 36 insertions(+), 46 deletions(-)
>
> diff --git a/block/blk-map.c b/block/blk-map.c
> index f60f439..885d359 100644
> --- a/block/blk-map.c
> +++ b/block/blk-map.c
> @@ -10,7 +10,6 @@
>
> /**
> * blk_rq_map_user_iov - map user data to a request, for REQ_TYPE_BLOCK_PC usage
> - * @q: request queue where request should be inserted
> * @rq: request to map data to
> * @md: pointer to the rq_map_data holding pages (if necessary)
> * @iov: pointer to the iovec
> @@ -30,10 +29,10 @@
> * original bio must be passed back in to blk_rq_unmap_user() for proper
> * unmapping.
> */
> -int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
> - struct rq_map_data *md, struct iovec *iov, int count,
> - gfp_t gfp)
> +int blk_rq_map_user_iov(struct request *rq, struct rq_map_data *md,
> + struct iovec *iov, int count, gfp_t gfp)
> {
> + struct request_queue *q = rq->q;
> struct bio *bio = ERR_PTR(-EINVAL);
> int rw = rq_data_dir(rq);
>
> @@ -66,7 +65,6 @@ EXPORT_SYMBOL(blk_rq_map_user_iov);
>
> /**
> * blk_rq_map_user - map user data to a request, for REQ_TYPE_BLOCK_PC usage
> - * @q: request queue where request should be inserted
> * @rq: request structure to fill
> * @md: pointer to the rq_map_data holding pages (if necessary)
> * @ubuf: the user buffer
> @@ -86,16 +84,15 @@ EXPORT_SYMBOL(blk_rq_map_user_iov);
> * original bio must be passed back in to blk_rq_unmap_user() for proper
> * unmapping.
> */
> -int blk_rq_map_user(struct request_queue *q, struct request *rq,
> - struct rq_map_data *md, void __user *ubuf,
> - unsigned long len, gfp_t gfp)
> +int blk_rq_map_user(struct request *rq, struct rq_map_data *md,
> + void __user *ubuf, unsigned long len, gfp_t gfp)
> {
> struct iovec iov;
>
> iov.iov_base = ubuf;
> iov.iov_len = len;
>
> - return blk_rq_map_user_iov(q, rq, md, &iov, 1, gfp);
> + return blk_rq_map_user_iov(rq, md, &iov, 1, gfp);
> }
> EXPORT_SYMBOL(blk_rq_map_user);
>
> @@ -128,7 +125,6 @@ EXPORT_SYMBOL(blk_rq_unmap_user);
>
> /**
> * blk_rq_map_kern_sg - map kernel data to a request, for REQ_TYPE_BLOCK_PC
> - * @q: request queue where request should be inserted
> * @rq: request to fill
> * @sgl: area to map
> * @nents: number of elements in @sgl
> @@ -138,9 +134,10 @@ EXPORT_SYMBOL(blk_rq_unmap_user);
> * Data will be mapped directly if possible. Otherwise a bounce
> * buffer is used.
> */
> -int blk_rq_map_kern_sg(struct request_queue *q, struct request *rq,
> - struct scatterlist *sgl, int nents, gfp_t gfp)
> +int blk_rq_map_kern_sg(struct request *rq, struct scatterlist *sgl, int nents,
> + gfp_t gfp)
> {
> + struct request_queue *q = rq->q;
> int rw = rq_data_dir(rq);
> struct bio *bio;
>
> @@ -165,7 +162,6 @@ EXPORT_SYMBOL(blk_rq_map_kern_sg);
>
> /**
> * blk_rq_map_kern - map kernel data to a request, for REQ_TYPE_BLOCK_PC usage
> - * @q: request queue where request should be inserted
> * @rq: request to fill
> * @kbuf: the kernel buffer
> * @len: length of user data
> @@ -175,13 +171,12 @@ EXPORT_SYMBOL(blk_rq_map_kern_sg);
> * Data will be mapped directly if possible. Otherwise a bounce
> * buffer is used.
> */
> -int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
> - unsigned int len, gfp_t gfp)
> +int blk_rq_map_kern(struct request *rq, void *kbuf, unsigned int len, gfp_t gfp)
> {
> struct scatterlist sg;
>
> sg_init_one(&sg, kbuf, len);
>
> - return blk_rq_map_kern_sg(q, rq, &sg, 1, gfp);
> + return blk_rq_map_kern_sg(rq, &sg, 1, gfp);
> }
> EXPORT_SYMBOL(blk_rq_map_kern);
> diff --git a/block/bsg.c b/block/bsg.c
> index 206060e..69c222a 100644
> --- a/block/bsg.c
> +++ b/block/bsg.c
> @@ -283,7 +283,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t has_write_perm,
> next_rq->cmd_type = rq->cmd_type;
>
> dxferp = (void*)(unsigned long)hdr->din_xferp;
> - ret = blk_rq_map_user(q, next_rq, NULL, dxferp,
> + ret = blk_rq_map_user(next_rq, NULL, dxferp,
> hdr->din_xfer_len, GFP_KERNEL);
> if (ret)
> goto out;
> @@ -299,8 +299,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t has_write_perm,
> dxfer_len = 0;
>
> if (dxfer_len) {
> - ret = blk_rq_map_user(q, rq, NULL, dxferp, dxfer_len,
> - GFP_KERNEL);
> + ret = blk_rq_map_user(rq, NULL, dxferp, dxfer_len, GFP_KERNEL);
> if (ret)
> goto out;
> }
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index fd538f8..a98e4ec 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -306,11 +306,11 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> hdr->iovec_count = iov_shorten(iov, hdr->iovec_count,
> hdr->dxfer_len);
>
> - ret = blk_rq_map_user_iov(q, rq, NULL, iov, hdr->iovec_count,
> + ret = blk_rq_map_user_iov(rq, NULL, iov, hdr->iovec_count,
> GFP_KERNEL);
> kfree(iov);
> } else if (hdr->dxfer_len)
> - ret = blk_rq_map_user(q, rq, NULL, hdr->dxferp, hdr->dxfer_len,
> + ret = blk_rq_map_user(rq, NULL, hdr->dxferp, hdr->dxfer_len,
> GFP_KERNEL);
>
> if (ret)
> @@ -449,7 +449,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,
> break;
> }
>
> - if (bytes && blk_rq_map_kern(q, rq, buffer, bytes, __GFP_WAIT)) {
> + if (bytes && blk_rq_map_kern(rq, buffer, bytes, __GFP_WAIT)) {
> err = DRIVER_ERROR << 24;
> goto out;
> }
> diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
> index dc7a8c3..a4e5e9b 100644
> --- a/drivers/block/pktcdvd.c
> +++ b/drivers/block/pktcdvd.c
> @@ -771,7 +771,7 @@ static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command *
> WRITE : READ, __GFP_WAIT);
>
> if (cgc->buflen) {
> - if (blk_rq_map_kern(q, rq, cgc->buffer, cgc->buflen, __GFP_WAIT))
> + if (blk_rq_map_kern(rq, cgc->buffer, cgc->buflen, __GFP_WAIT))
> goto out;
> }
>
> diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
> index cceace6..ef67aec 100644
> --- a/drivers/cdrom/cdrom.c
> +++ b/drivers/cdrom/cdrom.c
> @@ -2112,7 +2112,7 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf,
> break;
> }
>
> - ret = blk_rq_map_user(q, rq, NULL, ubuf, len, GFP_KERNEL);
> + ret = blk_rq_map_user(rq, NULL, ubuf, len, GFP_KERNEL);
> if (ret) {
> blk_put_request(rq);
> break;
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> index e356b43..f3cb900 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -101,7 +101,7 @@ static struct request *get_alua_req(struct scsi_device *sdev,
> return NULL;
> }
>
> - if (buflen && blk_rq_map_kern(q, rq, buffer, buflen, GFP_NOIO)) {
> + if (buflen && blk_rq_map_kern(rq, buffer, buflen, GFP_NOIO)) {
> blk_put_request(rq);
> sdev_printk(KERN_INFO, sdev,
> "%s: blk_rq_map_kern failed\n", __func__);
> diff --git a/drivers/scsi/device_handler/scsi_dh_emc.c b/drivers/scsi/device_handler/scsi_dh_emc.c
> index 0e572d2..dbbd56d 100644
> --- a/drivers/scsi/device_handler/scsi_dh_emc.c
> +++ b/drivers/scsi/device_handler/scsi_dh_emc.c
> @@ -308,7 +308,7 @@ static struct request *get_req(struct scsi_device *sdev, int cmd,
> rq->timeout = CLARIION_TIMEOUT;
> rq->retries = CLARIION_RETRIES;
>
> - if (blk_rq_map_kern(rq->q, rq, buffer, len, GFP_NOIO)) {
> + if (blk_rq_map_kern(rq, buffer, len, GFP_NOIO)) {
> blk_put_request(rq);
> return NULL;
> }
> diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
> index 5366476..f50b33a 100644
> --- a/drivers/scsi/device_handler/scsi_dh_rdac.c
> +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
> @@ -219,7 +219,7 @@ static struct request *get_rdac_req(struct scsi_device *sdev,
> return NULL;
> }
>
> - if (buflen && blk_rq_map_kern(q, rq, buffer, buflen, GFP_NOIO)) {
> + if (buflen && blk_rq_map_kern(rq, buffer, buflen, GFP_NOIO)) {
> blk_put_request(rq);
> sdev_printk(KERN_INFO, sdev,
> "get_rdac_req: blk_rq_map_kern failed.\n");
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 3fa5589..66c3d0b 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -216,8 +216,7 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
>
> req = blk_get_request(sdev->request_queue, write, __GFP_WAIT);
>
> - if (bufflen && blk_rq_map_kern(sdev->request_queue, req,
> - buffer, bufflen, __GFP_WAIT))
> + if (bufflen && blk_rq_map_kern(req, buffer, bufflen, __GFP_WAIT))
> goto out;
>
> req->cmd_len = COMMAND_SIZE(cmd[0]);
> @@ -332,9 +331,9 @@ int scsi_execute_async(struct scsi_device *sdev, const unsigned char *cmd,
> req->cmd_flags |= REQ_QUIET;
>
> if (use_sg)
> - err = blk_rq_map_kern_sg(req->q, req, buffer, use_sg, gfp);
> + err = blk_rq_map_kern_sg(req, buffer, use_sg, gfp);
> else if (bufflen)
> - err = blk_rq_map_kern(req->q, req, buffer, bufflen, gfp);
> + err = blk_rq_map_kern(req, buffer, bufflen, gfp);
>
> if (err)
> goto free_req;
> diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
> index 48ba413..55beba4 100644
> --- a/drivers/scsi/scsi_tgt_lib.c
> +++ b/drivers/scsi/scsi_tgt_lib.c
> @@ -357,12 +357,11 @@ static int scsi_tgt_transfer_response(struct scsi_cmnd *cmd)
> static int scsi_map_user_pages(struct scsi_tgt_cmd *tcmd, struct scsi_cmnd *cmd,
> unsigned long uaddr, unsigned int len, int rw)
> {
> - struct request_queue *q = cmd->request->q;
> struct request *rq = cmd->request;
> int err;
>
> dprintk("%lx %u\n", uaddr, len);
> - err = blk_rq_map_user(q, rq, NULL, (void *)uaddr, len, GFP_KERNEL);
> + err = blk_rq_map_user(rq, NULL, (void *)uaddr, len, GFP_KERNEL);
> if (err) {
> /*
> * TODO: need to fixup sg_tablesize, max_segment_size,
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 5fcf436..a769041 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -1672,11 +1672,11 @@ static int sg_start_req(Sg_request *srp, unsigned char *cmd)
> }
>
> if (iov_count)
> - res = blk_rq_map_user_iov(q, rq, md, hp->dxferp, iov_count,
> + res = blk_rq_map_user_iov(rq, md, hp->dxferp, iov_count,
> GFP_ATOMIC);
> else
> - res = blk_rq_map_user(q, rq, md, hp->dxferp,
> - hp->dxfer_len, GFP_ATOMIC);
> + res = blk_rq_map_user(rq, md, hp->dxferp, hp->dxfer_len,
> + GFP_ATOMIC);
>
> if (!res) {
> srp->bio = rq->bio;
> diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
> index c6f19ee..c4615bb 100644
> --- a/drivers/scsi/st.c
> +++ b/drivers/scsi/st.c
> @@ -492,8 +492,7 @@ static int st_scsi_execute(struct st_request *SRpnt, const unsigned char *cmd,
> mdata->null_mapped = 1;
>
> if (bufflen) {
> - err = blk_rq_map_user(req->q, req, mdata, NULL, bufflen,
> - GFP_KERNEL);
> + err = blk_rq_map_user(req, mdata, NULL, bufflen, GFP_KERNEL);
> if (err) {
> blk_put_request(req);
> return DRIVER_ERROR << 24;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 6876466..ff0ad9e 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -770,15 +770,14 @@ extern void __blk_stop_queue(struct request_queue *q);
> extern void __blk_run_queue(struct request_queue *);
> extern void blk_run_queue(struct request_queue *);
> extern void blk_start_queueing(struct request_queue *);
> -extern int blk_rq_map_user(struct request_queue *, struct request *,
> - struct rq_map_data *, void __user *, unsigned long,
> - gfp_t);
> +extern int blk_rq_map_user(struct request *rq, struct rq_map_data *md,
> + void __user *ubuf, unsigned long len, gfp_t gfp);
> extern int blk_rq_unmap_user(struct bio *);
> -extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
> -extern int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
> - struct rq_map_data *md, struct iovec *iov,
> - int count, gfp_t gfp);
> -extern int blk_rq_map_kern_sg(struct request_queue *q, struct request *rq,
> +extern int blk_rq_map_kern(struct request *rq, void *kbuf, unsigned int len,
> + gfp_t gfp);
> +extern int blk_rq_map_user_iov(struct request *rq, struct rq_map_data *md,
> + struct iovec *iov, int count, gfp_t gfp);
> +extern int blk_rq_map_kern_sg(struct request *rq,
> struct scatterlist *sgl, int nents, gfp_t gfp);
> extern int blk_execute_rq(struct request_queue *, struct gendisk *,
> struct request *, int);
On 04/01/2009 04:44 PM, Tejun Heo wrote:
> Impact: API cleanup
>
> blk_rq_map_user_iov() took @len parameter which contains duplicate
> information as the total length is available as the sum of all iov
> segments. This doesn't save anything either as the mapping function
> should walk the whole iov on entry to check for alignment anyway.
> Remove the superflous parameter.
>
> Removing the superflous parameter removes the pathological corner case
> where the caller passes in shorter @len than @iov but @iov mappings
> ends up capped due to queue limits and bio->bi_size ends up matching
> @len thus resulting in successful map.
I did not go to the bottom of this but how does that do not change user-space
API. It would now fail in code that would work before.
What if I want to try variable size commands, where I try the shorter
first, then a longer one (or vis versa). I would setup memory mapping to the biggest
command but issue a length that correspond to the encoded command inside the CDB.
The bi->bi_size is not only mapping size it is also the command size I want to
actually read/write.
But I might read this wrong though
> With the superflous parameter
> gone, blk-map/bio can now simply fail partial mappings.
>
> Move partial mapping detection to bio_create_from_sgl() which is
> shared by all map/copy paths and remove partial mapping handling from
> all other places.
>
> This change removes one of the two users of __blk_rq_unmap_user() and
> it gets collapsed into blk_rq_unmap_user().
>
> Signed-off-by: Tejun Heo <[email protected]>
> ---
> block/blk-map.c | 47 +++++++++++++++--------------------------------
> block/scsi_ioctl.c | 11 +++--------
> drivers/scsi/sg.c | 2 +-
> fs/bio.c | 43 +++++++++++++++++--------------------------
> include/linux/blkdev.h | 2 +-
> 5 files changed, 37 insertions(+), 68 deletions(-)
>
> diff --git a/block/blk-map.c b/block/blk-map.c
> index dc4097c..f60f439 100644
> --- a/block/blk-map.c
> +++ b/block/blk-map.c
> @@ -8,20 +8,6 @@
>
> #include "blk.h"
>
> -static int __blk_rq_unmap_user(struct bio *bio)
> -{
> - int ret = 0;
> -
> - if (bio) {
> - if (bio_flagged(bio, BIO_USER_MAPPED))
> - bio_unmap_user(bio);
> - else
> - ret = bio_uncopy_user(bio);
> - }
> -
> - return ret;
> -}
> -
> /**
> * blk_rq_map_user_iov - map user data to a request, for REQ_TYPE_BLOCK_PC usage
> * @q: request queue where request should be inserted
> @@ -29,7 +15,6 @@ static int __blk_rq_unmap_user(struct bio *bio)
> * @md: pointer to the rq_map_data holding pages (if necessary)
> * @iov: pointer to the iovec
> * @count: number of elements in the iovec
> - * @len: I/O byte count
> * @gfp: memory allocation flags
> *
> * Description:
> @@ -47,7 +32,7 @@ static int __blk_rq_unmap_user(struct bio *bio)
> */
> int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
> struct rq_map_data *md, struct iovec *iov, int count,
> - unsigned int len, gfp_t gfp)
> + gfp_t gfp)
> {
> struct bio *bio = ERR_PTR(-EINVAL);
> int rw = rq_data_dir(rq);
> @@ -62,23 +47,17 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
> if (IS_ERR(bio))
> return PTR_ERR(bio);
>
> - if (bio->bi_size != len) {
> - /*
> - * Grab an extra reference to this bio, as bio_unmap_user()
> - * expects to be able to drop it twice as it happens on the
> - * normal IO completion path
> - */
> - bio_get(bio);
> - bio_endio(bio, 0);
> - __blk_rq_unmap_user(bio);
> - return -EINVAL;
> - }
> -
> if (!bio_flagged(bio, BIO_USER_MAPPED))
> rq->cmd_flags |= REQ_COPY_USER;
>
> - blk_queue_bounce(q, &bio);
> + /*
> + * Grab an extra reference to this bio, as bio_unmap_user()
> + * expects to be able to drop it twice as it happens on the
> + * normal IO completion path.
> + */
> bio_get(bio);
> +
> + blk_queue_bounce(q, &bio);
> blk_rq_bio_prep(q, rq, bio);
> rq->buffer = rq->data = NULL;
> return 0;
> @@ -116,7 +95,7 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq,
> iov.iov_base = ubuf;
> iov.iov_len = len;
>
> - return blk_rq_map_user_iov(q, rq, md, &iov, 1, len, gfp);
> + return blk_rq_map_user_iov(q, rq, md, &iov, 1, gfp);
> }
> EXPORT_SYMBOL(blk_rq_map_user);
>
> @@ -132,12 +111,16 @@ EXPORT_SYMBOL(blk_rq_map_user);
> int blk_rq_unmap_user(struct bio *bio)
> {
> struct bio *mapped_bio = bio;
> - int ret;
> + int ret = 0;
>
> if (unlikely(bio_flagged(bio, BIO_BOUNCED)))
> mapped_bio = bio->bi_private;
>
> - ret = __blk_rq_unmap_user(mapped_bio);
> + if (bio_flagged(bio, BIO_USER_MAPPED))
> + bio_unmap_user(mapped_bio);
> + else
> + ret = bio_uncopy_user(mapped_bio);
> +
> bio_put(bio);
> return ret;
> }
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index 73cfd91..fd538f8 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -288,7 +288,6 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
>
> if (hdr->iovec_count) {
> const int size = sizeof(struct sg_iovec) * hdr->iovec_count;
> - size_t iov_data_len;
> struct iovec *iov;
>
> iov = kmalloc(size, GFP_KERNEL);
> @@ -304,15 +303,11 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> }
>
> /* SG_IO howto says that the shorter of the two wins */
> - iov_data_len = iov_length(iov, hdr->iovec_count);
> - if (hdr->dxfer_len < iov_data_len) {
> - hdr->iovec_count = iov_shorten(iov, hdr->iovec_count,
> - hdr->dxfer_len);
> - iov_data_len = hdr->dxfer_len;
> - }
> + hdr->iovec_count = iov_shorten(iov, hdr->iovec_count,
> + hdr->dxfer_len);
>
> ret = blk_rq_map_user_iov(q, rq, NULL, iov, hdr->iovec_count,
> - iov_data_len, GFP_KERNEL);
> + GFP_KERNEL);
> kfree(iov);
> } else if (hdr->dxfer_len)
> ret = blk_rq_map_user(q, rq, NULL, hdr->dxferp, hdr->dxfer_len,
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index b4ef2f8..5fcf436 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -1673,7 +1673,7 @@ static int sg_start_req(Sg_request *srp, unsigned char *cmd)
>
> if (iov_count)
> res = blk_rq_map_user_iov(q, rq, md, hp->dxferp, iov_count,
> - hp->dxfer_len, GFP_ATOMIC);
> + GFP_ATOMIC);
> else
> res = blk_rq_map_user(q, rq, md, hp->dxferp,
> hp->dxfer_len, GFP_ATOMIC);
> diff --git a/fs/bio.c b/fs/bio.c
> index fe796dc..9466b05 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -956,14 +956,14 @@ static void bio_memcpy_sgl_sgl(struct scatterlist *dsgl, int dnents,
> * @nr_pages: number of pages in @sgl
> * @rw: the data direction of new bio
> *
> - * Populate @bio with the data area described by @sgl. Note that
> - * the resulting bio might not contain the whole @sgl area. This
> - * can be checked by testing bio->bi_size against total area
> - * length.
> + * Populate @bio with the data area described by @sgl.
> + *
> + * RETURNS:
> + * 0 on success, -errno on failure.
> */
> -static void bio_init_from_sgl(struct bio *bio, struct request_queue *q,
> - struct scatterlist *sgl, int nents,
> - int nr_pages, int rw)
> +static int bio_init_from_sgl(struct bio *bio, struct request_queue *q,
> + struct scatterlist *sgl, int nents,
> + int nr_pages, int rw)
> {
> struct scatterlist *sg;
> int i;
> @@ -979,15 +979,18 @@ static void bio_init_from_sgl(struct bio *bio, struct request_queue *q,
> while (len) {
> size_t bytes = min_t(size_t, len, PAGE_SIZE - offset);
>
> + /* doesn't support partial mappings */
> if (unlikely(bio_add_pc_page(q, bio, page,
> bytes, offset) < bytes))
> - break;
> + return -EINVAL;
>
> offset = 0;
> len -= bytes;
> page = nth_page(page, 1);
> }
> }
> +
> + return 0;
> }
>
> /**
> @@ -1009,12 +1012,17 @@ static struct bio *bio_create_from_sgl(struct request_queue *q,
> int nr_pages, int rw, int gfp)
> {
> struct bio *bio;
> + int ret;
>
> bio = bio_kmalloc(gfp, nr_pages);
> if (!bio)
> return ERR_PTR(-ENOMEM);
>
> - bio_init_from_sgl(bio, q, sgl, nents, nr_pages, rw);
> + ret = bio_init_from_sgl(bio, q, sgl, nents, nr_pages, rw);
> + if (ret) {
> + bio_put(bio);
> + return ERR_PTR(ret);
> + }
>
> return bio;
> }
> @@ -1170,10 +1178,6 @@ struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
> goto err_pages;
> }
>
> - /* release the pages we didn't map into the bio, if any */
> - for (i = bio->bi_vcnt; i < nr_pages; i++)
> - page_cache_release(pages[i]);
> -
> bio->bi_bdev = bdev;
> bio->bi_flags |= (1 << BIO_USER_MAPPED);
>
> @@ -1283,12 +1287,6 @@ struct bio *bio_map_kern_sg(struct request_queue *q, struct scatterlist *sgl,
> if (IS_ERR(bio))
> return bio;
>
> - /* doesn't support partial mappings */
> - if (bio->bi_size != tot_len) {
> - bio_put(bio);
> - return ERR_PTR(-EINVAL);
> - }
> -
> bio->bi_end_io = bio_map_kern_endio;
> return bio;
> }
> @@ -1343,17 +1341,10 @@ struct bio *bio_copy_kern_sg(struct request_queue *q, struct scatterlist *sgl,
> goto err_bci;
> }
>
> - /* doesn't support partial mappings */
> - ret= -EINVAL;
> - if (bio->bi_size != bci->len)
> - goto err_bio;
> -
> bio->bi_end_io = bio_copy_kern_endio;
> bio->bi_private = bci;
> return bio;
>
> -err_bio:
> - bio_put(bio);
> err_bci:
> bci_destroy(bci);
> return ERR_PTR(ret);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 40bec76..6876466 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -777,7 +777,7 @@ extern int blk_rq_unmap_user(struct bio *);
> extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
> extern int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
> struct rq_map_data *md, struct iovec *iov,
> - int count, unsigned int len, gfp_t gfp);
> + int count, gfp_t gfp);
> extern int blk_rq_map_kern_sg(struct request_queue *q, struct request *rq,
> struct scatterlist *sgl, int nents, gfp_t gfp);
> extern int blk_execute_rq(struct request_queue *, struct gendisk *,
On 04/01/2009 08:05 PM, James Bottomley wrote:
> On Wed, 2009-04-01 at 20:00 +0300, Boaz Harrosh wrote:
>> On 04/01/2009 04:44 PM, Tejun Heo wrote:
>>> Impact: hack removal
>>>
>>> SCSI needs to map sgl into rq for kernel PC requests; however, block
>>> API didn't have such feature so it used its own rq mapping function
>>> which hooked into block/bio internals and is generally considered an
>>> ugly hack. The private function may also produce requests which are
>>> bigger than queue per-rq limits.
>>>
>>> Block blk_rq_map_kern_sgl(). Kill the private implementation and use
>>> it.
>>>
>>> Signed-off-by: Tejun Heo <[email protected]>
>> James, TOMO
>>
>> what happened to Tomo's patches that removes all this after fixing up
>> all users (sg.c)?
>>
>> I thought that was agreed and done? What is left to do for that to go
>> in.
>
> They couldn't go in because they would break libosd. You were going to
> send patches to fix libosd so it no longer relied on the exported
> function ... did that happen and I missed it?
>
That's not related. I'm asking about the scsi ULD patches and finally the
patch to scsi_lib.c. libosd only conflicts with the very last patch to block
layer. I don't see how that prevents the cleanups to scsi?
And BTW, I did send RFC patches that removes usage of blk_rq_append_bio() and
did not receive any comments
> James
>
>
Boaz
Hello,
Boaz Harrosh wrote:
> I did not go to the bottom of this but how does that do not change
> user-space API. It would now fail in code that would work before.
>
> What if I want to try variable size commands, where I try the
> shorter first, then a longer one (or vis versa). I would setup
> memory mapping to the biggest command but issue a length that
> correspond to the encoded command inside the CDB.
>
> The bi->bi_size is not only mapping size it is also the command size
> I want to actually read/write.
>
> But I might read this wrong though
Please see below.
>> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
>> index 73cfd91..fd538f8 100644
>> --- a/block/scsi_ioctl.c
>> +++ b/block/scsi_ioctl.c
>> @@ -288,7 +288,6 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
>>
>> if (hdr->iovec_count) {
>> const int size = sizeof(struct sg_iovec) * hdr->iovec_count;
>> - size_t iov_data_len;
>> struct iovec *iov;
>>
>> iov = kmalloc(size, GFP_KERNEL);
>> @@ -304,15 +303,11 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
>> }
>>
>> /* SG_IO howto says that the shorter of the two wins */
>> - iov_data_len = iov_length(iov, hdr->iovec_count);
>> - if (hdr->dxfer_len < iov_data_len) {
>> - hdr->iovec_count = iov_shorten(iov, hdr->iovec_count,
>> - hdr->dxfer_len);
>> - iov_data_len = hdr->dxfer_len;
>> - }
>> + hdr->iovec_count = iov_shorten(iov, hdr->iovec_count,
>> + hdr->dxfer_len);
>>
>> ret = blk_rq_map_user_iov(q, rq, NULL, iov, hdr->iovec_count,
>> - iov_data_len, GFP_KERNEL);
>> + GFP_KERNEL);
blk_rq_map_user_iov() never had the shorter data len functionality.
It fails requests with -EINVAL if iov_length(iov) != dxfer_len, so a
patch in the previous patchset adds the above iov_shorten() call to
trim iov if dxfer_len is shorter. In this patch, it gets a bit
simpler as the length parameter isn't necessary.
Thanks.
--
tejun
Boaz Harrosh wrote:
>> ret = get_user_pages_fast(uaddr, local_nr_pages, rw == READ,
>> - &pages[cur_page]);
>> + &pages[nents]);
>
> Fast look at all users of get_user_pages_fast().
>
> This can be converted to take an sglist instead of pages* array.
> Outside of this code all users either have a fixed-size allocated array
> or hard coded one page. This here is the main user. (Out of 5)
Yeah, that would be nice, although it's understandable that gup taking
page array as parameter.
Thanks.
--
tejun
Hello,
Boaz Harrosh wrote:
>> - do_copy = !blk_rq_aligned(q, kbuf, len) || object_is_on_stack(kbuf);
>> - if (do_copy)
>> - bio = bio_copy_kern(q, kbuf, len, gfp, rw);
>> - else
>> - bio = bio_map_kern(q, kbuf, len, gfp);
>> -
>> + bio = bio_map_kern_sg(q, sgl, nents, rw, gfp);
>> + if (IS_ERR(bio))
>> + bio = bio_copy_kern_sg(q, sgl, nents, rw, gfp);
>> if (IS_ERR(bio))
>> return PTR_ERR(bio);
>
> You might want to call bio_copy_kern_sg from within bio_map_kern_sg
> and remove yet another export from bio layer
>
> Same thing with bio_map_user_iov/bio_copy_user_iov
Right, that makes sense. Will incorporate it.
>>
>> - if (rq_data_dir(rq) == WRITE)
>> - bio->bi_rw |= (1 << BIO_RW);
>> -
>> - if (do_copy)
>> + if (!bio_flagged(bio, BIO_USER_MAPPED))
>> rq->cmd_flags |= REQ_COPY_USER;
>>
>> + blk_queue_bounce(q, &bio);
>> blk_rq_bio_prep(q, rq, bio);
>
> It could be nice to call blk_rq_append_bio() here
> and support multiple calls to this member.
> This will solve half of my problem with osd_initiator
>
> Hmm .. but you wanted to drop multiple bio chaining
> perhaps you would reconsider.
I don't want to drop multiple bio chaining at all in itself. I just
didn't see the current uses as, well, useful. If building a sgl for a
request at once is difficult for your purpose, making blk_rq_map_*()
functions accumulate bios sounds like a good idea. The primary goal
was to remove direct bio visibility/manipulation from low level
driver's POV.
>> +int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
>> + unsigned int len, gfp_t gfp)
>> +{
>> + struct scatterlist sg;
>> +
>> + sg_init_one(&sg, kbuf, len);
>> +
>> + return blk_rq_map_kern_sg(q, rq, &sg, 1, gfp);
>> +}
>> EXPORT_SYMBOL(blk_rq_map_kern);
>
> could be inline like with the end_request functions
Yeap, will make it inline.
Thanks.
--
tejun
Hello, Boaz.
Boaz Harrosh wrote:
> I've looked hard and deep into your patchset, and I would like to
> suggest an improvement.
>
> [Option 1]
> What your code is actually using from sgl-code base is:
> for_each_sg
> sg_mapping_iter and it's
> sg_miter_start, sg_miter_next
> ... (what else)
>
> I would like if you can define above for bvec(s) just the way you like
> them. Then code works directly on the destination bvect inside the final
> bio. One less copy no intermediate allocation, and no kmalloc of
> bigger-then-page buffers.
>
> These are all small inlines, duplicating those will not affect
> Kernel size at all. You are not using the chaining ability of sgl(s)
> so it can be simplified. You will see that not having the intermediate
> copy simplifies the code even more.
>
> Since no out-side user currently needs sgl(s) no functionality is lost.
Yeah, we can switch to dealing with bvecs instead of sgls. For kernel
mappings, it wouldn't make any difference. For user single mappings,
it'll remove an intermediate step. For user sg mappings, if we make
gup use sgl, it won't make much difference.
Frankly, for copy paths, I don't think all this matters at all. If
each request is small, other overheads will dominate. For large
requests, copying to and from sgl isn't something worth worrying
about.
For mapping paths, if we change gup to use sgl (we really can't make
it swallow bvec), I don't think it will make whole lot of difference
one way or the other. It might end up using slightly more cachelines,
but that will be about it.
I do agree using bvec as internal page carrier would be nicer (sans
having to implement stuff manually) but I'm not convinced whether
it'll be worth the added code.
> [Option 2]
> Keep pointer to sgl and not bvec at bio, again code works on final
> destination. Later users of block layer that call blk_rq_fill_sgl
> (blk_rq_map_sg) will just get a copy of the pointer and another
> allocation and copy is gained. This option will spill outside of
> the current patches scope. Into bvec hacking code.
>
> I do like your long term vision of separating the DMA part from the
> virtual part of scatterlists. Note how they are actually two
> disjoint lists altogether. After the dma_map does its thing the dma
> physical list might be shorter then virtual and sizes might not
> correspond at all. The dma mapping code regards the dma part as an
> empty list that gets appended while processing, any segments match
> is accidental. (That is: inside the scatterlist the virtual address
> most probably does not match the dma address)
>
> So [option 1] matches more closely to that vision.
>
> Historically code was doing
> Many-sources => scatterlist => biovec => scatterlist => dma-scatterlist
>
> Only at 2.6.30 we can say that we shorten a step to do:
> Many-sources => biovec => scatterlist => dma-scatterlist
>
> Now you want to return the extra step, I hate it.
It's not my favorite part either but I'm just not convinced it matters
all that much.
> [Option 2] can make that even shorter.
> Many-sources => scatterlist => dma-scatterlist
Yeah, this is the root cause. We don't have a common API to carry
segments so we end up doing impedence matching when crossing
boundaries. Drivers expect sgl at the interface. Filesystems
obviously expect bio with bvecs. gup expects array of pages (this one
isn't too bad tho).
> Please consider [option 1] it will only add some source code
> but it will not increase code size, maybe it will decrease,
> and it will be fast.
>
> Please consider that this code-path is used by me, in exofs and
> pNFS-objcets in a very very hot path, where memory pressure is a
> common scenario.
I'm not hugely against using bvec inside. I just didn't see much
difference and went for something easier, so yeah, it definitely is an
option.
Jens, Tomo, what do you guys think?
> And I have one more question. Are you sure kmalloc of
> bigger-then-page buffers are safe?
It has higher chance of failure but is definitely safe.
> As I understood it, that tries to allocate physically contiguous
> pages which degrades as time passes, and last time I tried this with
> a kmem_cache (do to a bug) it crashed the kernel randomly after 2
> minutes of use.
Then, that's a bug. Yeah, high order allocations fail more easily as
time passes. But low order (1 maybe 2) allocations aren't too bad.
If it matters we can make bio_kmalloc() use vmalloc() for bvecs if it
goes over PAGE_SIZE but again given that nobody reported the spurious
GFP_DMA in bounce_gfp which triggers frenzy OOM killing spree for
large SG_IOs, I don't think this matters all that much.
Thanks.
--
tejun
Hello, Boaz.
Boaz Harrosh wrote:
> Please consider that this code-path is used by me, in exofs and
> pNFS-objcets in a very very hot path, where memory pressure is a
> common scenario.
One quick question. Don't the above paths map kernel pages to a
request?
Thanks.
--
tejun
On 04/02/2009 04:38 AM, Tejun Heo wrote:
> Hello, Boaz.
>
> Boaz Harrosh wrote:
>> Please consider that this code-path is used by me, in exofs and
>> pNFS-objcets in a very very hot path, where memory pressure is a
>> common scenario.
>
> One quick question. Don't the above paths map kernel pages to a
> request?
>
Well vfs-page-cache pages. Any ideas?
> Thanks.
>
Boaz Harrosh wrote:
> On 04/02/2009 04:38 AM, Tejun Heo wrote:
>> Hello, Boaz.
>>
>> Boaz Harrosh wrote:
>>> Please consider that this code-path is used by me, in exofs and
>>> pNFS-objcets in a very very hot path, where memory pressure is a
>>> common scenario.
>> One quick question. Don't the above paths map kernel pages to a
>> request?
>>
>
> Well vfs-page-cache pages. Any ideas?
In that case it would go through blk_rq_map_kern_sg(), so unless you
wanna change the blk_rq_map_kern_sg() interface such that it uses
bvec, the sgl overhead doesn't matter anyway. Of course, if it has
been buidling bios directly, that is a different story.
Thanks.
--
tejun
On 04/02/2009 02:57 AM, Tejun Heo wrote:
> Hello, Boaz.
>
> Boaz Harrosh wrote:
>> I've looked hard and deep into your patchset, and I would like to
>> suggest an improvement.
>>
>> [Option 1]
>> What your code is actually using from sgl-code base is:
>> for_each_sg
>> sg_mapping_iter and it's
>> sg_miter_start, sg_miter_next
>> ... (what else)
>>
>> I would like if you can define above for bvec(s) just the way you like
>> them. Then code works directly on the destination bvect inside the final
>> bio. One less copy no intermediate allocation, and no kmalloc of
>> bigger-then-page buffers.
>>
>> These are all small inlines, duplicating those will not affect
>> Kernel size at all. You are not using the chaining ability of sgl(s)
>> so it can be simplified. You will see that not having the intermediate
>> copy simplifies the code even more.
>>
>> Since no out-side user currently needs sgl(s) no functionality is lost.
>
> Yeah, we can switch to dealing with bvecs instead of sgls. For kernel
> mappings, it wouldn't make any difference. For user single mappings,
> it'll remove an intermediate step. For user sg mappings, if we make
> gup use sgl, it won't make much difference.
"gup use sgl" was just a stupid suggestion to try eliminate one more
allocation. Realy Realy stupid.
for gup there is a much better, safer, way. Just allocate on the stack
a constant size say 64 pointers and add an inner-loop on the 64 pages.
The extra code loop is much cheaper then an allocation and is one less
point of failure.
So now that we got gup out of the way, will you go directly to bvecs?
The goal is to have NO allocations at all.
>
> Frankly, for copy paths, I don't think all this matters at all. If
> each request is small, other overheads will dominate. For large
> requests, copying to and from sgl isn't something worth worrying
> about.
It is. Again, I tested that. Each new allocation in the order of
the number of pages in a request adds 3-5% on a RAM I/O
And it adds a point of failure. The less the better.
> For mapping paths, if we change gup to use sgl (we really can't make
> it swallow bvec), I don't think it will make whole lot of difference
> one way or the other. It might end up using slightly more cachelines,
> but that will be about it.
>
gup to use sgl, is out of the way, that was stupid idea to try eliminate
one more dynamic allocation.
> I do agree using bvec as internal page carrier would be nicer (sans
> having to implement stuff manually) but I'm not convinced whether
> it'll be worth the added code.
>
We can reach a point where there is no intermediate dynamic alloctions
and no point of failures, but for the allocation of the final bio+biovec.
And that "implement stuff manually" will also enable internal bio
code cleanups since now biovec is that much smarter.
<snip>
>> And I have one more question. Are you sure kmalloc of
>> bigger-then-page buffers are safe?
>
> It has higher chance of failure but is definitely safe.
>
Much much higher, not just 100% higher. And it will kill the
use of block-layer in nommu systems.
>> As I understood it, that tries to allocate physically contiguous
>> pages which degrades as time passes, and last time I tried this with
>> a kmem_cache (do to a bug) it crashed the kernel randomly after 2
>> minutes of use.
>
> Then, that's a bug.
Fixing the bug will not help, the allocation will fail and the IO will
not get through.
> Yeah, high order allocations fail more easily as
> time passes. But low order (1 maybe 2) allocations aren't too bad.
No, we are not talking about 1 or 2. Today since Jens put the scatterlist
chaining we can have lots and lots of them chained together. At the time
Jens said that bio had the chaining option for a long time, only scatterlists
limit the size of the request.
> If it matters we can make bio_kmalloc() use vmalloc() for bvecs if it
> goes over PAGE_SIZE but again given that nobody reported the
No that is a regression, vmaloc is an order of a magnitude slower then
just plain BIO chaining
spurious
> GFP_DMA in bounce_gfp which triggers frenzy OOM killing spree for
> large SG_IOs, I don't think this matters all that much.
>
This BUG was only for HW, ie ata. But for stuff like iscsi FB sas
they did fine.
The fact of the matter is that people, me included, run in systems with
very large request for a while now, large like 4096 pages which is
16 chained bios. Do you want to allocate that with vmalloc?
> Thanks.
>
I love your cleanups, and your courage, which I don't have.
I want to help in any way I can. If you need just tell me what,
and I'll be glad to try or test anything. This is fun I like to work
and think on these things
Thanks
Boaz
On 04/01/2009 04:44 PM, Tejun Heo wrote:
> Impact: cleanup
>
> bio confusingly uses @write_to_vm and @reading for data directions,
> both of which mean the opposite of the usual block/bio convention of
> using READ and WRITE w.r.t. IO devices. The only place where the
> inversion is necessary is when caling get_user_pages_fast() in
> bio_copy_user_iov() as the gup uses the VM convention of read/write
> w.r.t. VM.
>
> This patch converts all bio functions to use READ/WRITE rw parameter
> and let the one place where inversion is necessary to rw == READ.
>
Hi one more nit picking just if you are at it. If you want
I can do the work and send it to you so you can squash it into this patch.
See bellow
> Signed-off-by: Tejun Heo <[email protected]>
> ---
> block/blk-map.c | 10 +++++-----
> fs/bio.c | 50 +++++++++++++++++++++++++-------------------------
> include/linux/bio.h | 18 +++++++++---------
> 3 files changed, 39 insertions(+), 39 deletions(-)
>
> diff --git a/block/blk-map.c b/block/blk-map.c
> index b0b65ef..29aa60d 100644
> --- a/block/blk-map.c
> +++ b/block/blk-map.c
> @@ -68,15 +68,15 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
> int iov_count, unsigned int len, gfp_t gfp_mask)
> {
> struct bio *bio = ERR_PTR(-EINVAL);
> - int read = rq_data_dir(rq) == READ;
> + int rw = rq_data_dir(rq);
>
> if (!iov || iov_count <= 0)
> return -EINVAL;
>
> if (!map_data)
> - bio = bio_map_user_iov(q, NULL, iov, iov_count, read, gfp_mask);
> + bio = bio_map_user_iov(q, NULL, iov, iov_count, rw, gfp_mask);
> if (bio == ERR_PTR(-EINVAL))
> - bio = bio_copy_user_iov(q, map_data, iov, iov_count, read,
> + bio = bio_copy_user_iov(q, map_data, iov, iov_count, rw,
> gfp_mask);
> if (IS_ERR(bio))
> return PTR_ERR(bio);
> @@ -177,7 +177,7 @@ EXPORT_SYMBOL(blk_rq_unmap_user);
> int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
> unsigned int len, gfp_t gfp_mask)
> {
> - int reading = rq_data_dir(rq) == READ;
> + int rw = rq_data_dir(rq);
> int do_copy = 0;
> struct bio *bio;
>
> @@ -188,7 +188,7 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
>
> do_copy = !blk_rq_aligned(q, kbuf, len) || object_is_on_stack(kbuf);
> if (do_copy)
> - bio = bio_copy_kern(q, kbuf, len, gfp_mask, reading);
> + bio = bio_copy_kern(q, kbuf, len, gfp_mask, rw);
> else
> bio = bio_map_kern(q, kbuf, len, gfp_mask);
>
> diff --git a/fs/bio.c b/fs/bio.c
> index 80f61ed..70e5153 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -780,7 +780,7 @@ int bio_uncopy_user(struct bio *bio)
> * @map_data: pointer to the rq_map_data holding pages (if necessary)
> * @iov: the iovec.
> * @iov_count: number of elements in the iovec
> - * @write_to_vm: bool indicating writing to pages or not
> + * @rw: READ or WRITE
> * @gfp_mask: memory allocation flags
> *
> * Prepares and returns a bio for indirect user io, bouncing data
> @@ -789,8 +789,8 @@ int bio_uncopy_user(struct bio *bio)
> */
> struct bio *bio_copy_user_iov(struct request_queue *q,
> struct rq_map_data *map_data,
> - struct sg_iovec *iov, int iov_count,
> - int write_to_vm, gfp_t gfp_mask)
> + struct sg_iovec *iov, int iov_count, int rw,
> + gfp_t gfp_mask)
> {
> struct bio_map_data *bmd;
> struct bio_vec *bvec;
> @@ -823,7 +823,8 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
> if (!bio)
> goto out_bmd;
>
> - bio->bi_rw |= (!write_to_vm << BIO_RW);
> + if (rw == WRITE)
> + bio->bi_rw |= 1 << BIO_RW;
can we pleas have an inline that does that? Like bio_set_dir()?
and change all users. You will be surprised how many there are.
It gives me an hart attack every time I have to write yet another
one.
>
> ret = 0;
>
> @@ -872,7 +873,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
> */
> if (unlikely(map_data && map_data->null_mapped))
> bio->bi_flags |= (1 << BIO_NULL_MAPPED);
> - else if (!write_to_vm) {
> + else if (rw == WRITE) {
> ret = __bio_copy_iov(bio, bio->bi_io_vec, iov, iov_count, 0, 0);
> if (ret)
> goto cleanup;
> @@ -897,7 +898,7 @@ out_bmd:
> * @map_data: pointer to the rq_map_data holding pages (if necessary)
> * @uaddr: start of user address
> * @len: length in bytes
> - * @write_to_vm: bool indicating writing to pages or not
> + * @rw: READ or WRITE
> * @gfp_mask: memory allocation flags
> *
> * Prepares and returns a bio for indirect user io, bouncing data
> @@ -905,21 +906,21 @@ out_bmd:
> * call bio_uncopy_user() on io completion.
> */
> struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
> - unsigned long uaddr, unsigned int len,
> - int write_to_vm, gfp_t gfp_mask)
> + unsigned long uaddr, unsigned int len, int rw,
> + gfp_t gfp_mask)
> {
> struct sg_iovec iov;
>
> iov.iov_base = (void __user *)uaddr;
> iov.iov_len = len;
>
> - return bio_copy_user_iov(q, map_data, &iov, 1, write_to_vm, gfp_mask);
> + return bio_copy_user_iov(q, map_data, &iov, 1, rw, gfp_mask);
> }
>
> static struct bio *__bio_map_user_iov(struct request_queue *q,
> struct block_device *bdev,
> struct sg_iovec *iov, int iov_count,
> - int write_to_vm, gfp_t gfp_mask)
> + int rw, gfp_t gfp_mask)
> {
> int i, j;
> size_t tot_len = 0;
> @@ -967,8 +968,8 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
> const int local_nr_pages = end - start;
> const int page_limit = cur_page + local_nr_pages;
>
> - ret = get_user_pages_fast(uaddr, local_nr_pages,
> - write_to_vm, &pages[cur_page]);
> + ret = get_user_pages_fast(uaddr, local_nr_pages, rw == READ,
> + &pages[cur_page]);
> if (ret < local_nr_pages) {
> ret = -EFAULT;
> goto out_unmap;
> @@ -1008,7 +1009,7 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
> /*
> * set data direction, and check if mapped pages need bouncing
> */
> - if (!write_to_vm)
> + if (rw == WRITE)
> bio->bi_rw |= (1 << BIO_RW);
Here
>
> bio->bi_bdev = bdev;
> @@ -1033,14 +1034,14 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
> * @bdev: destination block device
> * @uaddr: start of user address
> * @len: length in bytes
> - * @write_to_vm: bool indicating writing to pages or not
> + * @rw: READ or WRITE
> * @gfp_mask: memory allocation flags
> *
> * Map the user space address into a bio suitable for io to a block
> * device. Returns an error pointer in case of error.
> */
> struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
> - unsigned long uaddr, unsigned int len, int write_to_vm,
> + unsigned long uaddr, unsigned int len, int rw,
> gfp_t gfp_mask)
> {
> struct sg_iovec iov;
> @@ -1048,7 +1049,7 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
> iov.iov_base = (void __user *)uaddr;
> iov.iov_len = len;
>
> - return bio_map_user_iov(q, bdev, &iov, 1, write_to_vm, gfp_mask);
> + return bio_map_user_iov(q, bdev, &iov, 1, rw, gfp_mask);
> }
>
> /**
> @@ -1057,20 +1058,19 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
> * @bdev: destination block device
> * @iov: the iovec.
> * @iov_count: number of elements in the iovec
> - * @write_to_vm: bool indicating writing to pages or not
> + * @rw: READ or WRITE
> * @gfp_mask: memory allocation flags
> *
> * Map the user space address into a bio suitable for io to a block
> * device. Returns an error pointer in case of error.
> */
> struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
> - struct sg_iovec *iov, int iov_count,
> - int write_to_vm, gfp_t gfp_mask)
> + struct sg_iovec *iov, int iov_count, int rw,
> + gfp_t gfp_mask)
> {
> struct bio *bio;
>
> - bio = __bio_map_user_iov(q, bdev, iov, iov_count, write_to_vm,
> - gfp_mask);
> + bio = __bio_map_user_iov(q, bdev, iov, iov_count, rw, gfp_mask);
> if (IS_ERR(bio))
> return bio;
>
> @@ -1219,23 +1219,23 @@ static void bio_copy_kern_endio(struct bio *bio, int err)
> * @data: pointer to buffer to copy
> * @len: length in bytes
> * @gfp_mask: allocation flags for bio and page allocation
> - * @reading: data direction is READ
> + * @rw: READ or WRITE
> *
> * copy the kernel address into a bio suitable for io to a block
> * device. Returns an error pointer in case of error.
> */
> struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
> - gfp_t gfp_mask, int reading)
> + gfp_t gfp_mask, int rw)
> {
> struct bio *bio;
> struct bio_vec *bvec;
> int i;
>
> - bio = bio_copy_user(q, NULL, (unsigned long)data, len, 1, gfp_mask);
> + bio = bio_copy_user(q, NULL, (unsigned long)data, len, READ, gfp_mask);
> if (IS_ERR(bio))
> return bio;
>
> - if (!reading) {
> + if (rw == WRITE) {
> void *p = data;
>
> bio_for_each_segment(bvec, bio, i) {
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 4bf7442..45f56d2 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -387,24 +387,24 @@ int bio_get_nr_vecs(struct block_device *bdev);
> sector_t bio_sector_offset(struct bio *bio, unsigned short index,
> unsigned int offset);
> struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
> - unsigned long uaddr, unsigned int len,
> - int write_to_vm, gfp_t gfp_mask);
> + unsigned long uaddr, unsigned int len, int rw,
> + gfp_t gfp_mask);
> struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
> - struct sg_iovec *iov, int iov_count,
> - int write_to_vm, gfp_t gfp_mask);
> + struct sg_iovec *iov, int iov_count, int rw,
> + gfp_t gfp_mask);
> void bio_unmap_user(struct bio *bio);
> struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
> - unsigned long uaddr, unsigned int len,
> - int write_to_vm, gfp_t gfp_mask);
> + unsigned long uaddr, unsigned int len, int rw,
> + gfp_t gfp_mask);
> struct bio *bio_copy_user_iov(struct request_queue *q,
> struct rq_map_data *map_data,
> - struct sg_iovec *iov, int iov_count,
> - int write_to_vm, gfp_t gfp_mask);
> + struct sg_iovec *iov, int iov_count, int rw,
> + gfp_t gfp_mask);
> int bio_uncopy_user(struct bio *bio);
> struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
> gfp_t gfp_mask);
> struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
> - gfp_t gfp_mask, int reading);
> + gfp_t gfp_mask, int rw);
> void bio_set_pages_dirty(struct bio *bio);
> void bio_check_pages_dirty(struct bio *bio);
> void zero_fill_bio(struct bio *bio);
Boaz
Hello, Boaz.
Boaz Harrosh wrote:
>> Yeah, we can switch to dealing with bvecs instead of sgls. For kernel
>> mappings, it wouldn't make any difference. For user single mappings,
>> it'll remove an intermediate step. For user sg mappings, if we make
>> gup use sgl, it won't make much difference.
>
> "gup use sgl" was just a stupid suggestion to try eliminate one more
> allocation. Realy Realy stupid.
>
> for gup there is a much better, safer, way. Just allocate on the stack
> a constant size say 64 pointers and add an inner-loop on the 64 pages.
> The extra code loop is much cheaper then an allocation and is one less
> point of failure.
Great.
> So now that we got gup out of the way, will you go directly to bvecs?
As said before, I don't think exposing bio at the driver interface is
a good idea copy or not. For internal implementation, if there's
enough benefit, yeah.
>> Frankly, for copy paths, I don't think all this matters at all. If
>> each request is small, other overheads will dominate. For large
>> requests, copying to and from sgl isn't something worth worrying
>> about.
>
> It is. Again, I tested that. Each new allocation in the order of
> the number of pages in a request adds 3-5% on a RAM I/O
Sure, if you do things against ram, anything will show up. Do you
have something more realistic?
> And it adds a point of failure. The less the better.
That is not the only measure. Things are always traded off using
multiple evaluation metrics. The same goes with the _NO_ allocation
thing.
>> For mapping paths, if we change gup to use sgl (we really can't make
>> it swallow bvec), I don't think it will make whole lot of difference
>> one way or the other. It might end up using slightly more cachelines,
>> but that will be about it.
>
> gup to use sgl, is out of the way, that was stupid idea to try eliminate
> one more dynamic allocation.
Yeah, I do agree gup is much better off with pages array.
>> I do agree using bvec as internal page carrier would be nicer (sans
>> having to implement stuff manually) but I'm not convinced whether
>> it'll be worth the added code.
>
> We can reach a point where there is no intermediate dynamic alloctions
> and no point of failures, but for the allocation of the final bio+biovec.
There is no need to obsess about no point of failure. Obsessing about
single evaluation metric is a nice and fast way toward a mess.
> And that "implement stuff manually" will also enable internal bio
> code cleanups since now biovec is that much smarter.
>
> <snip>
>>> And I have one more question. Are you sure kmalloc of
>>> bigger-then-page buffers are safe?
>> It has higher chance of failure but is definitely safe.
>
> Much much higher, not just 100% higher. And it will kill the
> use of block-layer in nommu systems.
Didn't know nommu would kill high order allocation or were you talking
about vmalloc?
>>> As I understood it, that tries to allocate physically contiguous
>>> pages which degrades as time passes, and last time I tried this with
>>> a kmem_cache (do to a bug) it crashed the kernel randomly after 2
>>> minutes of use.
>> Then, that's a bug.
>
> Fixing the bug will not help, the allocation will fail and the IO will
> not get through.
Under enough memory pressure, Non-fs IOs are gonna fail somewhere
along the line. If you don't like that, use mempool backed allocation
or preallocated data structures. Also, I believe it is generally
considered better to fail IO than oopsing. :-P
>> Yeah, high order allocations fail more easily as time passes. But
>> low order (1 maybe 2) allocations aren't too bad.
>
> No, we are not talking about 1 or 2. Today since Jens put the scatterlist
> chaining we can have lots and lots of them chained together. At the time
> Jens said that bio had the chaining option for a long time, only scatterlists
> limit the size of the request.
Alright, let's than chain bios but let's please do it inside bio
proper.
>> If it matters we can make bio_kmalloc() use vmalloc() for bvecs if it
>> goes over PAGE_SIZE but again given that nobody reported the
>
> No that is a regression, vmaloc is an order of a magnitude slower then
> just plain BIO chaining
It all depends on how frequent those multiple allocations will be.
> spurious
>> GFP_DMA in bounce_gfp which triggers frenzy OOM killing spree for
>> large SG_IOs, I don't think this matters all that much.
>>
>
> This BUG was only for HW, ie ata. But for stuff like iscsi FB sas
> they did fine.
The bug was for all controllers which couldn't do 64bit DMA on 64bit
machines.
> The fact of the matter is that people, me included, run in systems
> with very large request for a while now, large like 4096 pages which
> is 16 chained bios. Do you want to allocate that with vmalloc?
The patchset was written the way it's written because I couldn't see
any pressing performance requirement for the current existing users.
Optimizing for SG_IO against memory is simply not a good idea.
If you have in-kernel users which often require large transfers, sure,
chaining bio would be better. Still, let's do it _inside_ bio. Would
your use case be happy with something like blk_rq_map_kern_bvec(rq,
bvec) which can append to rq? If we convert internal implementation
over to bvec, blk_rq_map_kern_sg() can be sg->bvec converting wrapper
around blk_rq_map_kern_bvec().
> I love your cleanups, and your courage, which I don't have.
Thanks a lot but it's not like the code and Jens are scary monsters I
need to slay, so I don't think I'm being particularly courageous. :-)
> I want to help in any way I can. If you need just tell me what, and
> I'll be glad to try or test anything. This is fun I like to work and
> think on these things
Great, your reviews are comments are very helpful, so please keep them
coming. :-)
Jens, Tomo, what do you guys think?
Thanks.
--
tejun
Boaz Harrosh wrote:
> On 04/01/2009 04:44 PM, Tejun Heo wrote:
>> Impact: cleanup
>>
>> bio confusingly uses @write_to_vm and @reading for data directions,
>> both of which mean the opposite of the usual block/bio convention of
>> using READ and WRITE w.r.t. IO devices. The only place where the
>> inversion is necessary is when caling get_user_pages_fast() in
>> bio_copy_user_iov() as the gup uses the VM convention of read/write
>> w.r.t. VM.
>>
>> This patch converts all bio functions to use READ/WRITE rw parameter
>> and let the one place where inversion is necessary to rw == READ.
>>
>
> Hi one more nit picking just if you are at it. If you want
> I can do the work and send it to you so you can squash it into this patch.
> See bellow
>
>> + if (rw == WRITE)
>> + bio->bi_rw |= 1 << BIO_RW;
>
> can we pleas have an inline that does that? Like bio_set_dir()?
> and change all users. You will be surprised how many there are.
>
> It gives me an hart attack every time I have to write yet another
> one.
Things like this are actually what I'm trying to clean up. If you
look at the whole series, at the end, there remains only one place
which sets the flag in the blk/bio map paths and all users of
blk_rq_map_*() interface couldn't care less about the flag because
they provide only the necessary information through strictly defined
API.
Thanks.
--
tejun
On 04/02/2009 12:02 PM, Tejun Heo wrote:
> Boaz Harrosh wrote:
>> On 04/01/2009 04:44 PM, Tejun Heo wrote:
>>> Impact: cleanup
>>>
>>> bio confusingly uses @write_to_vm and @reading for data directions,
>>> both of which mean the opposite of the usual block/bio convention of
>>> using READ and WRITE w.r.t. IO devices. The only place where the
>>> inversion is necessary is when caling get_user_pages_fast() in
>>> bio_copy_user_iov() as the gup uses the VM convention of read/write
>>> w.r.t. VM.
>>>
>>> This patch converts all bio functions to use READ/WRITE rw parameter
>>> and let the one place where inversion is necessary to rw == READ.
>>>
>> Hi one more nit picking just if you are at it. If you want
>> I can do the work and send it to you so you can squash it into this patch.
>> See bellow
>>
>>> + if (rw == WRITE)
>>> + bio->bi_rw |= 1 << BIO_RW;
>> can we pleas have an inline that does that? Like bio_set_dir()?
>> and change all users. You will be surprised how many there are.
>>
>> It gives me an hart attack every time I have to write yet another
>> one.
>
> Things like this are actually what I'm trying to clean up. If you
> look at the whole series, at the end, there remains only one place
> which sets the flag in the blk/bio map paths and all users of
> blk_rq_map_*() interface couldn't care less about the flag because
> they provide only the necessary information through strictly defined
> API.
>
Right, I know. and its great.
I'm saying, all these other block-layer places and all
these places above block like stacking drivers and filesystems
That use generic_make_request and friends. If at it these can be
cleaned as well, in regard to this small thingy.
> Thanks.
>
I'll do it, no problems. Would you push it with your patchset?
Thanks
Boaz
Boaz Harrosh wrote:
> I'll do it, no problems. Would you push it with your patchset?
Sure thing.
--
tejun
On 04/02/2009 11:59 AM, Tejun Heo wrote:
> Hello, Boaz.
>
Hi Tejun
> Boaz Harrosh wrote:
>> So now that we got gup out of the way, will you go directly to bvecs?
>
> As said before, I don't think exposing bio at the driver interface is
> a good idea copy or not. For internal implementation, if there's
> enough benefit, yeah.
>
I want to draw your attention, to something you might be forgetting
sorry not to make it clearer before.
There is no users for blk_rq_map_kern_sgl(), and no need for any such
driver interface.
This is because there is a pending patchest done by TOMO, which is
in it's 4th incarnation by now, which removes the very last users
of scsi_execute_async() and it's use of sgl. together with all that
mess at scsi_lib.c. Removing scsi_execute_async() will remove the only
potential user of blk_rq_map_kern_sgl(). So it can be killed at birth.
The pachset was tested and debugged by all scsi ULD maintainers
and is in very stable state. I hope it can make it into 2.6.30
but for sure it will make it into 2.6.31, which is the time frame
we are talking about.
James any chance this patchset can make 2.6.30?
Please note that the important historical users of scsi_execute_async()
sg.c and sr.c don't use it already in 2.6.28-29, and the last bugs of that
change, where wrinkled out.
Are there currently in your patchset any other users of blk_rq_map_kern_sgl()?
>>> Frankly, for copy paths, I don't think all this matters at all. If
>>> each request is small, other overheads will dominate. For large
>>> requests, copying to and from sgl isn't something worth worrying
>>> about.
>> It is. Again, I tested that. Each new allocation in the order of
>> the number of pages in a request adds 3-5% on a RAM I/O
>
> Sure, if you do things against ram, anything will show up. Do you
> have something more realistic?
>
Reads from some SSDs/FLASH is just as fast as RAM. Battery backup RAM
is very common. Pete Wyckoff reached speeds over RDMA which come close
to RAM.
>> And it adds a point of failure. The less the better.
>
> That is not the only measure. Things are always traded off using
> multiple evaluation metrics. The same goes with the _NO_ allocation
> thing.
>
>>> I do agree using bvec as internal page carrier would be nicer (sans
>>> having to implement stuff manually) but I'm not convinced whether
>>> it'll be worth the added code.
>> We can reach a point where there is no intermediate dynamic alloctions
>> and no point of failures, but for the allocation of the final bio+biovec.
>
> There is no need to obsess about no point of failure. Obsessing about
> single evaluation metric is a nice and fast way toward a mess.
>
I agree completely. "no mess" - first priority, "no point of failure"
- far behind second. "Lots of work" - not considered
Is there no way we can reach both?
>> And that "implement stuff manually" will also enable internal bio
>> code cleanups since now biovec is that much smarter.
>>
>> <snip>
>
> Under enough memory pressure, Non-fs IOs are gonna fail somewhere
> along the line. If you don't like that, use mempool backed allocation
> or preallocated data structures.
bio's and bvects do that, no?
> Also, I believe it is generally
> considered better to fail IO than oopsing. :-P
>
I agree, exactly my point, eliminate any extra allocation is the
best way for that.
>>> Yeah, high order allocations fail more easily as time passes. But
>>> low order (1 maybe 2) allocations aren't too bad.
>> No, we are not talking about 1 or 2. Today since Jens put the scatterlist
>> chaining we can have lots and lots of them chained together. At the time
>> Jens said that bio had the chaining option for a long time, only scatterlists
>> limit the size of the request.
>
> Alright, let's than chain bios but let's please do it inside bio
> proper.
>
Yes, I completely agree. This can/should be internal bio business.
Doing it this way will also add robustness to other users of bio
like filesystems raid-engines and staking drivers.
Thanks that would be ideal.
>> If it matters we can make bio_kmalloc() use vmalloc() for bvecs if it
>>> goes over PAGE_SIZE but again given that nobody reported the
>> No that is a regression, vmaloc is an order of a magnitude slower then
>> just plain BIO chaining
>
> It all depends on how frequent those multiple allocations will be.
>
When they come they come in groups. I'm talking about steady stream
of these that come for minutes. think any streaming application
like backup or plain cp of very large files, video prepossessing, ...
I know, at Panasas we have only these stuff.
>> spurious
>>> GFP_DMA in bounce_gfp which triggers frenzy OOM killing spree for
>>> large SG_IOs, I don't think this matters all that much.
>>>
>> This BUG was only for HW, ie ata. But for stuff like iscsi FB sas
>> they did fine.
>
> The bug was for all controllers which couldn't do 64bit DMA on 64bit
> machines.
>
>> The fact of the matter is that people, me included, run in systems
>> with very large request for a while now, large like 4096 pages which
>> is 16 chained bios. Do you want to allocate that with vmalloc?
>
> The patchset was written the way it's written because I couldn't see
> any pressing performance requirement for the current existing users.
> Optimizing for SG_IO against memory is simply not a good idea.
>
> If you have in-kernel users which often require large transfers, sure,
> chaining bio would be better. Still, let's do it _inside_ bio. Would
> your use case be happy with something like blk_rq_map_kern_bvec(rq,
> bvec) which can append to rq? If we convert internal implementation
> over to bvec, blk_rq_map_kern_sg() can be sg->bvec converting wrapper
> around blk_rq_map_kern_bvec().
>
See above I answered that. No need for blk_rq_map_kern_bvec() nor
blk_rq_map_kern_sg(), currently. OSD absolutely needs the direct
use of bios via the fs route, and the other users are going out
fast.
And yes bio chaining at the bio level is a dream come true
for me, and it drops 3 items off my todo list.
Thanks
>> I love your cleanups, and your courage, which I don't have.
>
> Thanks a lot but it's not like the code and Jens are scary monsters I
> need to slay, so I don't think I'm being particularly courageous. :-)
>
>> I want to help in any way I can. If you need just tell me what, and
>> I'll be glad to try or test anything. This is fun I like to work and
>> think on these things
>
> Great, your reviews are comments are very helpful, so please keep them
> coming. :-)
>
> Jens, Tomo, what do you guys think?
>
> Thanks.
>
Thank you
Boaz
On Wed, 01 Apr 2009 20:00:44 +0300
Boaz Harrosh <[email protected]> wrote:
> On 04/01/2009 04:44 PM, Tejun Heo wrote:
> > Impact: hack removal
> >
> > SCSI needs to map sgl into rq for kernel PC requests; however, block
> > API didn't have such feature so it used its own rq mapping function
> > which hooked into block/bio internals and is generally considered an
> > ugly hack. The private function may also produce requests which are
> > bigger than queue per-rq limits.
> >
> > Block blk_rq_map_kern_sgl(). Kill the private implementation and use
> > it.
> >
> > Signed-off-by: Tejun Heo <[email protected]>
>
> James, TOMO
>
> what happened to Tomo's patches that removes all this after fixing up
> all users (sg.c)?
>
> I thought that was agreed and done? What is left to do for that to go
> in.
I've converted all the users (sg, st, osst). Nothing is left. So we
don't need this.
FUJITA Tomonori wrote:
>> I thought that was agreed and done? What is left to do for that to go
>> in.
>
> I've converted all the users (sg, st, osst). Nothing is left. So we
> don't need this.
Yeah, pulled it. Okay, so we can postpone diddling with request
mapping for now. I'll re-post fixes only from this and the previous
patchset and proceed with other patchsets.
Thanks.
--
tejun
On Mon, 13 Apr 2009 18:38:23 +0900
Tejun Heo <[email protected]> wrote:
> FUJITA Tomonori wrote:
> >> I thought that was agreed and done? What is left to do for that to go
> >> in.
> >
> > I've converted all the users (sg, st, osst). Nothing is left. So we
> > don't need this.
>
> Yeah, pulled it. Okay, so we can postpone diddling with request
> mapping for now. I'll re-post fixes only from this and the previous
> patchset and proceed with other patchsets.
To be honest, I don't think that we can clean up the block
mapping. For example, blk_rq_map_kern_prealloc() in your patchset
doesn't look cleanup to me. It's just moving a hack from ide to the
block (well, I have to admit that I did the same thing when I
converted sg/st/osst...).
We can't have good API with insane users. I don't want to ask this
publicly but don't we have any chance to merge the old ide code and
libata?
Hi,
On Mon, Apr 13, 2009 at 07:07:58PM +0900, FUJITA Tomonori wrote:
> On Mon, 13 Apr 2009 18:38:23 +0900
> Tejun Heo <[email protected]> wrote:
>
> > FUJITA Tomonori wrote:
> > >> I thought that was agreed and done? What is left to do for that to go
> > >> in.
> > >
> > > I've converted all the users (sg, st, osst). Nothing is left. So we
> > > don't need this.
> >
> > Yeah, pulled it. Okay, so we can postpone diddling with request
> > mapping for now. I'll re-post fixes only from this and the previous
> > patchset and proceed with other patchsets.
>
> To be honest, I don't think that we can clean up the block
> mapping. For example, blk_rq_map_kern_prealloc() in your patchset
> doesn't look cleanup to me. It's just moving a hack from ide to the
> block (well, I have to admit that I did the same thing when I
> converted sg/st/osst...).
Well, since blk_rq_map_kern_prealloc() is going to be used only in
ide-cd (driver needs it to queue a sense request from within the irq
handler) and since it is considered a hack I could try to move it out of
the irq handler and do away only with blk_rq_map_kern() if that is more
of an agreeable solution?
--
Regards/Gruss,
Boris.
On Mon, 13 Apr 2009 14:59:12 +0200
Borislav Petkov <[email protected]> wrote:
> Hi,
>
> On Mon, Apr 13, 2009 at 07:07:58PM +0900, FUJITA Tomonori wrote:
> > On Mon, 13 Apr 2009 18:38:23 +0900
> > Tejun Heo <[email protected]> wrote:
> >
> > > FUJITA Tomonori wrote:
> > > >> I thought that was agreed and done? What is left to do for that to go
> > > >> in.
> > > >
> > > > I've converted all the users (sg, st, osst). Nothing is left. So we
> > > > don't need this.
> > >
> > > Yeah, pulled it. Okay, so we can postpone diddling with request
> > > mapping for now. I'll re-post fixes only from this and the previous
> > > patchset and proceed with other patchsets.
> >
> > To be honest, I don't think that we can clean up the block
> > mapping. For example, blk_rq_map_kern_prealloc() in your patchset
> > doesn't look cleanup to me. It's just moving a hack from ide to the
> > block (well, I have to admit that I did the same thing when I
> > converted sg/st/osst...).
>
> Well, since blk_rq_map_kern_prealloc() is going to be used only in
> ide-cd (driver needs it to queue a sense request from within the irq
> handler) and since it is considered a hack I could try to move it out of
> the irq handler
It's quite nice if you could do that.
But I think that ide-atapi also needs blk_rq_map_kern_prealloc().
> and do away only with blk_rq_map_kern() if that is more
> of an agreeable solution?
Yeah, blk_rq_map_kern() is much proper.
(adding Bart to CC)
On Tue, Apr 14, 2009 at 09:44:31AM +0900, FUJITA Tomonori wrote:
> On Mon, 13 Apr 2009 14:59:12 +0200
> Borislav Petkov <[email protected]> wrote:
>
> > Hi,
> >
> > On Mon, Apr 13, 2009 at 07:07:58PM +0900, FUJITA Tomonori wrote:
> > > On Mon, 13 Apr 2009 18:38:23 +0900
> > > Tejun Heo <[email protected]> wrote:
> > >
> > > > FUJITA Tomonori wrote:
> > > > >> I thought that was agreed and done? What is left to do for that to go
> > > > >> in.
> > > > >
> > > > > I've converted all the users (sg, st, osst). Nothing is left. So we
> > > > > don't need this.
> > > >
> > > > Yeah, pulled it. Okay, so we can postpone diddling with request
> > > > mapping for now. I'll re-post fixes only from this and the previous
> > > > patchset and proceed with other patchsets.
> > >
> > > To be honest, I don't think that we can clean up the block
> > > mapping. For example, blk_rq_map_kern_prealloc() in your patchset
> > > doesn't look cleanup to me. It's just moving a hack from ide to the
> > > block (well, I have to admit that I did the same thing when I
> > > converted sg/st/osst...).
> >
> > Well, since blk_rq_map_kern_prealloc() is going to be used only in
> > ide-cd (driver needs it to queue a sense request from within the irq
> > handler) and since it is considered a hack I could try to move it out of
> > the irq handler
>
> It's quite nice if you could do that.
>
> But I think that ide-atapi also needs blk_rq_map_kern_prealloc().
I'll look into that too.
> > and do away only with blk_rq_map_kern() if that is more
> > of an agreeable solution?
>
> Yeah, blk_rq_map_kern() is much proper.
How's that for starters? I know, it is rough but it seems to work
according to my initial testing.
Basically, I opted for preallocating a sense request in the ->do_request
routine and do that only on demand, i.e. I reinitialize it only if it
got used in the irq handler. So in case you want to shove a rq sense in
front of the queue, you simply use the already prepared one. Then in the
irq handler it is being finished the usual ways (blk_end_request). Next
time around you ->do_request, you reallocate it again since it got eaten
in the last round.
The good thing is that now I don't need all those static block layer
structs in the driver (bio, bio_vec, etc) and do the preferred dynamic
allocation instead.
The patch is ontop of Tejun's series at
http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=ide-phase1
with some small modifications in commit 15783b1443f810ae72cb5ccb3a3a3ccc3aeb8729
wrt proper sense buffer length.
I'm sure there's enough room for improvement so please let me know if
any objections/comments.
---
diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 35d0973..e689494 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -206,32 +206,21 @@ static void cdrom_analyze_sense_data(ide_drive_t *drive,
ide_cd_log_error(drive->name, failed_command, sense);
}
-static void cdrom_queue_request_sense(ide_drive_t *drive, void *sense,
- struct request *failed_command)
+static struct request *ide_cd_prep_sense(ide_drive_t *drive)
{
struct cdrom_info *info = drive->driver_data;
+ void *sense = &info->sense_data;
struct request *rq = &drive->request_sense_rq;
- struct bio *bio = &drive->request_sense_bio;
- struct bio_vec *bvec = drive->request_sense_bvec;
- unsigned int bvec_len = ARRAY_SIZE(drive->request_sense_bvec);
- unsigned sense_len = 18;
- int error;
+ unsigned sense_len = sizeof(struct request_sense);
ide_debug_log(IDE_DBG_SENSE, "enter");
- if (sense == NULL) {
- sense = &info->sense_data;
- sense_len = sizeof(struct request_sense);
- }
-
memset(sense, 0, sense_len);
- /* stuff the sense request in front of our current request */
blk_rq_init(NULL, rq);
- error = blk_rq_map_kern_prealloc(drive->queue, rq, bio, bvec, bvec_len,
- sense, sense_len, true);
- BUG_ON(error);
+ if (blk_rq_map_kern(drive->queue, rq, sense, sense_len, __GFP_WAIT))
+ return NULL;
rq->rq_disk = info->disk;
@@ -241,18 +230,17 @@ static void cdrom_queue_request_sense(ide_drive_t *drive, void *sense,
rq->cmd_type = REQ_TYPE_SENSE;
rq->cmd_flags |= REQ_PREEMPT;
- /* NOTE! Save the failed command in "rq->special" */
- rq->special = (void *)failed_command;
-
- if (failed_command)
- ide_debug_log(IDE_DBG_SENSE, "failed_cmd: 0x%x",
- failed_command->cmd[0]);
+ return rq;
+}
- drive->hwif->rq = NULL;
+static void ide_cd_queue_rq_sense(ide_drive_t *drive)
+{
+ BUG_ON(!drive->rq_sense);
- elv_add_request(drive->queue, rq, ELEVATOR_INSERT_FRONT, 0);
+ elv_add_request(drive->queue, drive->rq_sense, ELEVATOR_INSERT_FRONT, 0);
}
+
static void ide_cd_complete_failed_rq(ide_drive_t *drive, struct request *rq)
{
/*
@@ -440,7 +428,7 @@ static int cdrom_decode_status(ide_drive_t *drive, u8 stat)
/* if we got a CHECK_CONDITION status, queue a request sense command */
if (stat & ATA_ERR)
- cdrom_queue_request_sense(drive, NULL, NULL);
+ ide_cd_queue_rq_sense(drive);
return 1;
end_request:
@@ -454,7 +442,7 @@ end_request:
hwif->rq = NULL;
- cdrom_queue_request_sense(drive, rq->sense, rq);
+ ide_cd_queue_rq_sense(drive);
return 1;
} else
return 2;
@@ -788,6 +776,10 @@ out_end:
ide_complete_rq(drive, uptodate ? 0 : -EIO, nsectors << 9);
+ /* our sense buffer got used, reset it the next time around. */
+ if (sense)
+ drive->rq_sense = NULL;
+
if (sense && rc == 2)
ide_error(drive, "request sense failure", stat);
}
@@ -901,6 +893,25 @@ static ide_startstop_t ide_cd_do_request(ide_drive_t *drive, struct request *rq,
goto out_end;
}
+ /*
+ * prepare request sense if it got used with the last rq
+ */
+ if (!drive->rq_sense) {
+ drive->rq_sense = ide_cd_prep_sense(drive);
+ if (!drive->rq_sense) {
+ printk(KERN_ERR "%s: error prepping sense request!\n",
+ drive->name);
+ return ide_stopped;
+ }
+ }
+
+ /*
+ * save the current request in case we'll be queueing a sense rq
+ * afterwards due to its potential failure.
+ */
+ if (!blk_sense_request(rq))
+ drive->rq_sense->special = (void *)rq;
+
memset(&cmd, 0, sizeof(cmd));
if (rq_data_dir(rq))
diff --git a/include/linux/ide.h b/include/linux/ide.h
index c942533..4c2d310 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -605,6 +605,9 @@ struct ide_drive_s {
struct request request_sense_rq;
struct bio request_sense_bio;
struct bio_vec request_sense_bvec[2];
+
+ /* current sense rq */
+ struct request *rq_sense;
};
typedef struct ide_drive_s ide_drive_t;
--
Regards/Gruss,
Boris.
On Tue, 14 Apr 2009 12:01:31 +0200
Borislav Petkov <[email protected]> wrote:
> (adding Bart to CC)
>
> On Tue, Apr 14, 2009 at 09:44:31AM +0900, FUJITA Tomonori wrote:
> > On Mon, 13 Apr 2009 14:59:12 +0200
> > Borislav Petkov <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > On Mon, Apr 13, 2009 at 07:07:58PM +0900, FUJITA Tomonori wrote:
> > > > On Mon, 13 Apr 2009 18:38:23 +0900
> > > > Tejun Heo <[email protected]> wrote:
> > > >
> > > > > FUJITA Tomonori wrote:
> > > > > >> I thought that was agreed and done? What is left to do for that to go
> > > > > >> in.
> > > > > >
> > > > > > I've converted all the users (sg, st, osst). Nothing is left. So we
> > > > > > don't need this.
> > > > >
> > > > > Yeah, pulled it. Okay, so we can postpone diddling with request
> > > > > mapping for now. I'll re-post fixes only from this and the previous
> > > > > patchset and proceed with other patchsets.
> > > >
> > > > To be honest, I don't think that we can clean up the block
> > > > mapping. For example, blk_rq_map_kern_prealloc() in your patchset
> > > > doesn't look cleanup to me. It's just moving a hack from ide to the
> > > > block (well, I have to admit that I did the same thing when I
> > > > converted sg/st/osst...).
> > >
> > > Well, since blk_rq_map_kern_prealloc() is going to be used only in
> > > ide-cd (driver needs it to queue a sense request from within the irq
> > > handler) and since it is considered a hack I could try to move it out of
> > > the irq handler
> >
> > It's quite nice if you could do that.
> >
> > But I think that ide-atapi also needs blk_rq_map_kern_prealloc().
>
> I'll look into that too.
Great, thanks!
> > > and do away only with blk_rq_map_kern() if that is more
> > > of an agreeable solution?
> >
> > Yeah, blk_rq_map_kern() is much proper.
>
> How's that for starters? I know, it is rough but it seems to work
> according to my initial testing.
>
> Basically, I opted for preallocating a sense request in the ->do_request
> routine and do that only on demand, i.e. I reinitialize it only if it
> got used in the irq handler. So in case you want to shove a rq sense in
> front of the queue, you simply use the already prepared one. Then in the
> irq handler it is being finished the usual ways (blk_end_request). Next
> time around you ->do_request, you reallocate it again since it got eaten
> in the last round.
Sounds a workable solution.
> The good thing is that now I don't need all those static block layer
> structs in the driver (bio, bio_vec, etc) and do the preferred dynamic
> allocation instead.
That's surely good.
Well, if you could remove the usage of request structure that are not
came from blk_get_request, it will be super. But it's a different
topic and Tejun can go forward without such change.
> The patch is ontop of Tejun's series at
> http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=ide-phase1
> with some small modifications in commit 15783b1443f810ae72cb5ccb3a3a3ccc3aeb8729
> wrt proper sense buffer length.
I think that Tejun will drop some of the patchset. At least, we don't
need blk_rq_map_kern_prealloc stuff. I think that Tejun doesn't need
to play with the mapping API. Well, we need to play with the mapping
API for OSD but it's not directly related with the block layer
cleanups necessary for the libata SCSI separation.
Tejun?
Hello,
Sorry about the delay.
FUJITA Tomonori wrote:
>> Basically, I opted for preallocating a sense request in the ->do_request
>> routine and do that only on demand, i.e. I reinitialize it only if it
>> got used in the irq handler. So in case you want to shove a rq sense in
>> front of the queue, you simply use the already prepared one. Then in the
>> irq handler it is being finished the usual ways (blk_end_request). Next
>> time around you ->do_request, you reallocate it again since it got eaten
>> in the last round.
>
> Sounds a workable solution.
Haven't actually looked at the code but sweeeeeet.
>> The good thing is that now I don't need all those static block layer
>> structs in the driver (bio, bio_vec, etc) and do the preferred dynamic
>> allocation instead.
>
> That's surely good.
>
> Well, if you could remove the usage of request structure that are not
> came from blk_get_request, it will be super. But it's a different
> topic and Tejun can go forward without such change.
>
>> The patch is ontop of Tejun's series at
>> http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=ide-phase1
>> with some small modifications in commit 15783b1443f810ae72cb5ccb3a3a3ccc3aeb8729
>> wrt proper sense buffer length.
>
> I think that Tejun will drop some of the patchset. At least, we don't
> need blk_rq_map_kern_prealloc stuff. I think that Tejun doesn't need
> to play with the mapping API. Well, we need to play with the mapping
> API for OSD but it's not directly related with the block layer
> cleanups necessary for the libata SCSI separation.
Yeah, the blk_rq_map_kern_prealloc() was basically shifting rq map
from ide to blk/bio so that at least codes are all in one place. If
it's not necessary, super. :-)
I'll drop stuff from this and the other patchset and repost them with
Borislav's patch in a few hours. Thanks guys.
--
tejun
Hi guys,
On Wed, Apr 15, 2009 at 01:25:04PM +0900, Tejun Heo wrote:
> >> Basically, I opted for preallocating a sense request in the ->do_request
> >> routine and do that only on demand, i.e. I reinitialize it only if it
> >> got used in the irq handler. So in case you want to shove a rq sense in
> >> front of the queue, you simply use the already prepared one. Then in the
> >> irq handler it is being finished the usual ways (blk_end_request). Next
> >> time around you ->do_request, you reallocate it again since it got eaten
> >> in the last round.
> >
> > Sounds a workable solution.
>
> Haven't actually looked at the code but sweeeeeet.
>
> >> The good thing is that now I don't need all those static block layer
> >> structs in the driver (bio, bio_vec, etc) and do the preferred dynamic
> >> allocation instead.
> >
> > That's surely good.
> >
> > Well, if you could remove the usage of request structure that are not
> > came from blk_get_request, it will be super. But it's a different
> > topic and Tejun can go forward without such change.
> >
> >> The patch is ontop of Tejun's series at
> >> http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=ide-phase1
> >> with some small modifications in commit 15783b1443f810ae72cb5ccb3a3a3ccc3aeb8729
> >> wrt proper sense buffer length.
> >
> > I think that Tejun will drop some of the patchset. At least, we don't
> > need blk_rq_map_kern_prealloc stuff. I think that Tejun doesn't need
> > to play with the mapping API. Well, we need to play with the mapping
> > API for OSD but it's not directly related with the block layer
> > cleanups necessary for the libata SCSI separation.
>
> Yeah, the blk_rq_map_kern_prealloc() was basically shifting rq map
> from ide to blk/bio so that at least codes are all in one place. If
> it's not necessary, super. :-)
>
> I'll drop stuff from this and the other patchset and repost them with
> Borislav's patch in a few hours. Thanks guys.
here's a version which gets rid of the static drive->request_sense_rq
structure and does the usual blk_get_request(), as Fujita suggested.
@Tejun: we're gonna need the same thing for ide-atapi before you'll be
able to get rid of the _prealloc() hack. I'll try to cook something up by
tomorrow the latest.
---
From: Borislav Petkov <[email protected]>
Date: Tue, 14 Apr 2009 13:24:43 +0200
Subject: [PATCH] ide-cd: preallocate rq sense out of the irq path
Preallocate a sense request in the ->do_request method and reinitialize
it only on demand, in case it's been consumed in the IRQ handler path.
The reason for this is that we don't want to be mapping rq to bio in
the IRQ path and introduce all kinds of unnecessary hacks to the block
layer.
CC: Bartlomiej Zolnierkiewicz <[email protected]>
CC: FUJITA Tomonori <[email protected]>
CC: Tejun Heo <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
drivers/ide/ide-cd.c | 67 +++++++++++++++++++++++++++++---------------------
include/linux/ide.h | 3 ++
2 files changed, 42 insertions(+), 28 deletions(-)
diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 35d0973..82c9339 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -206,32 +206,21 @@ static void cdrom_analyze_sense_data(ide_drive_t *drive,
ide_cd_log_error(drive->name, failed_command, sense);
}
-static void cdrom_queue_request_sense(ide_drive_t *drive, void *sense,
- struct request *failed_command)
+static struct request *ide_cd_prep_sense(ide_drive_t *drive)
{
struct cdrom_info *info = drive->driver_data;
- struct request *rq = &drive->request_sense_rq;
- struct bio *bio = &drive->request_sense_bio;
- struct bio_vec *bvec = drive->request_sense_bvec;
- unsigned int bvec_len = ARRAY_SIZE(drive->request_sense_bvec);
- unsigned sense_len = 18;
- int error;
+ void *sense = &info->sense_data;
+ unsigned sense_len = sizeof(struct request_sense);
+ struct request *rq;
ide_debug_log(IDE_DBG_SENSE, "enter");
- if (sense == NULL) {
- sense = &info->sense_data;
- sense_len = sizeof(struct request_sense);
- }
-
memset(sense, 0, sense_len);
- /* stuff the sense request in front of our current request */
- blk_rq_init(NULL, rq);
+ rq = blk_get_request(drive->queue, 0, __GFP_WAIT);
- error = blk_rq_map_kern_prealloc(drive->queue, rq, bio, bvec, bvec_len,
- sense, sense_len, true);
- BUG_ON(error);
+ if (blk_rq_map_kern(drive->queue, rq, sense, sense_len, __GFP_WAIT))
+ return NULL;
rq->rq_disk = info->disk;
@@ -241,18 +230,17 @@ static void cdrom_queue_request_sense(ide_drive_t *drive, void *sense,
rq->cmd_type = REQ_TYPE_SENSE;
rq->cmd_flags |= REQ_PREEMPT;
- /* NOTE! Save the failed command in "rq->special" */
- rq->special = (void *)failed_command;
-
- if (failed_command)
- ide_debug_log(IDE_DBG_SENSE, "failed_cmd: 0x%x",
- failed_command->cmd[0]);
+ return rq;
+}
- drive->hwif->rq = NULL;
+static void ide_cd_queue_rq_sense(ide_drive_t *drive)
+{
+ BUG_ON(!drive->rq_sense);
- elv_add_request(drive->queue, rq, ELEVATOR_INSERT_FRONT, 0);
+ elv_add_request(drive->queue, drive->rq_sense, ELEVATOR_INSERT_FRONT, 0);
}
+
static void ide_cd_complete_failed_rq(ide_drive_t *drive, struct request *rq)
{
/*
@@ -440,7 +428,7 @@ static int cdrom_decode_status(ide_drive_t *drive, u8 stat)
/* if we got a CHECK_CONDITION status, queue a request sense command */
if (stat & ATA_ERR)
- cdrom_queue_request_sense(drive, NULL, NULL);
+ ide_cd_queue_rq_sense(drive);
return 1;
end_request:
@@ -454,7 +442,7 @@ end_request:
hwif->rq = NULL;
- cdrom_queue_request_sense(drive, rq->sense, rq);
+ ide_cd_queue_rq_sense(drive);
return 1;
} else
return 2;
@@ -788,6 +776,10 @@ out_end:
ide_complete_rq(drive, uptodate ? 0 : -EIO, nsectors << 9);
+ /* our sense buffer got used, reset it the next time around. */
+ if (sense)
+ drive->rq_sense = NULL;
+
if (sense && rc == 2)
ide_error(drive, "request sense failure", stat);
}
@@ -901,6 +893,25 @@ static ide_startstop_t ide_cd_do_request(ide_drive_t *drive, struct request *rq,
goto out_end;
}
+ /*
+ * prepare request sense if it got used with the last rq
+ */
+ if (!drive->rq_sense) {
+ drive->rq_sense = ide_cd_prep_sense(drive);
+ if (!drive->rq_sense) {
+ printk(KERN_ERR "%s: error prepping sense request!\n",
+ drive->name);
+ return ide_stopped;
+ }
+ }
+
+ /*
+ * save the current request in case we'll be queueing a sense rq
+ * afterwards due to its potential failure.
+ */
+ if (!blk_sense_request(rq))
+ drive->rq_sense->special = (void *)rq;
+
memset(&cmd, 0, sizeof(cmd));
if (rq_data_dir(rq))
diff --git a/include/linux/ide.h b/include/linux/ide.h
index c942533..4c2d310 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -605,6 +605,9 @@ struct ide_drive_s {
struct request request_sense_rq;
struct bio request_sense_bio;
struct bio_vec request_sense_bvec[2];
+
+ /* current sense rq */
+ struct request *rq_sense;
};
typedef struct ide_drive_s ide_drive_t;
--
1.6.2.2
--
Regards/Gruss,
Boris.
On Wed, 15 Apr 2009 09:26:55 +0200
Borislav Petkov <[email protected]> wrote:
> Hi guys,
>
> On Wed, Apr 15, 2009 at 01:25:04PM +0900, Tejun Heo wrote:
> > >> Basically, I opted for preallocating a sense request in the ->do_request
> > >> routine and do that only on demand, i.e. I reinitialize it only if it
> > >> got used in the irq handler. So in case you want to shove a rq sense in
> > >> front of the queue, you simply use the already prepared one. Then in the
> > >> irq handler it is being finished the usual ways (blk_end_request). Next
> > >> time around you ->do_request, you reallocate it again since it got eaten
> > >> in the last round.
> > >
> > > Sounds a workable solution.
> >
> > Haven't actually looked at the code but sweeeeeet.
> >
> > >> The good thing is that now I don't need all those static block layer
> > >> structs in the driver (bio, bio_vec, etc) and do the preferred dynamic
> > >> allocation instead.
> > >
> > > That's surely good.
> > >
> > > Well, if you could remove the usage of request structure that are not
> > > came from blk_get_request, it will be super. But it's a different
> > > topic and Tejun can go forward without such change.
> > >
> > >> The patch is ontop of Tejun's series at
> > >> http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=ide-phase1
> > >> with some small modifications in commit 15783b1443f810ae72cb5ccb3a3a3ccc3aeb8729
> > >> wrt proper sense buffer length.
> > >
> > > I think that Tejun will drop some of the patchset. At least, we don't
> > > need blk_rq_map_kern_prealloc stuff. I think that Tejun doesn't need
> > > to play with the mapping API. Well, we need to play with the mapping
> > > API for OSD but it's not directly related with the block layer
> > > cleanups necessary for the libata SCSI separation.
> >
> > Yeah, the blk_rq_map_kern_prealloc() was basically shifting rq map
> > from ide to blk/bio so that at least codes are all in one place. If
> > it's not necessary, super. :-)
> >
> > I'll drop stuff from this and the other patchset and repost them with
> > Borislav's patch in a few hours. Thanks guys.
>
> here's a version which gets rid of the static drive->request_sense_rq
> structure and does the usual blk_get_request(), as Fujita suggested.
>
> @Tejun: we're gonna need the same thing for ide-atapi before you'll be
> able to get rid of the _prealloc() hack. I'll try to cook something up by
> tomorrow the latest.
>
> ---
> From: Borislav Petkov <[email protected]>
> Date: Tue, 14 Apr 2009 13:24:43 +0200
> Subject: [PATCH] ide-cd: preallocate rq sense out of the irq path
>
> Preallocate a sense request in the ->do_request method and reinitialize
> it only on demand, in case it's been consumed in the IRQ handler path.
> The reason for this is that we don't want to be mapping rq to bio in
> the IRQ path and introduce all kinds of unnecessary hacks to the block
> layer.
>
> CC: Bartlomiej Zolnierkiewicz <[email protected]>
> CC: FUJITA Tomonori <[email protected]>
> CC: Tejun Heo <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
> drivers/ide/ide-cd.c | 67 +++++++++++++++++++++++++++++---------------------
> include/linux/ide.h | 3 ++
> 2 files changed, 42 insertions(+), 28 deletions(-)
Great, thanks!
I have some comments.
> diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
> index 35d0973..82c9339 100644
> --- a/drivers/ide/ide-cd.c
> +++ b/drivers/ide/ide-cd.c
> @@ -206,32 +206,21 @@ static void cdrom_analyze_sense_data(ide_drive_t *drive,
> ide_cd_log_error(drive->name, failed_command, sense);
> }
>
> -static void cdrom_queue_request_sense(ide_drive_t *drive, void *sense,
> - struct request *failed_command)
> +static struct request *ide_cd_prep_sense(ide_drive_t *drive)
> {
> struct cdrom_info *info = drive->driver_data;
> - struct request *rq = &drive->request_sense_rq;
> - struct bio *bio = &drive->request_sense_bio;
> - struct bio_vec *bvec = drive->request_sense_bvec;
> - unsigned int bvec_len = ARRAY_SIZE(drive->request_sense_bvec);
> - unsigned sense_len = 18;
> - int error;
> + void *sense = &info->sense_data;
> + unsigned sense_len = sizeof(struct request_sense);
> + struct request *rq;
>
> ide_debug_log(IDE_DBG_SENSE, "enter");
>
> - if (sense == NULL) {
> - sense = &info->sense_data;
> - sense_len = sizeof(struct request_sense);
> - }
> -
> memset(sense, 0, sense_len);
>
> - /* stuff the sense request in front of our current request */
> - blk_rq_init(NULL, rq);
> + rq = blk_get_request(drive->queue, 0, __GFP_WAIT);
>
> - error = blk_rq_map_kern_prealloc(drive->queue, rq, bio, bvec, bvec_len,
> - sense, sense_len, true);
> - BUG_ON(error);
> + if (blk_rq_map_kern(drive->queue, rq, sense, sense_len, __GFP_WAIT))
> + return NULL;
>
> rq->rq_disk = info->disk;
>
> @@ -241,18 +230,17 @@ static void cdrom_queue_request_sense(ide_drive_t *drive, void *sense,
> rq->cmd_type = REQ_TYPE_SENSE;
> rq->cmd_flags |= REQ_PREEMPT;
>
> - /* NOTE! Save the failed command in "rq->special" */
> - rq->special = (void *)failed_command;
> -
> - if (failed_command)
> - ide_debug_log(IDE_DBG_SENSE, "failed_cmd: 0x%x",
> - failed_command->cmd[0]);
> + return rq;
> +}
>
> - drive->hwif->rq = NULL;
> +static void ide_cd_queue_rq_sense(ide_drive_t *drive)
> +{
> + BUG_ON(!drive->rq_sense);
>
> - elv_add_request(drive->queue, rq, ELEVATOR_INSERT_FRONT, 0);
> + elv_add_request(drive->queue, drive->rq_sense, ELEVATOR_INSERT_FRONT, 0);
> }
>
> +
> static void ide_cd_complete_failed_rq(ide_drive_t *drive, struct request *rq)
> {
> /*
> @@ -440,7 +428,7 @@ static int cdrom_decode_status(ide_drive_t *drive, u8 stat)
>
> /* if we got a CHECK_CONDITION status, queue a request sense command */
> if (stat & ATA_ERR)
> - cdrom_queue_request_sense(drive, NULL, NULL);
> + ide_cd_queue_rq_sense(drive);
> return 1;
>
> end_request:
> @@ -454,7 +442,7 @@ end_request:
>
> hwif->rq = NULL;
>
> - cdrom_queue_request_sense(drive, rq->sense, rq);
> + ide_cd_queue_rq_sense(drive);
> return 1;
> } else
> return 2;
> @@ -788,6 +776,10 @@ out_end:
>
> ide_complete_rq(drive, uptodate ? 0 : -EIO, nsectors << 9);
>
> + /* our sense buffer got used, reset it the next time around. */
> + if (sense)
> + drive->rq_sense = NULL;
Needs to call blk_put_request() here?
I guess that we also need to call blk_put_request() when ide_drive_s
is freed.
> +
> if (sense && rc == 2)
> ide_error(drive, "request sense failure", stat);
> }
> @@ -901,6 +893,25 @@ static ide_startstop_t ide_cd_do_request(ide_drive_t *drive, struct request *rq,
> goto out_end;
> }
>
> + /*
> + * prepare request sense if it got used with the last rq
> + */
> + if (!drive->rq_sense) {
> + drive->rq_sense = ide_cd_prep_sense(drive);
> + if (!drive->rq_sense) {
> + printk(KERN_ERR "%s: error prepping sense request!\n",
> + drive->name);
> + return ide_stopped;
> + }
> + }
> +
> + /*
> + * save the current request in case we'll be queueing a sense rq
> + * afterwards due to its potential failure.
> + */
> + if (!blk_sense_request(rq))
> + drive->rq_sense->special = (void *)rq;
> +
> memset(&cmd, 0, sizeof(cmd));
>
> if (rq_data_dir(rq))
> diff --git a/include/linux/ide.h b/include/linux/ide.h
> index c942533..4c2d310 100644
> --- a/include/linux/ide.h
> +++ b/include/linux/ide.h
> @@ -605,6 +605,9 @@ struct ide_drive_s {
> struct request request_sense_rq;
> struct bio request_sense_bio;
> struct bio_vec request_sense_bvec[2];
We can remove the above, right?
> +
> + /* current sense rq */
> + struct request *rq_sense;
> };
>
> typedef struct ide_drive_s ide_drive_t;
> --
> 1.6.2.2
>
>
> --
> Regards/Gruss,
> Boris.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Hi,
On Wed, Apr 15, 2009 at 04:48:35PM +0900, FUJITA Tomonori wrote:
[..]
> I have some comments.
>
>
> > diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
> > index 35d0973..82c9339 100644
> > --- a/drivers/ide/ide-cd.c
> > +++ b/drivers/ide/ide-cd.c
> > @@ -206,32 +206,21 @@ static void cdrom_analyze_sense_data(ide_drive_t *drive,
> > ide_cd_log_error(drive->name, failed_command, sense);
> > }
> >
> > -static void cdrom_queue_request_sense(ide_drive_t *drive, void *sense,
> > - struct request *failed_command)
> > +static struct request *ide_cd_prep_sense(ide_drive_t *drive)
> > {
> > struct cdrom_info *info = drive->driver_data;
> > - struct request *rq = &drive->request_sense_rq;
> > - struct bio *bio = &drive->request_sense_bio;
> > - struct bio_vec *bvec = drive->request_sense_bvec;
> > - unsigned int bvec_len = ARRAY_SIZE(drive->request_sense_bvec);
> > - unsigned sense_len = 18;
> > - int error;
> > + void *sense = &info->sense_data;
> > + unsigned sense_len = sizeof(struct request_sense);
> > + struct request *rq;
> >
> > ide_debug_log(IDE_DBG_SENSE, "enter");
> >
> > - if (sense == NULL) {
> > - sense = &info->sense_data;
> > - sense_len = sizeof(struct request_sense);
> > - }
> > -
> > memset(sense, 0, sense_len);
> >
> > - /* stuff the sense request in front of our current request */
> > - blk_rq_init(NULL, rq);
> > + rq = blk_get_request(drive->queue, 0, __GFP_WAIT);
> >
> > - error = blk_rq_map_kern_prealloc(drive->queue, rq, bio, bvec, bvec_len,
> > - sense, sense_len, true);
> > - BUG_ON(error);
> > + if (blk_rq_map_kern(drive->queue, rq, sense, sense_len, __GFP_WAIT))
> > + return NULL;
> >
> > rq->rq_disk = info->disk;
> >
> > @@ -241,18 +230,17 @@ static void cdrom_queue_request_sense(ide_drive_t *drive, void *sense,
> > rq->cmd_type = REQ_TYPE_SENSE;
> > rq->cmd_flags |= REQ_PREEMPT;
> >
> > - /* NOTE! Save the failed command in "rq->special" */
> > - rq->special = (void *)failed_command;
> > -
> > - if (failed_command)
> > - ide_debug_log(IDE_DBG_SENSE, "failed_cmd: 0x%x",
> > - failed_command->cmd[0]);
> > + return rq;
> > +}
> >
> > - drive->hwif->rq = NULL;
> > +static void ide_cd_queue_rq_sense(ide_drive_t *drive)
> > +{
> > + BUG_ON(!drive->rq_sense);
> >
> > - elv_add_request(drive->queue, rq, ELEVATOR_INSERT_FRONT, 0);
> > + elv_add_request(drive->queue, drive->rq_sense, ELEVATOR_INSERT_FRONT, 0);
> > }
> >
> > +
> > static void ide_cd_complete_failed_rq(ide_drive_t *drive, struct request *rq)
> > {
> > /*
> > @@ -440,7 +428,7 @@ static int cdrom_decode_status(ide_drive_t *drive, u8 stat)
> >
> > /* if we got a CHECK_CONDITION status, queue a request sense command */
> > if (stat & ATA_ERR)
> > - cdrom_queue_request_sense(drive, NULL, NULL);
> > + ide_cd_queue_rq_sense(drive);
> > return 1;
> >
> > end_request:
> > @@ -454,7 +442,7 @@ end_request:
> >
> > hwif->rq = NULL;
> >
> > - cdrom_queue_request_sense(drive, rq->sense, rq);
> > + ide_cd_queue_rq_sense(drive);
> > return 1;
> > } else
> > return 2;
> > @@ -788,6 +776,10 @@ out_end:
> >
> > ide_complete_rq(drive, uptodate ? 0 : -EIO, nsectors << 9);
> >
> > + /* our sense buffer got used, reset it the next time around. */
> > + if (sense)
> > + drive->rq_sense = NULL;
>
> Needs to call blk_put_request() here?
No, because this is done by ide_complete_rq() above. Otherwise I'd be
ending the sense request even it didn't get used. This way, I only use
it if I call ide_cd_queue_rq_sense() from which cdrom_decode_status().
> I guess that we also need to call blk_put_request() when ide_drive_s
> is freed.
That we need to do, thanks. Will update accordingly.
> > +
> > if (sense && rc == 2)
> > ide_error(drive, "request sense failure", stat);
> > }
> > @@ -901,6 +893,25 @@ static ide_startstop_t ide_cd_do_request(ide_drive_t *drive, struct request *rq,
> > goto out_end;
> > }
> >
> > + /*
> > + * prepare request sense if it got used with the last rq
> > + */
> > + if (!drive->rq_sense) {
> > + drive->rq_sense = ide_cd_prep_sense(drive);
> > + if (!drive->rq_sense) {
> > + printk(KERN_ERR "%s: error prepping sense request!\n",
> > + drive->name);
> > + return ide_stopped;
> > + }
> > + }
> > +
> > + /*
> > + * save the current request in case we'll be queueing a sense rq
> > + * afterwards due to its potential failure.
> > + */
> > + if (!blk_sense_request(rq))
> > + drive->rq_sense->special = (void *)rq;
> > +
> > memset(&cmd, 0, sizeof(cmd));
> >
> > if (rq_data_dir(rq))
> > diff --git a/include/linux/ide.h b/include/linux/ide.h
> > index c942533..4c2d310 100644
> > --- a/include/linux/ide.h
> > +++ b/include/linux/ide.h
> > @@ -605,6 +605,9 @@ struct ide_drive_s {
> > struct request request_sense_rq;
> > struct bio request_sense_bio;
> > struct bio_vec request_sense_bvec[2];
>
> We can remove the above, right?
id-atapi uses them too, see
http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=commitdiff;h=9ac15840a6e5bf1fa6dce293484cb7aba4d078bb
They can go after I've converted ide-floppy and ide-tape to the same
sense handling as ide-cd. I'm on it...
--
Regards/Gruss,
Boris.
I updated the patch a bit and folded it into the series after patch
10. I hope I didn't butcher the patch too bad. Can I add your
Signed-off-by: on the modified patch? Also, I just started working on
the atapi version. Do you already have something?
Thanks.
From: Borislav Petkov <[email protected]>
Date: Tue, 14 Apr 2009 13:24:43 +0200
Subject: [PATCH] ide-cd: preallocate rq sense out of the irq path
Preallocate a sense request in the ->do_request method and reinitialize
it only on demand, in case it's been consumed in the IRQ handler path.
The reason for this is that we don't want to be mapping rq to bio in
the IRQ path and introduce all kinds of unnecessary hacks to the block
layer.
tj: * After this patch, ide_cd_do_request() might sleep. This should
be okay as ide request_fn - do_ide_request() - is invoked only
from make_request and plug work. Make sure this is the case by
adding might_sleep() to do_ide_request().
* Adapted to apply to the tree without blk_rq_map_prealloc()
changes.
* Use of blk_rq_map() and possible failure handling from it are
moved to later separate patch. ide_cd_prep_sense() now handles
everything regarding sense rq preparation.
* Move ide_drive->rq_sense to cdrom_info->sense_rq and put the
request when releasing cdrom_info.
* Both user and kernel PC requests expect sense data to be stored
in separate storage other than info->sense_data. Copy sense
data to rq->sense on completion if rq->sense is not NULL. This
fixes bogus sense data on PC requests.
CC: Bartlomiej Zolnierkiewicz <[email protected]>
CC: FUJITA Tomonori <[email protected]>
CC: Tejun Heo <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
---
drivers/ide/ide-cd.c | 69 +++++++++++++++++++++++++++++++--------------------
drivers/ide/ide-cd.h | 3 ++
drivers/ide/ide-io.c | 3 ++
3 files changed, 49 insertions(+), 26 deletions(-)
Index: block/drivers/ide/ide-cd.c
===================================================================
--- block.orig/drivers/ide/ide-cd.c
+++ block/drivers/ide/ide-cd.c
@@ -206,42 +206,44 @@ static void cdrom_analyze_sense_data(ide
ide_cd_log_error(drive->name, failed_command, sense);
}
-static void cdrom_queue_request_sense(ide_drive_t *drive, void *sense,
- struct request *failed_command)
+static void ide_cd_prep_sense(ide_drive_t *drive, struct request *rq)
{
struct cdrom_info *info = drive->driver_data;
- struct request *rq = &drive->request_sense_rq;
+ void *sense = &info->sense_data;
ide_debug_log(IDE_DBG_SENSE, "enter");
- if (sense == NULL)
- sense = &info->sense_data;
+ if (blk_sense_request(rq))
+ return;
- memset(sense, 0, 18);
+ if (!info->sense_rq) {
+ struct request *sense_rq;
- /* stuff the sense request in front of our current request */
- blk_rq_init(NULL, rq);
- rq->cmd_type = REQ_TYPE_ATA_PC;
- rq->rq_disk = info->disk;
+ memset(sense, 0, 18);
- rq->data = sense;
- rq->cmd[0] = GPCMD_REQUEST_SENSE;
- rq->cmd[4] = 18;
- rq->data_len = 18;
+ sense_rq = blk_get_request(drive->queue, 0, __GFP_WAIT);
+ sense_rq->rq_disk = info->disk;
- rq->cmd_type = REQ_TYPE_SENSE;
- rq->cmd_flags |= REQ_PREEMPT;
+ sense_rq->data = sense;
+ sense_rq->cmd[0] = GPCMD_REQUEST_SENSE;
+ sense_rq->cmd[4] = 18;
+ sense_rq->data_len = 18;
- /* NOTE! Save the failed command in "rq->special" */
- rq->special = (void *)failed_command;
+ sense_rq->cmd_type = REQ_TYPE_SENSE;
+ sense_rq->cmd_flags |= REQ_PREEMPT;
- if (failed_command)
- ide_debug_log(IDE_DBG_SENSE, "failed_cmd: 0x%x",
- failed_command->cmd[0]);
+ info->sense_rq = sense_rq;
+ }
+
+ info->sense_rq->special = rq;
+}
- drive->hwif->rq = NULL;
+static void ide_cd_queue_sense_rq(ide_drive_t *drive)
+{
+ struct cdrom_info *info = drive->driver_data;
- elv_add_request(drive->queue, rq, ELEVATOR_INSERT_FRONT, 0);
+ BUG_ON(!info->sense_rq);
+ elv_add_request(drive->queue, info->sense_rq, ELEVATOR_INSERT_FRONT, 0);
}
static void ide_cd_complete_failed_rq(ide_drive_t *drive, struct request *rq)
@@ -252,10 +254,16 @@ static void ide_cd_complete_failed_rq(id
*/
struct request *failed = (struct request *)rq->special;
struct cdrom_info *info = drive->driver_data;
- void *sense = &info->sense_data;
+ struct request_sense *sense = &info->sense_data;
if (failed) {
if (failed->sense) {
+ /*
+ * Sense is always read into info->sense_data.
+ * Copy back if the failed request has its
+ * sense pointer set.
+ */
+ memcpy(failed->sense, sense, sizeof(*sense));
sense = failed->sense;
failed->sense_len = rq->sense_len;
}
@@ -431,7 +439,7 @@ static int cdrom_decode_status(ide_drive
/* if we got a CHECK_CONDITION status, queue a request sense command */
if (stat & ATA_ERR)
- cdrom_queue_request_sense(drive, NULL, NULL);
+ ide_cd_queue_sense_rq(drive);
return 1;
end_request:
@@ -445,7 +453,7 @@ end_request:
hwif->rq = NULL;
- cdrom_queue_request_sense(drive, rq->sense, rq);
+ ide_cd_queue_sense_rq(drive);
return 1;
} else
return 2;
@@ -600,6 +608,7 @@ static void ide_cd_error_cmd(ide_drive_t
static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
{
+ struct cdrom_info *info = drive->driver_data;
ide_hwif_t *hwif = drive->hwif;
struct ide_cmd *cmd = &hwif->cmd;
struct request *rq = hwif->rq;
@@ -775,6 +784,10 @@ out_end:
ide_complete_rq(drive, uptodate ? 0 : -EIO, nsectors << 9);
+ /* our sense buffer got used, reset it for the next round */
+ if (sense)
+ info->sense_rq = NULL;
+
if (sense && rc == 2)
ide_error(drive, "request sense failure", stat);
}
@@ -893,6 +906,9 @@ static ide_startstop_t ide_cd_do_request
goto out_end;
}
+ /* prepare sense request */
+ ide_cd_prep_sense(drive, rq);
+
memset(&cmd, 0, sizeof(cmd));
if (rq_data_dir(rq))
@@ -1634,6 +1650,7 @@ static void ide_cd_release(struct device
ide_debug_log(IDE_DBG_FUNC, "enter");
kfree(info->toc);
+ blk_put_request(info->sense_rq);
if (devinfo->handle == drive)
unregister_cdrom(devinfo);
drive->driver_data = NULL;
Index: block/drivers/ide/ide-io.c
===================================================================
--- block.orig/drivers/ide/ide-io.c
+++ block/drivers/ide/ide-io.c
@@ -478,6 +478,9 @@ void do_ide_request(struct request_queue
spin_unlock_irq(q->queue_lock);
+ /* HLD do_request() callback might sleep, make sure it's okay */
+ might_sleep();
+
if (ide_lock_host(host, hwif))
goto plug_device_2;
Index: block/drivers/ide/ide-cd.h
===================================================================
--- block.orig/drivers/ide/ide-cd.h
+++ block/drivers/ide/ide-cd.h
@@ -98,6 +98,9 @@ struct cdrom_info {
struct cdrom_device_info devinfo;
unsigned long write_timeout;
+
+ /* current sense rq */
+ struct request *sense_rq;
};
/* ide-cd_verbose.c */
Hi,
On Thu, Apr 16, 2009 at 12:06:29PM +0900, Tejun Heo wrote:
> I updated the patch a bit and folded it into the series after patch
> 10. I hope I didn't butcher the patch too bad. Can I add your
> Signed-off-by: on the modified patch? Also, I just started working on
> the atapi version. Do you already have something?
Ok here's what I got: I went and converted all ide-cd and ide-atapi to
using a common routine ide_prep_sense, see following patches for that.
Preliminary testing is promising but you never know :).
[..]
> Preallocate a sense request in the ->do_request method and reinitialize
> it only on demand, in case it's been consumed in the IRQ handler path.
> The reason for this is that we don't want to be mapping rq to bio in
> the IRQ path and introduce all kinds of unnecessary hacks to the block
> layer.
>
> tj: * After this patch, ide_cd_do_request() might sleep. This should
> be okay as ide request_fn - do_ide_request() - is invoked only
> from make_request and plug work. Make sure this is the case by
> adding might_sleep() to do_ide_request().
>
> * Adapted to apply to the tree without blk_rq_map_prealloc()
> changes.
>
> * Use of blk_rq_map() and possible failure handling from it are
> moved to later separate patch. ide_cd_prep_sense() now handles
> everything regarding sense rq preparation.
>
> * Move ide_drive->rq_sense to cdrom_info->sense_rq and put the
> request when releasing cdrom_info.
I put the request_sense buffer as defined in <linux/cdrom.h> into the
drive struct so that each device can have its own buffer. IMHO, request
sense standard data should be pretty identical across most ATAPI devices
(yeah, I'm sure there are exceptions :)). We might move the struct
request_sense to a more generic location instead of <linux/cdrom.h> if
we decide to go with that.
> * Both user and kernel PC requests expect sense data to be stored
> in separate storage other than info->sense_data. Copy sense
> data to rq->sense on completion if rq->sense is not NULL. This
> fixes bogus sense data on PC requests.
I took that into my version of the patch.
Anyway, please do have a closer look in case I've missed something. Of
course you can split them the way you see fit, there's more to be done
there anyways but should suffice for your block layer stuff, I hope.
Thanks.
--
Regards/Gruss,
Boris.
This is in preparation of removing the queueing of a sense request out
of the IRQ handler path.
Use struct request_sense as a general sense buffer for all ATAPI devices
ide-{floppy,tape,cd}.
CC: Bartlomiej Zolnierkiewicz <[email protected]>
CC: FUJITA Tomonori <[email protected]>
CC: Tejun Heo <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
drivers/ide/ide-atapi.c | 34 ++++++++++++++++++++++++++++++++++
include/linux/ide.h | 10 ++++++++++
2 files changed, 44 insertions(+), 0 deletions(-)
diff --git a/drivers/ide/ide-atapi.c b/drivers/ide/ide-atapi.c
index 8054974..1f4d20f 100644
--- a/drivers/ide/ide-atapi.c
+++ b/drivers/ide/ide-atapi.c
@@ -200,6 +200,40 @@ void ide_create_request_sense_cmd(ide_drive_t *drive, struct ide_atapi_pc *pc)
}
EXPORT_SYMBOL_GPL(ide_create_request_sense_cmd);
+struct request *ide_prep_sense(ide_drive_t *drive, struct gendisk *disk)
+{
+ void *sense = &drive->sense_data;
+ unsigned sense_len = sizeof(struct request_sense);
+ struct request *rq;
+
+ debug_log("%s: enter\n", __func__);
+
+ memset(sense, 0, sense_len);
+
+ rq = blk_get_request(drive->queue, 0, __GFP_WAIT);
+
+ if (blk_rq_map_kern(drive->queue, rq, sense, sense_len, __GFP_WAIT))
+ return NULL;
+
+ rq->rq_disk = disk;
+
+ rq->cmd[0] = GPCMD_REQUEST_SENSE;
+ rq->cmd[4] = sense_len;
+
+ rq->cmd_type = REQ_TYPE_SENSE;
+ rq->cmd_flags |= REQ_PREEMPT;
+
+ return rq;
+}
+EXPORT_SYMBOL_GPL(ide_prep_sense);
+
+void ide_queue_sense_rq(ide_drive_t *drive)
+{
+ BUG_ON(!drive->sense_rq);
+ elv_add_request(drive->queue, drive->sense_rq, ELEVATOR_INSERT_FRONT, 0);
+}
+EXPORT_SYMBOL_GPL(ide_queue_sense_rq);
+
/*
* Called when an error was detected during the last packet command.
* We queue a request sense packet command in the head of the request list.
diff --git a/include/linux/ide.h b/include/linux/ide.h
index c942533..095cda2 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -26,6 +26,9 @@
#include <asm/io.h>
#include <asm/mutex.h>
+/* for request_sense */
+#include <linux/cdrom.h>
+
#if defined(CONFIG_CRIS) || defined(CONFIG_FRV) || defined(CONFIG_MN10300)
# define SUPPORT_VLB_SYNC 0
#else
@@ -605,6 +608,10 @@ struct ide_drive_s {
struct request request_sense_rq;
struct bio request_sense_bio;
struct bio_vec request_sense_bvec[2];
+
+ /* current sense rq and buffer */
+ struct request *sense_rq;
+ struct request_sense sense_data;
};
typedef struct ide_drive_s ide_drive_t;
@@ -1178,6 +1185,9 @@ int ide_set_media_lock(ide_drive_t *, struct gendisk *, int);
void ide_create_request_sense_cmd(ide_drive_t *, struct ide_atapi_pc *);
void ide_retry_pc(ide_drive_t *, struct gendisk *);
+struct request *ide_prep_sense(ide_drive_t *, struct gendisk *);
+void ide_queue_sense_rq(ide_drive_t *);
+
int ide_cd_expiry(ide_drive_t *);
int ide_cd_get_xferlen(struct request *);
--
1.6.2.2
--
Regards/Gruss,
Boris.
Preallocate a sense request in the ->do_request method and reinitialize
it only on demand, in case it's been consumed in the IRQ handler path.
The reason for this is that we don't want to be mapping rq to bio in
the IRQ path and introduce all kinds of unnecessary hacks to the block
layer.
tj: * After this patch, ide_cd_do_request() might sleep. This should
be okay as ide request_fn - do_ide_request() - is invoked only
from make_request and plug work. Make sure this is the case by
adding might_sleep() to do_ide_request().
* Both user and kernel PC requests expect sense data to be stored
in separate storage other than info->sense_data. Copy sense
data to rq->sense on completion if rq->sense is not NULL. This
fixes bogus sense data on PC requests.
As a result, remove cdrom_queue_request_sense.
CC: Bartlomiej Zolnierkiewicz <[email protected]>
CC: FUJITA Tomonori <[email protected]>
CC: Tejun Heo <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
drivers/ide/ide-atapi.c | 2 +
drivers/ide/ide-cd.c | 86 +++++++++++++++++++----------------------------
drivers/ide/ide-io.c | 3 ++
3 files changed, 40 insertions(+), 51 deletions(-)
diff --git a/drivers/ide/ide-atapi.c b/drivers/ide/ide-atapi.c
index 1f4d20f..fa09789 100644
--- a/drivers/ide/ide-atapi.c
+++ b/drivers/ide/ide-atapi.c
@@ -229,6 +229,8 @@ EXPORT_SYMBOL_GPL(ide_prep_sense);
void ide_queue_sense_rq(ide_drive_t *drive)
{
+ drive->hwif->rq = NULL;
+
BUG_ON(!drive->sense_rq);
elv_add_request(drive->queue, drive->sense_rq, ELEVATOR_INSERT_FRONT, 0);
}
diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 35d0973..0a50c77 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -206,53 +206,6 @@ static void cdrom_analyze_sense_data(ide_drive_t *drive,
ide_cd_log_error(drive->name, failed_command, sense);
}
-static void cdrom_queue_request_sense(ide_drive_t *drive, void *sense,
- struct request *failed_command)
-{
- struct cdrom_info *info = drive->driver_data;
- struct request *rq = &drive->request_sense_rq;
- struct bio *bio = &drive->request_sense_bio;
- struct bio_vec *bvec = drive->request_sense_bvec;
- unsigned int bvec_len = ARRAY_SIZE(drive->request_sense_bvec);
- unsigned sense_len = 18;
- int error;
-
- ide_debug_log(IDE_DBG_SENSE, "enter");
-
- if (sense == NULL) {
- sense = &info->sense_data;
- sense_len = sizeof(struct request_sense);
- }
-
- memset(sense, 0, sense_len);
-
- /* stuff the sense request in front of our current request */
- blk_rq_init(NULL, rq);
-
- error = blk_rq_map_kern_prealloc(drive->queue, rq, bio, bvec, bvec_len,
- sense, sense_len, true);
- BUG_ON(error);
-
- rq->rq_disk = info->disk;
-
- rq->cmd[0] = GPCMD_REQUEST_SENSE;
- rq->cmd[4] = sense_len;
-
- rq->cmd_type = REQ_TYPE_SENSE;
- rq->cmd_flags |= REQ_PREEMPT;
-
- /* NOTE! Save the failed command in "rq->special" */
- rq->special = (void *)failed_command;
-
- if (failed_command)
- ide_debug_log(IDE_DBG_SENSE, "failed_cmd: 0x%x",
- failed_command->cmd[0]);
-
- drive->hwif->rq = NULL;
-
- elv_add_request(drive->queue, rq, ELEVATOR_INSERT_FRONT, 0);
-}
-
static void ide_cd_complete_failed_rq(ide_drive_t *drive, struct request *rq)
{
/*
@@ -260,11 +213,16 @@ static void ide_cd_complete_failed_rq(ide_drive_t *drive, struct request *rq)
* failed request
*/
struct request *failed = (struct request *)rq->special;
- struct cdrom_info *info = drive->driver_data;
- void *sense = &info->sense_data;
+ struct request_sense *sense = &drive->sense_data;
if (failed) {
if (failed->sense) {
+ /*
+ * Sense is always read into info->sense_data.
+ * Copy back if the failed request has its
+ * sense pointer set.
+ */
+ memcpy(failed->sense, sense, sizeof(*sense));
sense = failed->sense;
failed->sense_len = rq->sense_len;
}
@@ -440,7 +398,7 @@ static int cdrom_decode_status(ide_drive_t *drive, u8 stat)
/* if we got a CHECK_CONDITION status, queue a request sense command */
if (stat & ATA_ERR)
- cdrom_queue_request_sense(drive, NULL, NULL);
+ ide_queue_sense_rq(drive);
return 1;
end_request:
@@ -454,7 +412,7 @@ end_request:
hwif->rq = NULL;
- cdrom_queue_request_sense(drive, rq->sense, rq);
+ ide_queue_sense_rq(drive);
return 1;
} else
return 2;
@@ -788,6 +746,10 @@ out_end:
ide_complete_rq(drive, uptodate ? 0 : -EIO, nsectors << 9);
+ /* our sense buffer got used, reset it the next time around. */
+ if (sense)
+ drive->sense_rq = NULL;
+
if (sense && rc == 2)
ide_error(drive, "request sense failure", stat);
}
@@ -872,6 +834,7 @@ static void cdrom_do_block_pc(ide_drive_t *drive, struct request *rq)
static ide_startstop_t ide_cd_do_request(ide_drive_t *drive, struct request *rq,
sector_t block)
{
+ struct cdrom_info *info = drive->driver_data;
struct ide_cmd cmd;
int uptodate = 0, nsectors;
@@ -901,6 +864,23 @@ static ide_startstop_t ide_cd_do_request(ide_drive_t *drive, struct request *rq,
goto out_end;
}
+ /* prepare request sense if it got used with the last rq */
+ if (!drive->sense_rq) {
+ drive->sense_rq = ide_prep_sense(drive, info->disk);
+ if (!drive->sense_rq) {
+ printk(KERN_ERR "%s: error prepping sense request!\n",
+ drive->name);
+ return ide_stopped;
+ }
+ }
+
+ /*
+ * save the current request in case we'll be queueing a sense rq
+ * afterwards due to its potential failure.
+ */
+ if (!blk_sense_request(rq))
+ drive->sense_rq->special = rq;
+
memset(&cmd, 0, sizeof(cmd));
if (rq_data_dir(rq))
@@ -1644,6 +1624,10 @@ static void ide_cd_release(struct device *dev)
kfree(info->toc);
if (devinfo->handle == drive)
unregister_cdrom(devinfo);
+
+ if (drive->sense_rq)
+ blk_put_request(drive->sense_rq);
+
drive->driver_data = NULL;
blk_queue_prep_rq(drive->queue, NULL);
g->private_data = NULL;
diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index 492699d..a4625e1 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -478,6 +478,9 @@ void do_ide_request(struct request_queue *q)
spin_unlock_irq(q->queue_lock);
+ /* HLD do_request() callback might sleep, make sure it's okay */
+ might_sleep();
+
if (ide_lock_host(host, hwif))
goto plug_device_2;
--
1.6.2.2
--
Regards/Gruss,
Boris.
Since we're issuing REQ_TYPE_SENSE now we need to allow those types of
rqs in the ->do_request callbacks. As a future improvement, sense_len
assignment might be unified across all ATAPI devices. Borislav to check
with specs and test.
As a result, get rid of ide_queue_pc_head() and the block layer structs.
CC: Bartlomiej Zolnierkiewicz <[email protected]>
CC: FUJITA Tomonori <[email protected]>
CC: Tejun Heo <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
drivers/ide/ide-atapi.c | 59 ++++++++++++++--------------------------------
drivers/ide/ide-floppy.c | 19 ++++++++++++++-
drivers/ide/ide-gd.c | 4 +++
drivers/ide/ide-tape.c | 23 +++++++++++++++++-
include/linux/ide.h | 3 --
5 files changed, 62 insertions(+), 46 deletions(-)
diff --git a/drivers/ide/ide-atapi.c b/drivers/ide/ide-atapi.c
index fa09789..43a2337 100644
--- a/drivers/ide/ide-atapi.c
+++ b/drivers/ide/ide-atapi.c
@@ -80,40 +80,6 @@ void ide_init_pc(struct ide_atapi_pc *pc)
EXPORT_SYMBOL_GPL(ide_init_pc);
/*
- * Generate a new packet command request in front of the request queue, before
- * the current request, so that it will be processed immediately, on the next
- * pass through the driver.
- */
-static void ide_queue_pc_head(ide_drive_t *drive, struct gendisk *disk,
- struct ide_atapi_pc *pc, struct request *rq,
- struct bio *bio, struct bio_vec *bvec,
- unsigned short bvec_len)
-{
- blk_rq_init(NULL, rq);
- rq->cmd_type = REQ_TYPE_SPECIAL;
- rq->cmd_flags |= REQ_PREEMPT;
- rq->special = (char *)pc;
- rq->rq_disk = disk;
-
- if (pc->req_xfer) {
- int error;
-
- error = blk_rq_map_kern_prealloc(drive->queue, rq, bio,
- bvec, bvec_len,
- pc->buf, pc->req_xfer, true);
- BUG_ON(error);
- }
-
- memcpy(rq->cmd, pc->c, 12);
- if (drive->media == ide_tape)
- rq->cmd[13] = REQ_IDETAPE_PC1;
-
- drive->hwif->rq = NULL;
-
- elv_add_request(drive->queue, rq, ELEVATOR_INSERT_FRONT, 0);
-}
-
-/*
* Add a special packet command request to the tail of the request queue,
* and wait for it to be serviced.
*/
@@ -237,22 +203,29 @@ void ide_queue_sense_rq(ide_drive_t *drive)
EXPORT_SYMBOL_GPL(ide_queue_sense_rq);
/*
- * Called when an error was detected during the last packet command.
- * We queue a request sense packet command in the head of the request list.
+ * Called when an error was detected during the last packet command. We queue a
+ * request sense packet command at the head of the request queue.
*/
void ide_retry_pc(ide_drive_t *drive, struct gendisk *disk)
{
- struct request *rq = &drive->request_sense_rq;
- struct bio *bio = &drive->request_sense_bio;
- struct bio_vec *bvec = drive->request_sense_bvec;
struct ide_atapi_pc *pc = &drive->request_sense_pc;
- unsigned short bvec_len = ARRAY_SIZE(drive->request_sense_bvec);
(void)ide_read_error(drive);
ide_create_request_sense_cmd(drive, pc);
if (drive->media == ide_tape)
set_bit(IDE_AFLAG_IGNORE_DSC, &drive->atapi_flags);
- ide_queue_pc_head(drive, disk, pc, rq, bio, bvec, bvec_len);
+
+ drive->sense_rq->special = (char *)pc;
+ drive->sense_rq->rq_disk = disk;
+
+ /* FIXME: sense_len == sizeof(struct request_sense) */
+ memcpy(drive->sense_rq->cmd, pc->c, 12);
+ drive->sense_rq->sense_len = pc->req_xfer;
+
+ if (drive->media == ide_tape)
+ drive->sense_rq->cmd[13] = REQ_IDETAPE_PC1;
+
+ ide_queue_sense_rq(drive);
}
EXPORT_SYMBOL_GPL(ide_retry_pc);
@@ -425,6 +398,10 @@ static ide_startstop_t ide_pc_intr(ide_drive_t *drive)
error = uptodate ? 0 : -EIO;
}
+ /* prepare request sense if it got used with the last rq */
+ if (blk_sense_request(rq))
+ drive->sense_rq = NULL;
+
ide_complete_rq(drive, error, done);
return ide_stopped;
}
diff --git a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c
index 9460033..7d2b219 100644
--- a/drivers/ide/ide-floppy.c
+++ b/drivers/ide/ide-floppy.c
@@ -263,7 +263,7 @@ static ide_startstop_t ide_floppy_do_request(ide_drive_t *drive,
}
pc = &floppy->queued_pc;
idefloppy_create_rw_cmd(drive, pc, rq, (unsigned long)block);
- } else if (blk_special_request(rq)) {
+ } else if (blk_special_request(rq) || blk_sense_request(rq)) {
pc = (struct ide_atapi_pc *)rq->special;
} else if (blk_pc_request(rq)) {
pc = &floppy->queued_pc;
@@ -273,6 +273,23 @@ static ide_startstop_t ide_floppy_do_request(ide_drive_t *drive,
goto out_end;
}
+ /* prepare request sense if it got used with the last rq */
+ if (!drive->sense_rq) {
+ drive->sense_rq = ide_prep_sense(drive, floppy->disk);
+ if (!drive->sense_rq) {
+ printk(KERN_ERR "%s: error prepping sense request!\n",
+ drive->name);
+ return ide_stopped;
+ }
+ }
+
+ /*
+ * save the current request in case we'll be queueing a sense rq
+ * afterwards due to its potential failure.
+ */
+ if (!blk_sense_request(rq))
+ drive->sense_rq->special = rq;
+
memset(&cmd, 0, sizeof(cmd));
if (rq_data_dir(rq))
diff --git a/drivers/ide/ide-gd.c b/drivers/ide/ide-gd.c
index 1aebdf1..811c002 100644
--- a/drivers/ide/ide-gd.c
+++ b/drivers/ide/ide-gd.c
@@ -82,6 +82,10 @@ static void ide_disk_release(struct device *dev)
struct gendisk *g = idkp->disk;
drive->disk_ops = NULL;
+
+ if (drive->sense_rq)
+ blk_put_request(drive->sense_rq);
+
drive->driver_data = NULL;
g->private_data = NULL;
put_disk(g);
diff --git a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
index d52f8b3..e26c9c7 100644
--- a/drivers/ide/ide-tape.c
+++ b/drivers/ide/ide-tape.c
@@ -758,7 +758,7 @@ static ide_startstop_t idetape_do_request(ide_drive_t *drive,
(unsigned long long)rq->sector, rq->nr_sectors,
rq->current_nr_sectors);
- if (!blk_special_request(rq)) {
+ if (!(blk_special_request(rq) || blk_sense_request(rq))) {
/* We do not support buffer cache originated requests. */
printk(KERN_NOTICE "ide-tape: %s: Unsupported request in "
"request queue (%d)\n", drive->name, rq->cmd_type);
@@ -850,6 +850,23 @@ static ide_startstop_t idetape_do_request(ide_drive_t *drive,
BUG();
out:
+ /* prepare request sense if it got used with the last rq */
+ if (!drive->sense_rq) {
+ drive->sense_rq = ide_prep_sense(drive, tape->disk);
+ if (!drive->sense_rq) {
+ printk(KERN_ERR "%s: error prepping sense request!\n",
+ drive->name);
+ return ide_stopped;
+ }
+ }
+
+ /*
+ * save the current request in case we'll be queueing a sense rq
+ * afterwards due to its potential failure.
+ */
+ if (!blk_sense_request(rq))
+ drive->sense_rq->special = rq;
+
if (rq_data_dir(rq))
cmd.tf_flags |= IDE_TFLAG_WRITE;
@@ -2249,6 +2266,10 @@ static void ide_tape_release(struct device *dev)
BUG_ON(tape->merge_bh_size);
drive->dev_flags &= ~IDE_DFLAG_DSC_OVERLAP;
+
+ if (drive->sense_rq)
+ blk_put_request(drive->sense_rq);
+
drive->driver_data = NULL;
device_destroy(idetape_sysfs_class, MKDEV(IDETAPE_MAJOR, tape->minor));
device_destroy(idetape_sysfs_class,
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 095cda2..2238e61 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -605,9 +605,6 @@ struct ide_drive_s {
/* for request sense */
struct ide_atapi_pc request_sense_pc;
- struct request request_sense_rq;
- struct bio request_sense_bio;
- struct bio_vec request_sense_bvec[2];
/* current sense rq and buffer */
struct request *sense_rq;
--
1.6.2.2
--
Regards/Gruss,
Boris.
Hello, Borislav.
Borislav Petkov wrote:
> Hi,
>
> On Thu, Apr 16, 2009 at 12:06:29PM +0900, Tejun Heo wrote:
>> I updated the patch a bit and folded it into the series after patch
>> 10. I hope I didn't butcher the patch too bad. Can I add your
>> Signed-off-by: on the modified patch? Also, I just started working on
>> the atapi version. Do you already have something?
>
> Ok here's what I got: I went and converted all ide-cd and ide-atapi to
> using a common routine ide_prep_sense, see following patches for that.
> Preliminary testing is promising but you never know :).
Cool. I'll take a look.
> I put the request_sense buffer as defined in <linux/cdrom.h> into the
> drive struct so that each device can have its own buffer. IMHO, request
> sense standard data should be pretty identical across most ATAPI devices
> (yeah, I'm sure there are exceptions :)). We might move the struct
> request_sense to a more generic location instead of <linux/cdrom.h> if
> we decide to go with that.
Okay.
>> * Both user and kernel PC requests expect sense data to be stored
>> in separate storage other than info->sense_data. Copy sense
>> data to rq->sense on completion if rq->sense is not NULL. This
>> fixes bogus sense data on PC requests.
>
> I took that into my version of the patch.
>
> Anyway, please do have a closer look in case I've missed something. Of
> course you can split them the way you see fit, there's more to be done
> there anyways but should suffice for your block layer stuff, I hope.
There was another problem. If we use blk_rq_map_kern() the failed rq
must be finished after the sense rq is finished because that's when
the bio is copied back if it was copied. Before sense_rq completion,
the sense buffer doesn't contain any valid data.
Anyways, I'll review your patchset and integrate it with mine. Please
standby a bit.
Thanks.
--
tejun
Hi Tejun,
On Thu, Apr 16, 2009 at 03:07:09PM +0900, Tejun Heo wrote:
[..]
> There was another problem. If we use blk_rq_map_kern() the failed rq
> must be finished after the sense rq is finished because that's when
> the bio is copied back if it was copied. Before sense_rq completion,
> the sense buffer doesn't contain any valid data.
Well, as an idea, we could just postpone the completion of the failed rq
in:
if (sense && uptodate)
ide_cd_complete_failed_rq(drive, rq);
and put that just after
ide_complete_rq(drive, uptodate ? 0 : -EIO, nsectors << 9);
line in cdrom_newpc_intr(). This way, we can copy back the sense data
safely and then kill the rq.
The only problem I fear with changes like that is that later some subtle
interactions come about with some device which shouldn't normally
happen. This is old code, you know, which still needs lots of scrubbing.
It's like walking on a minefield :).
Thanks.
--
Regards/Gruss,
Boris.
Borislav Petkov wrote:
> Hi Tejun,
>
> On Thu, Apr 16, 2009 at 03:07:09PM +0900, Tejun Heo wrote:
>
> [..]
>
>> There was another problem. If we use blk_rq_map_kern() the failed rq
>> must be finished after the sense rq is finished because that's when
>> the bio is copied back if it was copied. Before sense_rq completion,
>> the sense buffer doesn't contain any valid data.
>
> Well, as an idea, we could just postpone the completion of the failed rq
> in:
>
> if (sense && uptodate)
> ide_cd_complete_failed_rq(drive, rq);
>
> and put that just after
>
> ide_complete_rq(drive, uptodate ? 0 : -EIO, nsectors << 9);
>
> line in cdrom_newpc_intr(). This way, we can copy back the sense data
> safely and then kill the rq.
Yeah, already have that in my patch.
> The only problem I fear with changes like that is that later some subtle
> interactions come about with some device which shouldn't normally
> happen. This is old code, you know, which still needs lots of scrubbing.
> It's like walking on a minefield :).
:-)
--
tejun