Scatter-gather lists (sgl_s) are frequently used as data carriers in
the block layer. For example the SCSI and NVMe subsystems interchange
data with the block layer using sgl_s. The sgl API is declared in
<linux/scatterlist.h>
The author has extended these transient sgl use cases to a store (i.e.
a ramdisk) in the scsi_debug driver. Other new potential uses of sgl_s
could be for caches. When this extra step is taken, the need to copy
between sgl_s becomes apparent. The patchset adds sgl_copy_sgl() and
two other sgl operations.
The existing sgl_alloc_order() function can be seen as a replacement
for vmalloc() for large, long-term allocations. For what seems like
no good reason, sgl_alloc_order() currently restricts its total
allocation to less than or equal to 4 GiB. vmalloc() has no such
restriction.
Changes since v2 [posted 20201018]:
- remove unneeded lines from sgl_memset() definition.
- change sg_zero_buffer() to call sgl_memset() as the former
is a subset.
Changes since v1 [posted 20201016]:
- Bodo Stroesser pointed out a problem with the nesting of
kmap_atomic() [called via sg_miter_next()] and kunmap_atomic()
calls [called via sg_miter_stop()] and proposed a solution that
simplifies the previous code.
- the new implementation of the three functions has shorter periods
when pre-emption is disabled (but has more them). This should
make operations on large sgl_s more pre-emption "friendly" with
a relatively small performance hit.
- sgl_memset return type changed from void to size_t and is the
number of bytes actually (over)written. That number is needed
anyway internally so may as well return it as it may be useful to
the caller.
This patchset is against lk 5.9.0
Douglas Gilbert (4):
sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
scatterlist: add sgl_copy_sgl() function
scatterlist: add sgl_compare_sgl() function
scatterlist: add sgl_memset()
include/linux/scatterlist.h | 12 +++
lib/scatterlist.c | 186 +++++++++++++++++++++++++++++++++---
2 files changed, 184 insertions(+), 14 deletions(-)
--
2.25.1
After enabling copies between scatter gather lists (sgl_s),
another storage related operation is to compare two sgl_s.
This new function is modelled on NVMe's Compare command and
the SCSI VERIFY(BYTCHK=1) command. Like memcmp() this function
returns false on the first miscompare and stops comparing.
Signed-off-by: Douglas Gilbert <[email protected]>
---
include/linux/scatterlist.h | 4 +++
lib/scatterlist.c | 61 +++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+)
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 6649414c0749..ae260dc5fedb 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -325,6 +325,10 @@ size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_ski
struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
size_t n_bytes);
+bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
+ struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
+ size_t n_bytes);
+
/*
* Maximum number of entries that will be allocated in one piece, if
* a list larger than this is required then chaining will be utilized.
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 1f9e093ad7da..49185536acba 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -1049,3 +1049,64 @@ size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_ski
}
EXPORT_SYMBOL(sgl_copy_sgl);
+/**
+ * sgl_compare_sgl - Compare x and y (both sgl_s)
+ * @x_sgl: x (left) sgl
+ * @x_nents: Number of SG entries in x (left) sgl
+ * @x_skip: Number of bytes to skip in x (left) before starting
+ * @y_sgl: y (right) sgl
+ * @y_nents: Number of SG entries in y (right) sgl
+ * @y_skip: Number of bytes to skip in y (right) before starting
+ * @n_bytes: The (maximum) number of bytes to compare
+ *
+ * Returns:
+ * true if x and y compare equal before x, y or n_bytes is exhausted.
+ * Otherwise on a miscompare, returns false (and stops comparing).
+ *
+ * Notes:
+ * x and y are symmetrical: they can be swapped and the result is the same.
+ *
+ * Implementation is based on memcmp(). x and y segments may overlap.
+ *
+ * The notes in sgl_copy_sgl() about large sgl_s _applies here as well.
+ *
+ **/
+bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
+ struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
+ size_t n_bytes)
+{
+ bool equ = true;
+ size_t len;
+ size_t offset = 0;
+ struct sg_mapping_iter x_iter, y_iter;
+
+ if (n_bytes == 0)
+ return true;
+ sg_miter_start(&x_iter, x_sgl, x_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
+ sg_miter_start(&y_iter, y_sgl, y_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
+ if (!sg_miter_skip(&x_iter, x_skip))
+ goto fini;
+ if (!sg_miter_skip(&y_iter, y_skip))
+ goto fini;
+
+ while (equ && offset < n_bytes) {
+ if (!sg_miter_next(&x_iter))
+ break;
+ if (!sg_miter_next(&y_iter))
+ break;
+ len = min3(x_iter.length, y_iter.length, n_bytes - offset);
+
+ equ = !memcmp(x_iter.addr, y_iter.addr, len);
+ offset += len;
+ /* LIFO order is important when SG_MITER_ATOMIC is used */
+ y_iter.consumed = len;
+ sg_miter_stop(&y_iter);
+ x_iter.consumed = len;
+ sg_miter_stop(&x_iter);
+ }
+fini:
+ sg_miter_stop(&y_iter);
+ sg_miter_stop(&x_iter);
+ return equ;
+}
+EXPORT_SYMBOL(sgl_compare_sgl);
--
2.25.1
The existing sg_zero_buffer() function is a bit restrictive.
For example protection information (PI) blocks are usually
initialized to 0xff bytes. As its name suggests sgl_memset()
is modelled on memset(). One difference is the type of the
val argument which is u8 rather than int. Plus it returns
the number of bytes (over)written.
Change implementation of sg_zero_buffer() to call this new
function.
Signed-off-by: Douglas Gilbert <[email protected]>
---
include/linux/scatterlist.h | 3 ++
lib/scatterlist.c | 65 +++++++++++++++++++++++++------------
2 files changed, 48 insertions(+), 20 deletions(-)
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index ae260dc5fedb..a40012c8a4e6 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -329,6 +329,9 @@ bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_sk
struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
size_t n_bytes);
+size_t sgl_memset(struct scatterlist *sgl, unsigned int nents, off_t skip,
+ u8 val, size_t n_bytes);
+
/*
* Maximum number of entries that will be allocated in one piece, if
* a list larger than this is required then chaining will be utilized.
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 49185536acba..6b430f7293e0 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -952,26 +952,7 @@ EXPORT_SYMBOL(sg_pcopy_to_buffer);
size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
size_t buflen, off_t skip)
{
- unsigned int offset = 0;
- struct sg_mapping_iter miter;
- unsigned int sg_flags = SG_MITER_ATOMIC | SG_MITER_TO_SG;
-
- sg_miter_start(&miter, sgl, nents, sg_flags);
-
- if (!sg_miter_skip(&miter, skip))
- return false;
-
- while (offset < buflen && sg_miter_next(&miter)) {
- unsigned int len;
-
- len = min(miter.length, buflen - offset);
- memset(miter.addr, 0, len);
-
- offset += len;
- }
-
- sg_miter_stop(&miter);
- return offset;
+ return sgl_memset(sgl, nents, skip, 0, buflen);
}
EXPORT_SYMBOL(sg_zero_buffer);
@@ -1110,3 +1091,47 @@ bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_sk
return equ;
}
EXPORT_SYMBOL(sgl_compare_sgl);
+
+/**
+ * sgl_memset - set byte 'val' up to n_bytes times on SG list
+ * @sgl: The SG list
+ * @nents: Number of SG entries in sgl
+ * @skip: Number of bytes to skip before starting
+ * @val: byte value to write to sgl
+ * @n_bytes: The (maximum) number of bytes to modify
+ *
+ * Returns:
+ * The number of bytes written.
+ *
+ * Notes:
+ * Stops writing if either sgl or n_bytes is exhausted. If n_bytes is
+ * set SIZE_MAX then val will be written to each byte until the end
+ * of sgl.
+ *
+ * The notes in sgl_copy_sgl() about large sgl_s _applies here as well.
+ *
+ **/
+size_t sgl_memset(struct scatterlist *sgl, unsigned int nents, off_t skip,
+ u8 val, size_t n_bytes)
+{
+ size_t offset = 0;
+ size_t len;
+ struct sg_mapping_iter miter;
+
+ if (n_bytes == 0)
+ return 0;
+ sg_miter_start(&miter, sgl, nents, SG_MITER_ATOMIC | SG_MITER_TO_SG);
+ if (!sg_miter_skip(&miter, skip))
+ goto fini;
+
+ while ((offset < n_bytes) && sg_miter_next(&miter)) {
+ len = min(miter.length, n_bytes - offset);
+ memset(miter.addr, val, len);
+ offset += len;
+ }
+fini:
+ sg_miter_stop(&miter);
+ return offset;
+}
+EXPORT_SYMBOL(sgl_memset);
+
--
2.25.1
Both the SCSI and NVMe subsystems receive user data from the block
layer in scatterlist_s (aka scatter gather lists (sgl) which are
often arrays). If drivers in those subsystems represent storage
(e.g. a ramdisk) or cache "hot" user data then they may also
choose to use scatterlist_s. Currently there are no sgl to sgl
operations in the kernel. Start with a sgl to sgl copy. Stops
when the first of the number of requested bytes to copy, or the
source sgl, or the destination sgl is exhausted. So the
destination sgl will _not_ grow.
Signed-off-by: Douglas Gilbert <[email protected]>
---
include/linux/scatterlist.h | 4 ++
lib/scatterlist.c | 75 +++++++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+)
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 80178afc2a4a..6649414c0749 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -321,6 +321,10 @@ size_t sg_pcopy_to_buffer(struct scatterlist *sgl, unsigned int nents,
size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
size_t buflen, off_t skip);
+size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_skip,
+ struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
+ size_t n_bytes);
+
/*
* Maximum number of entries that will be allocated in one piece, if
* a list larger than this is required then chaining will be utilized.
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index d5770e7f1030..1f9e093ad7da 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -974,3 +974,78 @@ size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
return offset;
}
EXPORT_SYMBOL(sg_zero_buffer);
+
+/**
+ * sgl_copy_sgl - Copy over a destination sgl from a source sgl
+ * @d_sgl: Destination sgl
+ * @d_nents: Number of SG entries in destination sgl
+ * @d_skip: Number of bytes to skip in destination before starting
+ * @s_sgl: Source sgl
+ * @s_nents: Number of SG entries in source sgl
+ * @s_skip: Number of bytes to skip in source before starting
+ * @n_bytes: The (maximum) number of bytes to copy
+ *
+ * Returns:
+ * The number of copied bytes.
+ *
+ * Notes:
+ * Destination arguments appear before the source arguments, as with memcpy().
+ *
+ * Stops copying if either d_sgl, s_sgl or n_bytes is exhausted.
+ *
+ * Since memcpy() is used, overlapping copies (where d_sgl and s_sgl belong
+ * to the same sgl and the copy regions overlap) are not supported.
+ *
+ * Large copies are broken into copy segments whose sizes may vary. Those
+ * copy segment sizes are chosen by the min3() statement in the code below.
+ * Since SG_MITER_ATOMIC is used for both sides, each copy segment is started
+ * with kmap_atomic() [in sg_miter_next()] and completed with kunmap_atomic()
+ * [in sg_miter_stop()]. This means pre-emption is inhibited for relatively
+ * short periods even in very large copies.
+ *
+ * If d_skip is large, potentially spanning multiple d_nents then some
+ * integer arithmetic to adjust d_sgl may improve performance. For example
+ * if d_sgl is built using sgl_alloc_order(chainable=false) then the sgl
+ * will be an array with equally sized segments facilitating that
+ * arithmetic. The suggestion applies to s_skip, s_sgl and s_nents as well.
+ *
+ **/
+size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_skip,
+ struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
+ size_t n_bytes)
+{
+ size_t len;
+ size_t offset = 0;
+ struct sg_mapping_iter d_iter, s_iter;
+
+ if (n_bytes == 0)
+ return 0;
+ sg_miter_start(&s_iter, s_sgl, s_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
+ sg_miter_start(&d_iter, d_sgl, d_nents, SG_MITER_ATOMIC | SG_MITER_TO_SG);
+ if (!sg_miter_skip(&s_iter, s_skip))
+ goto fini;
+ if (!sg_miter_skip(&d_iter, d_skip))
+ goto fini;
+
+ while (offset < n_bytes) {
+ if (!sg_miter_next(&s_iter))
+ break;
+ if (!sg_miter_next(&d_iter))
+ break;
+ len = min3(d_iter.length, s_iter.length, n_bytes - offset);
+
+ memcpy(d_iter.addr, s_iter.addr, len);
+ offset += len;
+ /* LIFO order (stop d_iter before s_iter) needed with SG_MITER_ATOMIC */
+ d_iter.consumed = len;
+ sg_miter_stop(&d_iter);
+ s_iter.consumed = len;
+ sg_miter_stop(&s_iter);
+ }
+fini:
+ sg_miter_stop(&d_iter);
+ sg_miter_stop(&s_iter);
+ return offset;
+}
+EXPORT_SYMBOL(sgl_copy_sgl);
+
--
2.25.1
This patch removes a check done by sgl_alloc_order() before it starts
any allocations. The comment before the removed code says: "Check for
integer overflow" arguably gives a false sense of security. The right
hand side of the expression in the condition is resolved as u32 so
cannot exceed UINT32_MAX (4 GiB) which means 'length' cannot exceed
that amount. If that was the intention then the comment above it
could be dropped and the condition rewritten more clearly as:
if (length > UINT32_MAX) <<failure path >>;
The author's intention is to use sgl_alloc_order() to replace
vmalloc(unsigned long) for a large allocation (debug ramdisk).
vmalloc has no limit at 4 GiB so its seems unreasonable that:
sgl_alloc_order(unsigned long long length, ....)
does. sgl_s made with sgl_alloc_order(chainable=false) have equally
sized segments placed in a scatter gather array. That allows O(1)
navigation around a big sgl using some simple integer maths.
Having previously sent a patch to fix a memory leak in
sg_alloc_order() take the opportunity to put a one line comment above
sgl_free()'s declaration that it is not suitable when order > 0 . The
mis-use of sgl_free() when order > 0 was the reason for the memory
leak. The other users of sgl_alloc_order() in the kernel where
checked and found to handle free-ing properly.
Signed-off-by: Douglas Gilbert <[email protected]>
---
include/linux/scatterlist.h | 1 +
lib/scatterlist.c | 3 ---
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 45cf7b69d852..80178afc2a4a 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -302,6 +302,7 @@ struct scatterlist *sgl_alloc(unsigned long long length, gfp_t gfp,
unsigned int *nent_p);
void sgl_free_n_order(struct scatterlist *sgl, int nents, int order);
void sgl_free_order(struct scatterlist *sgl, int order);
+/* Only use sgl_free() when order is 0 */
void sgl_free(struct scatterlist *sgl);
#endif /* CONFIG_SGL_ALLOC */
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index c448642e0f78..d5770e7f1030 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -493,9 +493,6 @@ struct scatterlist *sgl_alloc_order(unsigned long long length,
u32 elem_len;
nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
- /* Check for integer overflow */
- if (length > (nent << (PAGE_SHIFT + order)))
- return NULL;
nalloc = nent;
if (chainable) {
/* Check for integer overflow */
--
2.25.1
Am 19.10.20 um 21:19 schrieb Douglas Gilbert:
> This patch removes a check done by sgl_alloc_order() before it starts
> any allocations. The comment before the removed code says: "Check for
> integer overflow" arguably gives a false sense of security. The right
> hand side of the expression in the condition is resolved as u32 so
> cannot exceed UINT32_MAX (4 GiB) which means 'length' cannot exceed
> that amount. If that was the intention then the comment above it
> could be dropped and the condition rewritten more clearly as:
> if (length > UINT32_MAX) <<failure path >>;
I think the intention of the check is to reject calls, where length is so high, that calculation of nent overflows unsigned int nent/nalloc.
Consistently a similar check is done few lines later before incrementing nalloc due to chainable = true.
So I think the code tries to allow length values up to 4G << (PAGE_SHIFT + order).
That said I think instead of removing the check it better should be fixed, e.g. by adding an unsigned long long cast before nent
BTW: I don't know why there are two checks. I think one check after conditionally incrementing nalloc would be enough.
>
> The author's intention is to use sgl_alloc_order() to replace
> vmalloc(unsigned long) for a large allocation (debug ramdisk).
> vmalloc has no limit at 4 GiB so its seems unreasonable that:
> sgl_alloc_order(unsigned long long length, ....)
> does. sgl_s made with sgl_alloc_order(chainable=false) have equally
> sized segments placed in a scatter gather array. That allows O(1)
> navigation around a big sgl using some simple integer maths.
>
> Having previously sent a patch to fix a memory leak in
> sg_alloc_order() take the opportunity to put a one line comment above
> sgl_free()'s declaration that it is not suitable when order > 0 . The
> mis-use of sgl_free() when order > 0 was the reason for the memory
> leak. The other users of sgl_alloc_order() in the kernel where
> checked and found to handle free-ing properly.
>
> Signed-off-by: Douglas Gilbert <[email protected]>
> ---
> include/linux/scatterlist.h | 1 +
> lib/scatterlist.c | 3 ---
> 2 files changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 45cf7b69d852..80178afc2a4a 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -302,6 +302,7 @@ struct scatterlist *sgl_alloc(unsigned long long length, gfp_t gfp,
> unsigned int *nent_p);
> void sgl_free_n_order(struct scatterlist *sgl, int nents, int order);
> void sgl_free_order(struct scatterlist *sgl, int order);
> +/* Only use sgl_free() when order is 0 */
> void sgl_free(struct scatterlist *sgl);
> #endif /* CONFIG_SGL_ALLOC */
>
> diff --git a/lib/scatterlist.c b/lib/scatterlist.c
> index c448642e0f78..d5770e7f1030 100644
> --- a/lib/scatterlist.c
> +++ b/lib/scatterlist.c
> @@ -493,9 +493,6 @@ struct scatterlist *sgl_alloc_order(unsigned long long length,
> u32 elem_len;
>
> nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
> - /* Check for integer overflow */
> - if (length > (nent << (PAGE_SHIFT + order)))
> - return NULL;
> nalloc = nent;
> if (chainable) {
> /* Check for integer overflow */
>
Am 19.10.20 um 21:19 schrieb Douglas Gilbert:
> Both the SCSI and NVMe subsystems receive user data from the block
> layer in scatterlist_s (aka scatter gather lists (sgl) which are
> often arrays). If drivers in those subsystems represent storage
> (e.g. a ramdisk) or cache "hot" user data then they may also
> choose to use scatterlist_s. Currently there are no sgl to sgl
> operations in the kernel. Start with a sgl to sgl copy. Stops
> when the first of the number of requested bytes to copy, or the
> source sgl, or the destination sgl is exhausted. So the
> destination sgl will _not_ grow.
>
> Signed-off-by: Douglas Gilbert <[email protected]>
> ---
> include/linux/scatterlist.h | 4 ++
> lib/scatterlist.c | 75 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 79 insertions(+)
>
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 80178afc2a4a..6649414c0749 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -321,6 +321,10 @@ size_t sg_pcopy_to_buffer(struct scatterlist *sgl, unsigned int nents,
> size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
> size_t buflen, off_t skip);
>
> +size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_skip,
> + struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
> + size_t n_bytes);
> +
> /*
> * Maximum number of entries that will be allocated in one piece, if
> * a list larger than this is required then chaining will be utilized.
> diff --git a/lib/scatterlist.c b/lib/scatterlist.c
> index d5770e7f1030..1f9e093ad7da 100644
> --- a/lib/scatterlist.c
> +++ b/lib/scatterlist.c
> @@ -974,3 +974,78 @@ size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
> return offset;
> }
> EXPORT_SYMBOL(sg_zero_buffer);
> +
> +/**
> + * sgl_copy_sgl - Copy over a destination sgl from a source sgl
> + * @d_sgl: Destination sgl
> + * @d_nents: Number of SG entries in destination sgl
> + * @d_skip: Number of bytes to skip in destination before starting
> + * @s_sgl: Source sgl
> + * @s_nents: Number of SG entries in source sgl
> + * @s_skip: Number of bytes to skip in source before starting
> + * @n_bytes: The (maximum) number of bytes to copy
> + *
> + * Returns:
> + * The number of copied bytes.
> + *
> + * Notes:
> + * Destination arguments appear before the source arguments, as with memcpy().
> + *
> + * Stops copying if either d_sgl, s_sgl or n_bytes is exhausted.
> + *
> + * Since memcpy() is used, overlapping copies (where d_sgl and s_sgl belong
> + * to the same sgl and the copy regions overlap) are not supported.
> + *
> + * Large copies are broken into copy segments whose sizes may vary. Those
> + * copy segment sizes are chosen by the min3() statement in the code below.
> + * Since SG_MITER_ATOMIC is used for both sides, each copy segment is started
> + * with kmap_atomic() [in sg_miter_next()] and completed with kunmap_atomic()
> + * [in sg_miter_stop()]. This means pre-emption is inhibited for relatively
> + * short periods even in very large copies.
> + *
> + * If d_skip is large, potentially spanning multiple d_nents then some
> + * integer arithmetic to adjust d_sgl may improve performance. For example
> + * if d_sgl is built using sgl_alloc_order(chainable=false) then the sgl
> + * will be an array with equally sized segments facilitating that
> + * arithmetic. The suggestion applies to s_skip, s_sgl and s_nents as well.
> + *
> + **/
> +size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_skip,
> + struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
> + size_t n_bytes)
> +{
> + size_t len;
> + size_t offset = 0;
> + struct sg_mapping_iter d_iter, s_iter;
> +
> + if (n_bytes == 0)
> + return 0;
> + sg_miter_start(&s_iter, s_sgl, s_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
> + sg_miter_start(&d_iter, d_sgl, d_nents, SG_MITER_ATOMIC | SG_MITER_TO_SG);
> + if (!sg_miter_skip(&s_iter, s_skip))
> + goto fini;
> + if (!sg_miter_skip(&d_iter, d_skip))
> + goto fini;
> +
> + while (offset < n_bytes) {
> + if (!sg_miter_next(&s_iter))
> + break;
> + if (!sg_miter_next(&d_iter))
> + break;
> + len = min3(d_iter.length, s_iter.length, n_bytes - offset);
> +
> + memcpy(d_iter.addr, s_iter.addr, len);
> + offset += len;
> + /* LIFO order (stop d_iter before s_iter) needed with SG_MITER_ATOMIC */
> + d_iter.consumed = len;
> + sg_miter_stop(&d_iter);
> + s_iter.consumed = len;
> + sg_miter_stop(&s_iter);
> + }
> +fini:
> + sg_miter_stop(&d_iter);
> + sg_miter_stop(&s_iter);
> + return offset;
> +}
> +EXPORT_SYMBOL(sgl_copy_sgl);
> +
>
Reviewed-by: Bodo Stroesser <[email protected]
Am 19.10.20 um 21:19 schrieb Douglas Gilbert:
> After enabling copies between scatter gather lists (sgl_s),
> another storage related operation is to compare two sgl_s.
> This new function is modelled on NVMe's Compare command and
> the SCSI VERIFY(BYTCHK=1) command. Like memcmp() this function
> returns false on the first miscompare and stops comparing.
>
> Signed-off-by: Douglas Gilbert <[email protected]>
> ---
> include/linux/scatterlist.h | 4 +++
> lib/scatterlist.c | 61 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 65 insertions(+)
>
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 6649414c0749..ae260dc5fedb 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -325,6 +325,10 @@ size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_ski
> struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
> size_t n_bytes);
>
> +bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
> + struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
> + size_t n_bytes);
> +
> /*
> * Maximum number of entries that will be allocated in one piece, if
> * a list larger than this is required then chaining will be utilized.
> diff --git a/lib/scatterlist.c b/lib/scatterlist.c
> index 1f9e093ad7da..49185536acba 100644
> --- a/lib/scatterlist.c
> +++ b/lib/scatterlist.c
> @@ -1049,3 +1049,64 @@ size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_ski
> }
> EXPORT_SYMBOL(sgl_copy_sgl);
>
> +/**
> + * sgl_compare_sgl - Compare x and y (both sgl_s)
> + * @x_sgl: x (left) sgl
> + * @x_nents: Number of SG entries in x (left) sgl
> + * @x_skip: Number of bytes to skip in x (left) before starting
> + * @y_sgl: y (right) sgl
> + * @y_nents: Number of SG entries in y (right) sgl
> + * @y_skip: Number of bytes to skip in y (right) before starting
> + * @n_bytes: The (maximum) number of bytes to compare
> + *
> + * Returns:
> + * true if x and y compare equal before x, y or n_bytes is exhausted.
> + * Otherwise on a miscompare, returns false (and stops comparing).
> + *
> + * Notes:
> + * x and y are symmetrical: they can be swapped and the result is the same.
> + *
> + * Implementation is based on memcmp(). x and y segments may overlap.
> + *
> + * The notes in sgl_copy_sgl() about large sgl_s _applies here as well.
> + *
> + **/
> +bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
> + struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
> + size_t n_bytes)
> +{
> + bool equ = true;
> + size_t len;
> + size_t offset = 0;
> + struct sg_mapping_iter x_iter, y_iter;
> +
> + if (n_bytes == 0)
> + return true;
> + sg_miter_start(&x_iter, x_sgl, x_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
> + sg_miter_start(&y_iter, y_sgl, y_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
> + if (!sg_miter_skip(&x_iter, x_skip))
> + goto fini;
> + if (!sg_miter_skip(&y_iter, y_skip))
> + goto fini;
> +
> + while (equ && offset < n_bytes) {
> + if (!sg_miter_next(&x_iter))
> + break;
> + if (!sg_miter_next(&y_iter))
> + break;
> + len = min3(x_iter.length, y_iter.length, n_bytes - offset);
> +
> + equ = !memcmp(x_iter.addr, y_iter.addr, len);
> + offset += len;
> + /* LIFO order is important when SG_MITER_ATOMIC is used */
> + y_iter.consumed = len;
> + sg_miter_stop(&y_iter);
> + x_iter.consumed = len;
> + sg_miter_stop(&x_iter);
> + }
> +fini:
> + sg_miter_stop(&y_iter);
> + sg_miter_stop(&x_iter);
> + return equ;
> +}
> +EXPORT_SYMBOL(sgl_compare_sgl);
>
Reviewed-by: Bodo Stroesser <[email protected]>
Am 19.10.20 um 21:19 schrieb Douglas Gilbert:
> Both the SCSI and NVMe subsystems receive user data from the block
> layer in scatterlist_s (aka scatter gather lists (sgl) which are
> often arrays). If drivers in those subsystems represent storage
> (e.g. a ramdisk) or cache "hot" user data then they may also
> choose to use scatterlist_s. Currently there are no sgl to sgl
> operations in the kernel. Start with a sgl to sgl copy. Stops
> when the first of the number of requested bytes to copy, or the
> source sgl, or the destination sgl is exhausted. So the
> destination sgl will _not_ grow.
>
> Signed-off-by: Douglas Gilbert <[email protected]>
> ---
> include/linux/scatterlist.h | 4 ++
> lib/scatterlist.c | 75 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 79 insertions(+)
>
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 80178afc2a4a..6649414c0749 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -321,6 +321,10 @@ size_t sg_pcopy_to_buffer(struct scatterlist *sgl, unsigned int nents,
> size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
> size_t buflen, off_t skip);
>
> +size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_skip,
> + struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
> + size_t n_bytes);
> +
> /*
> * Maximum number of entries that will be allocated in one piece, if
> * a list larger than this is required then chaining will be utilized.
> diff --git a/lib/scatterlist.c b/lib/scatterlist.c
> index d5770e7f1030..1f9e093ad7da 100644
> --- a/lib/scatterlist.c
> +++ b/lib/scatterlist.c
> @@ -974,3 +974,78 @@ size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
> return offset;
> }
> EXPORT_SYMBOL(sg_zero_buffer);
> +
> +/**
> + * sgl_copy_sgl - Copy over a destination sgl from a source sgl
> + * @d_sgl: Destination sgl
> + * @d_nents: Number of SG entries in destination sgl
> + * @d_skip: Number of bytes to skip in destination before starting
> + * @s_sgl: Source sgl
> + * @s_nents: Number of SG entries in source sgl
> + * @s_skip: Number of bytes to skip in source before starting
> + * @n_bytes: The (maximum) number of bytes to copy
> + *
> + * Returns:
> + * The number of copied bytes.
> + *
> + * Notes:
> + * Destination arguments appear before the source arguments, as with memcpy().
> + *
> + * Stops copying if either d_sgl, s_sgl or n_bytes is exhausted.
> + *
> + * Since memcpy() is used, overlapping copies (where d_sgl and s_sgl belong
> + * to the same sgl and the copy regions overlap) are not supported.
> + *
> + * Large copies are broken into copy segments whose sizes may vary. Those
> + * copy segment sizes are chosen by the min3() statement in the code below.
> + * Since SG_MITER_ATOMIC is used for both sides, each copy segment is started
> + * with kmap_atomic() [in sg_miter_next()] and completed with kunmap_atomic()
> + * [in sg_miter_stop()]. This means pre-emption is inhibited for relatively
> + * short periods even in very large copies.
> + *
> + * If d_skip is large, potentially spanning multiple d_nents then some
> + * integer arithmetic to adjust d_sgl may improve performance. For example
> + * if d_sgl is built using sgl_alloc_order(chainable=false) then the sgl
> + * will be an array with equally sized segments facilitating that
> + * arithmetic. The suggestion applies to s_skip, s_sgl and s_nents as well.
> + *
> + **/
> +size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_skip,
> + struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
> + size_t n_bytes)
> +{
> + size_t len;
> + size_t offset = 0;
> + struct sg_mapping_iter d_iter, s_iter;
> +
> + if (n_bytes == 0)
> + return 0;
> + sg_miter_start(&s_iter, s_sgl, s_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
> + sg_miter_start(&d_iter, d_sgl, d_nents, SG_MITER_ATOMIC | SG_MITER_TO_SG);
> + if (!sg_miter_skip(&s_iter, s_skip))
> + goto fini;
> + if (!sg_miter_skip(&d_iter, d_skip))
> + goto fini;
> +
> + while (offset < n_bytes) {
> + if (!sg_miter_next(&s_iter))
> + break;
> + if (!sg_miter_next(&d_iter))
> + break;
> + len = min3(d_iter.length, s_iter.length, n_bytes - offset);
> +
> + memcpy(d_iter.addr, s_iter.addr, len);
> + offset += len;
> + /* LIFO order (stop d_iter before s_iter) needed with SG_MITER_ATOMIC */
> + d_iter.consumed = len;
> + sg_miter_stop(&d_iter);
> + s_iter.consumed = len;
> + sg_miter_stop(&s_iter);
> + }
> +fini:
> + sg_miter_stop(&d_iter);
> + sg_miter_stop(&s_iter);
> + return offset;
> +}
> +EXPORT_SYMBOL(sgl_copy_sgl);
> +
>
Second try, this time with correct tag.
Reviewed-by: Bodo Stroesser <[email protected]>
Am 19.10.20 um 21:19 schrieb Douglas Gilbert:
> The existing sg_zero_buffer() function is a bit restrictive.
> For example protection information (PI) blocks are usually
> initialized to 0xff bytes. As its name suggests sgl_memset()
> is modelled on memset(). One difference is the type of the
> val argument which is u8 rather than int. Plus it returns
> the number of bytes (over)written.
>
> Change implementation of sg_zero_buffer() to call this new
> function.
>
> Signed-off-by: Douglas Gilbert <[email protected]>
> ---
> include/linux/scatterlist.h | 3 ++
> lib/scatterlist.c | 65 +++++++++++++++++++++++++------------
> 2 files changed, 48 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index ae260dc5fedb..a40012c8a4e6 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -329,6 +329,9 @@ bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_sk
> struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
> size_t n_bytes);
>
> +size_t sgl_memset(struct scatterlist *sgl, unsigned int nents, off_t skip,
> + u8 val, size_t n_bytes);
> +
> /*
> * Maximum number of entries that will be allocated in one piece, if
> * a list larger than this is required then chaining will be utilized.
> diff --git a/lib/scatterlist.c b/lib/scatterlist.c
> index 49185536acba..6b430f7293e0 100644
> --- a/lib/scatterlist.c
> +++ b/lib/scatterlist.c
> @@ -952,26 +952,7 @@ EXPORT_SYMBOL(sg_pcopy_to_buffer);
> size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
> size_t buflen, off_t skip)
> {
> - unsigned int offset = 0;
> - struct sg_mapping_iter miter;
> - unsigned int sg_flags = SG_MITER_ATOMIC | SG_MITER_TO_SG;
> -
> - sg_miter_start(&miter, sgl, nents, sg_flags);
> -
> - if (!sg_miter_skip(&miter, skip))
> - return false;
> -
> - while (offset < buflen && sg_miter_next(&miter)) {
> - unsigned int len;
> -
> - len = min(miter.length, buflen - offset);
> - memset(miter.addr, 0, len);
> -
> - offset += len;
> - }
> -
> - sg_miter_stop(&miter);
> - return offset;
> + return sgl_memset(sgl, nents, skip, 0, buflen);
> }
> EXPORT_SYMBOL(sg_zero_buffer);
>
> @@ -1110,3 +1091,47 @@ bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_sk
> return equ;
> }
> EXPORT_SYMBOL(sgl_compare_sgl);
> +
> +/**
> + * sgl_memset - set byte 'val' up to n_bytes times on SG list
> + * @sgl: The SG list
> + * @nents: Number of SG entries in sgl
> + * @skip: Number of bytes to skip before starting
> + * @val: byte value to write to sgl
> + * @n_bytes: The (maximum) number of bytes to modify
> + *
> + * Returns:
> + * The number of bytes written.
> + *
> + * Notes:
> + * Stops writing if either sgl or n_bytes is exhausted. If n_bytes is
> + * set SIZE_MAX then val will be written to each byte until the end
> + * of sgl.
> + *
> + * The notes in sgl_copy_sgl() about large sgl_s _applies here as well.
> + *
> + **/
> +size_t sgl_memset(struct scatterlist *sgl, unsigned int nents, off_t skip,
> + u8 val, size_t n_bytes)
> +{
> + size_t offset = 0;
> + size_t len;
> + struct sg_mapping_iter miter;
> +
> + if (n_bytes == 0)
> + return 0;
> + sg_miter_start(&miter, sgl, nents, SG_MITER_ATOMIC | SG_MITER_TO_SG);
> + if (!sg_miter_skip(&miter, skip))
> + goto fini;
> +
> + while ((offset < n_bytes) && sg_miter_next(&miter)) {
> + len = min(miter.length, n_bytes - offset);
> + memset(miter.addr, val, len);
> + offset += len;
> + }
> +fini:
> + sg_miter_stop(&miter);
> + return offset;
> +}
> +EXPORT_SYMBOL(sgl_memset);
> +
>
Reviewed-by: Bodo Stroesser <[email protected]>
On 2020-11-03 7:54 a.m., Bodo Stroesser wrote:
> Am 19.10.20 um 21:19 schrieb Douglas Gilbert:
>> This patch removes a check done by sgl_alloc_order() before it starts
>> any allocations. The comment before the removed code says: "Check for
>> integer overflow" arguably gives a false sense of security. The right
>> hand side of the expression in the condition is resolved as u32 so
>> cannot exceed UINT32_MAX (4 GiB) which means 'length' cannot exceed
>> that amount. If that was the intention then the comment above it
>> could be dropped and the condition rewritten more clearly as:
>> if (length > UINT32_MAX) <<failure path >>;
>
> I think the intention of the check is to reject calls, where length is so high, that calculation of nent overflows unsigned int nent/nalloc.
> Consistently a similar check is done few lines later before incrementing nalloc due to chainable = true.
> So I think the code tries to allow length values up to 4G << (PAGE_SHIFT + order).
>
> That said I think instead of removing the check it better should be fixed, e.g. by adding an unsigned long long cast before nent
>
> BTW: I don't know why there are two checks. I think one check after conditionally incrementing nalloc would be enough.
Okay, I'm working on a "v4" patchset. Apart from the above, my plan is
to extend sgl_compare_sgl() with a helper that additionally yields
the byte index of the first miscompare.
Doug Gilbert
>> The author's intention is to use sgl_alloc_order() to replace
>> vmalloc(unsigned long) for a large allocation (debug ramdisk).
>> vmalloc has no limit at 4 GiB so its seems unreasonable that:
>> sgl_alloc_order(unsigned long long length, ....)
>> does. sgl_s made with sgl_alloc_order(chainable=false) have equally
>> sized segments placed in a scatter gather array. That allows O(1)
>> navigation around a big sgl using some simple integer maths.
>>
>> Having previously sent a patch to fix a memory leak in
>> sg_alloc_order() take the opportunity to put a one line comment above
>> sgl_free()'s declaration that it is not suitable when order > 0 . The
>> mis-use of sgl_free() when order > 0 was the reason for the memory
>> leak. The other users of sgl_alloc_order() in the kernel where
>> checked and found to handle free-ing properly.
>>
>> Signed-off-by: Douglas Gilbert <[email protected]>
>> ---
>> include/linux/scatterlist.h | 1 +
>> lib/scatterlist.c | 3 ---
>> 2 files changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
>> index 45cf7b69d852..80178afc2a4a 100644
>> --- a/include/linux/scatterlist.h
>> +++ b/include/linux/scatterlist.h
>> @@ -302,6 +302,7 @@ struct scatterlist *sgl_alloc(unsigned long long length, gfp_t gfp,
>> unsigned int *nent_p);
>> void sgl_free_n_order(struct scatterlist *sgl, int nents, int order);
>> void sgl_free_order(struct scatterlist *sgl, int order);
>> +/* Only use sgl_free() when order is 0 */
>> void sgl_free(struct scatterlist *sgl);
>> #endif /* CONFIG_SGL_ALLOC */
>>
>> diff --git a/lib/scatterlist.c b/lib/scatterlist.c
>> index c448642e0f78..d5770e7f1030 100644
>> --- a/lib/scatterlist.c
>> +++ b/lib/scatterlist.c
>> @@ -493,9 +493,6 @@ struct scatterlist *sgl_alloc_order(unsigned long long length,
>> u32 elem_len;
>>
>> nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
>> - /* Check for integer overflow */
>> - if (length > (nent << (PAGE_SHIFT + order)))
>> - return NULL;
>> nalloc = nent;
>> if (chainable) {
>> /* Check for integer overflow */
>>