Hello Linus,
please pull s390 fixes for 6.6-rc7.
Thank you,
Vasily
The following changes since commit 5c95bf274665cc9f5126e4a48a9da51114f7afd2:
s390/cert_store: fix string length handling (2023-09-19 13:25:44 +0200)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git tags/s390-6.6-4
for you to fetch changes up to c1ae1c59c8c6e0b66a718308c623e0cb394dab6b:
s390/pci: fix iommu bitmap allocation (2023-10-19 16:35:41 +0200)
----------------------------------------------------------------
s390 updates for 6.6-rc7
- Fix IOMMU bitmap allocation in s390 PCI to avoid out of bounds access
when IOMMU pages aren't a multiple of 64.
- Fix kasan crashes when accessing DCSS mapping in memory holes by adding
corresponding kasan zero shadow mappings.
- Fix a memory leak in css_alloc_subchannel in case dma_set_coherent_mask
fails.
----------------------------------------------------------------
Dinghao Liu (1):
s390/cio: fix a memleak in css_alloc_subchannel
Niklas Schnelle (1):
s390/pci: fix iommu bitmap allocation
Vasily Gorbik (1):
s390/kasan: handle DCSS mapping in memory holes
arch/s390/boot/vmem.c | 7 ++++++-
arch/s390/pci/pci_dma.c | 15 +++++++++++++--
drivers/s390/cio/css.c | 6 ++++--
3 files changed, 23 insertions(+), 5 deletions(-)
On Sat, 21 Oct 2023 at 02:44, Vasily Gorbik <[email protected]> wrote:
>
> please pull s390 fixes for 6.6-rc7.
Pulled. HOWEVER.
> - Fix IOMMU bitmap allocation in s390 PCI to avoid out of bounds access
> when IOMMU pages aren't a multiple of 64.
Please don't do this kind of thing.
And I quote:
static unsigned long *bitmap_vzalloc(size_t bits, gfp_t flags)
{
size_t n = BITS_TO_LONGS(bits);
size_t bytes;
if (unlikely(check_mul_overflow(n, sizeof(unsigned long), &bytes)))
return NULL;
return vzalloc(bytes);
}
the above overflow handling is *not* "defensive and good programming".
The above is just "unreadable mindless boiler plate".
Seriously, you're taking a 'size_t' of number of bits, turning it into
number of longs, and you're then turning *that* into number of bytes,
AND YOU ADD OVERFLOW CHECKING?!??!!!
Now, to make matters worse, the above calculation can actually
overflow in theory - but not in the place where you added the
protection!
Because the "longs to bytes" sure as hell can't overflow. We know
that, because the number of longs is guaranteed to have a much smaller
range, since it came from a calculation of bits.
But what can actually overflow? BITS_TO_LONGS(bits) will overflow, and
turn ~0ul to 0, because it does the __KERNEL_DIV_ROUND_UP thing, which
is the simplistic
#define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
so that code added overflow protection that doesn't make sense, in all
the wrong places.
You need to verify the sanity of the number of bits first anyway.
Of course, in your use-case, the number of bits is also not unlimited,
because the source is
zdev->iommu_pages = zdev->iommu_size >> PAGE_SHIFT;
so it turns out that no, the BITS_TO_LONGS() won't overflow either,
but at least in some other situations - and only looking at that
bitmap_vzalloc() in a vacuum - it *could* have.
Now, I will argue that you always need range checking on the number of
bits *anyway* for other reasons - trying to just blindly allocate some
random amount of memory isn't acceptable, so there should to be some
range checking before anyway.
But that code is wrong, because the overflow is simply not an issue.
Adding overflow handling code is literally only actively misleading,
making the code harder to read, for no reason, and making people
*think* it's being careful when it is anything *but* careful.
I suspect that the compiler actually sees "that is stupid" and turns
the overflow into just a single left-shift again because it has seen
the (bigger) right-shift and knows it cannot overflow, but the problem
I'm ranting against is mindlessly adding boiler plate code that makes
the code harder to read for *humans*.
If you *do* want to add proper overflow handling, you'd need to either
fix BITS_TO_LONGS() some way (which is actually non-trivial since it
needs to be able to stay a constant and only use the argument once),
or you do something like
if (!bits)
return ZERO_SIZE_PTR;
longs = BITS_TO_LONG(bits);
if (!longs)
return NULL;
return vzalloc(longs * sizeof(long));
and I'd suggest maybe we should
(a) do the above checking in our bitmap_alloc() routines
(b) also change our bitmap_alloc() routines to take 'size_t' instead
of 'unsigned int' bit counts
(c) and finally, add that vzalloc() case, but simply using
kvmalloc_array(n, size, flags | __GFP_ZERO);
instead.
(And yes, kvmalloc_array() will actually then also do that
check_mul_overflow() thing, but now it's not pointless boiler plate
any more, now it's actually meaningful for *other* cases than the
bitmap allocation one that cannot overflow).
Hmm?
Linus
The pull request you sent on Sat, 21 Oct 2023 11:44:00 +0200:
> git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git tags/s390-6.6-4
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/4d7b04c0cda365f190c4a8f7fddc535b93aae9f9
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
Just re-quoting my suggestion here and adding Andy and Dmitry, who did
the original bitmap_alloc() helper interfaces a few years ago.
Also adding Kees in case he has any hardening suggestions, since this
is about (incorrect) overflow handling.
Kees: see my rant about mindlessly doing overflow handling in the wrong place in
https://lore.kernel.org/all/CAHk-=wgTUz1bdY6zvsN4ED0arCLE8Sb==1GH8d0sjm5bu7zesQ@mail.gmail.com/
in case you or somebody has a better idea for BITS_TO_LONG handling
than just "you need to check for zero before and after".
Linus
On Sat, 21 Oct 2023 at 10:56, Linus Torvalds
<[email protected]> wrote:
>
> If you *do* want to add proper overflow handling, you'd need to either
> fix BITS_TO_LONGS() some way (which is actually non-trivial since it
> needs to be able to stay a constant and only use the argument once),
> or you do something like
>
> if (!bits)
> return ZERO_SIZE_PTR;
> longs = BITS_TO_LONG(bits);
> if (!longs)
> return NULL;
> return vzalloc(longs * sizeof(long));
>
> and I'd suggest maybe we should
>
> (a) do the above checking in our bitmap_alloc() routines
>
> (b) also change our bitmap_alloc() routines to take 'size_t' instead
> of 'unsigned int' bit counts
>
> (c) and finally, add that vzalloc() case, but simply using
>
> kvmalloc_array(n, size, flags | __GFP_ZERO);
>
> instead.
On Sat, Oct 21, 2023 at 11:08:31AM -0700, Linus Torvalds wrote:
> in case you or somebody has a better idea for BITS_TO_LONG handling
> than just "you need to check for zero before and after".
>
> On Sat, 21 Oct 2023 at 10:56, Linus Torvalds
> <[email protected]> wrote:
> >
> > If you *do* want to add proper overflow handling, you'd need to either
> > fix BITS_TO_LONGS() some way (which is actually non-trivial since it
> > needs to be able to stay a constant and only use the argument once),
> > or you do something like
> >
> > if (!bits)
> > return ZERO_SIZE_PTR;
> > longs = BITS_TO_LONG(bits);
> > if (!longs)
> > return NULL;
> > return vzalloc(longs * sizeof(long));
This might work.
BITS_TO_<TYPE>(bits) utilizes __KERNEL_DIV_ROUND_UP, which may potentially
result in an overflow condition when
bits > ULONG_MAX - sizeof(<TYPE>) * 8 + 1.
To resolve this issue, avoid using the overflow-prone
__KERNEL_DIV_ROUND_UP. To meet the requirements of BITS_TO<TYPE>(bits)
for remaining constant and preventing side effects from multiple
argument uses, employ __is_constexpr to differentiate between constant
and non-constant cases, employing a helper function in the latter.
In the constant case, this ensures compatibility with constructs like
DECLARE_BITMAP. While in the non-constant case, the __bits_to_elem_count
function could be optimized for potentially improved code generation
by compilers, though this might come at the expense of readability and
visual consistency between the constant and non-constant cases.
I could further investigate if this approach, in general, appears acceptable.
---
include/linux/bitops.h | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 2ba557e067fe..72be25d4b95d 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -15,11 +15,21 @@
# define aligned_byte_mask(n) (~0xffUL << (BITS_PER_LONG - 8 - 8*(n)))
#endif
+static inline unsigned long __bits_to_elem_count(size_t nr, size_t sz)
+{
+ return nr / sz + (nr % sz ? 1 : 0);
+}
+
+#define BITS_TO_ELEM_COUNT(nr, sz) \
+ __builtin_choose_expr(__is_constexpr(nr), \
+ (nr) / sz + ((nr) % sz ? 1 : 0), \
+ __bits_to_elem_count((nr), sz))
+
#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
-#define BITS_TO_LONGS(nr) __KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(long))
-#define BITS_TO_U64(nr) __KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(u64))
-#define BITS_TO_U32(nr) __KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(u32))
-#define BITS_TO_BYTES(nr) __KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(char))
+#define BITS_TO_LONGS(nr) BITS_TO_ELEM_COUNT(nr, BITS_PER_TYPE(long))
+#define BITS_TO_U64(nr) BITS_TO_ELEM_COUNT(nr, BITS_PER_TYPE(u64))
+#define BITS_TO_U32(nr) BITS_TO_ELEM_COUNT(nr, BITS_PER_TYPE(u32))
+#define BITS_TO_BYTES(nr) BITS_TO_ELEM_COUNT(nr, BITS_PER_TYPE(char))
extern unsigned int __sw_hweight8(unsigned int w);
extern unsigned int __sw_hweight16(unsigned int w);
--
2.39.2
On Sun, 22 Oct 2023 at 06:18, Vasily Gorbik <[email protected]> wrote:
>
> This might work.
Hmm. Yes.
But let's fix __KERNEL_DIV_ROUND_UP itself while at it.
(And perhaps move it out of the odd location it is in now - its in
<uapi/linux/const.h> for some unfathomable reason)
And maybe we could do a helper like
#define __if_constexpr(x, a, b) \
__builtin_choose_expr(__is_constexpr(x), a, b)
since that is one of the main reasons for that __is_constexpr macro
(and _that_ makes sense in the const.h header file).
Linus
On Sat, Oct 21, 2023 at 10:56:29AM -0700, Linus Torvalds wrote:
> On Sat, 21 Oct 2023 at 02:44, Vasily Gorbik <[email protected]> wrote:
> > - Fix IOMMU bitmap allocation in s390 PCI to avoid out of bounds access
> > when IOMMU pages aren't a multiple of 64.
> But that code is wrong, because the overflow is simply not an issue.
> Adding overflow handling code is literally only actively misleading,
> making the code harder to read, for no reason, and making people
> *think* it's being careful when it is anything *but* careful.
Right, I should have done a better job reviewing this patch when picking
it up.
Please consider a follow-up patch (in reply) that cleans up unnecessary
and misleading overflow handling. There is no real benefit in getting
it into linux-next because the upcoming conversion to use the common
code DMA API on s390
Link: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231020&id=c76c067e488c
eliminates arch/s390/pci/pci_dma.c entirely and, therefore, addresses the
original problem in another way. That's why this fix is only relevant for
the current v6.6 and stable backports and is kept as simple as possible.
Let me know if you prefer the regular way, and I should include this
follow-up patch in my pull request later this week.
> If you *do* want to add proper overflow handling, you'd need to either
> fix BITS_TO_LONGS() some way (which is actually non-trivial since it
> needs to be able to stay a constant and only use the argument once),
Looking into that. Let's see if handling overflow in __KERNEL_DIV_ROUND_UP
turns out to be doable.
This commit effectively reverts commit c1ae1c59c8c6 ("s390/pci: fix
iommu bitmap allocation") and applies a simpler fix instead. Commit
c1ae1c59c8c6 introduced a custom bitmap_vzalloc() function that included
unnecessary and misleading overflow handling.
This fix is only relevant for the current v6.6 and stable backports. It
will be superseded by the upcoming conversion to use the common
code DMA API on s390 (pending in linux-next [2]), which eliminates
arch/s390/pci/pci_dma.c entirely and, therefore, addresses the original
problem in another way.
Instead of relying on a custom bitmap_vzalloc() function, this change goes
back to straightforward allocation using vzalloc() with the appropriate
size calculated using the BITS_TO_LONGS() macro.
Link: https://lore.kernel.org/all/CAHk-=wgTUz1bdY6zvsN4ED0arCLE8Sb==1GH8d0sjm5bu7zesQ@mail.gmail.com/
Link: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231020&id=c76c067e488c
Cc: [email protected]
Fixes: c1ae1c59c8c6 ("s390/pci: fix iommu bitmap allocation")
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
---
arch/s390/pci/pci_dma.c | 17 ++++-------------
1 file changed, 4 insertions(+), 13 deletions(-)
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 99209085c75b..1b4b123d79aa 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -565,17 +565,6 @@ static void s390_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
}
}
-static unsigned long *bitmap_vzalloc(size_t bits, gfp_t flags)
-{
- size_t n = BITS_TO_LONGS(bits);
- size_t bytes;
-
- if (unlikely(check_mul_overflow(n, sizeof(unsigned long), &bytes)))
- return NULL;
-
- return vzalloc(bytes);
-}
-
int zpci_dma_init_device(struct zpci_dev *zdev)
{
u8 status;
@@ -615,13 +604,15 @@ int zpci_dma_init_device(struct zpci_dev *zdev)
zdev->end_dma - zdev->start_dma + 1);
zdev->end_dma = zdev->start_dma + zdev->iommu_size - 1;
zdev->iommu_pages = zdev->iommu_size >> PAGE_SHIFT;
- zdev->iommu_bitmap = bitmap_vzalloc(zdev->iommu_pages, GFP_KERNEL);
+ zdev->iommu_bitmap = vzalloc(BITS_TO_LONGS(zdev->iommu_pages) *
+ sizeof(unsigned long));
if (!zdev->iommu_bitmap) {
rc = -ENOMEM;
goto free_dma_table;
}
if (!s390_iommu_strict) {
- zdev->lazy_bitmap = bitmap_vzalloc(zdev->iommu_pages, GFP_KERNEL);
+ zdev->lazy_bitmap = vzalloc(BITS_TO_LONGS(zdev->iommu_pages) *
+ sizeof(unsigned long));
if (!zdev->lazy_bitmap) {
rc = -ENOMEM;
goto free_bitmap;
--
2.39.2
On Mon, 2023-10-23 at 18:11 +0200, Vasily Gorbik wrote:
> This commit effectively reverts commit c1ae1c59c8c6 ("s390/pci: fix
> iommu bitmap allocation") and applies a simpler fix instead. Commit
> c1ae1c59c8c6 introduced a custom bitmap_vzalloc() function that included
> unnecessary and misleading overflow handling.
>
> This fix is only relevant for the current v6.6 and stable backports. It
> will be superseded by the upcoming conversion to use the common
> code DMA API on s390 (pending in linux-next [2]), which eliminates
> arch/s390/pci/pci_dma.c entirely and, therefore, addresses the original
> problem in another way.
>
> Instead of relying on a custom bitmap_vzalloc() function, this change goes
> back to straightforward allocation using vzalloc() with the appropriate
> size calculated using the BITS_TO_LONGS() macro.
>
> Link: https://lore.kernel.org/all/CAHk-=wgTUz1bdY6zvsN4ED0arCLE8Sb==1GH8d0sjm5bu7zesQ@mail.gmail.com/
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231020&id=c76c067e488c
> Cc: [email protected]
> Fixes: c1ae1c59c8c6 ("s390/pci: fix iommu bitmap allocation")
> Suggested-by: Linus Torvalds <[email protected]>
> Signed-off-by: Vasily Gorbik <[email protected]>
> ---
> arch/s390/pci/pci_dma.c | 17 ++++-------------
> 1 file changed, 4 insertions(+), 13 deletions(-)
>
> diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
> index 99209085c75b..1b4b123d79aa 100644
> --- a/arch/s390/pci/pci_dma.c
> +++ b/arch/s390/pci/pci_dma.c
> @@ -565,17 +565,6 @@ static void s390_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
> }
> }
>
> -static unsigned long *bitmap_vzalloc(size_t bits, gfp_t flags)
> -{
> - size_t n = BITS_TO_LONGS(bits);
> - size_t bytes;
> -
> - if (unlikely(check_mul_overflow(n, sizeof(unsigned long), &bytes)))
> - return NULL;
> -
> - return vzalloc(bytes);
> -}
> -
> int zpci_dma_init_device(struct zpci_dev *zdev)
> {
> u8 status;
> @@ -615,13 +604,15 @@ int zpci_dma_init_device(struct zpci_dev *zdev)
> zdev->end_dma - zdev->start_dma + 1);
> zdev->end_dma = zdev->start_dma + zdev->iommu_size - 1;
> zdev->iommu_pages = zdev->iommu_size >> PAGE_SHIFT;
> - zdev->iommu_bitmap = bitmap_vzalloc(zdev->iommu_pages, GFP_KERNEL);
> + zdev->iommu_bitmap = vzalloc(BITS_TO_LONGS(zdev->iommu_pages) *
> + sizeof(unsigned long));
> if (!zdev->iommu_bitmap) {
> rc = -ENOMEM;
> goto free_dma_table;
> }
> if (!s390_iommu_strict) {
> - zdev->lazy_bitmap = bitmap_vzalloc(zdev->iommu_pages, GFP_KERNEL);
> + zdev->lazy_bitmap = vzalloc(BITS_TO_LONGS(zdev->iommu_pages) *
> + sizeof(unsigned long));
> if (!zdev->lazy_bitmap) {
> rc = -ENOMEM;
> goto free_bitmap;
Mea culpa for the useless and misleading overflow check. I'm sorry, I
should not have copied this over from kvmalloc_array() without actually
thinking through whether it makes sense in the new place and you're
right Linus it doesn't.
Thank you Vasily for cleaning this up! Also as an additional point,
note that the size of the bitmap is limited by the above min3() which
in the largest possible case ensures a maximum of 128 MiB bitmaps which
only happens for very large memory systems or if a user sets an
unreasonably large s390_iommu_aperture kernel parameter.
Also:
Reviewed-by: Niklas Schnelle <[email protected]>
Best regards,
Niklas