Delete the WARN_ON() and return NULL directly for oversized parameter
in kvmalloc() calls.
Also add unlikely().
Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
Signed-off-by: Bixuan Cui <[email protected]>
---
There are a lot of oversize warnings and patches about kvmalloc() calls
recently. Maybe these warnings are not very necessary.
https://lore.kernel.org/all/YadOjJXMTjP85MQx@unreal
The example of size check in __do_kmalloc_node():
__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
{
struct kmem_cache *cachep;
void *ret;
if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
return NULL;
cachep = kmalloc_slab(size, flags);
mm/util.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/util.c b/mm/util.c
index 7e433690..d26f19c 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -587,7 +587,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
return ret;
/* Don't even allow crazy sizes */
- if (WARN_ON_ONCE(size > INT_MAX))
+ if (unlikely(size > INT_MAX))
return NULL;
return __vmalloc_node(size, 1, flags, node,
--
1.8.3.1
On 2021/12/2 10:06, Bixuan Cui wrote:
> Delete the WARN_ON() and return NULL directly for oversized parameter
> in kvmalloc() calls.
> Also add unlikely().
>
The commit message should explain why we need to do this. Thanks.
> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> Signed-off-by: Bixuan Cui <[email protected]>
> ---
> There are a lot of oversize warnings and patches about kvmalloc() calls
> recently. Maybe these warnings are not very necessary.
>
> https://lore.kernel.org/all/YadOjJXMTjP85MQx@unreal
>
> The example of size check in __do_kmalloc_node():
> __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> {
> struct kmem_cache *cachep;
> void *ret;
>
> if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> return NULL;
> cachep = kmalloc_slab(size, flags);
>
> mm/util.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/util.c b/mm/util.c
> index 7e433690..d26f19c 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -587,7 +587,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
> return ret;
>
> /* Don't even allow crazy sizes */
> - if (WARN_ON_ONCE(size > INT_MAX))
> + if (unlikely(size > INT_MAX))
> return NULL;
>
> return __vmalloc_node(size, 1, flags, node,
>
Tang
On Thu, 2 Dec 2021 10:06:24 +0800 Bixuan Cui <[email protected]> wrote:
> Delete the WARN_ON() and return NULL directly for oversized parameter
> in kvmalloc() calls.
> Also add unlikely().
>
> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> Signed-off-by: Bixuan Cui <[email protected]>
> ---
> There are a lot of oversize warnings and patches about kvmalloc() calls
> recently. Maybe these warnings are not very necessary.
Or maybe they are. Please let's take a look at these warnings, one at
a time. If a large number of them are bogus then sure, let's disable
the runtime test. But perhaps it's the case that calling code has
genuine issues and should be repaired.
On Thu, Dec 02, 2021 at 10:06:24AM +0800, Bixuan Cui wrote:
> Delete the WARN_ON() and return NULL directly for oversized parameter
> in kvmalloc() calls.
> Also add unlikely().
>
> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> Signed-off-by: Bixuan Cui <[email protected]>
> ---
> There are a lot of oversize warnings and patches about kvmalloc() calls
> recently. Maybe these warnings are not very necessary.
It seems these warnings are working, yes? i.e. we're finding the places
where giant values are coming in?
>
> https://lore.kernel.org/all/YadOjJXMTjP85MQx@unreal
>
> The example of size check in __do_kmalloc_node():
> __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> {
> struct kmem_cache *cachep;
> void *ret;
>
> if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> return NULL;
> cachep = kmalloc_slab(size, flags);
>
> mm/util.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/util.c b/mm/util.c
> index 7e433690..d26f19c 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -587,7 +587,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
> return ret;
>
> /* Don't even allow crazy sizes */
> - if (WARN_ON_ONCE(size > INT_MAX))
> + if (unlikely(size > INT_MAX))
> return NULL;
If we're rejecting the value, then it's still a pathological size, so
shouldn't the check be happening in the caller? I think the WARN is
doing exactly what it was supposed to do: find the places where bad
sizes can reach vmalloc.
-Kees
>
> return __vmalloc_node(size, 1, flags, node,
> --
> 1.8.3.1
>
--
Kees Cook
On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui <[email protected]> wrote:
>
> 在 2021/12/2 上午11:26, Andrew Morton 写道:
> >> Delete the WARN_ON() and return NULL directly for oversized parameter
> >> in kvmalloc() calls.
> >> Also add unlikely().
> >>
> >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> >> Signed-off-by: Bixuan Cui<[email protected]>
> >> ---
> >> There are a lot of oversize warnings and patches about kvmalloc() calls
> >> recently. Maybe these warnings are not very necessary.
> > Or maybe they are. Please let's take a look at these warnings, one at
> > a time. If a large number of them are bogus then sure, let's disable
> > the runtime test. But perhaps it's the case that calling code has
> > genuine issues and should be repaired.
> Such as:
Thanks, that's helpful.
Let's bring all these to the attention of the relevant developers.
If the consensus is "the code's fine, the warning is bogus" then let's
consider retiring the warning.
If the consensus is otherwise then hopefully they will fix their stuff!
> https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e
(cc [email protected])
> https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d
(cc [email protected], maintainers)
> https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7
(cc [email protected])
> https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00
(cc [email protected])
> https://syzkaller.appspot.com/bug?id=4c9ab8c7d0f8b551950db06559dc9cde4119ac83
(bpf again).
On 2021-12-01, at 20:29:05 -0800, Andrew Morton wrote:
> On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui wrote:
> > 在 2021/12/2 上午11:26, Andrew Morton 写道:
> > >> Delete the WARN_ON() and return NULL directly for oversized
> > >> parameter in kvmalloc() calls.
> > >> Also add unlikely().
> > >>
> > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> > >> Signed-off-by: Bixuan Cui<[email protected]>
> > >> ---
> > >> There are a lot of oversize warnings and patches about kvmalloc()
> > >> calls recently. Maybe these warnings are not very necessary.
> > >
> > > Or maybe they are. Please let's take a look at these warnings,
> > > one at a time. If a large number of them are bogus then sure,
> > > let's disable the runtime test. But perhaps it's the case that
> > > calling code has genuine issues and should be repaired.
> >
> > Such as:
>
> Thanks, that's helpful.
>
> Let's bring all these to the attention of the relevant developers.
>
> If the consensus is "the code's fine, the warning is bogus" then let's
> consider retiring the warning.
>
> If the consensus is otherwise then hopefully they will fix their stuff!
>
> > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e
>
> (cc [email protected])
>
> > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d
>
> (cc [email protected], maintainers)
>
> > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7
>
> (cc [email protected])
>
> > https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00
>
> (cc [email protected])
The netfilter bug has since been fixed:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?id=7bbc3d385bd813077acaf0e6fdb2a86a901f5382
> > https://syzkaller.appspot.com/bug?id=4c9ab8c7d0f8b551950db06559dc9cde4119ac83
>
> (bpf again).
J.
在 2021/12/2 下午12:29, Andrew Morton 写道:
> Thanks, that's helpful.
>
> Let's bring all these to the attention of the relevant developers.
>
> If the consensus is "the code's fine, the warning is bogus" then let's
> consider retiring the warning.
>
> If the consensus is otherwise then hopefully they will fix their stuff!
Ok,thanks for your advice :-)
Thanks,
Bixuan Cui
On Wed, Dec 01, 2021 at 07:26:43PM -0800, Andrew Morton wrote:
> On Thu, 2 Dec 2021 10:06:24 +0800 Bixuan Cui <[email protected]> wrote:
>
> > Delete the WARN_ON() and return NULL directly for oversized parameter
> > in kvmalloc() calls.
> > Also add unlikely().
> >
> > Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> > Signed-off-by: Bixuan Cui <[email protected]>
> > ---
> > There are a lot of oversize warnings and patches about kvmalloc() calls
> > recently. Maybe these warnings are not very necessary.
>
> Or maybe they are. Please let's take a look at these warnings, one at
> a time. If a large number of them are bogus then sure, let's disable
> the runtime test. But perhaps it's the case that calling code has
> genuine issues and should be repaired.
Andrew,
The problem is that this WARN_ON() is triggered by the users.
At least in the RDMA world, users can provide huge sizes and they expect
to get plain -ENOMEM and not dump stack, because it happens indirectly
to them.
In our case, these two kvcalloc() generates WARN_ON().
umem_odp->pfn_list = kvcalloc(
npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL);
if (!umem_odp->pfn_list)
return -ENOMEM;
umem_odp->dma_list = kvcalloc(
ndmas, sizeof(*umem_odp->dma_list), GFP_KERNEL);
https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L80
It is not a kernel programmer error to allow "oversized kvmalloc call" .
Thanks
On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> The problem is that this WARN_ON() is triggered by the users.
... or the problem is that you don't do a sanity check between the user
and the MM system. I mean, that's what this conversation is about --
is it a bug to be asking for this much memory in the first place?
> At least in the RDMA world, users can provide huge sizes and they expect
> to get plain -ENOMEM and not dump stack, because it happens indirectly
> to them.
>
> In our case, these two kvcalloc() generates WARN_ON().
>
> umem_odp->pfn_list = kvcalloc(
> npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL);
Does it really make sense for the user to specify 2^31 PFNs in a single
call? I mean, that's 8TB of memory. Should RDMA put its own limit
in here, or should it rely on kvmalloc returning -ENOMEM?
On Thu, Dec 2, 2021 at 2:38 AM Jeremy Sowden <[email protected]> wrote:
>
> On 2021-12-01, at 20:29:05 -0800, Andrew Morton wrote:
> > On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui wrote:
> > > 在 2021/12/2 上午11:26, Andrew Morton 写道:
> > > >> Delete the WARN_ON() and return NULL directly for oversized
> > > >> parameter in kvmalloc() calls.
> > > >> Also add unlikely().
> > > >>
> > > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> > > >> Signed-off-by: Bixuan Cui<[email protected]>
> > > >> ---
> > > >> There are a lot of oversize warnings and patches about kvmalloc()
> > > >> calls recently. Maybe these warnings are not very necessary.
> > > >
> > > > Or maybe they are. Please let's take a look at these warnings,
> > > > one at a time. If a large number of them are bogus then sure,
> > > > let's disable the runtime test. But perhaps it's the case that
> > > > calling code has genuine issues and should be repaired.
> > >
> > > Such as:
> >
> > Thanks, that's helpful.
> >
> > Let's bring all these to the attention of the relevant developers.
> >
> > If the consensus is "the code's fine, the warning is bogus" then let's
> > consider retiring the warning.
> >
> > If the consensus is otherwise then hopefully they will fix their stuff!
> >
> > > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e
> >
> > (cc [email protected])
> >
> > > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d
> >
> > (cc [email protected], maintainers)
> >
> > > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7
> >
> > (cc [email protected])
> >
> > > https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00
> >
> > (cc [email protected])
>
> The netfilter bug has since been fixed:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?id=7bbc3d385bd813077acaf0e6fdb2a86a901f5382
How is this a "fix" ?
u32 was the limit and because of the new warn the limit
got reduced to s32.
Every subsystem is supposed to do this "fix" now?
> > > https://syzkaller.appspot.com/bug?id=4c9ab8c7d0f8b551950db06559dc9cde4119ac83
> >
> > (bpf again).
>
> J.
On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote:
> On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> > The problem is that this WARN_ON() is triggered by the users.
>
> ... or the problem is that you don't do a sanity check between the user
> and the MM system. I mean, that's what this conversation is about --
> is it a bug to be asking for this much memory in the first place?
We do a lot of checks, and in this case, user provided valid input.
He asked size that doesn't cross his address space.
https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67
start = ALIGN_DOWN(umem_odp->umem.address, page_size);
if (check_add_overflow(umem_odp->umem.address,
(unsigned long)umem_odp->umem.length,
&end))
return -EOVERFLOW;
There is a feature called ODP (on-demand-paging) which is supported
in some RDMA NICs. It allows to the user "export" their whole address
space to the other RDMA node without pinning the pages. And once the
other node sends data to not-pinned page, the RDMA NIC will prefetch
it.
>
> > At least in the RDMA world, users can provide huge sizes and they expect
> > to get plain -ENOMEM and not dump stack, because it happens indirectly
> > to them.
> >
> > In our case, these two kvcalloc() generates WARN_ON().
> >
> > umem_odp->pfn_list = kvcalloc(
> > npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL);
>
> Does it really make sense for the user to specify 2^31 PFNs in a single
> call? I mean, that's 8TB of memory. Should RDMA put its own limit
> in here, or should it rely on kvmalloc returning -ENOMEM?
I heard about such systems with so many available RAM. I don't know
about their usage pattern, most likely they will use hugepages, but
it is my guess.
The thing is that it is not RDMA-specific. You are asking to place same
if (size > KVMALLOC...) check in all subsystems.
Thanks
On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote:
> On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> > The problem is that this WARN_ON() is triggered by the users.
>
> ... or the problem is that you don't do a sanity check between the user
> and the MM system. I mean, that's what this conversation is about --
> is it a bug to be asking for this much memory in the first place?
>
> > At least in the RDMA world, users can provide huge sizes and they expect
> > to get plain -ENOMEM and not dump stack, because it happens indirectly
> > to them.
> >
> > In our case, these two kvcalloc() generates WARN_ON().
> >
> > umem_odp->pfn_list = kvcalloc(
> > npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL);
>
> Does it really make sense for the user to specify 2^31 PFNs in a single
> call? I mean, that's 8TB of memory. Should RDMA put its own limit
> in here, or should it rely on kvmalloc returning -ENOMEM?
I wrote this - I don't think RDMA should put a limit here. What
limit would it use anyhow?
I'm pretty sure database people are already using low TB's here. It is
not absurd when you have DAX and the biggest user of ODP is with DAX.
If anything we might get to a point in a few years where the 2^31 is
too small and this has to be a better datastructure :\
Maybe an xarray and I should figure out how to use the multi-order
stuff to optimize huge pages?
I'd actually really like to get rid of it, just haven't figured out
how. The only purpose is to call set_page_dirty() and in many cases
the pfn should still be in the mm's page table. We also store another
copy of the PFN in the NIC's mapping. Surely one of these two could do
instead?
Jason
On Wed, Dec 01, 2021 at 07:46:01PM -0800, Kees Cook wrote:
> If we're rejecting the value, then it's still a pathological size, so
> shouldn't the check be happening in the caller? I think the WARN is
> doing exactly what it was supposed to do: find the places where bad
> sizes can reach vmalloc.
I think it meshes very poorly with the overflow work:
p = kzalloc(struct_size(p, regions, num_regions), GFP_KERNEL);
If num_regions is user controlled data why should the calling driver
hvae to somehow sanitize num_regions (without bugs!) instead of
relying on struct_size() and kzalloc() to contain all the sanitation?
What you are suggesting just pushes security sensitive coding into
drivers, which I think is the opposite of what we all want?
Jason
On Thu, Dec 02, 2021 at 06:08:40PM +0200, Leon Romanovsky wrote:
> On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote:
> > On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> > > The problem is that this WARN_ON() is triggered by the users.
> >
> > ... or the problem is that you don't do a sanity check between the user
> > and the MM system. I mean, that's what this conversation is about --
> > is it a bug to be asking for this much memory in the first place?
>
> We do a lot of checks, and in this case, user provided valid input.
> He asked size that doesn't cross his address space.
> https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67
>
> start = ALIGN_DOWN(umem_odp->umem.address, page_size);
> if (check_add_overflow(umem_odp->umem.address,
> (unsigned long)umem_odp->umem.length,
> &end))
> return -EOVERFLOW;
>
> There is a feature called ODP (on-demand-paging) which is supported
> in some RDMA NICs. It allows to the user "export" their whole address
> space to the other RDMA node without pinning the pages. And once the
> other node sends data to not-pinned page, the RDMA NIC will prefetch
> it.
I think we have two cases:
- limiting kvmalloc allocations to INT_MAX
- issuing a WARN when that limit is exceeded
The argument for the having the WARN is "that amount should never be
allocated so we want to find the pathological callers".
But if the actual issue is that >INT_MAX is _acceptable_, then we have
to do away with the entire check, not just the WARN.
--
Kees Cook
On Thu, Dec 02, 2021 at 11:08:34AM -0800, Kees Cook wrote:
> On Thu, Dec 02, 2021 at 06:08:40PM +0200, Leon Romanovsky wrote:
> > On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote:
> > > On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> > > > The problem is that this WARN_ON() is triggered by the users.
> > >
> > > ... or the problem is that you don't do a sanity check between the user
> > > and the MM system. I mean, that's what this conversation is about --
> > > is it a bug to be asking for this much memory in the first place?
> >
> > We do a lot of checks, and in this case, user provided valid input.
> > He asked size that doesn't cross his address space.
> > https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67
> >
> > start = ALIGN_DOWN(umem_odp->umem.address, page_size);
> > if (check_add_overflow(umem_odp->umem.address,
> > (unsigned long)umem_odp->umem.length,
> > &end))
> > return -EOVERFLOW;
> >
> > There is a feature called ODP (on-demand-paging) which is supported
> > in some RDMA NICs. It allows to the user "export" their whole address
> > space to the other RDMA node without pinning the pages. And once the
> > other node sends data to not-pinned page, the RDMA NIC will prefetch
> > it.
>
> I think we have two cases:
>
> - limiting kvmalloc allocations to INT_MAX
> - issuing a WARN when that limit is exceeded
>
> The argument for the having the WARN is "that amount should never be
> allocated so we want to find the pathological callers".
>
> But if the actual issue is that >INT_MAX is _acceptable_, then we have
> to do away with the entire check, not just the WARN.
First we need to get rid from WARN_ON(), which is completely safe thing to do.
Removal of the check can be done in second step as it will require audit
of whole kvmalloc* path.
Thanks
>
> --
> Kees Cook
On 2021-12-02, at 07:34:36 -0800, Alexei Starovoitov wrote:
> On Thu, Dec 2, 2021 at 2:38 AM Jeremy Sowden wrote:
> > On 2021-12-01, at 20:29:05 -0800, Andrew Morton wrote:
> > > On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui wrote:
> > > > 在 2021/12/2 上午11:26, Andrew Morton 写道:
> > > > >> Delete the WARN_ON() and return NULL directly for oversized
> > > > >> parameter in kvmalloc() calls.
> > > > >> Also add unlikely().
> > > > >>
> > > > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> > > > >> Signed-off-by: Bixuan Cui<[email protected]>
> > > > >> ---
> > > > >> There are a lot of oversize warnings and patches about kvmalloc()
> > > > >> calls recently. Maybe these warnings are not very necessary.
> > > > >
> > > > > Or maybe they are. Please let's take a look at these warnings,
> > > > > one at a time. If a large number of them are bogus then sure,
> > > > > let's disable the runtime test. But perhaps it's the case that
> > > > > calling code has genuine issues and should be repaired.
> > > >
> > > > Such as:
> > >
> > > Thanks, that's helpful.
> > >
> > > Let's bring all these to the attention of the relevant developers.
> > >
> > > If the consensus is "the code's fine, the warning is bogus" then let's
> > > consider retiring the warning.
> > >
> > > If the consensus is otherwise then hopefully they will fix their stuff!
> > >
> > > > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e
> > >
> > > (cc [email protected])
> > >
> > > > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d
> > >
> > > (cc [email protected], maintainers)
> > >
> > > > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7
> > >
> > > (cc [email protected])
> > >
> > > > https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00
> > >
> > > (cc [email protected])
> >
> > The netfilter bug has since been fixed:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?id=7bbc3d385bd813077acaf0e6fdb2a86a901f5382
>
> How is this a "fix" ?
> u32 was the limit and because of the new warn the limit
> got reduced to s32.
> Every subsystem is supposed to do this "fix" now?
My intention was only to provide information about what had been done in
the ipset case. In that case, there was already a check in place to
ensure that the requested hash-table size would not result in integer
overflow, and it was adjusted to reflect the limit imposed by the new
warning (one imagines that there is not much demand for hash-tables that
big).
I'm not familiar with the other cases, and so I would not presume to
make suggestions about whether those warnings were useful.
J.
On Thu, Dec 02, 2021 at 09:24:08PM +0200, Leon Romanovsky wrote:
> On Thu, Dec 02, 2021 at 11:08:34AM -0800, Kees Cook wrote:
> > On Thu, Dec 02, 2021 at 06:08:40PM +0200, Leon Romanovsky wrote:
> > > On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote:
> > > > On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> > > > > The problem is that this WARN_ON() is triggered by the users.
> > > >
> > > > ... or the problem is that you don't do a sanity check between the user
> > > > and the MM system. I mean, that's what this conversation is about --
> > > > is it a bug to be asking for this much memory in the first place?
> > >
> > > We do a lot of checks, and in this case, user provided valid input.
> > > He asked size that doesn't cross his address space.
> > > https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67
> > >
> > > start = ALIGN_DOWN(umem_odp->umem.address, page_size);
> > > if (check_add_overflow(umem_odp->umem.address,
> > > (unsigned long)umem_odp->umem.length,
> > > &end))
> > > return -EOVERFLOW;
> > >
> > > There is a feature called ODP (on-demand-paging) which is supported
> > > in some RDMA NICs. It allows to the user "export" their whole address
> > > space to the other RDMA node without pinning the pages. And once the
> > > other node sends data to not-pinned page, the RDMA NIC will prefetch
> > > it.
> >
> > I think we have two cases:
> >
> > - limiting kvmalloc allocations to INT_MAX
> > - issuing a WARN when that limit is exceeded
> >
> > The argument for the having the WARN is "that amount should never be
> > allocated so we want to find the pathological callers".
> >
> > But if the actual issue is that >INT_MAX is _acceptable_, then we have
> > to do away with the entire check, not just the WARN.
>
> First we need to get rid from WARN_ON(), which is completely safe thing to do.
>
> Removal of the check can be done in second step as it will require audit
> of whole kvmalloc* path.
If those are legit sizes, I'm fine with dropping the WARN. (But I still
think if they're legit sizes, we must also drop the INT_MAX limit.)
--
Kees Cook
On Thu, 2 Dec 2021 13:23:13 -0800 Kees Cook <[email protected]> wrote:
> > > I think we have two cases:
> > >
> > > - limiting kvmalloc allocations to INT_MAX
> > > - issuing a WARN when that limit is exceeded
> > >
> > > The argument for the having the WARN is "that amount should never be
> > > allocated so we want to find the pathological callers".
> > >
> > > But if the actual issue is that >INT_MAX is _acceptable_, then we have
> > > to do away with the entire check, not just the WARN.
> >
> > First we need to get rid from WARN_ON(), which is completely safe thing to do.
> >
> > Removal of the check can be done in second step as it will require audit
> > of whole kvmalloc* path.
>
> If those are legit sizes, I'm fine with dropping the WARN. (But I still
> think if they're legit sizes, we must also drop the INT_MAX limit.)
Can we suppress the WARN if the caller passed __GFP_NOWARN?
On Thu, Dec 02, 2021 at 02:03:43PM -0800, Andrew Morton wrote:
> On Thu, 2 Dec 2021 13:23:13 -0800 Kees Cook <[email protected]> wrote:
>
> > > > I think we have two cases:
> > > >
> > > > - limiting kvmalloc allocations to INT_MAX
> > > > - issuing a WARN when that limit is exceeded
> > > >
> > > > The argument for the having the WARN is "that amount should never be
> > > > allocated so we want to find the pathological callers".
> > > >
> > > > But if the actual issue is that >INT_MAX is _acceptable_, then we have
> > > > to do away with the entire check, not just the WARN.
> > >
> > > First we need to get rid from WARN_ON(), which is completely safe thing to do.
> > >
> > > Removal of the check can be done in second step as it will require audit
> > > of whole kvmalloc* path.
> >
> > If those are legit sizes, I'm fine with dropping the WARN. (But I still
> > think if they're legit sizes, we must also drop the INT_MAX limit.)
>
> Can we suppress the WARN if the caller passed __GFP_NOWARN?
I don't think that's a good idea. NOWARN is for allocation failure
messages whereas this warning is more of a "You're doing something
wrong" -- ENOMEM vs EINVAL.
I'm still agnostic on whether this should be a check at all, or whether
we should let people kvmalloc(20GB). But I don't like conditioning the
warning on GFP_NOWARN.
+Paolo, I'm pretty sure he's still not subscribed to the KVM mailing list :-)
On Wed, Dec 01, 2021, Andrew Morton wrote:
> On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui <[email protected]> wrote:
>
> >
> > 在 2021/12/2 上午11:26, Andrew Morton 写道:
> > >> Delete the WARN_ON() and return NULL directly for oversized parameter
> > >> in kvmalloc() calls.
> > >> Also add unlikely().
> > >>
> > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> > >> Signed-off-by: Bixuan Cui<[email protected]>
> > >> ---
> > >> There are a lot of oversize warnings and patches about kvmalloc() calls
> > >> recently. Maybe these warnings are not very necessary.
> > > Or maybe they are. Please let's take a look at these warnings, one at
> > > a time. If a large number of them are bogus then sure, let's disable
> > > the runtime test. But perhaps it's the case that calling code has
> > > genuine issues and should be repaired.
> > Such as:
>
> Thanks, that's helpful.
>
> Let's bring all these to the attention of the relevant developers.
>
> If the consensus is "the code's fine, the warning is bogus" then let's
> consider retiring the warning.
>
> If the consensus is otherwise then hopefully they will fix their stuff!
>
>
>
> > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e
>
> (cc [email protected])
>
> > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d
>
> (cc [email protected], maintainers)
>
> > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7
>
> (cc [email protected])
Paolo posted patches to resolve the KVM issues, but I don't think they ever got
applied.
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/
On Thu, Dec 02, 2021 at 10:06:24AM +0800, Bixuan Cui wrote:
> Delete the WARN_ON() and return NULL directly for oversized parameter
> in kvmalloc() calls.
> Also add unlikely().
>
> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> Signed-off-by: Bixuan Cui <[email protected]>
How can we progress with this patch?
From this discussion, at least for RDMA, KVM and BPF, this huge size is legit
and WARN_ON() can be seen as false alarm.
Thanks