2020-09-29 14:25:46

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

On Mon, Sep 21, 2020 at 03:26:02PM +0200, Marco Elver wrote:
> From: Alexander Potapenko <[email protected]>
>
> This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> low-overhead sampling-based memory safety error detector of heap
> use-after-free, invalid-free, and out-of-bounds access errors.
>
> KFENCE is designed to be enabled in production kernels, and has near
> zero performance overhead. Compared to KASAN, KFENCE trades performance
> for precision. The main motivation behind KFENCE's design, is that with
> enough total uptime KFENCE will detect bugs in code paths not typically
> exercised by non-production test workloads. One way to quickly achieve a
> large enough total uptime is when the tool is deployed across a large
> fleet of machines.
>
> KFENCE objects each reside on a dedicated page, at either the left or
> right page boundaries. The pages to the left and right of the object
> page are "guard pages", whose attributes are changed to a protected
> state, and cause page faults on any attempted access to them. Such page
> faults are then intercepted by KFENCE, which handles the fault
> gracefully by reporting a memory access error. To detect out-of-bounds
> writes to memory within the object's page itself, KFENCE also uses
> pattern-based redzones. The following figure illustrates the page
> layout:
>
> ---+-----------+-----------+-----------+-----------+-----------+---
> | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx |
> | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx |
> | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x |
> | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx |
> | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx |
> | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |
> ---+-----------+-----------+-----------+-----------+-----------+---
>
> Guarded allocations are set up based on a sample interval (can be set
> via kfence.sample_interval). After expiration of the sample interval, a
> guarded allocation from the KFENCE object pool is returned to the main
> allocator (SLAB or SLUB). At this point, the timer is reset, and the
> next allocation is set up after the expiration of the interval.

From other sub-threads it sounds like these addresses are not part of
the linear/direct map. Having kmalloc return addresses outside of the
linear map is going to break anything that relies on virt<->phys
conversions, and is liable to make DMA corrupt memory. There were
problems of that sort with VMAP_STACK, and this is why kvmalloc() is
separate from kmalloc().

Have you tested with CONFIG_DEBUG_VIRTUAL? I'd expect that to scream.

I strongly suspect this isn't going to be safe unless you always use an
in-place carevout from the linear map (which could be the linear alias
of a static carevout).

[...]

> +static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
> +{
> + return static_branch_unlikely(&kfence_allocation_key) ? __kfence_alloc(s, size, flags) :
> + NULL;
> +}

Minor (unrelated) nit, but this would be easier to read as:

static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
{
if (static_branch_unlikely(&kfence_allocation_key))
return __kfence_alloc(s, size, flags);
return NULL;
}

Thanks,
Mark.


2020-09-29 14:55:06

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

On Tue, 29 Sep 2020 at 16:24, Mark Rutland <[email protected]> wrote:
[...]
>
> From other sub-threads it sounds like these addresses are not part of
> the linear/direct map. Having kmalloc return addresses outside of the
> linear map is going to break anything that relies on virt<->phys
> conversions, and is liable to make DMA corrupt memory. There were
> problems of that sort with VMAP_STACK, and this is why kvmalloc() is
> separate from kmalloc().
>
> Have you tested with CONFIG_DEBUG_VIRTUAL? I'd expect that to scream.
>
> I strongly suspect this isn't going to be safe unless you always use an
> in-place carevout from the linear map (which could be the linear alias
> of a static carevout).

That's an excellent point, thank you! Indeed, on arm64, a version with
naive static-pool screams with CONFIG_DEBUG_VIRTUAL.

We'll try to put together an arm64 version using a carveout as you suggest.

> [...]
>
> > +static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
> > +{
> > + return static_branch_unlikely(&kfence_allocation_key) ? __kfence_alloc(s, size, flags) :
> > + NULL;
> > +}
>
> Minor (unrelated) nit, but this would be easier to read as:
>
> static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
> {
> if (static_branch_unlikely(&kfence_allocation_key))
> return __kfence_alloc(s, size, flags);
> return NULL;
> }

Will fix for v5.

Thanks,
-- Marco

2020-09-29 15:07:57

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

On Tue, Sep 29, 2020 at 04:51:29PM +0200, Marco Elver wrote:
> On Tue, 29 Sep 2020 at 16:24, Mark Rutland <[email protected]> wrote:
> [...]
> >
> > From other sub-threads it sounds like these addresses are not part of
> > the linear/direct map. Having kmalloc return addresses outside of the
> > linear map is going to break anything that relies on virt<->phys
> > conversions, and is liable to make DMA corrupt memory. There were
> > problems of that sort with VMAP_STACK, and this is why kvmalloc() is
> > separate from kmalloc().
> >
> > Have you tested with CONFIG_DEBUG_VIRTUAL? I'd expect that to scream.
> >
> > I strongly suspect this isn't going to be safe unless you always use an
> > in-place carevout from the linear map (which could be the linear alias
> > of a static carevout).
>
> That's an excellent point, thank you! Indeed, on arm64, a version with
> naive static-pool screams with CONFIG_DEBUG_VIRTUAL.
>
> We'll try to put together an arm64 version using a carveout as you suggest.

Great, thanks!

Just to be clear, the concerns for DMA and virt<->phys conversions also
apply to x86 (the x86 virt<->phys conversion behaviour is more forgiving
in the common case, but still has cases that can go wrong).

Other than the code to initialize the page tables for the careveout, I
think the carevout code can be geenric.

Thanks,
Mark.

2020-10-05 16:04:45

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

On Tue, Sep 29, 2020 at 5:06 PM Mark Rutland <[email protected]> wrote:
>
> On Tue, Sep 29, 2020 at 04:51:29PM +0200, Marco Elver wrote:
> > On Tue, 29 Sep 2020 at 16:24, Mark Rutland <[email protected]> wrote:
> > [...]
> > >
> > > From other sub-threads it sounds like these addresses are not part of
> > > the linear/direct map. Having kmalloc return addresses outside of the
> > > linear map is going to break anything that relies on virt<->phys
> > > conversions, and is liable to make DMA corrupt memory. There were
> > > problems of that sort with VMAP_STACK, and this is why kvmalloc() is
> > > separate from kmalloc().
> > >
> > > Have you tested with CONFIG_DEBUG_VIRTUAL? I'd expect that to scream.
> > >
> > > I strongly suspect this isn't going to be safe unless you always use an
> > > in-place carevout from the linear map (which could be the linear alias
> > > of a static carevout).
> >
> > That's an excellent point, thank you! Indeed, on arm64, a version with
> > naive static-pool screams with CONFIG_DEBUG_VIRTUAL.
> >
> > We'll try to put together an arm64 version using a carveout as you suggest.
>
> Great, thanks!
>
> Just to be clear, the concerns for DMA and virt<->phys conversions also
> apply to x86 (the x86 virt<->phys conversion behaviour is more forgiving
> in the common case, but still has cases that can go wrong).

To clarify, shouldn't kmalloc/kmem_cache allocations used with DMA be
allocated with explicit GFP_DMA?
If so, how practical would it be to just skip such allocations in
KFENCE allocator?

2020-10-05 17:11:54

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

On Mon, Oct 5, 2020 at 6:01 PM Alexander Potapenko <[email protected]> wrote:
>
> On Tue, Sep 29, 2020 at 5:06 PM Mark Rutland <[email protected]> wrote:
> >
> > On Tue, Sep 29, 2020 at 04:51:29PM +0200, Marco Elver wrote:
> > > On Tue, 29 Sep 2020 at 16:24, Mark Rutland <[email protected]> wrote:
> > > [...]
> > > >
> > > > From other sub-threads it sounds like these addresses are not part of
> > > > the linear/direct map. Having kmalloc return addresses outside of the
> > > > linear map is going to break anything that relies on virt<->phys
> > > > conversions, and is liable to make DMA corrupt memory. There were
> > > > problems of that sort with VMAP_STACK, and this is why kvmalloc() is
> > > > separate from kmalloc().
> > > >
> > > > Have you tested with CONFIG_DEBUG_VIRTUAL? I'd expect that to scream.
> > > >
> > > > I strongly suspect this isn't going to be safe unless you always use an
> > > > in-place carevout from the linear map (which could be the linear alias
> > > > of a static carevout).
> > >
> > > That's an excellent point, thank you! Indeed, on arm64, a version with
> > > naive static-pool screams with CONFIG_DEBUG_VIRTUAL.
> > >
> > > We'll try to put together an arm64 version using a carveout as you suggest.
> >
> > Great, thanks!
> >
> > Just to be clear, the concerns for DMA and virt<->phys conversions also
> > apply to x86 (the x86 virt<->phys conversion behaviour is more forgiving
> > in the common case, but still has cases that can go wrong).
>
> To clarify, shouldn't kmalloc/kmem_cache allocations used with DMA be
> allocated with explicit GFP_DMA?
> If so, how practical would it be to just skip such allocations in
> KFENCE allocator?

AFAIK GFP_DMA doesn't really mean "I will use this allocation for
DMA"; it means "I will use this allocation for DMA using some ancient
hardware (e.g. stuff on the ISA bus?) that only supports 16-bit
physical addresses (or maybe different limits on other
architectures)".
There's also GFP_DMA32, which means the same thing, except with 32-bit
physical addresses.

You can see in e.g. __dma_direct_alloc_pages() that the GFP_DMA32 and
GFP_DMA flags are only used if the hardware can't address the full
physical address space supported by the CPU.