After switched page size from 64KB to 4KB on several arm64 servers here,
kmemleak starts to run out of early memory pool due to a huge number of
those early_pgtable_alloc() calls:
kmemleak_alloc_phys()
memblock_alloc_range_nid()
memblock_phys_alloc_range()
early_pgtable_alloc()
init_pmd()
alloc_init_pud()
__create_pgd_mapping()
__map_memblock()
paging_init()
setup_arch()
start_kernel()
Increased the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE by 4 times
won't be enough for a server with 200GB+ memory. There isn't much
interesting to check memory leaks for those early page tables and those
early memory mappings should not reference to other memory. Hence, no
kmemleak false positives, and we can safely skip tracking those early
allocations from kmemleak like we did in the commit fed84c785270
("mm/memblock.c: skip kmemleak for kasan_init()") without needing to
introduce complications to automatically scale the value depends on the
runtime memory size etc. After the patch, the default value of
DEBUG_KMEMLEAK_MEM_POOL_SIZE becomes sufficient again.
Signed-off-by: Qian Cai <[email protected]>
---
arch/arm64/mm/mmu.c | 3 ++-
include/linux/memblock.h | 1 +
mm/memblock.c | 10 +++++++---
3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d77bf06d6a6d..4d3cfbaa92a7 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -96,7 +96,8 @@ static phys_addr_t __init early_pgtable_alloc(int shift)
phys_addr_t phys;
void *ptr;
- phys = memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
+ phys = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0,
+ MEMBLOCK_ALLOC_PGTABLE);
if (!phys)
panic("Failed to allocate page table page\n");
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 7df557b16c1e..de903055b01c 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -390,6 +390,7 @@ static inline int memblock_get_region_node(const struct memblock_region *r)
#define MEMBLOCK_ALLOC_ANYWHERE (~(phys_addr_t)0)
#define MEMBLOCK_ALLOC_ACCESSIBLE 0
#define MEMBLOCK_ALLOC_KASAN 1
+#define MEMBLOCK_ALLOC_PGTABLE 2
/* We are using top down, so it is safe to use 0 here */
#define MEMBLOCK_LOW_LIMIT 0
diff --git a/mm/memblock.c b/mm/memblock.c
index 659bf0ffb086..13bc56a641c0 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -287,7 +287,8 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
{
/* pump up @end */
if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
- end == MEMBLOCK_ALLOC_KASAN)
+ end == MEMBLOCK_ALLOC_KASAN ||
+ end == MEMBLOCK_ALLOC_PGTABLE)
end = memblock.current_limit;
/* avoid allocating the first page */
@@ -1387,8 +1388,11 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
return 0;
done:
- /* Skip kmemleak for kasan_init() due to high volume. */
- if (end != MEMBLOCK_ALLOC_KASAN)
+ /*
+ * Skip kmemleak for kasan_init() and early_pgtable_alloc() due to high
+ * volume.
+ */
+ if (end != MEMBLOCK_ALLOC_KASAN && end != MEMBLOCK_ALLOC_PGTABLE)
/*
* The min_count is set to 0 so that memblock allocated
* blocks are never reported as leaks. This is because many
--
2.30.2
On Thu, Nov 04, 2021 at 11:56:23AM -0400, Qian Cai wrote:
> After switched page size from 64KB to 4KB on several arm64 servers here,
> kmemleak starts to run out of early memory pool due to a huge number of
> those early_pgtable_alloc() calls:
>
> kmemleak_alloc_phys()
> memblock_alloc_range_nid()
> memblock_phys_alloc_range()
> early_pgtable_alloc()
> init_pmd()
> alloc_init_pud()
> __create_pgd_mapping()
> __map_memblock()
> paging_init()
> setup_arch()
> start_kernel()
>
> Increased the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE by 4 times
> won't be enough for a server with 200GB+ memory. There isn't much
> interesting to check memory leaks for those early page tables and those
> early memory mappings should not reference to other memory. Hence, no
> kmemleak false positives, and we can safely skip tracking those early
> allocations from kmemleak like we did in the commit fed84c785270
> ("mm/memblock.c: skip kmemleak for kasan_init()") without needing to
> introduce complications to automatically scale the value depends on the
> runtime memory size etc. After the patch, the default value of
> DEBUG_KMEMLEAK_MEM_POOL_SIZE becomes sufficient again.
>
> Signed-off-by: Qian Cai <[email protected]>
> ---
> arch/arm64/mm/mmu.c | 3 ++-
> include/linux/memblock.h | 1 +
> mm/memblock.c | 10 +++++++---
> 3 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index d77bf06d6a6d..4d3cfbaa92a7 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -96,7 +96,8 @@ static phys_addr_t __init early_pgtable_alloc(int shift)
> phys_addr_t phys;
> void *ptr;
>
> - phys = memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
> + phys = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0,
> + MEMBLOCK_ALLOC_PGTABLE);
> if (!phys)
> panic("Failed to allocate page table page\n");
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 7df557b16c1e..de903055b01c 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -390,6 +390,7 @@ static inline int memblock_get_region_node(const struct memblock_region *r)
> #define MEMBLOCK_ALLOC_ANYWHERE (~(phys_addr_t)0)
> #define MEMBLOCK_ALLOC_ACCESSIBLE 0
> #define MEMBLOCK_ALLOC_KASAN 1
> +#define MEMBLOCK_ALLOC_PGTABLE 2
>
> /* We are using top down, so it is safe to use 0 here */
> #define MEMBLOCK_LOW_LIMIT 0
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 659bf0ffb086..13bc56a641c0 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -287,7 +287,8 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
> {
> /* pump up @end */
> if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
> - end == MEMBLOCK_ALLOC_KASAN)
> + end == MEMBLOCK_ALLOC_KASAN ||
> + end == MEMBLOCK_ALLOC_PGTABLE)
I think I'll be better to rename MEMBLOCK_ALLOC_KASAN to, say,
MEMBLOCK_ALLOC_NOKMEMLEAK and use that for both KASAN and page table cases.
But more generally, we are going to hit this again and again.
Couldn't we add a memblock allocation as a mean to get more memory to
kmemleak::mem_pool_alloc()?
> end = memblock.current_limit;
>
> /* avoid allocating the first page */
> @@ -1387,8 +1388,11 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
> return 0;
>
> done:
> - /* Skip kmemleak for kasan_init() due to high volume. */
> - if (end != MEMBLOCK_ALLOC_KASAN)
> + /*
> + * Skip kmemleak for kasan_init() and early_pgtable_alloc() due to high
> + * volume.
> + */
> + if (end != MEMBLOCK_ALLOC_KASAN && end != MEMBLOCK_ALLOC_PGTABLE)
> /*
> * The min_count is set to 0 so that memblock allocated
> * blocks are never reported as leaks. This is because many
> --
> 2.30.2
>
--
Sincerely yours,
Mike.
On Thu, Nov 04, 2021 at 11:56:23AM -0400, Qian Cai wrote:
> After switched page size from 64KB to 4KB on several arm64 servers here,
> kmemleak starts to run out of early memory pool due to a huge number of
> those early_pgtable_alloc() calls:
>
> kmemleak_alloc_phys()
> memblock_alloc_range_nid()
> memblock_phys_alloc_range()
> early_pgtable_alloc()
> init_pmd()
> alloc_init_pud()
> __create_pgd_mapping()
> __map_memblock()
> paging_init()
> setup_arch()
> start_kernel()
>
> Increased the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE by 4 times
> won't be enough for a server with 200GB+ memory. There isn't much
> interesting to check memory leaks for those early page tables and those
> early memory mappings should not reference to other memory. Hence, no
> kmemleak false positives, and we can safely skip tracking those early
> allocations from kmemleak like we did in the commit fed84c785270
> ("mm/memblock.c: skip kmemleak for kasan_init()") without needing to
> introduce complications to automatically scale the value depends on the
> runtime memory size etc. After the patch, the default value of
> DEBUG_KMEMLEAK_MEM_POOL_SIZE becomes sufficient again.
>
> Signed-off-by: Qian Cai <[email protected]>
Looks fine to me:
Reviewed-by: Catalin Marinas <[email protected]>
On 11/4/21 1:06 PM, Mike Rapoport wrote:
> I think I'll be better to rename MEMBLOCK_ALLOC_KASAN to, say,
> MEMBLOCK_ALLOC_NOKMEMLEAK and use that for both KASAN and page table cases.
Okay, that would look a bit nicer.
> But more generally, we are going to hit this again and again.
> Couldn't we add a memblock allocation as a mean to get more memory to
> kmemleak::mem_pool_alloc()?
For the last 5 years, this is the second time I am ware of this kind of
issue just because of the 64KB->4KB switch on those servers, although I
agree it could happen again in the future due to some new debugging
features etc. I don't feel a strong need to rewrite it now though. Not
sure if Catalin saw things differently. Anyway, Mike, do you agree that
we could rewrite that separately in the future?
On Thu, Nov 04, 2021 at 01:57:03PM -0400, Qian Cai wrote:
> On 11/4/21 1:06 PM, Mike Rapoport wrote:
> > I think I'll be better to rename MEMBLOCK_ALLOC_KASAN to, say,
> > MEMBLOCK_ALLOC_NOKMEMLEAK and use that for both KASAN and page table cases.
>
> Okay, that would look a bit nicer.
Or MEMBLOCK_ALLOC_ACCESSIBLE_NOLEAKTRACE to match SLAB_NOLEAKTRACE and
also hint that it's accessible memory.
> > But more generally, we are going to hit this again and again.
> > Couldn't we add a memblock allocation as a mean to get more memory to
> > kmemleak::mem_pool_alloc()?
>
> For the last 5 years, this is the second time I am ware of this kind of
> issue just because of the 64KB->4KB switch on those servers, although I
> agree it could happen again in the future due to some new debugging
> features etc. I don't feel a strong need to rewrite it now though. Not
> sure if Catalin saw things differently. Anyway, Mike, do you agree that
> we could rewrite that separately in the future?
I was talking to Mike on IRC last night and I think you still need a
flag, otherwise you could get a recursive memblock -> kmemleak ->
memblock call (that's why we have SLAB_NOLEAKTRACE). So for the time
being, a new MEMBLOCK_* definition would do.
I wonder whether we could actually use the bottom bits in the end/limit
as actual flags so one can do (MEMBLOCK_ALLOC_ACCESSIBLE |
MEMBLOCK_NOLEAKTRACE). But that could be for a separate clean-up.
--
Catalin
On Thu, Nov 04, 2021 at 01:57:03PM -0400, Qian Cai wrote:
>
>
> On 11/4/21 1:06 PM, Mike Rapoport wrote:
> > I think I'll be better to rename MEMBLOCK_ALLOC_KASAN to, say,
> > MEMBLOCK_ALLOC_NOKMEMLEAK and use that for both KASAN and page table cases.
>
> Okay, that would look a bit nicer.
>
> > But more generally, we are going to hit this again and again.
> > Couldn't we add a memblock allocation as a mean to get more memory to
> > kmemleak::mem_pool_alloc()?
>
> For the last 5 years, this is the second time I am ware of this kind of
> issue just because of the 64KB->4KB switch on those servers, although I
> agree it could happen again in the future due to some new debugging
> features etc. I don't feel a strong need to rewrite it now though. Not
> sure if Catalin saw things differently. Anyway, Mike, do you agree that
> we could rewrite that separately in the future?
Yeah, the rework can definitely go on top.
--
Sincerely yours,
Mike.
On Fri, Nov 05, 2021 at 10:08:05AM +0000, Catalin Marinas wrote:
> On Thu, Nov 04, 2021 at 01:57:03PM -0400, Qian Cai wrote:
> > On 11/4/21 1:06 PM, Mike Rapoport wrote:
> > > I think I'll be better to rename MEMBLOCK_ALLOC_KASAN to, say,
> > > MEMBLOCK_ALLOC_NOKMEMLEAK and use that for both KASAN and page table cases.
> >
> > Okay, that would look a bit nicer.
>
> Or MEMBLOCK_ALLOC_ACCESSIBLE_NOLEAKTRACE to match SLAB_NOLEAKTRACE and
> also hint that it's accessible memory.
Hmm, I think MEMBLOCK_ALLOC_NOLEAKTRACE is enough. Having a constant
instead of end limit already implies there is no limit and when we update
the API to use lower bits or a dedicated 'flags' we won't need to change
the flag name as well.
> > > But more generally, we are going to hit this again and again.
> > > Couldn't we add a memblock allocation as a mean to get more memory to
> > > kmemleak::mem_pool_alloc()?
> >
> > For the last 5 years, this is the second time I am ware of this kind of
> > issue just because of the 64KB->4KB switch on those servers, although I
> > agree it could happen again in the future due to some new debugging
> > features etc. I don't feel a strong need to rewrite it now though. Not
> > sure if Catalin saw things differently. Anyway, Mike, do you agree that
> > we could rewrite that separately in the future?
>
> I was talking to Mike on IRC last night and I think you still need a
> flag, otherwise you could get a recursive memblock -> kmemleak ->
> memblock call (that's why we have SLAB_NOLEAKTRACE). So for the time
> being, a new MEMBLOCK_* definition would do.
>
> I wonder whether we could actually use the bottom bits in the end/limit
> as actual flags so one can do (MEMBLOCK_ALLOC_ACCESSIBLE |
> MEMBLOCK_NOLEAKTRACE). But that could be for a separate clean-up.
We never restricted end/limit to be on a word boundary, but I doubt that in
practice we'd ever have the low bits set.
I'm not entirely happy with using end limit parameter for this, I'd like to
see how much churn it will be to extend some of memblock_*_alloc with an
explicit flags parameter.
--
Sincerely yours,
Mike.