When trying to boot a device with an ARM64 kernel with the following
config options enabled:
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_DEBUG_KMEMLEAK=y
a page-fault is encountered when kmemleak starts to scan the list of gray
or allocated objects that it maintains. Upon closer inspection, it was
observed that these page-faults always occurred when kmemleak attempted
to scan a CMA region.
At the moment, kmemleak is made aware of CMA regions that are specified
through the devicetree to be created at specific memory addresses or
dynamically allocated within a range of addresses. However, if the
CMA region is constrained to a certain range of addresses through the
command line, the region is reserved through the memblock_reserve()
function, but kmemleak_alloc_phys() is not invoked. Furthermore,
kmemleak is never informed about CMA regions being freed to buddy at
boot, which is problematic when CONFIG_DEBUG_PAGEALLOC is enabled, as
all CMA regions are unmapped from the kernel's address space, and
subsequently causes a page-fault when kmemleak attempts to scan any
of them.
This series makes it so that kmemleak is aware of every CMA region before
they are freed to the buddy allocator, so that at that time, kmemleak
can be informed that each region is about to be freed, and thus it
should not attempt to scan those regions.
Isaac J. Manjarres (2):
mm/cma.c: Make kmemleak aware of all CMA regions
mm/cma.c: Delete kmemleak objects when freeing CMA areas to buddy at
boot
mm/cma.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
--
2.39.0.314.g84b9a713c41-goog
Currently, kmemleak tracks CMA regions that are specified through the
devicetree. However, if the global CMA region is specified through
the commandline, kmemleak will be unaware of the CMA region because
kmemleak_alloc_phys() is not invoked after memblock_reserve(). Add
the missing call to kmemleak_alloc_phys() so that all CMA regions are
tracked by kmemleak before they are freed to the page allocator in
cma_activate_area().
Cc: [email protected]
Signed-off-by: Isaac J. Manjarres <[email protected]>
---
mm/cma.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/cma.c b/mm/cma.c
index 4a978e09547a..674b7fdd563e 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -318,6 +318,8 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
ret = -EBUSY;
goto err;
}
+
+ kmemleak_alloc_phys(base, size, 0);
} else {
phys_addr_t addr = 0;
--
2.39.0.314.g84b9a713c41-goog
Hi Isaac,
Please cc me on kmemleak patches. I only noticed when Andrew picket them
up.
On Mon, Jan 09, 2023 at 02:16:21PM -0800, Isaac J. Manjarres wrote:
> When trying to boot a device with an ARM64 kernel with the following
> config options enabled:
>
> CONFIG_DEBUG_PAGEALLOC=y
> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
> CONFIG_DEBUG_KMEMLEAK=y
>
> a page-fault is encountered when kmemleak starts to scan the list of gray
> or allocated objects that it maintains. Upon closer inspection, it was
> observed that these page-faults always occurred when kmemleak attempted
> to scan a CMA region.
What I don't understand is why kmemleak scans such CMA regions. The only
reason for a kmemleak_ignore_phys() call in cma_declare_contiguous_nid()
is because the kmemleak_alloc_phys() hook was called on the
memblock_alloc_range_nid() path, so we don't want this scanned.
Do you have a backtrace?
> At the moment, kmemleak is made aware of CMA regions that are specified
> through the devicetree to be created at specific memory addresses or
> dynamically allocated within a range of addresses. However, if the
> CMA region is constrained to a certain range of addresses through the
> command line, the region is reserved through the memblock_reserve()
> function, but kmemleak_alloc_phys() is not invoked.
The combination of kmemleak_alloc_phys() + kmemleak_free_part_phys() in
your series is equivalent to not adding it at all in the first place.
> Furthermore,
> kmemleak is never informed about CMA regions being freed to buddy at
> boot, which is problematic when CONFIG_DEBUG_PAGEALLOC is enabled, as
> all CMA regions are unmapped from the kernel's address space, and
> subsequently causes a page-fault when kmemleak attempts to scan any
> of them.
kmemleak would only scan such objects if it knows about them. So I think
it's only the case where CMA does a memblock allocation. The
kmemleak_ignore_phys() should tell kmemleak not to touch this region but
it's probably better to just free it altogether (i.e. replace the ignore
with the free kmemleak callback). Would this be sufficient for your
scenario?
> This series makes it so that kmemleak is aware of every CMA region before
> they are freed to the buddy allocator, so that at that time, kmemleak
> can be informed that each region is about to be freed, and thus it
> should not attempt to scan those regions.
I may be missing something but I don't get why kmemleak needs to be
informed only to tell kmemleak shortly after to remove them from its
list of objects.
--
Catalin
On Wed, Jan 18, 2023 at 05:16:46PM +0000, Catalin Marinas wrote:
> Hi Isaac,
>
> Please cc me on kmemleak patches. I only noticed when Andrew picket them
> up.
Will do, sorry about that.
>
> What I don't understand is why kmemleak scans such CMA regions. The only
> reason for a kmemleak_ignore_phys() call in cma_declare_contiguous_nid()
> is because the kmemleak_alloc_phys() hook was called on the
> memblock_alloc_range_nid() path, so we don't want this scanned.
The reason is because kmemleak_ignore_phys() is only called within
cma_declare_contiguous_nid(), which is not called for every CMA region.
For instance, CMA regions which are specified through the devicetree
and not constrained to a fixed address are allocated through
early_init_dt_alloc_reserved_memory_arch(), which eventually calls
kmemleak_alloc_phys() through memblock_phys_alloc_range().
When the CMA region is constrained to a particular address, it is allocated
through early_init_dt_reserve_memory(), which is followed up by a call to
kmemleak_alloc_phys() due to this commit:
https://lore.kernel.org/all/[email protected]/T/#u
I'm not sure if that commit is appropriate, given that reserved regions
that still have their direct mappings intact may be used for DMA, which
isn't appropriate for kmemleak scanning.
In any one of these cases, the kmemleak object is never removed, nor is
kmemleak told to ignore it, so it ends up scanning it later.
> Do you have a backtrace?
Yes:
[ 61.155981][ T97] Unable to handle kernel paging request at virtual address ...
[ 61.156241][ T97] Hardware name: Oriole EVT 1.1 (DT)
[ 61.156243][ T97] pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 61.156246][ T97] pc : scan_block+0xbc/0x3c8
[ 61.156253][ T97] lr : scan_block+0xac/0x3c8
[ 61.156291][ T97] Call trace:
[ 61.156293][ T97] scan_block+0xbc/0x3c8
[ 61.156296][ T97] scan_gray_list+0x130/0x23c
[ 61.156299][ T97] kmemleak_scan+0x408/0x71c
[ 61.156302][ T97] kmemleak_scan_thread+0xc0/0xf0
[ 61.156306][ T97] kthread+0x114/0x15c
[ 61.156311][ T97] ret_from_fork+0x10/0x20
when I looked at the PTE from the page table walk, I saw that the virtual
address mapped to the physical address within a CMA region, and that the
page was unmapped (because of CONFIG_DEBUG_PAGEALLOC).
> kmemleak would only scan such objects if it knows about them. So I think
> it's only the case where CMA does a memblock allocation. The
> kmemleak_ignore_phys() should tell kmemleak not to touch this region but
> it's probably better to just free it altogether (i.e. replace the ignore
> with the free kmemleak callback). Would this be sufficient for your
> scenario?
I agree that freeing the kmemleak object is a better strategy. However,
replacing the call to kmemleak_ignore_phys() wouldn't be sufficient,
as there are other scenarios that would still leave behind kmemleak
objects to be scanned. That's why I ended up freeing the kmemleak object
in a path that is common for all CMA areas.
> I may be missing something but I don't get why kmemleak needs to be
> informed only to tell kmemleak shortly after to remove them from its
> list of objects.
That's a good point, and I agree that it's pointless; ideally whatever
way the CMA region is allocated would not inform kmemleak, and we
wouldn't have to deal with kmemleak.
We could use a flag like MEMBLOCK_ALLOC_NOLEAKTRACE to tell
memblock to skip the call to kmemleak_alloc_phys(), but that wouldn't
work as is, since MEMBLOCK_ALLOC_NOLEAKTRACE is meant to be used as the
"end" parameter for memblock_alloc() and friends. For CMA regions "end"
can be used (e.g. the case where the CMA region is supposed to be
within a certain range).
Perhaps we should change memblock to take MEMBLOCK_ALLOC_NOLEAKTRACE as
a flag under a "flags" argument instead?
--Isaac
On Thu, Jan 19, 2023 at 04:20:56PM -0800, Isaac Manjarres wrote:
> On Wed, Jan 18, 2023 at 05:16:46PM +0000, Catalin Marinas wrote:
> > What I don't understand is why kmemleak scans such CMA regions. The only
> > reason for a kmemleak_ignore_phys() call in cma_declare_contiguous_nid()
> > is because the kmemleak_alloc_phys() hook was called on the
> > memblock_alloc_range_nid() path, so we don't want this scanned.
> The reason is because kmemleak_ignore_phys() is only called within
> cma_declare_contiguous_nid(), which is not called for every CMA region.
>
> For instance, CMA regions which are specified through the devicetree
> and not constrained to a fixed address are allocated through
> early_init_dt_alloc_reserved_memory_arch(), which eventually calls
> kmemleak_alloc_phys() through memblock_phys_alloc_range().
>
> When the CMA region is constrained to a particular address, it is allocated
> through early_init_dt_reserve_memory(), which is followed up by a call to
> kmemleak_alloc_phys() due to this commit:
> https://lore.kernel.org/all/[email protected]/T/#u
Thanks for digging this out. This patch shouldn't have ended up upstream
(commit 972fa3a7c17c "mm: kmemleak: alloc gray object for reserved
region with direct map"). I thought both Calvin Zhang and I agreed that
it's not the correct approach (not even sure there was a real problem to
fix).
Do you still get the any faults with the above commit reverted? I'd
prefer this if it works rather than adding unnecessary
kmemleak_alloc/free callbacks that pretty much cancel each-other.
> I'm not sure if that commit is appropriate, given that reserved regions
> that still have their direct mappings intact may be used for DMA, which
> isn't appropriate for kmemleak scanning.
It's not. I think it should be reverted.
> > kmemleak would only scan such objects if it knows about them. So I think
> > it's only the case where CMA does a memblock allocation. The
> > kmemleak_ignore_phys() should tell kmemleak not to touch this region but
> > it's probably better to just free it altogether (i.e. replace the ignore
> > with the free kmemleak callback). Would this be sufficient for your
> > scenario?
>
> I agree that freeing the kmemleak object is a better strategy. However,
> replacing the call to kmemleak_ignore_phys() wouldn't be sufficient,
> as there are other scenarios that would still leave behind kmemleak
> objects to be scanned. That's why I ended up freeing the kmemleak object
> in a path that is common for all CMA areas.
The only reason for kmemleak_ignore_phys() was to counter the actual
kmemleak_alloc() call from the memblock code on the CMA allocation.
--
Catalin
On Tue, 24 Jan 2023 15:48:57 +0000 Catalin Marinas <[email protected]> wrote:
> Thanks for digging this out. This patch shouldn't have ended up upstream
> (commit 972fa3a7c17c "mm: kmemleak: alloc gray object for reserved
> region with direct map"). I thought both Calvin Zhang and I agreed that
> it's not the correct approach (not even sure there was a real problem to
> fix).
>
> Do you still get the any faults with the above commit reverted? I'd
> prefer this if it works rather than adding unnecessary
> kmemleak_alloc/free callbacks that pretty much cancel each-other.
>
> > I'm not sure if that commit is appropriate, given that reserved regions
> > that still have their direct mappings intact may be used for DMA, which
> > isn't appropriate for kmemleak scanning.
>
> It's not. I think it should be reverted.
Could someone please send along a patch to revert this, along
with the explanation for doing so? And please consider a cc:stable.
On Tue, Jan 24, 2023 at 03:48:57PM +0000, Catalin Marinas wrote:
> Thanks for digging this out. This patch shouldn't have ended up upstream
> (commit 972fa3a7c17c "mm: kmemleak: alloc gray object for reserved
> region with direct map"). I thought both Calvin Zhang and I agreed that
> it's not the correct approach (not even sure there was a real problem to
> fix).
>
> Do you still get the any faults with the above commit reverted? I'd
> prefer this if it works rather than adding unnecessary
> kmemleak_alloc/free callbacks that pretty much cancel each-other.
Yes, I still see the same problem after reverting that commit. The problem
still persists because there are CMA areas that are allocated through
memblock_phys_alloc_range(), which invokes kmemleak_alloc_phys(). The
allocation call stack is along the lines of:
kmemleak_alloc_phys()
memblock_alloc_range_nid()
memblock_phys_alloc_range()
__reserved_mem_alloc_size()
fdt_init_reserved_mem()
I also followed up on my suggestion about adding a flags parameter to
the memblock allocation functions to be able to use
MEMBLOCK_ALLOC_NOLEAKTRACE in this particular scenario, but that would
involve changing many call-sites, which doesn't make much sense given
that there are only 4 call-sites that actually use this flag.
Maybe adding a new memblock allocation function that allows this flag to
be passed as just a flag can be used to avoid creating these kmemleak
objects for CMA allocations?
--Isaac
On Tue, Jan 24, 2023 at 12:20:15PM -0800, Andrew Morton wrote:
> On Tue, 24 Jan 2023 15:48:57 +0000 Catalin Marinas <[email protected]> wrote:
>
> > Thanks for digging this out. This patch shouldn't have ended up upstream
> > (commit 972fa3a7c17c "mm: kmemleak: alloc gray object for reserved
> > region with direct map"). I thought both Calvin Zhang and I agreed that
> > it's not the correct approach (not even sure there was a real problem to
> > fix).
> >
> > Do you still get the any faults with the above commit reverted? I'd
> > prefer this if it works rather than adding unnecessary
> > kmemleak_alloc/free callbacks that pretty much cancel each-other.
> >
> > > I'm not sure if that commit is appropriate, given that reserved regions
> > > that still have their direct mappings intact may be used for DMA, which
> > > isn't appropriate for kmemleak scanning.
> >
> > It's not. I think it should be reverted.
>
> Could someone please send along a patch to revert this, along
> with the explanation for doing so? And please consider a cc:stable.
Yes, I can send a revert patch later today. My patches that are
currently in mm-unstable depend on this patch though, so those would
have to be dropped from that branch as well.
--Isaac
On Tue, Jan 24, 2023 at 01:19:57PM -0800, Isaac Manjarres wrote:
> On Tue, Jan 24, 2023 at 03:48:57PM +0000, Catalin Marinas wrote:
> > Thanks for digging this out. This patch shouldn't have ended up upstream
> > (commit 972fa3a7c17c "mm: kmemleak: alloc gray object for reserved
> > region with direct map"). I thought both Calvin Zhang and I agreed that
> > it's not the correct approach (not even sure there was a real problem to
> > fix).
> >
> > Do you still get the any faults with the above commit reverted? I'd
> > prefer this if it works rather than adding unnecessary
> > kmemleak_alloc/free callbacks that pretty much cancel each-other.
>
> Yes, I still see the same problem after reverting that commit. The problem
> still persists because there are CMA areas that are allocated through
> memblock_phys_alloc_range(), which invokes kmemleak_alloc_phys(). The
> allocation call stack is along the lines of:
>
> kmemleak_alloc_phys()
> memblock_alloc_range_nid()
> memblock_phys_alloc_range()
> __reserved_mem_alloc_size()
> fdt_init_reserved_mem()
Ah, that's another place that kmemleak shouldn't care about.
> I also followed up on my suggestion about adding a flags parameter to
> the memblock allocation functions to be able to use
> MEMBLOCK_ALLOC_NOLEAKTRACE in this particular scenario, but that would
> involve changing many call-sites, which doesn't make much sense given
> that there are only 4 call-sites that actually use this flag.
Yeah, the current way of passing MEMBLOCK_ALLOC_NOLEAKTRACE is not
ideal. We did it more like a quick hack. See some past discussion here
(and adding Mike to this thread):
https://lore.kernel.org/all/YYUChdTeXP%[email protected]/
> Maybe adding a new memblock allocation function that allows this flag to
> be passed as just a flag can be used to avoid creating these kmemleak
> objects for CMA allocations?
That's an option. If there's too much churn to add a flag, an
alternative is to use the bottom bit of 'end' to set the noleaktrace
flag.
Yet another idea is to avoid the kmemleak callback on all the 'phys'
memblock allocations. We can add the callback to the higher level
memblock_alloc() which returns a VA but the lower level 'phys' variants
could simply avoid it. However, I think we still need the
MEMBLOCK_ALLOC_NOLEAKTRACE flag for the kasan shadow allocation. Well,
given that this flag is not widely used, we can add explicit
kmemleak_ignore() calls in those four places.
I think the latter, if it works, would be the least intrusive.
--
Catalin
On Wed, Jan 25, 2023 at 12:08:58PM +0000, Catalin Marinas wrote:
> That's an option. If there's too much churn to add a flag, an
> alternative is to use the bottom bit of 'end' to set the noleaktrace
> flag.
Using the least significant bit won't work; there are allocations for
CMA regions that can be specified to occur within the first 4 GB of
memory, and would have an alloc-ranges of [0 0xffff_ffff]. I also don't
think there's anything in the memblock documentation that ensures that
those bits are supposed to be clear all the time.
> Yet another idea is to avoid the kmemleak callback on all the 'phys'
> memblock allocations. We can add the callback to the higher level
> memblock_alloc() which returns a VA but the lower level 'phys' variants
> could simply avoid it. However, I think we still need the
> MEMBLOCK_ALLOC_NOLEAKTRACE flag for the kasan shadow allocation. Well,
> given that this flag is not widely used, we can add explicit
> kmemleak_ignore() calls in those four places.
>
> I think the latter, if it works, would be the least intrusive.
I agree; I think using kmemleak_ignore() would be best. I will
split that into series: 1 series that fixes the kmemleak issue
with CMA regions by reverting Calvin's patch and adding a call
to kmemleak_ignore in the call-stack I referenced earlier, and
then another series that cleans up the usage of the flag.
--Isaac