On Fri, Mar 25, 2022 at 01:58:42PM +0100, Michal Hocko wrote:
> Dang, I have just realized that I have misread the boot log and it has
> turned out that a674e48c5443 is covering my situation because the
> allocation failure message says:
>
> Node 0 DMA free:0kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:636kB managed:0kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
As in your report is from a kernel that does not have a674e48c5443
yet?
>
> I thought there are only few pages in the managed by the DMA zone. This
> is still theoretically possible so I think __GFP_NOWARN makes sense here
> but it would require to change the patch description.
>
> Is this really worth it?
In general I think for kernels where we need the pool and can't allocate
it, a warning is very useful. We just shouldn't spew it when there is
no need for the pool to start with.
On Fri 25-03-22 17:48:56, Christoph Hellwig wrote:
> On Fri, Mar 25, 2022 at 01:58:42PM +0100, Michal Hocko wrote:
> > Dang, I have just realized that I have misread the boot log and it has
> > turned out that a674e48c5443 is covering my situation because the
> > allocation failure message says:
> >
> > Node 0 DMA free:0kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:636kB managed:0kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>
> As in your report is from a kernel that does not have a674e48c5443
> yet?
yes. I just mixed up the early boot messages and thought that DMA zone
ended up with a single page. That message was saying something else
though.
> > I thought there are only few pages in the managed by the DMA zone. This
> > is still theoretically possible so I think __GFP_NOWARN makes sense here
> > but it would require to change the patch description.
> >
> > Is this really worth it?
>
> In general I think for kernels where we need the pool and can't allocate
> it, a warning is very useful. We just shouldn't spew it when there is
> no need for the pool to start with.
Well, do we have any way to find that out during early boot?
--
Michal Hocko
SUSE Labs
On Fri 25-03-22 17:54:33, Michal Hocko wrote:
> On Fri 25-03-22 17:48:56, Christoph Hellwig wrote:
> > On Fri, Mar 25, 2022 at 01:58:42PM +0100, Michal Hocko wrote:
> > > Dang, I have just realized that I have misread the boot log and it has
> > > turned out that a674e48c5443 is covering my situation because the
> > > allocation failure message says:
> > >
> > > Node 0 DMA free:0kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:636kB managed:0kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >
> > As in your report is from a kernel that does not have a674e48c5443
> > yet?
>
> yes. I just mixed up the early boot messages and thought that DMA zone
> ended up with a single page. That message was saying something else
> though.
OK, so I have another machine spewing this warning. Still on an older
kernel but I do not think the current upstream would be any different in
that regards. This time the DMA zone is populated and consumed from
large part and the pool size request is just too large for it:
[ 14.017417][ T1] swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0-7
[ 14.017429][ T1] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.14.21-150400.22-default #1 SLE15-SP4 0b6a6578ade2de5c4a0b916095dff44f76ef1704
[ 14.017434][ T1] Hardware name: XXXX
[ 14.017437][ T1] Call Trace:
[ 14.017444][ T1] <TASK>
[ 14.017449][ T1] dump_stack_lvl+0x45/0x57
[ 14.017469][ T1] warn_alloc+0xfe/0x160
[ 14.017490][ T1] __alloc_pages_slowpath.constprop.112+0xc27/0xc60
[ 14.017497][ T1] ? rdinit_setup+0x2b/0x2b
[ 14.017509][ T1] ? rdinit_setup+0x2b/0x2b
[ 14.017512][ T1] __alloc_pages+0x2d5/0x320
[ 14.017517][ T1] alloc_page_interleave+0xf/0x70
[ 14.017531][ T1] atomic_pool_expand+0x4a/0x200
[ 14.017541][ T1] ? rdinit_setup+0x2b/0x2b
[ 14.017544][ T1] __dma_atomic_pool_init+0x44/0x90
[ 14.017556][ T1] dma_atomic_pool_init+0xad/0x13f
[ 14.017560][ T1] ? __dma_atomic_pool_init+0x90/0x90
[ 14.017562][ T1] do_one_initcall+0x41/0x200
[ 14.017581][ T1] kernel_init_freeable+0x236/0x298
[ 14.017589][ T1] ? rest_init+0xd0/0xd0
[ 14.017596][ T1] kernel_init+0x16/0x120
[ 14.017599][ T1] ret_from_fork+0x22/0x30
[ 14.017604][ T1] </TASK>
[...]
[ 14.018026][ T1] Node 0 DMA free:160kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 14.018035][ T1] lowmem_reserve[]: 0 0 0 0 0
[ 14.018339][ T1] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 0*64kB 1*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 160kB
So the DMA zone has only 160kB free while the pool would like to use 4MB
of it which obviously fails. I haven't tried to check who is consuming
the DMA zone memory and why but this shouldn't be all that important
because the pool clearly cannot allocate and there is not much the
user/admin can do about that. Well, the pool could be explicitly
requested smaller but is that really what we expect them to do?
> > > I thought there are only few pages in the managed by the DMA zone. This
> > > is still theoretically possible so I think __GFP_NOWARN makes sense here
> > > but it would require to change the patch description.
> > >
> > > Is this really worth it?
> >
> > In general I think for kernels where we need the pool and can't allocate
> > it, a warning is very useful. We just shouldn't spew it when there is
> > no need for the pool to start with.
>
> Well, do we have any way to find that out during early boot?
Thinking about it. We should get a warning when the actual allocation
from the pool fails no? That would be more useful information than the
pre-allocation failure when it is not really clear whether anybody is
ever going to consume it.
What do you think? Should I repost my original patch with the updated
changelog?
--
Michal Hocko
SUSE Labs
On 08/03/22 at 11:52am, Michal Hocko wrote:
> On Fri 25-03-22 17:54:33, Michal Hocko wrote:
> > On Fri 25-03-22 17:48:56, Christoph Hellwig wrote:
> > > On Fri, Mar 25, 2022 at 01:58:42PM +0100, Michal Hocko wrote:
> > > > Dang, I have just realized that I have misread the boot log and it has
> > > > turned out that a674e48c5443 is covering my situation because the
> > > > allocation failure message says:
> > > >
> > > > Node 0 DMA free:0kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:636kB managed:0kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > >
> > > As in your report is from a kernel that does not have a674e48c5443
> > > yet?
> >
> > yes. I just mixed up the early boot messages and thought that DMA zone
> > ended up with a single page. That message was saying something else
> > though.
>
> OK, so I have another machine spewing this warning. Still on an older
> kernel but I do not think the current upstream would be any different in
> that regards. This time the DMA zone is populated and consumed from
> large part and the pool size request is just too large for it:
>
> [ 14.017417][ T1] swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0-7
> [ 14.017429][ T1] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.14.21-150400.22-default #1 SLE15-SP4 0b6a6578ade2de5c4a0b916095dff44f76ef1704
> [ 14.017434][ T1] Hardware name: XXXX
> [ 14.017437][ T1] Call Trace:
> [ 14.017444][ T1] <TASK>
> [ 14.017449][ T1] dump_stack_lvl+0x45/0x57
> [ 14.017469][ T1] warn_alloc+0xfe/0x160
> [ 14.017490][ T1] __alloc_pages_slowpath.constprop.112+0xc27/0xc60
> [ 14.017497][ T1] ? rdinit_setup+0x2b/0x2b
> [ 14.017509][ T1] ? rdinit_setup+0x2b/0x2b
> [ 14.017512][ T1] __alloc_pages+0x2d5/0x320
> [ 14.017517][ T1] alloc_page_interleave+0xf/0x70
> [ 14.017531][ T1] atomic_pool_expand+0x4a/0x200
> [ 14.017541][ T1] ? rdinit_setup+0x2b/0x2b
> [ 14.017544][ T1] __dma_atomic_pool_init+0x44/0x90
> [ 14.017556][ T1] dma_atomic_pool_init+0xad/0x13f
> [ 14.017560][ T1] ? __dma_atomic_pool_init+0x90/0x90
> [ 14.017562][ T1] do_one_initcall+0x41/0x200
> [ 14.017581][ T1] kernel_init_freeable+0x236/0x298
> [ 14.017589][ T1] ? rest_init+0xd0/0xd0
> [ 14.017596][ T1] kernel_init+0x16/0x120
> [ 14.017599][ T1] ret_from_fork+0x22/0x30
> [ 14.017604][ T1] </TASK>
> [...]
> [ 14.018026][ T1] Node 0 DMA free:160kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 14.018035][ T1] lowmem_reserve[]: 0 0 0 0 0
> [ 14.018339][ T1] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 0*64kB 1*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 160kB
>
> So the DMA zone has only 160kB free while the pool would like to use 4MB
> of it which obviously fails. I haven't tried to check who is consuming
> the DMA zone memory and why but this shouldn't be all that important
> because the pool clearly cannot allocate and there is not much the
> user/admin can do about that. Well, the pool could be explicitly
> requested smaller but is that really what we expect them to do?
>
> > > > I thought there are only few pages in the managed by the DMA zone. This
> > > > is still theoretically possible so I think __GFP_NOWARN makes sense here
> > > > but it would require to change the patch description.
> > > >
> > > > Is this really worth it?
> > >
> > > In general I think for kernels where we need the pool and can't allocate
> > > it, a warning is very useful. We just shouldn't spew it when there is
> > > no need for the pool to start with.
> >
> > Well, do we have any way to find that out during early boot?
>
> Thinking about it. We should get a warning when the actual allocation
> from the pool fails no? That would be more useful information than the
> pre-allocation failure when it is not really clear whether anybody is
> ever going to consume it.
Hi Michal,
You haven't told on which ARCH you met this issue, is it x86_64?
If yes, I have one patch queued to fix it in another way which I have
been trying to take in mind.
Thanks
Baoquan
On Wed, Aug 03, 2022 at 11:52:10AM +0200, Michal Hocko wrote:
> OK, so I have another machine spewing this warning. Still on an older
> kernel but I do not think the current upstream would be any different in
> that regards. This time the DMA zone is populated and consumed from
> large part and the pool size request is just too large for it:
I can't really parse the last sentence. What does "consumed from large
part" mean here?
On Fri, Mar 25, 2022 at 05:54:32PM +0100, Michal Hocko wrote:
> > > I thought there are only few pages in the managed by the DMA zone. This
> > > is still theoretically possible so I think __GFP_NOWARN makes sense here
> > > but it would require to change the patch description.
> > >
> > > Is this really worth it?
> >
> > In general I think for kernels where we need the pool and can't allocate
> > it, a warning is very useful. We just shouldn't spew it when there is
> > no need for the pool to start with.
>
> Well, do we have any way to find that out during early boot?
In general an architecture / configuration that selects
CONFIG_ZONE_DMA needs it. We could try to reduce that dependency and/or
make it boot time configurable, but there's still plenty of device with
sub-32bit addessing limits around, so I'm not sure it would help much.
On Thu 11-08-22 09:28:17, Christoph Hellwig wrote:
> On Wed, Aug 03, 2022 at 11:52:10AM +0200, Michal Hocko wrote:
> > OK, so I have another machine spewing this warning. Still on an older
> > kernel but I do not think the current upstream would be any different in
> > that regards. This time the DMA zone is populated and consumed from
> > large part and the pool size request is just too large for it:
>
> I can't really parse the last sentence. What does "consumed from large
> part" mean here?
Meminfo part says
Node 0 DMA free:160kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
So the zone has 15MB of managed memory (by the page allocator), yet only
160kB is free early boot during the allocation. So it is mostly consumed
by somebody. I haven't really checked by whom.
Does that exaplain the above better?
--
Michal Hocko
SUSE Labs
On Thu, Aug 11, 2022 at 10:20:43AM +0200, Michal Hocko wrote:
> Meminfo part says
> Node 0 DMA free:160kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>
> So the zone has 15MB of managed memory (by the page allocator), yet only
> 160kB is free early boot during the allocation. So it is mostly consumed
> by somebody. I haven't really checked by whom.
>
> Does that exaplain the above better?
Yes. I'm really curious who eats up all the GFP_DMA memory early during
boot, though.