2014-11-28 11:29:52

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH 0/3] Fix kdump failures with crashkernel=high

Hi,

here is a patch-set to fix failed kdump kernel boots when
the systems was booted with crashkernel=X,high. On those
systems the kernel allocates only 72MiB of low-memory for
DMA buffers, which showed to be too low on some systems.

The problem is that 64MiB of the low-memory is allocated by
swiotlb, leaving 8MB for the page-allocator. But swiotlb
tries to allocate DMA memory from the page-allocator first,
which fails pretty fast in the boot sequence, causing
warnings. This patch-set removes these warnings.

But even the 64MiB for swiotlb are eaten up on some systems,
so that the default of low-memory allocated for the
crash-kernel is increase from 72MB to 256MB (only changing
the defaults, can still be overwritten by crashkernel=X,low).

This number comes from experiments on the affected systems,
128MiB low-memory was still not enough there, thus I set the
value to 256MiB to fix the issues.

Any feedback appreciated.

Thanks,

Joerg

Joerg Roedel (3):
swiotlb: Warn on allocation failure in swiotlb_alloc_coherent
x86, swiotlb: Try coherent allocations with __GFP_NOWARN
x86, crash: Allocate enough low-mem when crashkernel=high

arch/x86/kernel/pci-swiotlb.c | 8 ++++++++
arch/x86/kernel/setup.c | 5 ++++-
lib/swiotlb.c | 11 +++++++++--
3 files changed, 21 insertions(+), 3 deletions(-)

--
1.9.1


2014-11-28 11:29:26

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH 1/3] swiotlb: Warn on allocation failure in swiotlb_alloc_coherent

From: Joerg Roedel <[email protected]>

Print a warning when all allocation tries have been failed
and the function is about to return NULL. This prepares for
calling the function with __GFP_NOWARN to suppress
allocation failure warnings before all fall-backs have
failed.

Signed-off-by: Joerg Roedel <[email protected]>
---
lib/swiotlb.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 4abda07..e0e9212 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -655,7 +655,7 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
*/
phys_addr_t paddr = map_single(hwdev, 0, size, DMA_FROM_DEVICE);
if (paddr == SWIOTLB_MAP_ERROR)
- return NULL;
+ goto err_warn;

ret = phys_to_virt(paddr);
dev_addr = phys_to_dma(hwdev, paddr);
@@ -669,7 +669,7 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
/* DMA_TO_DEVICE to avoid memcpy in unmap_single */
swiotlb_tbl_unmap_single(hwdev, paddr,
size, DMA_TO_DEVICE);
- return NULL;
+ goto err_warn;
}
}

@@ -677,6 +677,13 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
memset(ret, 0, size);

return ret;
+
+err_warn:
+ pr_warn("swiotlb: coherent allocation failed for device %s size=%zu\n",
+ dev_name(hwdev), size);
+ dump_stack();
+
+ return NULL;
}
EXPORT_SYMBOL(swiotlb_alloc_coherent);

--
1.9.1

2014-11-28 11:29:24

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH 3/3] x86, crash: Allocate enough low-mem when crashkernel=high

From: Joerg Roedel <[email protected]>

When the crashkernel is loaded above 4GiB in memory the
first kernel only allocates 72MiB of low-memory for the DMA
requirements of the second kernel. On systems with many
devices this is not enough and causes device driver
initialization errors and failed crash dumps. Set this
default value to 256MiB to make sure there is enough memory
available for DMA.

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/kernel/setup.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ab08aa2..1227e30 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -536,8 +536,11 @@ static void __init reserve_crashkernel_low(void)
* swiotlb overflow buffer: now is hardcoded to 32k.
* We round it to 8M for other buffers that
* may need to stay low too.
+ * Also make sure we allocate enough extra memory
+ * low memory so that we don't run out of DMA
+ * buffers for 32bit devices.
*/
- low_size = swiotlb_size_or_default() + (8UL<<20);
+ low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20);
auto_set = true;
} else {
/* passed with crashkernel=0,low ? */
--
1.9.1

2014-11-28 11:29:51

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH 2/3] x86, swiotlb: Try coherent allocations with __GFP_NOWARN

From: Joerg Roedel <[email protected]>

When we boot a kdump kernel in high memory, there is by
default only 72MB of low memory available. The swiotlb code
takes 64MB of it (by default) so that there are only 8MB
left to allocate from. On systems with many devices this
causes page allocator warnings from dma_generic_alloc_coherent():

systemd-udevd: page allocation failure: order:0, mode:0x280d4
CPU: 0 PID: 197 Comm: systemd-udevd Tainted: G W 3.12.28-4-default #1
Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012
ffff8800781335e0 ffffffff8150b1db 00000000000280d4 ffffffff8113af90
0000000000000000 0000000000000000 ffff88007efdbb00 0000000100000000
0000000000000000 0000000000000000 0000000000000000 0000000000000001
Call Trace:
[<ffffffff8100467d>] dump_trace+0x7d/0x2d0
[<ffffffff81004964>] show_stack_log_lvl+0x94/0x170
[<ffffffff81005d91>] show_stack+0x21/0x50
[<ffffffff8150b1db>] dump_stack+0x41/0x51
[<ffffffff8113af90>] warn_alloc_failed+0xf0/0x160
[<ffffffff8150763a>] __alloc_pages_slowpath+0x72f/0x796
[<ffffffff8113ee7a>] __alloc_pages_nodemask+0x1ea/0x210
[<ffffffff81008256>] dma_generic_alloc_coherent+0x96/0x140
[<ffffffff8103fccc>] x86_swiotlb_alloc_coherent+0x1c/0x50
[<ffffffffa048ae5b>] ttm_dma_pool_alloc_new_pages+0xab/0x320 [ttm]
[<ffffffffa048bc6e>] ttm_dma_populate+0x3ce/0x640 [ttm]
[<ffffffffa0486>] ttm_tt_bind+0x36/0x60 [ttm]
[<ffffffffa0484faf>] ttm_bo_handle_move_mem+0x55f/0x5c0 [ttm]
[<ffffffffa0485be5>] ttm_bo_move_buffer+0x105/0x130 [ttm]
[<ffffffffa0485cd1>] ttm_bo_validate+0xc1/0x130 [ttm]
[<ffffffffa0485f8b>] ttm_bo_init+0x24b/0x400 [ttm]
[<ffffffffa054f8bc>] radeon_bo_create+0x16c/0x200 [radeon]
[<ffffffffa0563c8e>] radeon_ring_init+0x11e/0x2b0 [radeon]
[<ffffffffa056c143>] r100_cp_init+0x123/0x5b0 [radeon]
[<ffffffffa056e8e4>] r100_startup+0x194/0x230 [radeon]
[<ffffffffa056ece3>] r100_init+0x223/0x410 [radeon]
[<ffffffffa053495f>] radeon_device_init+0x6af/0x830 [radeon]
[<ffffffffa0536979>] radeon_driver_load_kms+0x89/0x180 [radeon]
[<ffffffffa04eeb31>] drm_get_pci_dev+0x121/0x2f0 [drm]
[<ffffffff812d3ec9>ocal_pci_probe+0x39/0x60
[<ffffffff812d51e9>] pci_device_probe+0xa9/0x120
[<ffffffff8139871d>] driver_probe_device+0x9d/0x3d0
[<ffffffff81398b1b>] __driver_attach+0x8b/0x90
[<ffffffff8139667b>] bus_for_each_dev+0x5b/0x90
[<ffffffff81397cd8>] bus_add_driver+0x1f8/0x2c0
[<ffffffff8139911b>] driver_register+0x5b/0xe0
[<ffffffff810002c2>] do_one_initcall+0xf2/0x1a0
[<ffffffff810c8837>] load_module+0x1207/0x1c70
[<ffffffff810c93f5>] SYSC_finit_module+0x75/0xa0
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007fac533d2789>] 0x7fac533d2788

After these warnings the code enters a fall-back path and
allocated directly from the swiotlb aperture in the end.
So remove these warnings as this is not a fatal error.

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/kernel/pci-swiotlb.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 77dd0ad..79b2291 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -20,6 +20,14 @@ void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
{
void *vaddr;

+ /*
+ * When booting a kdump kernel in high memory these allocations are very
+ * likely to fail, as there are by default only 8MB of low memory to
+ * allocate from. So disable the warnings from the allocator when this
+ * happens. SWIOTLB also implements fall-backs for failed allocations.
+ */
+ flags |= __GFP_NOWARN;
+
vaddr = dma_generic_alloc_coherent(hwdev, size, dma_handle, flags,
attrs);
if (vaddr)
--
1.9.1

2014-12-01 22:10:50

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86, swiotlb: Try coherent allocations with __GFP_NOWARN

On Fri, Nov 28, 2014 at 12:29:08PM +0100, Joerg Roedel wrote:
> From: Joerg Roedel <[email protected]>
>
> When we boot a kdump kernel in high memory, there is by
> default only 72MB of low memory available. The swiotlb code
> takes 64MB of it (by default) so that there are only 8MB
> left to allocate from. On systems with many devices this
> causes page allocator warnings from dma_generic_alloc_coherent():
>
> systemd-udevd: page allocation failure: order:0, mode:0x280d4
> CPU: 0 PID: 197 Comm: systemd-udevd Tainted: G W 3.12.28-4-default #1
> Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012
> ffff8800781335e0 ffffffff8150b1db 00000000000280d4 ffffffff8113af90
> 0000000000000000 0000000000000000 ffff88007efdbb00 0000000100000000
> 0000000000000000 0000000000000000 0000000000000000 0000000000000001
> Call Trace:
> [<ffffffff8100467d>] dump_trace+0x7d/0x2d0
> [<ffffffff81004964>] show_stack_log_lvl+0x94/0x170
> [<ffffffff81005d91>] show_stack+0x21/0x50
> [<ffffffff8150b1db>] dump_stack+0x41/0x51
> [<ffffffff8113af90>] warn_alloc_failed+0xf0/0x160
> [<ffffffff8150763a>] __alloc_pages_slowpath+0x72f/0x796
> [<ffffffff8113ee7a>] __alloc_pages_nodemask+0x1ea/0x210
> [<ffffffff81008256>] dma_generic_alloc_coherent+0x96/0x140
> [<ffffffff8103fccc>] x86_swiotlb_alloc_coherent+0x1c/0x50
> [<ffffffffa048ae5b>] ttm_dma_pool_alloc_new_pages+0xab/0x320 [ttm]
> [<ffffffffa048bc6e>] ttm_dma_populate+0x3ce/0x640 [ttm]
> [<ffffffffa0486>] ttm_tt_bind+0x36/0x60 [ttm]
> [<ffffffffa0484faf>] ttm_bo_handle_move_mem+0x55f/0x5c0 [ttm]
> [<ffffffffa0485be5>] ttm_bo_move_buffer+0x105/0x130 [ttm]
> [<ffffffffa0485cd1>] ttm_bo_validate+0xc1/0x130 [ttm]
> [<ffffffffa0485f8b>] ttm_bo_init+0x24b/0x400 [ttm]
> [<ffffffffa054f8bc>] radeon_bo_create+0x16c/0x200 [radeon]
> [<ffffffffa0563c8e>] radeon_ring_init+0x11e/0x2b0 [radeon]
> [<ffffffffa056c143>] r100_cp_init+0x123/0x5b0 [radeon]
> [<ffffffffa056e8e4>] r100_startup+0x194/0x230 [radeon]
> [<ffffffffa056ece3>] r100_init+0x223/0x410 [radeon]
> [<ffffffffa053495f>] radeon_device_init+0x6af/0x830 [radeon]
> [<ffffffffa0536979>] radeon_driver_load_kms+0x89/0x180 [radeon]
> [<ffffffffa04eeb31>] drm_get_pci_dev+0x121/0x2f0 [drm]
> [<ffffffff812d3ec9>ocal_pci_probe+0x39/0x60
> [<ffffffff812d51e9>] pci_device_probe+0xa9/0x120
> [<ffffffff8139871d>] driver_probe_device+0x9d/0x3d0
> [<ffffffff81398b1b>] __driver_attach+0x8b/0x90
> [<ffffffff8139667b>] bus_for_each_dev+0x5b/0x90
> [<ffffffff81397cd8>] bus_add_driver+0x1f8/0x2c0
> [<ffffffff8139911b>] driver_register+0x5b/0xe0
> [<ffffffff810002c2>] do_one_initcall+0xf2/0x1a0
> [<ffffffff810c8837>] load_module+0x1207/0x1c70
> [<ffffffff810c93f5>] SYSC_finit_module+0x75/0xa0
> [<ffffffff81519329>] system_call_fastpath+0x16/0x1b
> [<00007fac533d2789>] 0x7fac533d2788
>
> After these warnings the code enters a fall-back path and
> allocated directly from the swiotlb aperture in the end.
> So remove these warnings as this is not a fatal error.
>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> arch/x86/kernel/pci-swiotlb.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> index 77dd0ad..79b2291 100644
> --- a/arch/x86/kernel/pci-swiotlb.c
> +++ b/arch/x86/kernel/pci-swiotlb.c
> @@ -20,6 +20,14 @@ void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> {
> void *vaddr;
>
> + /*
> + * When booting a kdump kernel in high memory these allocations are very
> + * likely to fail, as there are by default only 8MB of low memory to
> + * allocate from. So disable the warnings from the allocator when this
> + * happens. SWIOTLB also implements fall-backs for failed allocations.
> + */
> + flags |= __GFP_NOWARN;

Should this perhaps then have 'if (kdump_kernel)' around it since
the use-case seems to be kdump related?

> +
> vaddr = dma_generic_alloc_coherent(hwdev, size, dma_handle, flags,
> attrs);
> if (vaddr)
> --
> 1.9.1
>

2014-12-01 22:11:17

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 1/3] swiotlb: Warn on allocation failure in swiotlb_alloc_coherent

On Fri, Nov 28, 2014 at 12:29:07PM +0100, Joerg Roedel wrote:
> From: Joerg Roedel <[email protected]>
>
> Print a warning when all allocation tries have been failed
> and the function is about to return NULL. This prepares for
> calling the function with __GFP_NOWARN to suppress
> allocation failure warnings before all fall-backs have
> failed.

This can be quite noisy. Especially the dump-stack.

Perhaps have this trigger is the 'verbose' or 'debug' (new) parameter
would be added to the 'swiotlb' one?

>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> lib/swiotlb.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index 4abda07..e0e9212 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -655,7 +655,7 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> */
> phys_addr_t paddr = map_single(hwdev, 0, size, DMA_FROM_DEVICE);
> if (paddr == SWIOTLB_MAP_ERROR)
> - return NULL;
> + goto err_warn;
>
> ret = phys_to_virt(paddr);
> dev_addr = phys_to_dma(hwdev, paddr);
> @@ -669,7 +669,7 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> /* DMA_TO_DEVICE to avoid memcpy in unmap_single */
> swiotlb_tbl_unmap_single(hwdev, paddr,
> size, DMA_TO_DEVICE);
> - return NULL;
> + goto err_warn;
> }
> }
>
> @@ -677,6 +677,13 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> memset(ret, 0, size);
>
> return ret;
> +
> +err_warn:
> + pr_warn("swiotlb: coherent allocation failed for device %s size=%zu\n",
> + dev_name(hwdev), size);
> + dump_stack();
> +
> + return NULL;
> }
> EXPORT_SYMBOL(swiotlb_alloc_coherent);
>
> --
> 1.9.1
>

2014-12-02 11:32:18

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix kdump failures with crashkernel=high

On 11/28/2014 07:29 PM, Joerg Roedel wrote:
> Hi,
>
> here is a patch-set to fix failed kdump kernel boots when
> the systems was booted with crashkernel=X,high. On those
> systems the kernel allocates only 72MiB of low-memory for
> DMA buffers, which showed to be too low on some systems.
>
> The problem is that 64MiB of the low-memory is allocated by
> swiotlb, leaving 8MB for the page-allocator. But swiotlb
> tries to allocate DMA memory from the page-allocator first,
> which fails pretty fast in the boot sequence, causing
> warnings. This patch-set removes these warnings.
>
> But even the 64MiB for swiotlb are eaten up on some systems,
> so that the default of low-memory allocated for the
> crash-kernel is increase from 72MB to 256MB (only changing
> the defaults, can still be overwritten by crashkernel=X,low).

Hi Joerg,

The default low memory is calculated in swiotlb_size_or_default(), and
this relies on IO_TLB_DEFAULT_SIZE which is default value for swiotlb.
If this doesn't work for your case in kdump kernel, does the default
value IO_TLB_DEFAULT_SIZE work for swiotlb in 1st kenrel? If not, user
knows it and should know it will fail for kdump kernel either with
default vaule, user can specify that by crashkernel=x,low which is why
it's introduced.

It may not be a good idea to increase the default low value from 72M to
256M. After all the case you are encountering is a special case, special
case need be treated specially, namely specify it in crashkernel=x,low
clearly.


Thanks
Baoquan


>
> This number comes from experiments on the affected systems,
> 128MiB low-memory was still not enough there, thus I set the
> value to 256MiB to fix the issues.
>
> Any feedback appreciated.
>
> Thanks,
>
> Joerg
>
> Joerg Roedel (3):
> swiotlb: Warn on allocation failure in swiotlb_alloc_coherent
> x86, swiotlb: Try coherent allocations with __GFP_NOWARN
> x86, crash: Allocate enough low-mem when crashkernel=high
>
> arch/x86/kernel/pci-swiotlb.c | 8 ++++++++
> arch/x86/kernel/setup.c | 5 ++++-
> lib/swiotlb.c | 11 +++++++++--
> 3 files changed, 21 insertions(+), 3 deletions(-)
>

2014-12-02 14:41:35

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH 1/3] swiotlb: Warn on allocation failure in swiotlb_alloc_coherent

On Mon, Dec 01, 2014 at 03:28:03PM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Nov 28, 2014 at 12:29:07PM +0100, Joerg Roedel wrote:
> > From: Joerg Roedel <[email protected]>
> >
> > Print a warning when all allocation tries have been failed
> > and the function is about to return NULL. This prepares for
> > calling the function with __GFP_NOWARN to suppress
> > allocation failure warnings before all fall-backs have
> > failed.
>
> This can be quite noisy. Especially the dump-stack.

Well, this is as noisy as the dump_stack()s from the page-allocator when
the first allocation try fails. The goal of the first two patches in
this series is to only print a warning (with stack-trace) when
alloc_coherent failed, and not when only an intermediate step failed
that has a fall-back anyway (and might thus be no real problem).


Joerg

2014-12-02 14:45:56

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86, swiotlb: Try coherent allocations with __GFP_NOWARN

On Mon, Dec 01, 2014 at 03:28:54PM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Nov 28, 2014 at 12:29:08PM +0100, Joerg Roedel wrote:
> > diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> > index 77dd0ad..79b2291 100644
> > --- a/arch/x86/kernel/pci-swiotlb.c
> > +++ b/arch/x86/kernel/pci-swiotlb.c
> > @@ -20,6 +20,14 @@ void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> > {
> > void *vaddr;
> >
> > + /*
> > + * When booting a kdump kernel in high memory these allocations are very
> > + * likely to fail, as there are by default only 8MB of low memory to
> > + * allocate from. So disable the warnings from the allocator when this
> > + * happens. SWIOTLB also implements fall-backs for failed allocations.
> > + */
> > + flags |= __GFP_NOWARN;
>
> Should this perhaps then have 'if (kdump_kernel)' around it since
> the use-case seems to be kdump related?

Hmm, I don't think this is entirely kdump specific. It can also be
triggered on a non-kdump kernel, it is just much more unlikely. But
maybe I should change the comment to something like:

/*
* Don't print a warning when the first allocation attempt
* fails. The swiotlb_alloc_coherent() function will print a
* warning when the allocation of DMA memory ultimatly failed.
*/

This takes the kdump-specifics out of this change (in the end
kdump-kernel loaded high is just a case where this failure is much more
likely).


Joerg

2014-12-02 14:56:09

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix kdump failures with crashkernel=high

On Tue, Dec 02, 2014 at 07:30:09PM +0800, Baoquan He wrote:
> The default low memory is calculated in swiotlb_size_or_default(), and
> this relies on IO_TLB_DEFAULT_SIZE which is default value for swiotlb.
> If this doesn't work for your case in kdump kernel, does the default
> value IO_TLB_DEFAULT_SIZE work for swiotlb in 1st kenrel? If not, user
> knows it and should know it will fail for kdump kernel either with
> default vaule, user can specify that by crashkernel=x,low which is why
> it's introduced.

In the first kernel the defaults work fine because it has plenty of
low-memory (below 4GB) to allocate from. But in the kdump kernel there
are only 72MB by default, which showed to not be enough to reliably boot
with certain hardware in the machine.

> It may not be a good idea to increase the default low value from 72M to
> 256M. After all the case you are encountering is a special case, special
> case need be treated specially, namely specify it in crashkernel=x,low
> clearly.

I think the kernel should set sane defaults if possible. But to not
increase memory usage for kdump too much, how about subtracting the
amount of low-memory allocated for kdump from the high-mem amount? This
would not increase the overall memory usage.


Joerg

2014-12-02 18:46:36

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 1/3] swiotlb: Warn on allocation failure in swiotlb_alloc_coherent

On Tue, Dec 02, 2014 at 03:41:22PM +0100, Joerg Roedel wrote:
> On Mon, Dec 01, 2014 at 03:28:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Nov 28, 2014 at 12:29:07PM +0100, Joerg Roedel wrote:
> > > From: Joerg Roedel <[email protected]>
> > >
> > > Print a warning when all allocation tries have been failed
> > > and the function is about to return NULL. This prepares for
> > > calling the function with __GFP_NOWARN to suppress
> > > allocation failure warnings before all fall-backs have
> > > failed.
> >
> > This can be quite noisy. Especially the dump-stack.
>
> Well, this is as noisy as the dump_stack()s from the page-allocator when
> the first allocation try fails. The goal of the first two patches in

Right, on the first allocation. Subsequent allocations won't be so noisy
in the page-allocator (I think?). While this will be noisy on subsequent
ones.

> this series is to only print a warning (with stack-trace) when
> alloc_coherent failed, and not when only an intermediate step failed
> that has a fall-back anyway (and might thus be no real problem).
>
>
> Joerg
>

2014-12-02 18:47:09

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86, swiotlb: Try coherent allocations with __GFP_NOWARN

On Tue, Dec 02, 2014 at 03:45:51PM +0100, Joerg Roedel wrote:
> On Mon, Dec 01, 2014 at 03:28:54PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Nov 28, 2014 at 12:29:08PM +0100, Joerg Roedel wrote:
> > > diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> > > index 77dd0ad..79b2291 100644
> > > --- a/arch/x86/kernel/pci-swiotlb.c
> > > +++ b/arch/x86/kernel/pci-swiotlb.c
> > > @@ -20,6 +20,14 @@ void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> > > {
> > > void *vaddr;
> > >
> > > + /*
> > > + * When booting a kdump kernel in high memory these allocations are very
> > > + * likely to fail, as there are by default only 8MB of low memory to
> > > + * allocate from. So disable the warnings from the allocator when this
> > > + * happens. SWIOTLB also implements fall-backs for failed allocations.
> > > + */
> > > + flags |= __GFP_NOWARN;
> >
> > Should this perhaps then have 'if (kdump_kernel)' around it since
> > the use-case seems to be kdump related?
>
> Hmm, I don't think this is entirely kdump specific. It can also be
> triggered on a non-kdump kernel, it is just much more unlikely. But
> maybe I should change the comment to something like:
>
> /*
> * Don't print a warning when the first allocation attempt
> * fails. The swiotlb_alloc_coherent() function will print a
> * warning when the allocation of DMA memory ultimatly failed.
> */

Much better. Thank you.
>
> This takes the kdump-specifics out of this change (in the end
> kdump-kernel loaded high is just a case where this failure is much more
> likely).

<nods>
>
>
> Joerg
>

2014-12-03 04:01:48

by WANG Chao

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix kdump failures with crashkernel=high

> Hi,
>
> here is a patch-set to fix failed kdump kernel boots when
> the systems was booted with crashkernel=X,high. On those
> systems the kernel allocates only 72MiB of low-memory for
> DMA buffers, which showed to be too low on some systems.

>From your experience It seems like swiotlb isn't working well with
crashkernel=X,high alone. What about using crashkernel=X,low with
crashkernel=X,high? Is there any reason you have to use
crashkernel=X,high alone?

>
> The problem is that 64MiB of the low-memory is allocated by
> swiotlb, leaving 8MB for the page-allocator. But swiotlb
> tries to allocate DMA memory from the page-allocator first,
> which fails pretty fast in the boot sequence, causing
> warnings. This patch-set removes these warnings.
>
> But even the 64MiB for swiotlb are eaten up on some systems,
> so that the default of low-memory allocated for the
> crash-kernel is increase from 72MB to 256MB (only changing
> the defaults, can still be overwritten by crashkernel=X,low).

crashkernel=X,high shouldn't automatically reserve 72M low at the first
place. Now it's going insane if you increase it to 256M by default.

Thanks
WANG Chao

> This number comes from experiments on the affected systems,
> 128MiB low-memory was still not enough there, thus I set the
> value to 256MiB to fix the issues.
>
> Any feedback appreciated.
>
> Thanks,
>
> Joerg
>
> Joerg Roedel (3):
> swiotlb: Warn on allocation failure in swiotlb_alloc_coherent
> x86, swiotlb: Try coherent allocations with __GFP_NOWARN
> x86, crash: Allocate enough low-mem when crashkernel=high
>
> arch/x86/kernel/pci-swiotlb.c | 8 ++++++++
> arch/x86/kernel/setup.c | 5 ++++-
> lib/swiotlb.c | 11 +++++++++--
> 3 files changed, 21 insertions(+), 3 deletions(-)
>
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2014-12-03 10:26:58

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH 1/3] swiotlb: Warn on allocation failure in swiotlb_alloc_coherent

On Tue, Dec 02, 2014 at 01:46:15PM -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 02, 2014 at 03:41:22PM +0100, Joerg Roedel wrote:
> > On Mon, Dec 01, 2014 at 03:28:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Nov 28, 2014 at 12:29:07PM +0100, Joerg Roedel wrote:
> > > > From: Joerg Roedel <[email protected]>
> > > >
> > > > Print a warning when all allocation tries have been failed
> > > > and the function is about to return NULL. This prepares for
> > > > calling the function with __GFP_NOWARN to suppress
> > > > allocation failure warnings before all fall-backs have
> > > > failed.
> > >
> > > This can be quite noisy. Especially the dump-stack.
> >
> > Well, this is as noisy as the dump_stack()s from the page-allocator when
> > the first allocation try fails. The goal of the first two patches in
>
> Right, on the first allocation. Subsequent allocations won't be so noisy
> in the page-allocator (I think?).

>From the code in mm/page_alloc.c (function warn_alloc_failed) it doesn't
look like a one-time warning. The dmesg I have seen from a failing
kernel also shows a lot of these messages.

So having the warning at the end of swiotlb_alloc_coherent won't be any
more noisy than the (removed) warnings from the page allocator.


Joerg

2014-12-03 10:27:25

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86, swiotlb: Try coherent allocations with __GFP_NOWARN

On Tue, Dec 02, 2014 at 01:46:48PM -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 02, 2014 at 03:45:51PM +0100, Joerg Roedel wrote:
> > On Mon, Dec 01, 2014 at 03:28:54PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Nov 28, 2014 at 12:29:08PM +0100, Joerg Roedel wrote:
> > > > diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> > > > index 77dd0ad..79b2291 100644
> > > > --- a/arch/x86/kernel/pci-swiotlb.c
> > > > +++ b/arch/x86/kernel/pci-swiotlb.c
> > > > @@ -20,6 +20,14 @@ void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> > > > {
> > > > void *vaddr;
> > > >
> > > > + /*
> > > > + * When booting a kdump kernel in high memory these allocations are very
> > > > + * likely to fail, as there are by default only 8MB of low memory to
> > > > + * allocate from. So disable the warnings from the allocator when this
> > > > + * happens. SWIOTLB also implements fall-backs for failed allocations.
> > > > + */
> > > > + flags |= __GFP_NOWARN;
> > >
> > > Should this perhaps then have 'if (kdump_kernel)' around it since
> > > the use-case seems to be kdump related?
> >
> > Hmm, I don't think this is entirely kdump specific. It can also be
> > triggered on a non-kdump kernel, it is just much more unlikely. But
> > maybe I should change the comment to something like:
> >
> > /*
> > * Don't print a warning when the first allocation attempt
> > * fails. The swiotlb_alloc_coherent() function will print a
> > * warning when the allocation of DMA memory ultimatly failed.
> > */
>
> Much better. Thank you.
> >
> > This takes the kdump-specifics out of this change (in the end
> > kdump-kernel loaded high is just a case where this failure is much more
> > likely).
>
> <nods>

Okay, thanks. I'll update the patch.


Joerg

2014-12-03 10:35:10

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix kdump failures with crashkernel=high

Hi,

On Wed, Dec 03, 2014 at 12:01:23PM +0800, WANG Chao wrote:
>From your experience It seems like swiotlb isn't working well with
> crashkernel=X,high alone. What about using crashkernel=X,low with
> crashkernel=X,high? Is there any reason you have to use
> crashkernel=X,high alone?

Sure, when I specify an additional crashkernel=X,low then I can get
things to work this way too. But my patch-set is about changing the
default, since the failure was seen on common server systems with the
defaults.

> crashkernel=X,high shouldn't automatically reserve 72M low at the first
> place. Now it's going insane if you increase it to 256M by default.

How should a kernel without some low memory (which has only memory above
4G available) handle any 32bit DMA devices? There would be no way to
allocate DMA-able memory for those devices.

And as I said, if people prefer it I can change the patch-set so that
the amount of low-memory allocated is subtracted from the amount of
high-memory. This way the overall memory usage for kdump would stay the
same while changing the defaults to work on more systems.


Joerg

2014-12-03 15:19:50

by WANG Chao

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix kdump failures with crashkernel=high

Hi,

On 12/03/14 at 11:35am, Joerg Roedel wrote:
> Hi,
>
> On Wed, Dec 03, 2014 at 12:01:23PM +0800, WANG Chao wrote:
> >From your experience It seems like swiotlb isn't working well with
> > crashkernel=X,high alone. What about using crashkernel=X,low with
> > crashkernel=X,high? Is there any reason you have to use
> > crashkernel=X,high alone?
>
> Sure, when I specify an additional crashkernel=X,low then I can get
> things to work this way too. But my patch-set is about changing the
> default, since the failure was seen on common server systems with the
> defaults.

crashkernel=X doesn't work for you?

>
> > crashkernel=X,high shouldn't automatically reserve 72M low at the first
> > place. Now it's going insane if you increase it to 256M by default.
>
> How should a kernel without some low memory (which has only memory above
> 4G available) handle any 32bit DMA devices? There would be no way to
> allocate DMA-able memory for those devices.

I mean crashkernel=X,high shouldn't automatically reserve low.
crashkernel=X,high should always be used together with
crashkernel=X,low.

If one is not satisfied with the combination of two parameters,
crashkernel=X should be able to look for a suitable area below 4G. But
right now, crashkernel=X only deals under 896M.

I've sent a patch for this in the past:
[PATCH] x86, kdump: crashkernel=X try to reserve below 896M first, then try below 4G, then MAXMEM
- https://lkml.org/lkml/2013/10/14/183

X86 people don't like this idea so I didn't update the patch even
there's minor nit to clean up.

>
> And as I said, if people prefer it I can change the patch-set so that
> the amount of low-memory allocated is subtracted from the amount of
> high-memory. This way the overall memory usage for kdump would stay the
> same while changing the defaults to work on more systems.

Say you do this, crashkernel=X,high would be reserve (X-n) high and (n)
low? IMHO I think it's better crashkernel=X,high means reserve X high
and high only.

Thanks
WANG Chao

2014-12-11 19:09:01

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 1/3] swiotlb: Warn on allocation failure in swiotlb_alloc_coherent

On Wed, Dec 03, 2014 at 11:26:52AM +0100, Joerg Roedel wrote:
> On Tue, Dec 02, 2014 at 01:46:15PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Tue, Dec 02, 2014 at 03:41:22PM +0100, Joerg Roedel wrote:
> > > On Mon, Dec 01, 2014 at 03:28:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Fri, Nov 28, 2014 at 12:29:07PM +0100, Joerg Roedel wrote:
> > > > > From: Joerg Roedel <[email protected]>
> > > > >
> > > > > Print a warning when all allocation tries have been failed
> > > > > and the function is about to return NULL. This prepares for
> > > > > calling the function with __GFP_NOWARN to suppress
> > > > > allocation failure warnings before all fall-backs have
> > > > > failed.
> > > >
> > > > This can be quite noisy. Especially the dump-stack.
> > >
> > > Well, this is as noisy as the dump_stack()s from the page-allocator when
> > > the first allocation try fails. The goal of the first two patches in
> >
> > Right, on the first allocation. Subsequent allocations won't be so noisy
> > in the page-allocator (I think?).
>
> >From the code in mm/page_alloc.c (function warn_alloc_failed) it doesn't
> look like a one-time warning. The dmesg I have seen from a failing
> kernel also shows a lot of these messages.
>
> So having the warning at the end of swiotlb_alloc_coherent won't be any
> more noisy than the (removed) warnings from the page allocator.

OK, then Acked-by: Konrad Rzeszutek Wilk <[email protected]>
>
>
> Joerg
>

2015-02-01 08:41:26

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86, crash: Allocate enough low-mem when crashkernel=high

On 01/26/15 at 01:07pm, Joerg Roedel wrote:
> Hi Baoquan,
>
> thanks for your reply.
>
> On Fri, Jan 23, 2015 at 04:44:53PM +0800, Baoquan He wrote:
> > 2) increase low-mem when crashkernel=high. But we have to be careful to
> > do this. We implement crashkernel=high not only for the unhappiness
> > crashkernel reservation is limited below 4G, but dma/dma32 memory space
> > is precious on some systems. If set crashkernel=high still too much low
> > memory has to be reserved by default, it's important to find the
> > balance. So if we have to increase the default low-mem, how much memory
> > is enough, why 256M? why not 128M/192M/320M/384M? And if 256M works
> > on your system, what if another person say it does't work because there
> > are more devices on his system?
> >
> > Anyway, I understand the requirement, but we need find out how much
> > memory can satisfy most of systems.
>
> Yes, I totally agree that it is tough to find a good default here. I
> used 256MB because this is what was required on the system the failed
> kdumps were reported on.
>
> But probably we can agree that 72MB is not enough (given that 64MB are
> taken away by swiotlb already), and increase it to a value we think by
> now is sufficient for most systems.

Yeah, and I got report from user about this issue too. It should be
fixed. Like I said, the 1st suggestion mainly will goes to the area of
initramfs making tools, currently maybe dracut which is used widely.
This may cause many changes. Hence increasing low mem is a better idea.

Before I said 256M may not be a good value, that's because in your patch
cover you said this number comes from experiments on the affected
systems, and 128M was still not enough, then you set it to 256M. This
may be a little rush. I think the step size to increase should be 32M,
after all previously people only take 64M and 8M, enlarge it on a step
size of 128M only one time, it can't be seen as patient and careful.
If it failed on 224M but succeed on 256M, then 256M may be not enough.
I would like to say 32M is better, then we can make a good evaluate.

I will ask user reported this issue to help test and see what value will
be satisfy their system.

Anyway, I think this patch is helpful and necessary.

>
> Btw, the issue was also reported on machines with only a few devices,
> the reason there is that device drivers allocate more dma memory by
> default on intilization. Maybe we should handle that as a driver
> regression in the future, forcing them to allocate more dma-memory
> on-demand and not on initialization.

Yeah, agree. In that case it shoube be handled as a regression.


Thanks
Baoquan

2015-02-04 14:10:25

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86, crash: Allocate enough low-mem when crashkernel=high

Hi Baoquan,

On Sun, Feb 01, 2015 at 04:41:03PM +0800, Baoquan He wrote:
> Before I said 256M may not be a good value, that's because in your patch
> cover you said this number comes from experiments on the affected
> systems, and 128M was still not enough, then you set it to 256M. This
> may be a little rush. I think the step size to increase should be 32M,
> after all previously people only take 64M and 8M, enlarge it on a step
> size of 128M only one time, it can't be seen as patient and careful.
> If it failed on 224M but succeed on 256M, then 256M may be not enough.
> I would like to say 32M is better, then we can make a good evaluate.

That makes sense. I also asked the customer to test intermediate values,
we already know that it works with 256MB but also that 128MB are not
enough. I will report back when I have the results of the intermediate
values in 32MB steps.

Thanks,

Joerg

2015-02-09 12:20:45

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86, crash: Allocate enough low-mem when crashkernel=high

Hi Baoquan,

On Wed, Feb 04, 2015 at 03:10:20PM +0100, Joerg Roedel wrote:
> That makes sense. I also asked the customer to test intermediate values,
> we already know that it works with 256MB but also that 128MB are not
> enough. I will report back when I have the results of the intermediate
> values in 32MB steps.

I got the results from the customer, and it turns out that a value of
192MB is sufficient to make the kdump succeed. It fails with 128MB and
160MB.

So I think we can settle in 192MB for now. What do you think?

Regards,

Joerg

2015-02-13 15:34:59

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86, crash: Allocate enough low-mem when crashkernel=high

On 02/09/15 at 01:20pm, Joerg Roedel wrote:
> Hi Baoquan,
>
> On Wed, Feb 04, 2015 at 03:10:20PM +0100, Joerg Roedel wrote:
> > That makes sense. I also asked the customer to test intermediate values,
> > we already know that it works with 256MB but also that 128MB are not
> > enough. I will report back when I have the results of the intermediate
> > values in 32MB steps.
>
> I got the results from the customer, and it turns out that a value of
> 192MB is sufficient to make the kdump succeed. It fails with 128MB and
> 160MB.
>
> So I think we can settle in 192MB for now. What do you think?

Hi Joerg,

Sorry for late reply.

So that machine need eat memory between 160M and 192M. Then how about
setting it as 256M? Since 192M seems very close to the brink, and
setting it larger can keep enough space to extend for increasing dma
device on larger machines, otherwise this value could be increased soon.

Wasting some low memory is better than kdump kernel can't bootup
caused by not enough low memory. If someone grudge the low mem they can
specify crashkernel=size,low manually.

Conclusively, I like 256M since the testing data showed it's sufficient
now and should be save for a long time.

Thanks
Baoquan

2015-02-13 22:28:52

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86, crash: Allocate enough low-mem when crashkernel=high

Hi Baoquan,

On Fri, Feb 13, 2015 at 11:34:38PM +0800, Baoquan He wrote:
> Conclusively, I like 256M since the testing data showed it's sufficient
> now and should be save for a long time.

Thanks, I am fine with 256MB too, so can I have your Acked-by on this
series? I will rebase and resend it then after the merge window in the
hope it gets queued.


Joerg

2015-02-14 11:44:34

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86, crash: Allocate enough low-mem when crashkernel=high

On 02/13/15 at 11:28pm, Joerg Roedel wrote:
> Hi Baoquan,
>
> On Fri, Feb 13, 2015 at 11:34:38PM +0800, Baoquan He wrote:
> > Conclusively, I like 256M since the testing data showed it's sufficient
> > now and should be save for a long time.
>
> Thanks, I am fine with 256MB too, so can I have your Acked-by on this
> series? I will rebase and resend it then after the merge window in the
> hope it gets queued.

Sure. Have acked it in reply to cover letter, feel free to add my
Acked-by in your resend.

Thanks a lot!

Thanks
Baoquan

>
>
> Joerg
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/