we see the below kernel panic on stress suspend resume test in
snd_malloc_sgbuf_pages(), snd_dma_alloc_pages_fallback() alloc
chunk maybe larger than the left pages due to the pages alignment,
which will cause the pages overflow.
while (pages > 0) {
...
pages -= chunk;
}
the patch is change the pages from unsigned int to int to fix the issue.
BUG: unable to handle kernel paging request at ffff88000deb4000
IP: [<ffffffff81404fa9>] memset_erms+0x9/0x10
Call Trace:
[<ffffffff818f222f>] snd_dma_alloc_pages+0xff/0x210
[<ffffffff818f23af>] snd_dma_alloc_pages_fallback+0x6f/0x90
[<ffffffff818f2b85>] snd_malloc_sgbuf_pages+0x145/0x370
[<ffffffff818f229e>] snd_dma_alloc_pages+0x16e/0x210
[<ffffffffc011930d>] hdac_ext_dma_alloc_pages+0x1d/0x40 [snd_hda_ext_core]
[<ffffffffc010729a>] snd_hdac_dsp_prepare+0xca/0x1c0 [snd_hda_core]
[<ffffffffc01880f9>] skl_dsp_prepare+0x99/0xf0 [snd_soc_skl]
[<ffffffffc0162a7e>] bxt_load_base_firmware+0x9e/0x5c0 [snd_soc_skl_ipc]
[<ffffffffc01630ec>] bxt_set_dsp_D0+0x14c/0x300 [snd_soc_skl_ipc]
[<ffffffffc015f9c3>] skl_dsp_get_core+0x43/0xd0 [snd_soc_skl_ipc]
[<ffffffffc015fa60>] skl_dsp_wake+0x10/0x20 [snd_soc_skl_ipc]
[<ffffffffc0188e3e>] skl_resume_dsp+0x7e/0x140 [snd_soc_skl]
[<ffffffffc0183c4a>] skl_resume+0xda/0x170 [snd_soc_skl]
[<ffffffff81452726>] pci_pm_resume+0x76/0xe0
[<ffffffff816616da>] dpm_run_callback+0x5a/0x180
[<ffffffff81661e3c>] device_resume+0xdc/0x2c0
[<ffffffff81663818>] dpm_resume+0x118/0x310
[<ffffffff81663e11>] dpm_resume_end+0x11/0x20
[<ffffffff810f8bcc>] suspend_devices_and_enter+0x11c/0x2b0
[<ffffffff810f90bd>] pm_suspend+0x35d/0x3d0
[<ffffffff810f78a6>] state_store+0x66/0x90
[<ffffffff813f80e2>] kobj_attr_store+0x12/0x20
[<ffffffff812a37bc>] sysfs_kf_write+0x3c/0x50
[<ffffffff812a2cbd>] kernfs_fop_write+0x11d/0x1a0
[<ffffffff8121dfaa>] __vfs_write+0x3a/0x150
[<ffffffff8121f2b1>] vfs_write+0xb1/0x1a0
[<ffffffff81220898>] SyS_write+0x58/0xc0
[<ffffffff81001fca>] do_syscall_64+0x6a/0xe0
[<ffffffff81b06560>] entry_SYSCALL_64_after_swapgs+0x5d/0xd7
Signed-off-by: he, bo <[email protected]>
Signed-off-by: zhang jun <[email protected]>
---
sound/core/sgbuf.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sound/core/sgbuf.c b/sound/core/sgbuf.c
index 84fffab..33449ee 100644
--- a/sound/core/sgbuf.c
+++ b/sound/core/sgbuf.c
@@ -68,7 +68,8 @@ void *snd_malloc_sgbuf_pages(struct device *device,
size_t *res_size)
{
struct snd_sg_buf *sgbuf;
- unsigned int i, pages, chunk, maxpages;
+ unsigned int i, chunk, maxpages;
+ int pages;
struct snd_dma_buffer tmpb;
struct snd_sg_page *table;
struct page **pgtable;
--
2.7.4
On Wed, 18 Jul 2018 13:52:45 +0200,
He, Bo wrote:
>
> we see the below kernel panic on stress suspend resume test in
> snd_malloc_sgbuf_pages(), snd_dma_alloc_pages_fallback() alloc
> chunk maybe larger than the left pages due to the pages alignment,
> which will cause the pages overflow.
>
> while (pages > 0) {
> ...
> pages -= chunk;
> }
>
> the patch is change the pages from unsigned int to int to fix the issue.
Thanks for the patch.
Although the analysis is correct, the fix doesn't look ideal. It's
also possible that the returned size may over sgbuf->tblsize if we are
more unlucky.
A change like below should work instead. Could you give it a try?
Takashi
-- 8< --
--- a/sound/core/sgbuf.c
+++ b/sound/core/sgbuf.c
@@ -108,7 +108,7 @@ void *snd_malloc_sgbuf_pages(struct device *device,
break;
}
chunk = tmpb.bytes >> PAGE_SHIFT;
- for (i = 0; i < chunk; i++) {
+ for (i = 0; i < chunk && pages > 0; i++) {
table->buf = tmpb.area;
table->addr = tmpb.addr;
if (!i)
@@ -117,9 +117,9 @@ void *snd_malloc_sgbuf_pages(struct device *device,
*pgtable++ = virt_to_page(tmpb.area);
tmpb.area += PAGE_SIZE;
tmpb.addr += PAGE_SIZE;
+ sgbuf->pages++;
+ pages--;
}
- sgbuf->pages += chunk;
- pages -= chunk;
if (chunk < maxpages)
maxpages = chunk;
}
Thanks, we will run the test with your patch, will update the test results in 24 Hours.
Current status is:
We can reproduce the issue in 3000 cycles stress S/R test, we can't reproduce the kernel panic with our patch in 6000 cycles.
-----Original Message-----
From: Takashi Iwai <[email protected]>
Sent: Wednesday, July 18, 2018 8:34 PM
To: He, Bo <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; Zhang, Jun <[email protected]>; Zhang, Yanmin <[email protected]>
Subject: Re: [PATCH] ALSA: core: fix unsigned int pages overflow when comapred
On Wed, 18 Jul 2018 13:52:45 +0200,
He, Bo wrote:
>
> we see the below kernel panic on stress suspend resume test in
> snd_malloc_sgbuf_pages(), snd_dma_alloc_pages_fallback() alloc chunk
> maybe larger than the left pages due to the pages alignment, which
> will cause the pages overflow.
>
> while (pages > 0) {
> ...
> pages -= chunk;
> }
>
> the patch is change the pages from unsigned int to int to fix the issue.
Thanks for the patch.
Although the analysis is correct, the fix doesn't look ideal. It's also possible that the returned size may over sgbuf->tblsize if we are more unlucky.
A change like below should work instead. Could you give it a try?
Takashi
-- 8< --
--- a/sound/core/sgbuf.c
+++ b/sound/core/sgbuf.c
@@ -108,7 +108,7 @@ void *snd_malloc_sgbuf_pages(struct device *device,
break;
}
chunk = tmpb.bytes >> PAGE_SHIFT;
- for (i = 0; i < chunk; i++) {
+ for (i = 0; i < chunk && pages > 0; i++) {
table->buf = tmpb.area;
table->addr = tmpb.addr;
if (!i)
@@ -117,9 +117,9 @@ void *snd_malloc_sgbuf_pages(struct device *device,
*pgtable++ = virt_to_page(tmpb.area);
tmpb.area += PAGE_SIZE;
tmpb.addr += PAGE_SIZE;
+ sgbuf->pages++;
+ pages--;
}
- sgbuf->pages += chunk;
- pages -= chunk;
if (chunk < maxpages)
maxpages = chunk;
}
Hello, Takashi
I think use our patch, it's NOT possible that the returned size is over sgbuf->tblsize.
In function snd_malloc_sgbuf_pages,
Pages is align page,
sgbuf->tblsize is align 32*page,
chunk is align 2^n*page,
in our panic case, pages = 123, tlbsize = 128,
1st loop trunk = 32
2nd loop trunk = 32
3rd loop trunk = 32
4th loop trunk = 16
5th loop trunk = 16
So in 5th loop pages-trunk = -5, which make dead loop.
Use our patch , in 5th loop, while is break. Returned size could NOT be over sgbuf->tblsize.
-----Original Message-----
From: Takashi Iwai [mailto:[email protected]]
Sent: Wednesday, July 18, 2018 20:34
To: He, Bo <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; Zhang, Jun <[email protected]>; Zhang, Yanmin <[email protected]>
Subject: Re: [PATCH] ALSA: core: fix unsigned int pages overflow when comapred
On Wed, 18 Jul 2018 13:52:45 +0200,
He, Bo wrote:
>
> we see the below kernel panic on stress suspend resume test in
> snd_malloc_sgbuf_pages(), snd_dma_alloc_pages_fallback() alloc chunk
> maybe larger than the left pages due to the pages alignment, which
> will cause the pages overflow.
>
> while (pages > 0) {
> ...
> pages -= chunk;
> }
>
> the patch is change the pages from unsigned int to int to fix the issue.
Thanks for the patch.
Although the analysis is correct, the fix doesn't look ideal. It's also possible that the returned size may over sgbuf->tblsize if we are more unlucky.
A change like below should work instead. Could you give it a try?
Takashi
-- 8< --
--- a/sound/core/sgbuf.c
+++ b/sound/core/sgbuf.c
@@ -108,7 +108,7 @@ void *snd_malloc_sgbuf_pages(struct device *device,
break;
}
chunk = tmpb.bytes >> PAGE_SHIFT;
- for (i = 0; i < chunk; i++) {
+ for (i = 0; i < chunk && pages > 0; i++) {
table->buf = tmpb.area;
table->addr = tmpb.addr;
if (!i)
@@ -117,9 +117,9 @@ void *snd_malloc_sgbuf_pages(struct device *device,
*pgtable++ = virt_to_page(tmpb.area);
tmpb.area += PAGE_SIZE;
tmpb.addr += PAGE_SIZE;
+ sgbuf->pages++;
+ pages--;
}
- sgbuf->pages += chunk;
- pages -= chunk;
if (chunk < maxpages)
maxpages = chunk;
}
On Thu, 19 Jul 2018 08:08:06 +0200,
Zhang, Jun wrote:
>
> Hello, Takashi
>
> I think use our patch, it's NOT possible that the returned size is over sgbuf->tblsize.
>
> In function snd_malloc_sgbuf_pages,
>
> Pages is align page,
> sgbuf->tblsize is align 32*page,
> chunk is align 2^n*page,
>
> in our panic case, pages = 123, tlbsize = 128,
> 1st loop trunk = 32
> 2nd loop trunk = 32
> 3rd loop trunk = 32
> 4th loop trunk = 16
> 5th loop trunk = 16
> So in 5th loop pages-trunk = -5, which make dead loop.
Looking at the code again, yeah, you are right, that won't happen.
And now it becomes clear: the fundamental problem is that
snd_dma_alloc_pages_fallback() returns a larger size than requested.
It would be acceptable if the internal allocator aligns a larger size,
but it shouldn't appear in the returned size outside. I believe this
was just a misunderstanding of get_order() usage there.
(BTW, it's interesting that the allocation with a larger block worked
while allocation with a smaller chunk failed; it must be a rare case
and that's one of reasons this bug didn't hit frequently.)
That being said, what we should fix is rather the function
snd_dma_alloc_pages_fallback() to behave as expected, and it'll be
like the patch below.
thanks,
Takashi
--- a/sound/core/memalloc.c
+++ b/sound/core/memalloc.c
@@ -247,11 +247,10 @@ int snd_dma_alloc_pages_fallback(int type, struct device *device, size_t size,
return err;
if (size <= PAGE_SIZE)
return -ENOMEM;
+ size >>= 1;
aligned_size = PAGE_SIZE << get_order(size);
if (size != aligned_size)
size = aligned_size;
- else
- size >>= 1;
}
if (! dmab->area)
return -ENOMEM;
On Thu, 19 Jul 2018 08:42:14 +0200,
Takashi Iwai wrote:
>
> On Thu, 19 Jul 2018 08:08:06 +0200,
> Zhang, Jun wrote:
> >
> > Hello, Takashi
> >
> > I think use our patch, it's NOT possible that the returned size is over sgbuf->tblsize.
> >
> > In function snd_malloc_sgbuf_pages,
> >
> > Pages is align page,
> > sgbuf->tblsize is align 32*page,
> > chunk is align 2^n*page,
> >
> > in our panic case, pages = 123, tlbsize = 128,
> > 1st loop trunk = 32
> > 2nd loop trunk = 32
> > 3rd loop trunk = 32
> > 4th loop trunk = 16
> > 5th loop trunk = 16
> > So in 5th loop pages-trunk = -5, which make dead loop.
>
> Looking at the code again, yeah, you are right, that won't happen.
>
> And now it becomes clear: the fundamental problem is that
> snd_dma_alloc_pages_fallback() returns a larger size than requested.
> It would be acceptable if the internal allocator aligns a larger size,
> but it shouldn't appear in the returned size outside. I believe this
> was just a misunderstanding of get_order() usage there.
> (BTW, it's interesting that the allocation with a larger block worked
> while allocation with a smaller chunk failed; it must be a rare case
> and that's one of reasons this bug didn't hit frequently.)
>
> That being said, what we should fix is rather the function
> snd_dma_alloc_pages_fallback() to behave as expected, and it'll be
> like the patch below.
And we can reduce even more lines. A proper patch is below.
thanks,
Takashi
-- 8< --
From: Takashi Iwai <[email protected]>
Subject: [PATCH] ALSA: memalloc: Don't exceed over the requested size
snd_dma_alloc_pages_fallback() tries to allocate pages again when the
allocation fails with reduced size. But the first try actually
*increases* the size to power-of-two, which may give back a larger
chunk than the requested size. This confuses the callers, e.g. sgbuf
assumes that the size is equal or less, and it may result in a bad
loop due to the underflow and eventually lead to Oops.
The code of this function seems incorrectly assuming the usage of
get_order(). We need to decrease at first, then align to
power-of-two.
Reported-by: he, bo <[email protected]>
Reported-by: zhang jun <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
---
sound/core/memalloc.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c
index 7f89d3c79a4b..753d5fc4b284 100644
--- a/sound/core/memalloc.c
+++ b/sound/core/memalloc.c
@@ -242,16 +242,12 @@ int snd_dma_alloc_pages_fallback(int type, struct device *device, size_t size,
int err;
while ((err = snd_dma_alloc_pages(type, device, size, dmab)) < 0) {
- size_t aligned_size;
if (err != -ENOMEM)
return err;
if (size <= PAGE_SIZE)
return -ENOMEM;
- aligned_size = PAGE_SIZE << get_order(size);
- if (size != aligned_size)
- size = aligned_size;
- else
- size >>= 1;
+ size >>= 1;
+ size = PAGE_SIZE << get_order(size);
}
if (! dmab->area)
return -ENOMEM;
--
2.18.0
Hi, Takashi:
we tested for the whole weekend, your patch works, no panic issue seen. You can safe merge you patch.
-----Original Message-----
From: Takashi Iwai <[email protected]>
Sent: Thursday, July 19, 2018 5:11 PM
To: Zhang, Jun <[email protected]>
Cc: He, Bo <[email protected]>; [email protected]; [email protected]; [email protected]; Zhang, Yanmin <[email protected]>
Subject: Re: [PATCH] ALSA: core: fix unsigned int pages overflow when comapred
On Thu, 19 Jul 2018 08:42:14 +0200,
Takashi Iwai wrote:
>
> On Thu, 19 Jul 2018 08:08:06 +0200,
> Zhang, Jun wrote:
> >
> > Hello, Takashi
> >
> > I think use our patch, it's NOT possible that the returned size is over sgbuf->tblsize.
> >
> > In function snd_malloc_sgbuf_pages,
> >
> > Pages is align page,
> > sgbuf->tblsize is align 32*page,
> > chunk is align 2^n*page,
> >
> > in our panic case, pages = 123, tlbsize = 128, 1st loop trunk = 32
> > 2nd loop trunk = 32 3rd loop trunk = 32 4th loop trunk = 16 5th loop
> > trunk = 16 So in 5th loop pages-trunk = -5, which make dead loop.
>
> Looking at the code again, yeah, you are right, that won't happen.
>
> And now it becomes clear: the fundamental problem is that
> snd_dma_alloc_pages_fallback() returns a larger size than requested.
> It would be acceptable if the internal allocator aligns a larger size,
> but it shouldn't appear in the returned size outside. I believe this
> was just a misunderstanding of get_order() usage there.
> (BTW, it's interesting that the allocation with a larger block worked
> while allocation with a smaller chunk failed; it must be a rare case
> and that's one of reasons this bug didn't hit frequently.)
>
> That being said, what we should fix is rather the function
> snd_dma_alloc_pages_fallback() to behave as expected, and it'll be
> like the patch below.
And we can reduce even more lines. A proper patch is below.
thanks,
Takashi
-- 8< --
From: Takashi Iwai <[email protected]>
Subject: [PATCH] ALSA: memalloc: Don't exceed over the requested size
snd_dma_alloc_pages_fallback() tries to allocate pages again when the allocation fails with reduced size. But the first try actually
*increases* the size to power-of-two, which may give back a larger chunk than the requested size. This confuses the callers, e.g. sgbuf assumes that the size is equal or less, and it may result in a bad loop due to the underflow and eventually lead to Oops.
The code of this function seems incorrectly assuming the usage of get_order(). We need to decrease at first, then align to power-of-two.
Reported-by: he, bo <[email protected]>
Reported-by: zhang jun <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
---
sound/core/memalloc.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c index 7f89d3c79a4b..753d5fc4b284 100644
--- a/sound/core/memalloc.c
+++ b/sound/core/memalloc.c
@@ -242,16 +242,12 @@ int snd_dma_alloc_pages_fallback(int type, struct device *device, size_t size,
int err;
while ((err = snd_dma_alloc_pages(type, device, size, dmab)) < 0) {
- size_t aligned_size;
if (err != -ENOMEM)
return err;
if (size <= PAGE_SIZE)
return -ENOMEM;
- aligned_size = PAGE_SIZE << get_order(size);
- if (size != aligned_size)
- size = aligned_size;
- else
- size >>= 1;
+ size >>= 1;
+ size = PAGE_SIZE << get_order(size);
}
if (! dmab->area)
return -ENOMEM;
--
2.18.0
On Mon, 23 Jul 2018 02:47:18 +0200,
He, Bo wrote:
>
> Hi, Takashi:
> we tested for the whole weekend, your patch works, no panic issue seen. You can safe merge you patch.
OK, thanks for testing! Now it's merged.
Takashi
>
> -----Original Message-----
> From: Takashi Iwai <[email protected]>
> Sent: Thursday, July 19, 2018 5:11 PM
> To: Zhang, Jun <[email protected]>
> Cc: He, Bo <[email protected]>; [email protected]; [email protected]; [email protected]; Zhang, Yanmin <[email protected]>
> Subject: Re: [PATCH] ALSA: core: fix unsigned int pages overflow when comapred
>
> On Thu, 19 Jul 2018 08:42:14 +0200,
> Takashi Iwai wrote:
> >
> > On Thu, 19 Jul 2018 08:08:06 +0200,
> > Zhang, Jun wrote:
> > >
> > > Hello, Takashi
> > >
> > > I think use our patch, it's NOT possible that the returned size is over sgbuf->tblsize.
> > >
> > > In function snd_malloc_sgbuf_pages,
> > >
> > > Pages is align page,
> > > sgbuf->tblsize is align 32*page,
> > > chunk is align 2^n*page,
> > >
> > > in our panic case, pages = 123, tlbsize = 128, 1st loop trunk = 32
> > > 2nd loop trunk = 32 3rd loop trunk = 32 4th loop trunk = 16 5th loop
> > > trunk = 16 So in 5th loop pages-trunk = -5, which make dead loop.
> >
> > Looking at the code again, yeah, you are right, that won't happen.
> >
> > And now it becomes clear: the fundamental problem is that
> > snd_dma_alloc_pages_fallback() returns a larger size than requested.
> > It would be acceptable if the internal allocator aligns a larger size,
> > but it shouldn't appear in the returned size outside. I believe this
> > was just a misunderstanding of get_order() usage there.
> > (BTW, it's interesting that the allocation with a larger block worked
> > while allocation with a smaller chunk failed; it must be a rare case
> > and that's one of reasons this bug didn't hit frequently.)
> >
> > That being said, what we should fix is rather the function
> > snd_dma_alloc_pages_fallback() to behave as expected, and it'll be
> > like the patch below.
>
> And we can reduce even more lines. A proper patch is below.
>
>
> thanks,
>
> Takashi
>
> -- 8< --
> From: Takashi Iwai <[email protected]>
> Subject: [PATCH] ALSA: memalloc: Don't exceed over the requested size
>
> snd_dma_alloc_pages_fallback() tries to allocate pages again when the allocation fails with reduced size. But the first try actually
> *increases* the size to power-of-two, which may give back a larger chunk than the requested size. This confuses the callers, e.g. sgbuf assumes that the size is equal or less, and it may result in a bad loop due to the underflow and eventually lead to Oops.
>
> The code of this function seems incorrectly assuming the usage of get_order(). We need to decrease at first, then align to power-of-two.
>
> Reported-by: he, bo <[email protected]>
> Reported-by: zhang jun <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Takashi Iwai <[email protected]>
> ---
> sound/core/memalloc.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c index 7f89d3c79a4b..753d5fc4b284 100644
> --- a/sound/core/memalloc.c
> +++ b/sound/core/memalloc.c
> @@ -242,16 +242,12 @@ int snd_dma_alloc_pages_fallback(int type, struct device *device, size_t size,
> int err;
>
> while ((err = snd_dma_alloc_pages(type, device, size, dmab)) < 0) {
> - size_t aligned_size;
> if (err != -ENOMEM)
> return err;
> if (size <= PAGE_SIZE)
> return -ENOMEM;
> - aligned_size = PAGE_SIZE << get_order(size);
> - if (size != aligned_size)
> - size = aligned_size;
> - else
> - size >>= 1;
> + size >>= 1;
> + size = PAGE_SIZE << get_order(size);
> }
> if (! dmab->area)
> return -ENOMEM;
> --
> 2.18.0
>