LinuxLists.cc - [v2 0/4] ARM: dma-mapping: IOMMU atomic allocation

2012-08-23 06:10:50

Subject: [v2 0/4] ARM: dma-mapping: IOMMU atomic allocation

Hi,

The commit e9da6e9 "ARM: dma-mapping: remove custom consistent dma
region" breaks the compatibility with existing drivers. This causes
the following kernel oops(*1). That driver has called dma_pool_alloc()
to allocate memory from the interrupt context, and it hits
BUG_ON(in_interrpt()) in "get_vm_area_caller()". This patch seris
fixes this problem with making use of the pre-allocate atomic memory
pool which DMA is using in the same way as DMA does now.

Any comment would be really appreciated.

v2:
Don't modify attrs(DMA_ATTR_NO_KERNEL_MAPPING) for atomic allocation. (Marek)
Skip vzalloc (KyongHo, Minchan)
v1:
http://lists.linaro.org/pipermail/linaro-mm-sig/2012-August/002398.html

*1:
[ 8.321343] ------------[ cut here ]------------
[ 8.325971] kernel BUG at kernel/mm/vmalloc.c:1322!
[ 8.333615] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[ 8.339436] Modules linked in:
[ 8.342496] CPU: 0 Tainted: G W (3.4.6-00067-g5d485f7 #67)
[ 8.349192] PC is at __get_vm_area_node.isra.29+0x164/0x16c
[ 8.354758] LR is at get_vm_area_caller+0x4c/0x54
[ 8.359454] pc : [<c011297c>] lr : [<c011318c>] psr: 20000193
[ 8.359458] sp : c09edca0 ip : c09ec000 fp : ae278000
[ 8.370922] r10: f0000000 r9 : c011aa54 r8 : c0a26cb8
[ 8.376136] r7 : 00000001 r6 : 000000d0 r5 : 20000008 r4 : c09edca0
[ 8.382651] r3 : 00010000 r2 : 20000008 r1 : 00000001 r0 : 00001000
[ 8.389166] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 8.396549] Control: 10c5387d Table: ad98c04a DAC: 00000015
....
[ 9.169162] dfa0: 412fc099 c09ec000 00000000 c000fdd8 c06df1e4 c0a1b080 00000000 00000000
[ 9.177329] dfc0: c0a235cc 8000406a 00000000 c0986818 ffffffff ffffffff c0986404 00000000
[ 9.185497] dfe0: 00000000 c09bb070 10c5387d c0a19c58 c09bb064 80008044 00000000 00000000
[ 9.193673] [<c011297c>] (__get_vm_area_node.isra.29+0x164/0x16c) from [<c011318c>] (get_vm_area_caller+0x4c/0x54)
[ 9.204022] [<c011318c>] (get_vm_area_caller+0x4c/0x54) from [<c001aed8>] (__iommu_alloc_remap.isra.14+0x2c/0xfc)
[ 9.214276] [<c001aed8>] (__iommu_alloc_remap.isra.14+0x2c/0xfc) from [<c001b06c>] (arm_iommu_alloc_attrs+0xc4/0xf8)
[ 9.224795] [<c001b06c>] (arm_iommu_alloc_attrs+0xc4/0xf8) from [<c011aa54>] (pool_alloc_page.constprop.5+0x6c/0xf8)
[ 9.235309] [<c011aa54>] (pool_alloc_page.constprop.5+0x6c/0xf8) from [<c011ab60>] (dma_pool_alloc+0x80/0x170)
[ 9.245304] [<c011ab60>] (dma_pool_alloc+0x80/0x170) from [<c03cbbcc>] (tegra_build_dtd+0x48/0x14c)
[ 9.254344] [<c03cbbcc>] (tegra_build_dtd+0x48/0x14c) from [<c03cbd4c>] (tegra_req_to_dtd+0x7c/0xa8)
[ 9.263467] [<c03cbd4c>] (tegra_req_to_dtd+0x7c/0xa8) from [<c03cc140>] (tegra_ep_queue+0x154/0x33c)
[ 9.272592] [<c03cc140>] (tegra_ep_queue+0x154/0x33c) from [<c03dd5b4>] (composite_setup+0x364/0x6d4)
[ 9.281804] [<c03dd5b4>] (composite_setup+0x364/0x6d4) from [<c03dd9dc>] (android_setup+0xb8/0x14c)
[ 9.290843] [<c03dd9dc>] (android_setup+0xb8/0x14c) from [<c03cd144>] (setup_received_irq+0xbc/0x270)
[ 9.300053] [<c03cd144>] (setup_received_irq+0xbc/0x270) from [<c03cda64>] (tegra_udc_irq+0x2ac/0x2c4)
[ 9.309353] [<c03cda64>] (tegra_udc_irq+0x2ac/0x2c4) from [<c00b5708>] (handle_irq_event_percpu+0x78/0x2e0)
[ 9.319087] [<c00b5708>] (handle_irq_event_percpu+0x78/0x2e0) from [<c00b59b4>] (handle_irq_event+0x44/0x64)
[ 9.328907] [<c00b59b4>] (handle_irq_event+0x44/0x64) from [<c00b8688>] (handle_fasteoi_irq+0xc4/0x16c)
[ 9.338294] [<c00b8688>] (handle_fasteoi_irq+0xc4/0x16c) from [<c00b4f14>] (generic_handle_irq+0x34/0x48)
[ 9.347858] [<c00b4f14>] (generic_handle_irq+0x34/0x48) from [<c000f6f4>] (handle_IRQ+0x54/0xb4)
[ 9.356637] [<c000f6f4>] (handle_IRQ+0x54/0xb4) from [<c00084b0>] (gic_handle_irq+0x2c/0x60)
[ 9.365068] [<c00084b0>] (gic_handle_irq+0x2c/0x60) from [<c000e900>] (__irq_svc+0x40/0x70)
[ 9.373405] Exception stack(0xc09edf10 to 0xc09edf58)
[ 9.378447] df00: 00000000 000f4240 00000003 00000000
[ 9.386615] df20: 00000000 e55bbc00 ef66f3ca 00000001 00000000 412fc099 c0abb9c8 00000000
[ 9.394781] df40: 3b9ac9ff c09edf58 c027a9bc c0042880 20000113 ffffffff
[ 9.401396] [<c000e900>] (__irq_svc+0x40/0x70) from [<c0042880>] (tegra_idle_enter_lp3+0x68/0x78)
[ 9.410272] [<c0042880>] (tegra_idle_enter_lp3+0x68/0x78) from [<c04701d4>] (cpuidle_idle_call+0xdc/0x3a4)
[ 9.419922] [<c04701d4>] (cpuidle_idle_call+0xdc/0x3a4) from [<c000fdd8>] (cpu_idle+0xd8/0x134)
[ 9.428612] [<c000fdd8>] (cpu_idle+0xd8/0x134) from [<c0986818>] (start_kernel+0x27c/0x2cc)
[ 9.436952] Code: e1a00004 e3a04000 eb002265 eaffffe0 (e7f001f2)
[ 9.443038] ---[ end trace 1b75b31a2719ed24 ]---
[ 9.447645] Kernel panic - not syncing: Fatal exception in interrupt

Hiroshi Doyu (4):
ARM: dma-mapping: Refactor out to introduce __in_atomic_pool
ARM: dma-mapping: Use kzalloc() with GFP_ATOMIC
ARM: dma-mapping: Refactor out to introduce __alloc_fill_pages
ARM: dma-mapping: IOMMU allocates pages from atomic_pool with
GFP_ATOMIC

arch/arm/mm/dma-mapping.c | 102 ++++++++++++++++++++++++++++++++++-----------
1 files changed, 77 insertions(+), 25 deletions(-)

--
1.7.5.4

2012-08-23 06:10:56

by Hiroshi Doyu

[permalink] [raw]

Subject: [v2 1/4] ARM: dma-mapping: Refactor out to introduce __in_atomic_pool

Check the given range("start", "size") is included in "atomic_pool" or not.

Signed-off-by: Hiroshi Doyu <[email protected]>
---
arch/arm/mm/dma-mapping.c | 25 +++++++++++++++++++------
1 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 601da7a..aca2fd0 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -492,19 +492,32 @@ static void *__alloc_from_pool(size_t size, struct page **ret_page)
return ptr;
}

+static bool __in_atomic_pool(void *start, size_t size)
+{
+ struct dma_pool *pool = &atomic_pool;
+ void *end = start + size;
+ void *pool_start = pool->vaddr;
+ void *pool_end = pool->vaddr + pool->size;
+
+ if (start < pool_start || start > pool_end)
+ return false;
+
+ if (end > pool_end) {
+ WARN(1, "freeing wrong coherent size from pool\n");
+ return false;
+ }
+
+ return true;
+}
+
static int __free_from_pool(void *start, size_t size)
{
struct dma_pool *pool = &atomic_pool;
unsigned long pageno, count;
unsigned long flags;

- if (start < pool->vaddr || start > pool->vaddr + pool->size)
- return 0;
-
- if (start + size > pool->vaddr + pool->size) {
- WARN(1, "freeing wrong coherent size from pool\n");
+ if (!__in_atomic_pool(start, size))
return 0;
- }

pageno = (start - pool->vaddr) >> PAGE_SHIFT;
count = size >> PAGE_SHIFT;
--
1.7.5.4

2012-08-23 06:11:20

by Hiroshi Doyu

[permalink] [raw]

Subject: [v2 3/4] ARM: dma-mapping: Refactor out to introduce __alloc_fill_pages

__alloc_fill_pages() allocates power of 2 page number allocation at
most repeatedly.

Signed-off-by: Hiroshi Doyu <[email protected]>
---
arch/arm/mm/dma-mapping.c | 50 ++++++++++++++++++++++++++++++--------------
1 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index b64475a..7ab016b 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1022,20 +1022,11 @@ static inline void __free_iova(struct dma_iommu_mapping *mapping,
spin_unlock_irqrestore(&mapping->lock, flags);
}

-static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t gfp)
+static int __alloc_fill_pages(struct page ***ppages, int count, gfp_t gfp)
{
- struct page **pages;
- int count = size >> PAGE_SHIFT;
- int array_size = count * sizeof(struct page *);
+ struct page **pages = *ppages;
int i = 0;

- if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC))
- pages = kzalloc(array_size, gfp);
- else
- pages = vzalloc(array_size);
- if (!pages)
- return NULL;
-
while (count) {
int j, order = __fls(count);

@@ -1045,22 +1036,49 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t
if (!pages[i])
goto error;

- if (order)
+ if (order) {
split_page(pages[i], order);
- j = 1 << order;
- while (--j)
- pages[i + j] = pages[i] + j;
+ j = 1 << order;
+ while (--j)
+ pages[i + j] = pages[i] + j;
+ }

__dma_clear_buffer(pages[i], PAGE_SIZE << order);
i += 1 << order;
count -= 1 << order;
}

- return pages;
+ return 0;
+
error:
while (i--)
if (pages[i])
__free_pages(pages[i], 0);
+ return -ENOMEM;
+}
+
+static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
+ gfp_t gfp)
+{
+ struct page **pages;
+ int count = size >> PAGE_SHIFT;
+ int array_size = count * sizeof(struct page *);
+ int err;
+
+ if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC))
+ pages = kzalloc(array_size, gfp);
+ else
+ pages = vzalloc(array_size);
+ if (!pages)
+ return NULL;
+
+ err = __alloc_fill_pages(&pages, count, gfp);
+ if (err)
+ goto error
+
+ return pages;
+
+error:
if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC))
kfree(pages);
else
--
1.7.5.4

2012-08-23 06:11:23

by Hiroshi Doyu

[permalink] [raw]

Subject: [v2 2/4] ARM: dma-mapping: Use kzalloc() with GFP_ATOMIC

Use kzalloc() with GFP_ATOMIC instead of vzalloc(). At freeing,
__in_atomic_pool() checks if it comes from atomic_pool or not.

Signed-off-by: Hiroshi Doyu <[email protected]>
---
arch/arm/mm/dma-mapping.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index aca2fd0..b64475a 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1029,7 +1029,7 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t
int array_size = count * sizeof(struct page *);
int i = 0;

- if (array_size <= PAGE_SIZE)
+ if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC))
pages = kzalloc(array_size, gfp);
else
pages = vzalloc(array_size);
@@ -1061,7 +1061,7 @@ error:
while (i--)
if (pages[i])
__free_pages(pages[i], 0);
- if (array_size <= PAGE_SIZE)
+ if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC))
kfree(pages);
else
vfree(pages);
@@ -1076,7 +1076,8 @@ static int __iommu_free_buffer(struct device *dev, struct page **pages, size_t s
for (i = 0; i < count; i++)
if (pages[i])
__free_pages(pages[i], 0);
- if (array_size <= PAGE_SIZE)
+ if ((array_size <= PAGE_SIZE) ||
+ __in_atomic_pool(page_address(pages[0]), size))
kfree(pages);
else
vfree(pages);
--
1.7.5.4

2012-08-23 06:11:48

by Hiroshi Doyu

[permalink] [raw]

Subject: [v2 4/4] ARM: dma-mapping: IOMMU allocates pages from atomic_pool with GFP_ATOMIC

Makes use of the same atomic pool from DMA, and skips kernel page
mapping which can involve sleep'able operations at allocating a kernel
page table.

Signed-off-by: Hiroshi Doyu <[email protected]>
---
arch/arm/mm/dma-mapping.c | 30 +++++++++++++++++++++++++-----
1 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 7ab016b..433312a 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1063,7 +1063,6 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
struct page **pages;
int count = size >> PAGE_SHIFT;
int array_size = count * sizeof(struct page *);
- int err;

if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC))
pages = kzalloc(array_size, gfp);
@@ -1072,9 +1071,20 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
if (!pages)
return NULL;

- err = __alloc_fill_pages(&pages, count, gfp);
- if (err)
- goto error
+ if (gfp & GFP_ATOMIC) {
+ struct page *page;
+ int i;
+ void *addr = __alloc_from_pool(size, &page);
+ if (!addr)
+ goto error;
+
+ for (i = 0; i < count; i++)
+ pages[i] = page + i;
+ } else {
+ int err = __alloc_fill_pages(&pages, count, gfp);
+ if (err)
+ goto error;
+ }

return pages;

@@ -1091,9 +1101,15 @@ static int __iommu_free_buffer(struct device *dev, struct page **pages, size_t s
int count = size >> PAGE_SHIFT;
int array_size = count * sizeof(struct page *);
int i;
+
+ if (__free_from_pool(page_address(pages[0]), size))
+ goto out;
+
for (i = 0; i < count; i++)
if (pages[i])
__free_pages(pages[i], 0);
+
+out:
if ((array_size <= PAGE_SIZE) ||
__in_atomic_pool(page_address(pages[0]), size))
kfree(pages);
@@ -1221,6 +1237,9 @@ static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
if (*handle == DMA_ERROR_CODE)
goto err_buffer;

+ if (gfp & GFP_ATOMIC)
+ return page_address(pages[0]);
+
if (dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs))
return pages;

@@ -1279,7 +1298,8 @@ void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
return;
}

- if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs)) {
+ if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs) ||
+ !__in_atomic_pool(cpu_addr, size)) {
unmap_kernel_range((unsigned long)cpu_addr, size);
vunmap(cpu_addr);
}
--
1.7.5.4

2012-08-23 06:43:33

by Minchan Kim

[permalink] [raw]

Subject: Re: [v2 0/4] ARM: dma-mapping: IOMMU atomic allocation

On Thu, Aug 23, 2012 at 09:10:25AM +0300, Hiroshi Doyu wrote:
> Hi,
>
> The commit e9da6e9 "ARM: dma-mapping: remove custom consistent dma
> region" breaks the compatibility with existing drivers. This causes
> the following kernel oops(*1). That driver has called dma_pool_alloc()
> to allocate memory from the interrupt context, and it hits
> BUG_ON(in_interrpt()) in "get_vm_area_caller()". This patch seris
> fixes this problem with making use of the pre-allocate atomic memory
> pool which DMA is using in the same way as DMA does now.
>
> Any comment would be really appreciated.
>
> v2:
> Don't modify attrs(DMA_ATTR_NO_KERNEL_MAPPING) for atomic allocation. (Marek)
> Skip vzalloc (KyongHo, Minchan)

Huh? I would like to correct exactly. I didn't say that kzalloc unify is okay.
As KyongHo said, there are other usecases for allocating big buffer but
I can't agree his opinion that it's system memory shortage if the allocation
fails. Because there are lots of freeable pages(cached page + anon pages
with swap , shrinkable slab and so on). But as I said early, VM can't do
anything except relying on kswapd in atomic context while there are a ton
of freeable pages in system and we can't gaurantee kswapd will do something
for us before seeing the page allocation failure in process context.
Especially, UP is more problem. So, it never indicate system needs more memory.

Atomic high order allocation is very fragile in VM POV
so caller should have fallback mechanism. Otherwise, don't do that.

--
Kind regards,
Minchan Kim

2012-08-23 07:33:28

by Marek Szyprowski

[permalink] [raw]

Subject: RE: [v2 4/4] ARM: dma-mapping: IOMMU allocates pages from atomic_pool with GFP_ATOMIC

Hi Hiroshi,

On Thursday, August 23, 2012 8:10 AM Hiroshi Doyu wrote:

> Makes use of the same atomic pool from DMA, and skips kernel page
> mapping which can involve sleep'able operations at allocating a kernel
> page table.
>
> Signed-off-by: Hiroshi Doyu <[email protected]>
> ---
> arch/arm/mm/dma-mapping.c | 30 +++++++++++++++++++++++++-----
> 1 files changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 7ab016b..433312a 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1063,7 +1063,6 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t
> size,
> struct page **pages;
> int count = size >> PAGE_SHIFT;
> int array_size = count * sizeof(struct page *);
> - int err;
>
> if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC))
> pages = kzalloc(array_size, gfp);
> @@ -1072,9 +1071,20 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t
> size,
> if (!pages)
> return NULL;
>
> - err = __alloc_fill_pages(&pages, count, gfp);
> - if (err)
> - goto error
> + if (gfp & GFP_ATOMIC) {
> + struct page *page;
> + int i;
> + void *addr = __alloc_from_pool(size, &page);
> + if (!addr)
> + goto error;
> +
> + for (i = 0; i < count; i++)
> + pages[i] = page + i;
> + } else {
> + int err = __alloc_fill_pages(&pages, count, gfp);
> + if (err)
> + goto error;
> + }
>
> return pages;
>
> @@ -1091,9 +1101,15 @@ static int __iommu_free_buffer(struct device *dev, struct page **pages,
> size_t s
> int count = size >> PAGE_SHIFT;
> int array_size = count * sizeof(struct page *);
> int i;
> +
> + if (__free_from_pool(page_address(pages[0]), size))
> + goto out;
> +
> for (i = 0; i < count; i++)
> if (pages[i])
> __free_pages(pages[i], 0);
> +
> +out:
> if ((array_size <= PAGE_SIZE) ||
> __in_atomic_pool(page_address(pages[0]), size))
> kfree(pages);
> @@ -1221,6 +1237,9 @@ static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
> if (*handle == DMA_ERROR_CODE)
> goto err_buffer;
>
> + if (gfp & GFP_ATOMIC)
> + return page_address(pages[0]);
> +
> if (dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs))
> return pages;
>

I've read the whole code and it looks that it starts to become a little spaghetti - there are too
many places altered by those atomic changes and it is hard to follow the logic at the first sight.

IMHO it will be better to add a following function:

static void *__iommu_alloc_atomic(struct device *dev, size_t size,
dma_addr_t *handle)
{
struct page *page, **pages;
int count = size >> PAGE_SHIFT;
int array_size = count * sizeof(struct page *);
void *addr;
int i;

pages = kzalloc(array_size, gfp);
if (!pages)
return NULL;

addr = __alloc_from_pool(size, &page);
if (!addr)
goto err_pool;

for (i = 0; i < count; i++)
pages[i] = page + i;

*handle = __iommu_create_mapping(dev, pages, size);
if (*handle == DMA_ERROR_CODE)
goto err_mapping;

return addr;

err_mapping:
__free_from_pool(addr, size);
err_pool:
kfree(pages);
return NULL;
}

and then call it at the beginning of the arm_iommu_alloc_attrs():

if (gfp & GFP_ATOMIC)
return __iommu_alloc_atomic(dev, size, handle);

You should also add support for the allocation from atomic_pool to __iommu_get_pages() function
to get dma_mmap() and dma_get_sgtable() working correctly.

> @@ -1279,7 +1298,8 @@ void arm_iommu_free_attrs(struct device *dev, size_t size, void
> *cpu_addr,
> return;
> }
>
> - if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs)) {
> + if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs) ||
> + !__in_atomic_pool(cpu_addr, size)) {
> unmap_kernel_range((unsigned long)cpu_addr, size);
> vunmap(cpu_addr);
> }

Are you sure this one works? __iommu_get_pages() won't find pages array, because atomic
allocations don't get their separate vm_struct area, so this check will be never reached. Maybe
a __iommu_free_atomic() function would also make it safer and easier to understand.

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center