2007-06-21 09:32:16

by Nicolas Ferre

[permalink] [raw]
Subject: Oops in a driver while using SLUB as a SLAB allocator

Hello,

While debugging a Linux driver on my ARM platform
(AT91), I switched on the 2.6.22-rc5 kernel. While
reconfiguring it I selected CONFIG_SLUB as a SLAB allocator.

The sd/mmc driver I tried to run is vanilla driver
which never, until now, produces Oops (at91_mci.c).

The oops seems to occur after a page unmapping using
dma_unmap_page() followed by a flush_dcache_page() (in
at91mci_post_dma_read()).

Here is the oops (driver verbose output included to show
pages used for dma sg list) :

Starting kernel ...

Linux version 2.6.22-rc5 (user@location) (gcc version 3.4.2) #8 Thu Jun 21 11:05:18 CEST 2007
CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177
Machine: Atmel AT91SAM9263-EK
[..]
Memory: 64MB = 64MB total
Memory: 62012KB available (2492K code, 246K data, 116K init)
SLUB: Genslabs=23, HWalign=32, Order=0-1, MinObjects=4, CPUs=1, Nodes=1
[..]
Probe MCI devices
pre dma read
Using transfer index 0
sg = c3d0be68
dma address = 23C54968, length = 8
Nothing left to transfer (index = 1)
pre dma read done
setting ier to 00000040
MCI irq: status = 0000C0ED, C07F0040, 00000040
Receive has ended
post dma read
finishing index 0
Unmapping page 23C54968
buffer = c3c54968, length = 8
post dma read done
[^ this transfert is ok]
[..]
pre dma read
Using transfer index 0
sg = c3d0be68
dma address = 203AFBC0, length = 64
Nothing left to transfer (index = 1)
pre dma read done
setting ier to 00000040
MCI irq: status = 0000C0ED, C07F0040, 00000040
Receive has ended
post dma read
finishing index 0
Unmapping page 203AFBC0
buffer = c03afbc0, length = 64
Unable to handle kernel NULL pointer dereference at virtual address 0000000d
pgd = c0004000
[0000000d] *pgd=00000000
Internal error: Oops: 1 [#1]
Modules linked in:
CPU: 0
PC is at get_index+0x1c/0x50
LR is at prio_tree_next+0x60/0x194
pc : [<c01098ac>] lr : [<c0109fc4>] Not tainted
sp : c3d0bca8 ip : ffffffdd fp : c3d0bcb4
r10: 00000040 r9 : c3d0be78 r8 : c3d0bcc4
r7 : c3d0bcc0 r6 : 00000000 r5 : c029635c r4 : c3d0bcfc
r3 : c3d0bcc0 r2 : c3d0bcc4 r1 : 00000001 r0 : c3d0bcc0
Flags: nZCv IRQs off FIQs on Mode SVC_32 Segment kernel
Control: 5317F
Table: 20004000 DAC: 00000017
Process kmmcd (pid: 761, stack limit = 0xc3d0a258)
Stack: (0xc3d0bca8 to 0xc3d0c000)
bca0: c3d0bce8 c3d0bcb8 c0109fc4 c01098a0 c003efc0 c003ed50
bcc0: c3d0bcec c3d0bcd0 00000000 00000000 c0298594 c3d0be68 c3d649e0 c3d0bcf8
bce0: c3d0bcec c0067734 c0109f74 c3d0bd30 c3d0bcfc c002bd0c c0067728 00000000
bd00: 00000000 00000000 00000000 c029635c 00000000 00000000 c3d0a000 c3d0be68
bd20: 00000000 c3d0bd64 c3d0bd34 c017df00 c002bc14 ffffffff c005068c c3d6ce60
bd40: 00000000 00000000 0000000b 0000002c c3d0bea4 c3d0be78 c3d0bd84 c3d0bd68
bd60: c005a99c c017dd00 c029c4bc 0000000b c02c5f38 00000000 c3d0bd9c c3d0bd88
bd80: c005bd04 c005a968 0000000b c029c4bc c3d0bdbc c3d0bda0 c0025048 c005bc68
bda0: ffffffff fefff000 0000000b 00000001 c3d0be14 c3d0bdc0 c0025a84 c0025010
bdc0: 0000001b 00000001 c4880000 c07f0040 c3d0bed0 c3d64800 00000000 00000001
bde0: 0000002c c3d0bea4 c3d0be78 c3d0be14 c029ae50 c3d0be08 c029ae50 c017db6c
be00: 60000013 ffffffff c3d0be24 c3d0be18 c017dba0 c017db28 c3d0be40 c3d0be28
be20: c0179890 c017db90 00000000 c3d0be44 00000000 c3d0be64 c3d0be44 c01798f0
be40: c01797a8 00000000 c3d0be48 c3d0be48 00000000 c3c54800 c3d0bf0c c3d0be68
be60: c017c744 c01798c8 c02da5e0 00000bc0 203afbc0 00000040 05f5e100 00000000
be80: 00000040 00000001 00000000 00000200 00000040 00000000 c3d0bed0 00000001
bea0: c3d0be68 00000006 00fffff1 00000000 00000000 00000000 00000000 00000035
bec0: 00000000 00000000 c3d0be78 c3d0bed0 c3d0bea4 c3d0be78 00000000 c3d0be44
bee0: c01798a0 c03afbc0 c3c54800 c3c54958 c3d64800 c3c54948 00000000 00000000
bf00: c3d0bf58 c3d0bf10 c017be4c c017c61c c03afbc0 00000000 00000000 02250000
bf20: 00000000 03534453 44303147 8030b855 ff006ad3 c3d0bf5c c3d6496c c3d64800
bf40: 00000000 00000000 00000000 c3d0bf7c c3d0bf5c c017a080 c017b908 00ff8000
bf60: c3d6cde0 c0179f34 c3c4f280 c3d6cde0 c3d0bf94 c3d0bf80 c00499c0 c0179f44
bf80: c3d6cde8 c3d0bfac c3d0bfdc c3d0bf98 c0049b24 c0049918 00000000 c3c4f280
bfa0: c004d938 c3d0bfb8 c3d0bfb8 00000000 c3c4f280 c004d938 c3d0bfb8 c3d0bfb8
bfc0: fffffffc c0049a54 00000000 00000000 c3d0bff4 c3d0bfe0 c004d444 c0049a64
bfe0: 00000000 00000000 00000000 c3d0bff8 c003c3f4 c004d400 00000000 00000000
Backtrace:
[<c0109890>] (get_index+0x0/0x50) from [<c0109fc4>] (prio_tree_next+0x60/0x194)
[<c0109f64>] (prio_tree_next+0x0/0x194) from [<c0067734>] (vma_prio_tree_next+0x1c/0x88)
r8:c3d649e0 r7:c3d0be68 r6:c0298594 r5:00000000 r4:00000000
[<c0067718>] (vma_prio_tree_next+0x0/0x88) from [<c002bd0c>] (flush_dcache_page+0x108/0x124)
[<c002bc04>] (flush_dcache_page+0x0/0x124) from [<c017df00>] (at91_mci_irq+0x210/0x45c)
r6:00000000 r5:c3d0be68 r4:c3d0a000
[<c017dcf0>] (at91_mci_irq+0x0/0x45c) from [<c005a99c>] (handle_IRQ_event+0x44/0x80)
[<c005a958>] (handle_IRQ_event+0x0/0x80) from [<c005bd04>] (handle_level_irq+0xac/0x104)
r7:00000000 r6:c02c5f38 r5:0000000b r4:c029c4bc
[<c005bc58>] (handle_level_irq+0x0/0x104) from [<c0025048>] (asm_do_IRQ+0x48/0x78)
r5:c029c4bc r4:0000000b
[<c0025000>] (asm_do_IRQ+0x0/0x78) from [<c0025a84>] (__irq_svc+0x24/0x60)
Exception stack(0xc3d0bdc0 to 0xc3d0be08)
bdc0: 0000001b 00000001 c4880000 c07f0040 c3d0bed0 c3d64800 00000000 00000001
bde0: 0000002c c3d0bea4 c3d0be78 c3d0be14 c029ae50 c3d0be08 c029ae50 c017db6c
be00: 60000013 ffffffff
r7:00000001 r6:0000000b r5:fefff000 r4:ffffffff
[<c017db18>] (at91mci_process_next+0x0/0x68) from [<c017dba0>] (at91_mci_request+0x20/0x24)
[<c017db80>] (at91_mci_request+0x0/0x24) from [<c0179890>] (mmc_start_request+0xf8/0x108)
[<c0179798>] (mmc_start_request+0x0/0x108) from [<c01798f0>] (mmc_wait_for_req+0x38/0x50)
r6:00000000 r5:c3d0be44 r4:00000000
[<c01798b8>] (mmc_wait_for_req+0x0/0x50) from [<c017c744>] (mmc_sd_switch+0x138/0x160)
r5:c3c54800 r4:00000000
[<c017c60c>] (mmc_sd_switch+0x0/0x160) from [<c017be4c>] (mmc_attach_sd+0x554/0x7e4)
[<c017b8f8>] (mmc_attach_sd+0x0/0x7e4) from [<c017a080>] (mmc_rescan+0x14c/0x20c)
[<c0179f34>] (mmc_rescan+0x0/0x20c) from [<c00499c0>] (run_workqueue+0xb8/0x14c)
r7:c3d6cde0 r6:c3c4f280 r5:c0179f34 r4:c3d6cde0
[<c0049908>] (run_workqueue+0x0/0x14c) from [<c0049b24>] (worker_thread+0xd0/0xe4)
r5:c3d0bfac r4:c3d6cde8
[<c0049a54>] (worker_thread+0x0/0xe4) from [<c004d444>] (kthread+0x54/0x7c)
r7:00000000 r6:00000000 r5:c0049a54 r4:fffffffc
[<c004d3f0>] (kthread+0x0/0x7c) from [<c003c3f4>] (do_exit+0x0/0x75c)
r5:00000000 r4:00000000
Code: e1d000b6 e241c024 e3500000 e1a00003 (0591300c)
Kernel panic - not syncing: Aiee, killing interrupt handler!


Note that when using CONFIG_SLAB the dma mapping adresses are :

dma address = 0x23D8AB68, length = 8
dma address = 0x23D453A0, length = 64
dma address = 0x23D30000, length = 4096
for instance.

and the driver behaves OK (like it behaves until now).

Do you find a reason why I cannot use SLUB ? Did I missed
something or do I use a bad dma_xx or cache flush call in
the driver ?

Thanks, regards,
--
Nicolas Ferre



2007-06-21 15:14:08

by Marc Pignat

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

please use this patch, sorry for the later

Regards

Marc

--- drivers/mmc/host/at91_mci.c-2.6.22-rc5.orig 2007-06-21 16:27:31.000000000 +0200
+++ drivers/mmc/host/at91_mci.c-2.6.22-rc5 2007-06-21 16:42:48.000000000 +0200
@@ -164,7 +164,7 @@ static inline void at91mci_sg_to_dma(str
else
memcpy(dmabuf, sgbuffer, amount);

- kunmap_atomic(sgbuffer, KM_BIO_SRC_IRQ);
+ kunmap_atomic(sgbuffer - sg->offset, KM_BIO_SRC_IRQ);

if (size == 0)
break;
@@ -293,7 +293,7 @@ static void at91mci_post_dma_read(struct
buffer[index] = swab32(buffer[index]);
}

- kunmap_atomic(buffer, KM_BIO_SRC_IRQ);
+ kunmap_atomic(buffer - sg->offset, KM_BIO_SRC_IRQ);
flush_dcache_page(sg->page);
}

2007-06-21 15:14:24

by Marc Pignat

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

Hello Nicolas!

Good news!

I think I've found the problem, this seems to work (tested with SLUB and SLAB).

If you're in the cleanup stage, I think the whole kmap and kunmap can be in the 'if (cpu_is_at91rm9200())',
we have no reason to kmap data we don't touch :-D

Regards

Marc

--- drivers/mmc/host/at91_mci.c-2.6.22-rc5.orig 2007-06-21 16:27:31.000000000 +0200
+++ drivers/mmc/host/at91_mci.c-2.6.22-rc5 2007-06-21 16:42:48.000000000 +0200
@@ -164,7 +164,7 @@ static inline void at91mci_sg_to_dma(str
else
memcpy(dmabuf, sgbuffer, amount);

- kunmap_atomic(sgbuffer, KM_BIO_SRC_IRQ);
+ kunmap_atomic(sgbuffer - sg->offset, KM_BIO_SRC_IRQ);

if (size == 0)
break;
@@ -293,7 +293,7 @@ static void at91mci_post_dma_read(struct
buffer[index] = swab32(buffer[index]);
}

- kunmap_atomic(buffer, KM_BIO_SRC_IRQ);
+ kunmap_atomic(buffer - sg->buffer, KM_BIO_SRC_IRQ);
flush_dcache_page(sg->page);
}

2007-06-21 15:55:44

by Nicolas Ferre

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

Marc Pignat :
> please use this patch, sorry for the later

My eyes are too tired or this patch is the same as the previous one :-\

Cheers,
--
Nicolas Ferre


2007-06-21 22:28:35

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Thu, 21 Jun 2007, Nicolas Ferre wrote:
>
> While debugging a Linux driver on my ARM platform (AT91), I switched on the
> 2.6.22-rc5 kernel. While reconfiguring it I selected CONFIG_SLUB as a SLAB
> allocator.
>
> The sd/mmc driver I tried to run is vanilla driver which never, until now,
> produces Oops (at91_mci.c).
>
> The oops seems to occur after a page unmapping using dma_unmap_page() followed
> by a flush_dcache_page() (in at91mci_post_dma_read()).
...
> Unable to handle kernel NULL pointer dereference at virtual address 0000000d
> pgd = c0004000
> [0000000d] *pgd=00000000
> Internal error: Oops: 1 [#1]
> Modules linked in:
> CPU: 0
> PC is at get_index+0x1c/0x50
> LR is at prio_tree_next+0x60/0x194
> pc : [<c01098ac>] lr : [<c0109fc4>] Not tainted
> sp : c3d0bca8 ip : ffffffdd fp : c3d0bcb4
> r10: 00000040 r9 : c3d0be78 r8 : c3d0bcc4
> r7 : c3d0bcc0 r6 : 00000000 r5 : c029635c r4 : c3d0bcfc
> r3 : c3d0bcc0 r2 : c3d0bcc4 r1 : 00000001 r0 : c3d0bcc0
> Flags: nZCv IRQs off FIQs on Mode SVC_32 Segment kernel
> Control: 5317F
> Table: 20004000 DAC: 00000017
> Process kmmcd (pid: 761, stack limit = 0xc3d0a258)
...
> Backtrace:
> [<c0109890>] (get_index+0x0/0x50) from [<c0109fc4>]
> (prio_tree_next+0x60/0x194)
> [<c0109f64>] (prio_tree_next+0x0/0x194) from [<c0067734>]
> (vma_prio_tree_next+0x1c/0x88)
> r8:c3d649e0 r7:c3d0be68 r6:c0298594 r5:00000000 r4:00000000
> [<c0067718>] (vma_prio_tree_next+0x0/0x88) from [<c002bd0c>]
> (flush_dcache_page+0x108/0x124)
> [<c002bc04>] (flush_dcache_page+0x0/0x124) from [<c017df00>]
> (at91_mci_irq+0x210/0x45c)
> r6:00000000 r5:c3d0be68 r4:c3d0a000
> [<c017dcf0>] (at91_mci_irq+0x0/0x45c) from [<c005a99c>]
> (handle_IRQ_event+0x44/0x80)
> [<c005a958>] (handle_IRQ_event+0x0/0x80) from [<c005bd04>]
...
>
> Do you find a reason why I cannot use SLUB ? Did I missed something or do I
> use a bad dma_xx or cache flush call in the driver ?

That looks serious: thanks a lot for reporting it.

(I see Marc has sent you a patch or two for the driver end:
I didn't see their relevance, maybe they skirt around the
problem somehow; but unless I'm mistaken, what you've found
goes beyond that particular driver.)

Seems a little odd that it's gone throughout 2.6.22-rc unnoticed
until now - nobody else trying SLUB on ARM or PA-RISC yet perhaps.
As I understand it, you're not doing anything wrong (disclaimer:
I'm no expert on dma_mapping things), but SLUB's reuse of struct
page fields has collided with what flush_dcache_page is expecting.

Here's a patch: I'm not convinced it's necessarily the best one
(most uses of page_mapping will never see a slab page, it's a pity
to be cluttering up that inline even further); but in case nobody
else can provide a better...


[PATCH] page_mapping must avoid slub pages

Nicolas Ferre reports oops from flush_dcache_page() on ARM when using
SLUB: which reuses page->mapping as page->slab. The page_mapping()
function, used by ARM and PA-RISC flush_dcache_page() implementations,
must not confuse SLUB pages with those which have page->mapping set.

Signed-off-by: Hugh Dickins <[email protected]>
---
That #ifdef I've put in there is not essential:
it's perhaps more of a comment or an accusation ;)

include/linux/mm.h | 4 ++++
1 file changed, 4 insertions(+)

--- 2.6.22-rc5/include/linux/mm.h 2007-05-26 07:16:48.000000000 +0100
+++ linux/include/linux/mm.h 2007-06-21 15:52:47.000000000 +0100
@@ -603,6 +603,10 @@ static inline struct address_space *page

if (unlikely(PageSwapCache(page)))
mapping = &swapper_space;
+#ifdef CONFIG_SLUB
+ else if (unlikely(PageSlab(page)))
+ mapping = NULL;
+#endif
else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON))
mapping = NULL;
return mapping;

2007-06-22 01:01:44

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Thu, 21 Jun 2007, Hugh Dickins wrote:

> > The oops seems to occur after a page unmapping using dma_unmap_page() followed
> > by a flush_dcache_page() (in at91mci_post_dma_read()).

Was the page allocated using slab calls?

> Seems a little odd that it's gone throughout 2.6.22-rc unnoticed
> until now - nobody else trying SLUB on ARM or PA-RISC yet perhaps.
> As I understand it, you're not doing anything wrong (disclaimer:
> I'm no expert on dma_mapping things), but SLUB's reuse of struct
> page fields has collided with what flush_dcache_page is expecting.
>
> Here's a patch: I'm not convinced it's necessarily the best one
> (most uses of page_mapping will never see a slab page, it's a pity
> to be cluttering up that inline even further); but in case nobody
> else can provide a better...

Well one may be better off allocating pages using the page allocator
instead of the slab allocator. I removed these things from i386 but I did
not check ARM.

2007-06-22 01:36:31

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

Maybe this will address the issue on ARM?


ARM: Allocate dma pages via the page allocator and not via the slab allocator

Slab allocations are not guaranteed to be page aligned and slab allocators
may use the page structs for their own purposes. Using the page allocator
yields a properly aligned page and also makes the page flushing logic work right.

Passing a kmalloced "page" to a flushing function will not work reliably.

This will hopefully address the issue with SLUB on ARM. SLUB uses the
page->mapping field which is also checked by the flushing logic. The
flushing logic expects a normal page and not a slab page.

Signed-off-by: Christoph Lameter <[email protected]>

---
arch/arm/mm/consistent.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/arm/mm/consistent.c
===================================================================
--- linux-2.6.orig/arch/arm/mm/consistent.c 2007-06-21 18:18:15.000000000 -0700
+++ linux-2.6/arch/arm/mm/consistent.c 2007-06-21 18:29:16.000000000 -0700
@@ -277,7 +277,7 @@ dma_alloc_coherent(struct device *dev, s
if (arch_is_coherent()) {
void *virt;

- virt = kmalloc(size, gfp);
+ virt = get_free_pages(gfp, get_order(size));
if (!virt)
return NULL;
*handle = virt_to_dma(dev, virt);
@@ -364,7 +364,7 @@ void dma_free_coherent(struct device *de
WARN_ON(irqs_disabled());

if (arch_is_coherent()) {
- kfree(cpu_addr);
+ free_pages((unsigned long)cpu_addr, get_order(size));
return;
}

2007-06-22 01:42:05

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Thu, 21 Jun 2007, Hugh Dickins wrote:

> Seems a little odd that it's gone throughout 2.6.22-rc unnoticed
> until now - nobody else trying SLUB on ARM or PA-RISC yet perhaps.

The impact is only on a subset of ARM machines.

PA_RISC? It looks like they run their own flushing function for byte
ranges called flush_kernel_dache_range. That does not use the page struct.


2007-06-22 04:27:23

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Thu, 21 Jun 2007, Christoph Lameter wrote:
> On Thu, 21 Jun 2007, Hugh Dickins wrote:
>
> > > The oops seems to occur after a page unmapping using dma_unmap_page() followed
> > > by a flush_dcache_page() (in at91mci_post_dma_read()).
>
> Was the page allocated using slab calls?

You've found yes (in the ARM case).

> Well one may be better off allocating pages using the page allocator
> instead of the slab allocator. I removed these things from i386 but I did
> not check ARM.

They may or may not be: I think that's a matter to discuss with rmk.

You keep on forcing the outside world to revolve around your needs
within slub.c: that is a good way to keep slub lean, and may be
justified; but it's at least questionable to be enforcing such
restrictions years after people have grown accustomed to more
freedom from their slabs.

Hugh

2007-06-22 04:58:18

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Thu, 21 Jun 2007, Christoph Lameter wrote:

> Maybe this will address the issue on ARM?

Looks like it would indeed address the immediate issue on ARM -
IF they've no particular reason to be using kmalloc there.

However... what gives you confidence that flush_dcache_page is
never applied to other slab pages?

This looks to me like a clean way forward to try in -mm; but that
2.6.22 should go with the safety PageSlab test in page_mapping.

Hugh

>
>
> ARM: Allocate dma pages via the page allocator and not via the slab allocator
>
> Slab allocations are not guaranteed to be page aligned and slab allocators
> may use the page structs for their own purposes. Using the page allocator
> yields a properly aligned page and also makes the page flushing logic work right.
>
> Passing a kmalloced "page" to a flushing function will not work reliably.
>
> This will hopefully address the issue with SLUB on ARM. SLUB uses the
> page->mapping field which is also checked by the flushing logic. The
> flushing logic expects a normal page and not a slab page.
>
> Signed-off-by: Christoph Lameter <[email protected]>
>
> ---
> arch/arm/mm/consistent.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/arch/arm/mm/consistent.c
> ===================================================================
> --- linux-2.6.orig/arch/arm/mm/consistent.c 2007-06-21 18:18:15.000000000 -0700
> +++ linux-2.6/arch/arm/mm/consistent.c 2007-06-21 18:29:16.000000000 -0700
> @@ -277,7 +277,7 @@ dma_alloc_coherent(struct device *dev, s
> if (arch_is_coherent()) {
> void *virt;
>
> - virt = kmalloc(size, gfp);
> + virt = get_free_pages(gfp, get_order(size));
> if (!virt)
> return NULL;
> *handle = virt_to_dma(dev, virt);
> @@ -364,7 +364,7 @@ void dma_free_coherent(struct device *de
> WARN_ON(irqs_disabled());
>
> if (arch_is_coherent()) {
> - kfree(cpu_addr);
> + free_pages((unsigned long)cpu_addr, get_order(size));
> return;
> }
>
>

2007-06-22 05:11:15

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, 22 Jun 2007, Hugh Dickins wrote:

> However... what gives you confidence that flush_dcache_page is
> never applied to other slab pages?

Flush dcache page is supposed to run on pages not objects of varying
length. It is suprising that this has not lead to earlier problems.
Objects allocated this way may straddle a page boundary under some
conditions and in that case virt_to_page may not lead to a page that
covers the complete object that is supposed to be flushed. Hopefully the
"size" of the allocated object were whole pages.

> This looks to me like a clean way forward to try in -mm; but that
> 2.6.22 should go with the safety PageSlab test in page_mapping.

2.6.22 cleans up these issues and this one follows in those footsteps.

The reason for the introduction of the quicklist f.e. was to have a clear
separation between page sized allocation and the variable allocations
through slab allocators.

2007-06-22 05:13:16

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, 22 Jun 2007, Hugh Dickins wrote:

> You keep on forcing the outside world to revolve around your needs
> within slub.c: that is a good way to keep slub lean, and may be
> justified; but it's at least questionable to be enforcing such
> restrictions years after people have grown accustomed to more
> freedom from their slabs.

This work is a cleanup of the VM code and part of the slab cleanup that
was done. Having slab objects on the LRU and other components of the VM
that are supposed to work on page sized objects is plainly wrong and can
lead to surprising results.


2007-06-22 05:13:42

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Thu, 21 Jun 2007, Christoph Lameter wrote:
> On Thu, 21 Jun 2007, Hugh Dickins wrote:
>
> > Seems a little odd that it's gone throughout 2.6.22-rc unnoticed
> > until now - nobody else trying SLUB on ARM or PA-RISC yet perhaps.
>
> The impact is only on a subset of ARM machines.
>
> PA_RISC? It looks like they run their own flushing function for byte
> ranges called flush_kernel_dache_range. That does not use the page struct.

PA-RISC does have a function of that name, and I'm guessing that
you came across it in looking at the PA-RISC dma_map_single.

But PA-RISC also has a function called flush_dcache_page, which uses
page_mapping and expects a struct address_space * from it. If that
can ever be get applied to a SLOB page (which is not so clear as in
the ARM case, but cannot easily be ruled out completely), we're
in trouble without a PageSlab test within page_mapping.

Hugh

2007-06-22 05:31:57

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, 22 Jun 2007, Hugh Dickins wrote:

> But PA-RISC also has a function called flush_dcache_page, which uses
> page_mapping and expects a struct address_space * from it. If that
> can ever be get applied to a SLOB page (which is not so clear as in
> the ARM case, but cannot easily be ruled out completely), we're
> in trouble without a PageSlab test within page_mapping.

A page comes from the page allocator. Access to a slab allocators "page"
is an access to subsystem internals. Those internals vary
(including the representation of objects in the "page") and depend on the
slab allocator selected, the debug options in effect and parameters
with which the slab was created etc etc.

If flush_dcache_page is applied to a slab object then we need to do a
similar change to PA-RISC as for ARM.

2007-06-22 05:37:53

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Thu, 21 Jun 2007, Christoph Lameter wrote:
> On Fri, 22 Jun 2007, Hugh Dickins wrote:
>
> > However... what gives you confidence that flush_dcache_page is
> > never applied to other slab pages?
>
> Flush dcache page is supposed to run on pages not objects of varying
> length. It is suprising that this has not lead to earlier problems.
> Objects allocated this way may straddle a page boundary under some
> conditions and in that case virt_to_page may not lead to a page that
> covers the complete object that is supposed to be flushed. Hopefully the
> "size" of the allocated object were whole pages.

No, that's the wrong way round. Neither ARM nor PA-RISC expects
flush_dcache_page to flush any dcache when given a slab allocation:
they just expect it to pass through, not to oops.

Hugh

2007-06-22 06:29:49

by Marc Pignat

[permalink] [raw]
Subject: [PATCH] mmc-atmel : fix kunmap wrong usage

from: Marc Pignat <[email protected]>

kunmap must be called on the pointer returned by kmap.

Signed-off-by: Marc Pignat <[email protected]>

---

N.B: This is the same patch as yesterday, with proper Signed-off-by and more
comments.

The buffer variable is used this way:
buffer = kmap_atomic(sg->page, KM_BIO_SRC_IRQ) + sg->offset;
So we can't do
kunmap_atomic(buffer, KM_BIO_SRC_IRQ);
we must do
kunmap_atomic(buffer - sg->offset, KM_BIO_SRC_IRQ);

Strangely this misuse showed no problem using CONFIG_SLAB, but oops using
CONFIG_SLUB, (lkml ''Oops in a driver while using SLUB as a SLAB allocator").

Regards

Marc

--- drivers/mmc/host/at91_mci.c-2.6.22-rc5.orig 2007-06-21 16:27:31.000000000 +0200
+++ drivers/mmc/host/at91_mci.c-2.6.22-rc5 2007-06-21 16:42:48.000000000 +0200
@@ -164,7 +164,7 @@ static inline void at91mci_sg_to_dma(str
else
memcpy(dmabuf, sgbuffer, amount);

- kunmap_atomic(sgbuffer, KM_BIO_SRC_IRQ);
+ kunmap_atomic(sgbuffer - sg->offset, KM_BIO_SRC_IRQ);

if (size == 0)
break;
@@ -293,7 +293,7 @@ static void at91mci_post_dma_read(struct
buffer[index] = swab32(buffer[index]);
}

- kunmap_atomic(buffer, KM_BIO_SRC_IRQ);
+ kunmap_atomic(buffer - sg->offset, KM_BIO_SRC_IRQ);
flush_dcache_page(sg->page);
}

2007-06-22 07:01:35

by Russell King

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, Jun 22, 2007 at 05:26:33AM +0100, Hugh Dickins wrote:
> On Thu, 21 Jun 2007, Christoph Lameter wrote:
> > On Thu, 21 Jun 2007, Hugh Dickins wrote:
> >
> > > > The oops seems to occur after a page unmapping using dma_unmap_page() followed
> > > > by a flush_dcache_page() (in at91mci_post_dma_read()).
> >
> > Was the page allocated using slab calls?
>
> You've found yes (in the ARM case).
>
> > Well one may be better off allocating pages using the page allocator
> > instead of the slab allocator. I removed these things from i386 but I did
> > not check ARM.
>
> They may or may not be: I think that's a matter to discuss with rmk.

The coherent case on ARM is broken in more ways, not only because
it uses kmalloc, but it also takes no notice of the DMA mask.

However, AT91 isn't a coherent ARM architecture, so arch_is_coherent()
should be false. Therefore, we should never be allocating pages for
DMA from SLAB/SLUB for AT91 platforms.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:

2007-06-22 09:10:15

by Nicolas Ferre

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

Nicolas Ferre :
> Marc Pignat :
>> please use this patch, sorry for the later
>
> My eyes are too tired or this patch is the same as the previous one :-\

Indeed, my eyes where too tired ;-) Sorry for the trouble.

--
Nicolas Ferre


2007-06-22 12:28:55

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] mmc-atmel : fix kunmap wrong usage

On Fri, 22 Jun 2007, Marc Pignat wrote:
> from: Marc Pignat <[email protected]>
>
> kunmap must be called on the pointer returned by kmap.

Not necessarily: an offset within the same page is acceptable.

>
> Signed-off-by: Marc Pignat <[email protected]>
>
> ---
>
> N.B: This is the same patch as yesterday, with proper Signed-off-by and more
> comments.
>
> The buffer variable is used this way:
> buffer = kmap_atomic(sg->page, KM_BIO_SRC_IRQ) + sg->offset;
> So we can't do
> kunmap_atomic(buffer, KM_BIO_SRC_IRQ);
> we must do
> kunmap_atomic(buffer - sg->offset, KM_BIO_SRC_IRQ);
>
> Strangely this misuse showed no problem using CONFIG_SLAB, but oops using
> CONFIG_SLUB, (lkml ''Oops in a driver while using SLUB as a SLAB allocator").

Aren't you just guessing there? Those kunmap_atomics in at91_mci.c
may look wrong to you, but they're not incorrect (so long as sg->offset
falls within the page, as it must do here to make sense of the page).
Especially not on ARM, where kunmap_atomic actually has no interest
in the argument passed. And the oops was in the flush_dcache_page.

If you actually reproduced Nicolas' problem on ARM, and verified
that your patch then fixes it, please let us know: that will be
remarkably interesting.

I believe I posted the correct fix last night (or at least a safe fix
for now: Christoph Lameter may prefer to undo it and change ARM's
dma_mapping at leisure later), checking PageSlab in page_mapping;
and Linus has already put that one into his git tree.

But it would be good to hear from Nicolas whether that indeed fixes
his oops: I couldn't actually try my patch on ARM either.

Hugh

>
> Regards
>
> Marc
>
> --- drivers/mmc/host/at91_mci.c-2.6.22-rc5.orig 2007-06-21 16:27:31.000000000 +0200
> +++ drivers/mmc/host/at91_mci.c-2.6.22-rc5 2007-06-21 16:42:48.000000000 +0200
> @@ -164,7 +164,7 @@ static inline void at91mci_sg_to_dma(str
> else
> memcpy(dmabuf, sgbuffer, amount);
>
> - kunmap_atomic(sgbuffer, KM_BIO_SRC_IRQ);
> + kunmap_atomic(sgbuffer - sg->offset, KM_BIO_SRC_IRQ);
>
> if (size == 0)
> break;
> @@ -293,7 +293,7 @@ static void at91mci_post_dma_read(struct
> buffer[index] = swab32(buffer[index]);
> }
>
> - kunmap_atomic(buffer, KM_BIO_SRC_IRQ);
> + kunmap_atomic(buffer - sg->offset, KM_BIO_SRC_IRQ);
> flush_dcache_page(sg->page);

2007-06-22 13:34:57

by Nicolas Ferre

[permalink] [raw]
Subject: Re: [PATCH] mmc-atmel : fix kunmap wrong usage

Hugh Dickins :
> Aren't you just guessing there? Those kunmap_atomics in at91_mci.c
> may look wrong to you, but they're not incorrect (so long as sg->offset
> falls within the page, as it must do here to make sense of the page).
> Especially not on ARM, where kunmap_atomic actually has no interest
> in the argument passed. And the oops was in the flush_dcache_page.
>
> If you actually reproduced Nicolas' problem on ARM, and verified
> that your patch then fixes it, please let us know: that will be
> remarkably interesting.

Patch tested without success. Indeed, I always see the Oops with
Marc's patch.

> I believe I posted the correct fix last night (or at least a safe fix
> for now: Christoph Lameter may prefer to undo it and change ARM's
> dma_mapping at leisure later), checking PageSlab in page_mapping;
> and Linus has already put that one into his git tree.

Here:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b9bae3402572dc50a1e084c5b1ae5117918ef0f0

> But it would be good to hear from Nicolas whether that indeed fixes
> his oops: I couldn't actually try my patch on ARM either.

Ok tested on Atmel ARM AT91 (at91sam9263), this fixes this oops.

Thanks a lot to all of you for your precise help.

Regards,
--
Nicolas Ferre


2007-06-22 14:16:08

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] mmc-atmel : fix kunmap wrong usage

On Fri, 22 Jun 2007, Nicolas Ferre wrote:
>
> Here:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b9bae3402572dc50a1e084c5b1ae5117918ef0f0

Yup.

> > But it would be good to hear from Nicolas whether that indeed fixes
> > his oops: I couldn't actually try my patch on ARM either.
>
> Ok tested on Atmel ARM AT91 (at91sam9263), this fixes this oops.

Great.

>
> Thanks a lot to all of you for your precise help.

Not at all, thank you for reporting and testing and reporting back,
in time for 2.6.22: that's all more valuable than the easy fix.

Hugh

2007-06-22 14:23:36

by Marc Pignat

[permalink] [raw]
Subject: Re: [PATCH] mmc-atmel : fix kunmap wrong usage

On Friday 22 June 2007 15:34, Nicolas Ferre wrote:
> Hugh Dickins :
> > Aren't you just guessing there? Those kunmap_atomics in at91_mci.c
> > may look wrong to you, but they're not incorrect (so long as sg->offset
> > falls within the page, as it must do here to make sense of the page).
> > Especially not on ARM, where kunmap_atomic actually has no interest
> > in the argument passed. And the oops was in the flush_dcache_page.
> >
> > If you actually reproduced Nicolas' problem on ARM, and verified
> > that your patch then fixes it, please let us know: that will be
> > remarkably interesting.
>
> Patch tested without success. Indeed, I always see the Oops with
> Marc's patch.
So, it's really interesting, it worked really for me (oops without patch) and
no oops with it. My hardware is a custom board with an at91rm9200.

I had a look the the kunmap_atomic function, and I _really_ don't understand
how this patch can do something :-/

And last but not least, my patch is completly wrong...
void *kmap_atomic(...);
unsigned int *buffer = kmap_atomic(...) + sg->offset; // addition in u8*
kunmap_atomic(buffer - sg->offset); // <- sub in u32*

-> please forget my patch

Regards

Marc


2007-06-22 14:59:17

by Marc Pignat

[permalink] [raw]
Subject: Re: [PATCH] mmc-atmel : fix kunmap wrong usage

On Friday 22 June 2007 15:34, Nicolas Ferre wrote:
> Hugh Dickins :
> Here:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b9bae3402572dc50a1e084c5b1ae5117918ef0f0
>
> > But it would be good to hear from Nicolas whether that indeed fixes
> > his oops: I couldn't actually try my patch on ARM either.
>
> Ok tested on Atmel ARM AT91 (at91sam9263), this fixes this oops.
>
Just tested and working on my atmel at91rm9200.
(at91sam9263 core is ARM926EJ-S and at91rm9200 core is ARM920T).

Thanks!

Regards

Marc

2007-06-22 16:43:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator



On Fri, 22 Jun 2007, Hugh Dickins wrote:
>
> On Thu, 21 Jun 2007, Christoph Lameter wrote:
>
> > Maybe this will address the issue on ARM?
>
> Looks like it would indeed address the immediate issue on ARM -
> IF they've no particular reason to be using kmalloc there.

I think the right thing to do is do both of these things. I already
applied Hugh's patch - it seemed like a total nobrainer to do at this
stage in the 2.6.22 -rc series. But that doesn't mean that we should not
_also_ look at "flush_dcache_page()" users.

I do think that even just the name (the ".._page()" part) makes it obvious
that it was designed for page-level allocations, not kmalloc(). So I think
Christoph's patch makes sense in that context.

At the same time, I do think that the whole notion of flushing the D$ is
certainly something that makes sense for kmalloc() allocations also, and
maybe people do actually do small DMA allocations and Christophs patch
would break that.

End result: for 2.6.22, I think the patch from Hugh that I already applied
is the right thing.

But as to 2.6.23 and onward.. I dunno.

Linus

2007-06-22 17:27:03

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, 22 Jun 2007, Linus Torvalds wrote:

> > Looks like it would indeed address the immediate issue on ARM -
> > IF they've no particular reason to be using kmalloc there.
>
> I think the right thing to do is do both of these things. I already
> applied Hugh's patch - it seemed like a total nobrainer to do at this
> stage in the 2.6.22 -rc series. But that doesn't mean that we should not
> _also_ look at "flush_dcache_page()" users.

Hugh's patch not address the complete issue. It only works right now
because the size of the allocation is page size and fits right into a slab
page. If debugging is enabled then the slab size will increase and the
"pages" will be misaligned which will lead to other sorts of funky
behavior. kmalloc allocations are only guaranteed to be aligned to
ARCH_KMALLOC_MINALIGN which is 4 to 8 bytes. If one must have a
page aligned entity out of a slab allocator then a custom slab needs to be
created.

2007-06-22 17:41:19

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator


Add VM_BUG_ON in case someone uses page_mapping on a slab page

Detect slab objects being passed to the page oriented functions of the VM.

It is not sufficient to simply return NULL because the functions calling
page_mapping may depend on other items of the page_struct also to be setup
properly. Moreover the slab object may not be properly aligned. The page
orientedfunctions of the VM expect to operate on page aligned, page sized
objects. operations on objects straddling page boundaries may only affect
the objects partially which may lead to surprising results.

It is better to detect eventually remaining uses and eliminate them.

Signed-off-by: Christoph Lameter <[email protected]>

---
include/linux/mm.h | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h 2007-06-22 10:33:27.000000000 -0700
+++ linux-2.6/include/linux/mm.h 2007-06-22 10:34:09.000000000 -0700
@@ -603,10 +603,7 @@ static inline struct address_space *page

if (unlikely(PageSwapCache(page)))
mapping = &swapper_space;
-#ifdef CONFIG_SLUB
- else if (unlikely(PageSlab(page)))
- mapping = NULL;
-#endif
+ VM_BUG_ON(PageSlab(page));
else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON))
mapping = NULL;
return mapping;

2007-06-22 18:40:36

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, 22 Jun 2007, Christoph Lameter wrote:
>
> Add VM_BUG_ON in case someone uses page_mapping on a slab page
>
> Detect slab objects being passed to the page oriented functions of the VM.
>
> It is not sufficient to simply return NULL because the functions calling
> page_mapping may depend on other items of the page_struct also to be setup
> properly. Moreover the slab object may not be properly aligned. The page
> orientedfunctions of the VM expect to operate on page aligned, page sized
> objects. operations on objects straddling page boundaries may only affect
> the objects partially which may lead to surprising results.
>
> It is better to detect eventually remaining uses and eliminate them.
>
> Signed-off-by: Christoph Lameter <[email protected]>

I'm quite happy with this approach for 2.6.23-rc, along with your ARM
dma_map patch which (if I understood aright) rmk approved. Except,
haven't you misplaced that VM_BUG_ON between an if and its else?
And I'd much prefer you to make it an outright BUG_ON, because
many testers have those VM_BUG_ONs configured out.

Hugh

>
> ---
> include/linux/mm.h | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> Index: linux-2.6/include/linux/mm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/mm.h 2007-06-22 10:33:27.000000000 -0700
> +++ linux-2.6/include/linux/mm.h 2007-06-22 10:34:09.000000000 -0700
> @@ -603,10 +603,7 @@ static inline struct address_space *page
>
> if (unlikely(PageSwapCache(page)))
> mapping = &swapper_space;
> -#ifdef CONFIG_SLUB
> - else if (unlikely(PageSlab(page)))
> - mapping = NULL;
> -#endif
> + VM_BUG_ON(PageSlab(page));
> else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON))
> mapping = NULL;
> return mapping;
>

2007-06-22 18:51:16

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, 22 Jun 2007, Hugh Dickins wrote:

> I'm quite happy with this approach for 2.6.23-rc, along with your ARM
> dma_map patch which (if I understood aright) rmk approved. Except,
> haven't you misplaced that VM_BUG_ON between an if and its else?

Right.

> And I'd much prefer you to make it an outright BUG_ON, because
> many testers have those VM_BUG_ONs configured out.

Thought about it but doing so would create quite a load of BUG_ONs in the
VM given the frequent use of that particular inline function. And AFAIK
we are only aware of one other potential call site that could cause
trouble. Many arches have run SLUB now for awhile and would certainly have
shown issues if they would do strange things with slab allocs. Even with
SLAB they would have to be very careful in order to make this work. So I
think its rather unlikely that this is going to be triggered. Its
primarily useful for debugging if strange things start to happen.
The VM_BUG_ON could stay there for good to make sure development does not
result in similar issues in the future.

Fixed up patch:




Add VM_BUG_ON in case someone uses page_mapping on a slab page

Detect slab objects being passed to the page oriented functions of the VM.

It is not sufficient to simply return NULL because the functions calling
page_mapping may depend on other items of the page_struct also to be setup
properly. Moreover slab object may not be properly aligned. The page oriented
functions of the VM expect to operate on page aligned, page sized objects.
Operations on object straddling page boundaries may only affect the objects
partially which may lead to surprising results.

It is better to detect eventually remaining uses and eliminate them.

Signed-off-by: Christoph Lameter <[email protected]>

---
include/linux/mm.h | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h 2007-06-22 10:33:27.000000000 -0700
+++ linux-2.6/include/linux/mm.h 2007-06-22 11:44:10.000000000 -0700
@@ -601,12 +601,9 @@ static inline struct address_space *page
{
struct address_space *mapping = page->mapping;

+ VM_BUG_ON(PageSlab(page));
if (unlikely(PageSwapCache(page)))
mapping = &swapper_space;
-#ifdef CONFIG_SLUB
- else if (unlikely(PageSlab(page)))
- mapping = NULL;
-#endif
else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON))
mapping = NULL;
return mapping;

2007-06-22 19:01:16

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] mmc-atmel : fix kunmap wrong usage

On Fri, Jun 22 2007, Marc Pignat wrote:
> from: Marc Pignat <[email protected]>
>
> kunmap must be called on the pointer returned by kmap.

kunmap_atomic() should align the given buffer to the start of the page
anyway, so your patch wont change behaviour.

--
Jens Axboe

2007-06-22 19:02:21

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, 22 Jun 2007, Christoph Lameter wrote:
> On Fri, 22 Jun 2007, Hugh Dickins wrote:
>
> > I'm quite happy with this approach for 2.6.23-rc, along with your ARM
> > dma_map patch which (if I understood aright) rmk approved. Except,
> > haven't you misplaced that VM_BUG_ON between an if and its else?
>
> Right.
>
> > And I'd much prefer you to make it an outright BUG_ON, because
> > many testers have those VM_BUG_ONs configured out.
>
> Thought about it but doing so would create quite a load of BUG_ONs in the
> VM given the frequent use of that particular inline function. And AFAIK
> we are only aware of one other potential call site that could cause
> trouble. Many arches have run SLUB now for awhile and would certainly have
> shown issues if they would do strange things with slab allocs. Even with
> SLAB they would have to be very careful in order to make this work. So I
> think its rather unlikely that this is going to be triggered. Its
> primarily useful for debugging if strange things start to happen.
> The VM_BUG_ON could stay there for good to make sure development does not
> result in similar issues in the future.

Okay; and I was overlooking that (as in this case) we'd probably get
an easily debuggable oops instead of the explicit BUG when it is
configured out.

>
> Fixed up patch:
>
>
>
>
> Add VM_BUG_ON in case someone uses page_mapping on a slab page
>
> Detect slab objects being passed to the page oriented functions of the VM.
>
> It is not sufficient to simply return NULL because the functions calling
> page_mapping may depend on other items of the page_struct also to be setup
> properly. Moreover slab object may not be properly aligned. The page oriented
> functions of the VM expect to operate on page aligned, page sized objects.
> Operations on object straddling page boundaries may only affect the objects
> partially which may lead to surprising results.
>
> It is better to detect eventually remaining uses and eliminate them.
>
> Signed-off-by: Christoph Lameter <[email protected]>

Acked-by: Hugh Dickins <[email protected]>
(immediately after 2.6.22, accompanied by your ARM patch)

>
> ---
> include/linux/mm.h | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> Index: linux-2.6/include/linux/mm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/mm.h 2007-06-22 10:33:27.000000000 -0700
> +++ linux-2.6/include/linux/mm.h 2007-06-22 11:44:10.000000000 -0700
> @@ -601,12 +601,9 @@ static inline struct address_space *page
> {
> struct address_space *mapping = page->mapping;
>
> + VM_BUG_ON(PageSlab(page));
> if (unlikely(PageSwapCache(page)))
> mapping = &swapper_space;
> -#ifdef CONFIG_SLUB
> - else if (unlikely(PageSlab(page)))
> - mapping = NULL;
> -#endif
> else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON))
> mapping = NULL;
> return mapping;
>

2007-06-22 19:11:44

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, 22 Jun 2007, Hugh Dickins wrote:

> Acked-by: Hugh Dickins <[email protected]>
> (immediately after 2.6.22, accompanied by your ARM patch)

We need to fix any remaining weird slab object uses right now. Your check
leaves a lot of holes open. 2.6.22 removes all other such strange slab
uses in other arches. It would be inconsistent if we left these things in
ARM (and maybe PA-RISC).





2007-06-22 20:15:33

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

Here is the corresponding PA-RISC piece. Its as straightforward as
the other one since the only way to make this work properly in the past
was if the sizes passed to the dma alloc functions are page size based.



PA-RISC: Use page allocator instead of slab allocator.

Slab pages obtained via kmalloc are not cachline aligned. Nor is it advisable
to perform VM operations designed for page allocator pages on
memory obtained via kmalloc.

So replace the page sized allocations in kernel/pci-dma.c with page allocator
pages.

Signed-off-by: Christoph Lameter <[email protected]>

---
arch/parisc/kernel/pci-dma.c | 4 ++--
include/linux/mm.h | 5 +----
2 files changed, 3 insertions(+), 6 deletions(-)

Index: linux-2.6/arch/parisc/kernel/pci-dma.c
===================================================================
--- linux-2.6.orig/arch/parisc/kernel/pci-dma.c 2007-06-22 13:02:32.000000000 -0700
+++ linux-2.6/arch/parisc/kernel/pci-dma.c 2007-06-22 13:06:39.000000000 -0700
@@ -572,7 +572,7 @@ static void *pa11_dma_alloc_noncoherent(
void *addr = NULL;

/* rely on kmalloc to be cacheline aligned */
- addr = kmalloc(size, flag);
+ addr = (void *)__get_free_pages(flag, get_order(size));
if(addr)
*dma_handle = (dma_addr_t)virt_to_phys(addr);

@@ -582,7 +582,7 @@ static void *pa11_dma_alloc_noncoherent(
static void pa11_dma_free_noncoherent(struct device *dev, size_t size,
void *vaddr, dma_addr_t iova)
{
- kfree(vaddr);
+ free_pages((unsigned long)vaddr, get_order(size));
return;
}


2007-06-22 20:19:17

by Russell King

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, Jun 22, 2007 at 09:40:45AM -0700, Linus Torvalds wrote:
>
>
> On Fri, 22 Jun 2007, Hugh Dickins wrote:
> >
> > On Thu, 21 Jun 2007, Christoph Lameter wrote:
> >
> > > Maybe this will address the issue on ARM?
> >
> > Looks like it would indeed address the immediate issue on ARM -
> > IF they've no particular reason to be using kmalloc there.
>
> I think the right thing to do is do both of these things. I already
> applied Hugh's patch - it seemed like a total nobrainer to do at this
> stage in the 2.6.22 -rc series. But that doesn't mean that we should not
> _also_ look at "flush_dcache_page()" users.

Note, however, that the use of flush_dcache_page() on allocations
derived from dma_alloc_coherent() is undefined and unpredictable.

dma_alloc_coherent() may return a remapped virtual address which
would be invalid for things like virt_to_page() to operate on.

So the fact that dma_alloc_coherent() returning something that was
kmalloc'd and then causing flush_dcache_page() to oops is actually
a sign that there's far deeper problems.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:

2007-06-22 20:22:21

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, 22 Jun 2007, Christoph Lameter wrote:
>
> We need to fix any remaining weird slab object uses right now. Your check
> leaves a lot of holes open. 2.6.22 removes all other such strange slab
> uses in other arches. It would be inconsistent if we left these things in
> ARM (and maybe PA-RISC).

As I understand it, that driver used to work right with SLAB, then
oopsed with SLUB, and now works okay again with the page_mapping patch?

I'm unclear how it comes about that you removed "all other such strange
slab uses in other arches", yet missed this? That suggests there may
be further unexpected uses.

It worries me that any use which catches you by surprise has to be
fixed up in the caller, rather than in slub itself: slab/slub is a
service, not a master. But I'm rather repeating myself.

Hugh

2007-06-22 22:55:11

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, 22 Jun 2007, Hugh Dickins wrote:

> On Fri, 22 Jun 2007, Christoph Lameter wrote:
> >
> > We need to fix any remaining weird slab object uses right now. Your check
> > leaves a lot of holes open. 2.6.22 removes all other such strange slab
> > uses in other arches. It would be inconsistent if we left these things in
> > ARM (and maybe PA-RISC).
>
> As I understand it, that driver used to work right with SLAB, then
> oopsed with SLUB, and now works okay again with the page_mapping patch?

Try to enable debugging then it may fail again despite your patch. You
just scratched the surface with this and are enabling a dangerous usage
mode with SLUB that we explicitly did not want to support anymore.

> I'm unclear how it comes about that you removed "all other such strange
> slab uses in other arches", yet missed this? That suggests there may
> be further unexpected uses.

There could be other uses that were missed. I looked for slabs created by
kmem_cache_create. The trouble is that any kmalloc could also be used for
engineer these weird things (as seen here) and there are gazillions of
kmallocs. That is why a VM_BUG_ON is useful. However, it requires some
effort even with SLAB to create these things and--given that we have
tested extensively on lots of arches--I am hopeful that
we have caught everything.

> It worries me that any use which catches you by surprise has to be
> fixed up in the caller, rather than in slub itself: slab/slub is a
> service, not a master. But I'm rather repeating myself.

SLUB used to implement the same special casing as SLAB did (which results
in the fragile scenarios in which the above is possible). But we made the
decision to clean up the slab interface and we dropped the emulation of
the SLAB frills from SLUB.

Messy and problematic code like this should be removed. That improves the
quality of the kernel. The removal is a straightfoward process. And the
cases that we are discussing here are in remote corners of the kernel.

2007-06-23 10:27:47

by Oleg Verych

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

* From: Christoph Lameter
* Newsgroups: linux.kernel,linux.ports.arm.kernel
* Date: Fri, 22 Jun 2007 13:15:19 -0700 (PDT)
>
> Here is the corresponding PA-RISC piece. Its as straightforward as
> the other one since the only way to make this work properly in the past
> was if the sizes passed to the dma alloc functions are page size based.
[]
> --- linux-2.6.orig/arch/parisc/kernel/pci-dma.c 2007-06-22 13:02:32.000000000 -0700
> +++ linux-2.6/arch/parisc/kernel/pci-dma.c 2007-06-22 13:06:39.000000000 -0700
> @@ -572,7 +572,7 @@ static void *pa11_dma_alloc_noncoherent(
> void *addr = NULL;
>
> /* rely on kmalloc to be cacheline aligned */
> - addr = kmalloc(size, flag);
> + addr = (void *)__get_free_pages(flag, get_order(size));

Seems like comment must be removed.
____

2007-06-24 08:39:45

by Russell King

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Fri, Jun 22, 2007 at 07:39:33PM +0100, Hugh Dickins wrote:
> On Fri, 22 Jun 2007, Christoph Lameter wrote:
> >
> > Add VM_BUG_ON in case someone uses page_mapping on a slab page
> >
> > Detect slab objects being passed to the page oriented functions of the VM.
> >
> > It is not sufficient to simply return NULL because the functions calling
> > page_mapping may depend on other items of the page_struct also to be setup
> > properly. Moreover the slab object may not be properly aligned. The page
> > orientedfunctions of the VM expect to operate on page aligned, page sized
> > objects. operations on objects straddling page boundaries may only affect
> > the objects partially which may lead to surprising results.
> >
> > It is better to detect eventually remaining uses and eliminate them.
> >
> > Signed-off-by: Christoph Lameter <[email protected]>
>
> I'm quite happy with this approach for 2.6.23-rc, along with your ARM
> dma_map patch which (if I understood aright) rmk approved.

I didn't approve it. Please re-read my reply - there are still some
unanswered questions in it which _really_ need answering.

The report talks about the AT91 machines. These machines do not have
cache coherent DMA. Therefore, the code being patched should be
optimised away by the compiler. *Or* we have even bigger problems.

Please forward the original problem report.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:

2007-06-24 10:25:13

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Sun, 24 Jun 2007, Russell King wrote:
> On Fri, Jun 22, 2007 at 07:39:33PM +0100, Hugh Dickins wrote:
> > I'm quite happy with this approach for 2.6.23-rc, along with your ARM
> > dma_map patch which (if I understood aright) rmk approved.
>
> I didn't approve it. Please re-read my reply - there are still some
> unanswered questions in it which _really_ need answering.

Sorry for misrepresenting you.

> The report talks about the AT91 machines. These machines do not have
> cache coherent DMA. Therefore, the code being patched should be
> optimised away by the compiler. *Or* we have even bigger problems.
>
> Please forward the original problem report.

Done.
Hugh

2007-06-24 10:52:41

by Russell King

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Sun, Jun 24, 2007 at 11:24:16AM +0100, Hugh Dickins wrote:
> On Sun, 24 Jun 2007, Russell King wrote:
> > On Fri, Jun 22, 2007 at 07:39:33PM +0100, Hugh Dickins wrote:
> > > I'm quite happy with this approach for 2.6.23-rc, along with your ARM
> > > dma_map patch which (if I understood aright) rmk approved.
> >
> > I didn't approve it. Please re-read my reply - there are still some
> > unanswered questions in it which _really_ need answering.
>
> Sorry for misrepresenting you.
>
> > The report talks about the AT91 machines. These machines do not have
> > cache coherent DMA. Therefore, the code being patched should be
> > optimised away by the compiler. *Or* we have even bigger problems.
> >
> > Please forward the original problem report.
>
> Done.

Okay, that seems to back up my suspicions - it's definitely AT91-based.
Since AT91-based machines do not have a DMA coherent cache,
arch_is_coherent() must be defined to '0'. The only way that kmalloc
could be reached is if that were defined to something other than '0',
and if that's done on a machine with DMA incoherent caches, that will
lead to data corruption.

I think we need to wait for Nicolas to respond on this issue before
running headlong into applying a sticky plaster for something which is
actually a deeper issue.

However, the arch_is_coherent() path _is_ buggy as it stands, but in
more than the way identified thus far. Eg, it doesn't set __GFP_DMA
appropriately for various DMA masks, so it might return DMA inaccessible
memory.

If we're after a simple fix for 2.6.22, the _easiest_ solution would be
to delete the entire arch_is_coherent() branches in arch/arm/mm/consistent.c;
that will result in a working solution for everyone, albiet at a slightly
lower performance for the DMA-coherent CPUs.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:

2007-06-25 00:26:19

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Sun, 24 Jun 2007, Russell King wrote:
> On Sun, Jun 24, 2007 at 11:24:16AM +0100, Hugh Dickins wrote:
> > On Sun, 24 Jun 2007, Russell King wrote:
> > > On Fri, Jun 22, 2007 at 07:39:33PM +0100, Hugh Dickins wrote:
> > >
> > > Please forward the original problem report.
> >
> > Done.
>
> Okay, that seems to back up my suspicions - it's definitely AT91-based.
> Since AT91-based machines do not have a DMA coherent cache,
> arch_is_coherent() must be defined to '0'. The only way that kmalloc
> could be reached is if that were defined to something other than '0',
> and if that's done on a machine with DMA incoherent caches, that will
> lead to data corruption.

Yes, having looked through that now, I agree with you 100%.

>
> I think we need to wait for Nicolas to respond on this issue before
> running headlong into applying a sticky plaster for something which is
> actually a deeper issue.

No need for Nicolas to respond, I think I've found what's "wrong":
see below.

>
> However, the arch_is_coherent() path _is_ buggy as it stands, but in
> more than the way identified thus far. Eg, it doesn't set __GFP_DMA
> appropriately for various DMA masks, so it might return DMA inaccessible
> memory.

I expect you're right, but that's a separate issue. I had thought
you were approving Christoph's ARM patch because both you and he seemed
to agree that kmalloc was inappropriate for use in dma_alloc_coherent,
whatever additional issues you saw with it.

I still don't see why kmalloc is wrong there myself: for a while
I bought Christoph's alignment argument, but now I don't see why
(more than long) alignment is important to it. But I'm easily
wrong when it comes to DMA mapping issues.

>
> If we're after a simple fix for 2.6.22, the _easiest_ solution would be
> to delete the entire arch_is_coherent() branches in arch/arm/mm/consistent.c;
> that will result in a working solution for everyone, albiet at a slightly
> lower performance for the DMA-coherent CPUs.

The fix for 2.6.22 is my PageSlab test in page_mapping which Linus
already put into -git.

And I now rather think that needs to stay, not be replaced by the
VM_BUG_ON Christoph was proposing for 2.6.23 (which earlier I acked).

Christoph responded to my page_mapping patch by looking at arch/arm,
and there finding a kmalloc in dma_alloc_coherent which he didn't
like; but you're right, it's entirely irrelevant to Nicolas' oops.

The slub allocation which gives rise to Nicolas' oops in not in
ARM, but (I'm guessing) in drivers/mmc/core/sd.c: one of those
status = kmalloc(64, GFP_KERNEL);
where status is passed down for the response from mmc_sd_switch.

And what is wrong with using kmalloc there?
Why should that be changed to allocate a whole page?
How many other such cases might there be?

And the flush_dcache_page in at91mci_post_dma_read looks correct
to me too: it has just filled and perhaps also swabbed a buffer,
that buffer might in some cases be mapped into userspace, so it
calls flush_dcache_page.

In the kmalloc case it's not mapped into userspace: flush_dcache_page
should detect that and do nothing, as it does with slab; but slub was
reusing page->mapping for something else, so we oopsed.

Hugh

2007-06-25 13:56:50

by Nicolas Ferre

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

Hugh Dickins :
> On Sun, 24 Jun 2007, Russell King wrote:
>> On Sun, Jun 24, 2007 at 11:24:16AM +0100, Hugh Dickins wrote:
>>> On Sun, 24 Jun 2007, Russell King wrote:
>>>> On Fri, Jun 22, 2007 at 07:39:33PM +0100, Hugh Dickins wrote:
>>>>
>>>> Please forward the original problem report.
>>> Done.
>> Okay, that seems to back up my suspicions - it's definitely AT91-based.
>> Since AT91-based machines do not have a DMA coherent cache,
>> arch_is_coherent() must be defined to '0'. The only way that kmalloc
>> could be reached is if that were defined to something other than '0',
>> and if that's done on a machine with DMA incoherent caches, that will
>> lead to data corruption.
>
> Yes, having looked through that now, I agree with you 100%.

AT91 _is_ defined with the arch_is_coherent() switch set to 0 (in
include/asm-arm/memory.h and not overloaded).

>> I think we need to wait for Nicolas to respond on this issue before
>> running headlong into applying a sticky plaster for something which is
>> actually a deeper issue.
>
> No need for Nicolas to respond, I think I've found what's "wrong":
> see below.
>
>> However, the arch_is_coherent() path _is_ buggy as it stands, but in
>> more than the way identified thus far. Eg, it doesn't set __GFP_DMA
>> appropriately for various DMA masks, so it might return DMA inaccessible
>> memory.

Ok it is bad but, in AT91, all memory is accessible with DMA.

[..]

> The slub allocation which gives rise to Nicolas' oops in not in
> ARM, but (I'm guessing) in drivers/mmc/core/sd.c: one of those
> status = kmalloc(64, GFP_KERNEL);
> where status is passed down for the response from mmc_sd_switch.

True, the oops appears after a mmc switch command (#6 command).

[..]

Not sure I can add much to what Hugh has said. If you need some more
precision anyway, I will try to answer.

Regards,
--
Nicolas Ferre


2007-06-25 14:08:05

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Mon, 25 Jun 2007, Hugh Dickins wrote:

> And I now rather think that needs to stay, not be replaced by the
> VM_BUG_ON Christoph was proposing for 2.6.23 (which earlier I acked).
>
> Christoph responded to my page_mapping patch by looking at arch/arm,
> and there finding a kmalloc in dma_alloc_coherent which he didn't
> like; but you're right, it's entirely irrelevant to Nicolas' oops.
>
> The slub allocation which gives rise to Nicolas' oops in not in
> ARM, but (I'm guessing) in drivers/mmc/core/sd.c: one of those
> status = kmalloc(64, GFP_KERNEL);
> where status is passed down for the response from mmc_sd_switch.
>
> And what is wrong with using kmalloc there?
> Why should that be changed to allocate a whole page?
> How many other such cases might there be?

So someone effectively does a flush_dcache_page(virt_to_page(status))?

> In the kmalloc case it's not mapped into userspace: flush_dcache_page
> should detect that and do nothing, as it does with slab; but slub was
> reusing page->mapping for something else, so we oopsed.

If that is the case then what we really want is a flush_dcache_range
not the above. flush_dcache_range does not take a page struct as an
argument and it will work on memory that has no struct page backing it.

Is flush_dcache_range available in all platforms? I see some drivers
using it:

drivers/net/fec.c
drivers/serial/mpsc.c
drivers/char/agp/uninorth-agp.c


flush_dcache_page is implemented by

sparc64 Uses mapping
sh Ok. Only uses PG_mapped
arm Uses mapping in the mmu case
frv Does a kmap_atomic ?? Otherwise looks ok.
ppc Clears PG_arch_1
mips Uses mapping
sh64 No page struct use
parisc Uses mapping
xtensa Uses mapping
powerpc Handles page flags PG_arch_1
ia64 Clears PG_arch_1
sparc Calculates address based on page struct addr.
blackfin Does an immediate page_address(page)
m68k Does an immediate page_address(page)

In many situations the page struct passed to flush_dcache_page is
simply used to calculate the virtual address. So its mostly harmless.
Trouble starts when page attributes like the mapping is used.

So the problem platforms are

sparc64 arm mips parisc xtensa

If we indeed do these weird things then I think the general fix should
be to use flush_dcache_range() but that is too late for 2.6.22. The
VM_BUG_ON will be useful to detect these scenarios. Maybe we need
to replace that with a WARN_ON or something if the usage is frequent?
There are a large number of platforms on which flush_dcache_range has
no effect or an effect that is negligible.

A kmalloc slab object (even 64 byte) may be crossing a page boundary
with a ARCH_KMALLOC_MINALIGN of 4 or 8. So I think that
flush_dcache_range *must* be used rather than flush_dcache_page.
flush_dcache_page(virt_to_page(object)) takes the starting address of
the object and flushes the page in which the object started. It may
not be the complete object. This usually works fine with 64 byte objects
because they neatly fit into a slab page. Again if CONFIG_SLAB_DEBUG
f.e. is enabled then the alignment will no longer be to a 64 byte bound
but only to the alignment guaranteed by ARCH_KMALLOC_MINALIGN. If this
trick is used on a non kmalloc cache with a non power of size then we
may have a larger chance of trouble occurring.

For 2.6.22 the easiest solution may be to check for PageSlab in the
flush_dcache_pages of the affected platforms and then count on
the users not enabling any slab debugging. Its then simply the same state
as before.

2007-06-25 16:43:06

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Mon, 25 Jun 2007, Christoph Lameter wrote:
> On Mon, 25 Jun 2007, Hugh Dickins wrote:
>
> > And I now rather think that needs to stay, not be replaced by the
> > VM_BUG_ON Christoph was proposing for 2.6.23 (which earlier I acked).
> >
> > Christoph responded to my page_mapping patch by looking at arch/arm,
> > and there finding a kmalloc in dma_alloc_coherent which he didn't
> > like; but you're right, it's entirely irrelevant to Nicolas' oops.
> >
> > The slub allocation which gives rise to Nicolas' oops in not in
> > ARM, but (I'm guessing) in drivers/mmc/core/sd.c: one of those
> > status = kmalloc(64, GFP_KERNEL);
> > where status is passed down for the response from mmc_sd_switch.
> >
> > And what is wrong with using kmalloc there?
> > Why should that be changed to allocate a whole page?
> > How many other such cases might there be?
>
> So someone effectively does a flush_dcache_page(virt_to_page(status))?

Yes.

>
> > In the kmalloc case it's not mapped into userspace: flush_dcache_page
> > should detect that and do nothing, as it does with slab; but slub was
> > reusing page->mapping for something else, so we oopsed.
>
> If that is the case then what we really want is a flush_dcache_range
> not the above. flush_dcache_range does not take a page struct as an
> argument and it will work on memory that has no struct page backing it.
>
> Is flush_dcache_range available in all platforms? I see some drivers
> using it:
>
> drivers/net/fec.c
> drivers/serial/mpsc.c
> drivers/char/agp/uninorth-agp.c
>
>
> flush_dcache_page is implemented by
>
> sparc64 Uses mapping
> sh Ok. Only uses PG_mapped
> arm Uses mapping in the mmu case
> frv Does a kmap_atomic ?? Otherwise looks ok.
> ppc Clears PG_arch_1
> mips Uses mapping
> sh64 No page struct use
> parisc Uses mapping
> xtensa Uses mapping
> powerpc Handles page flags PG_arch_1
> ia64 Clears PG_arch_1
> sparc Calculates address based on page struct addr.
> blackfin Does an immediate page_address(page)
> m68k Does an immediate page_address(page)
>
> In many situations the page struct passed to flush_dcache_page is
> simply used to calculate the virtual address. So its mostly harmless.
> Trouble starts when page attributes like the mapping is used.

Mostly harmless indeed. I don't understand why you insist on trying
to complicate the situation. flush_dcache_page is only expected to
do something on pages mapped into userspace (correct me if I'm wrong
there), it's expected to do nothing on kmalloc'ed pages. It's
been working that way for years, and will continue to work that way
with slub, providing either page_mapping or flush_dcache_page checks
PageSlab to avoid oopsing on page->mapping.

2.6.22-rc6 has page_mapping making that check: we could argue about
which is the better site for it, there are good arguments both ways
(page_mapping is the correct place, flush_dcache_page is the more
efficient place), I suggest we leave it as is.

>
> So the problem platforms are
>
> sparc64 arm mips parisc xtensa
>
> If we indeed do these weird things then I think the general fix should
> be to use flush_dcache_range() but that is too late for 2.6.22. The
> VM_BUG_ON will be useful to detect these scenarios. Maybe we need
> to replace that with a WARN_ON or something if the usage is frequent?
> There are a large number of platforms on which flush_dcache_range has
> no effect or an effect that is negligible.
>
> A kmalloc slab object (even 64 byte) may be crossing a page boundary
> with a ARCH_KMALLOC_MINALIGN of 4 or 8. So I think that
> flush_dcache_range *must* be used rather than flush_dcache_page.

Why???? All we require of flush_dcache_page is that it not oops on
the first page in the range: we don't need to change over to
flush_dcache_range for that.

> flush_dcache_page(virt_to_page(object)) takes the starting address of
> the object and flushes the page in which the object started. It may
> not be the complete object. This usually works fine with 64 byte objects
> because they neatly fit into a slab page. Again if CONFIG_SLAB_DEBUG
> f.e. is enabled then the alignment will no longer be to a 64 byte bound
> but only to the alignment guaranteed by ARCH_KMALLOC_MINALIGN. If this
> trick is used on a non kmalloc cache with a non power of size then we
> may have a larger chance of trouble occurring.
>
> For 2.6.22 the easiest solution may be to check for PageSlab in the
> flush_dcache_pages of the affected platforms and then count on
> the users not enabling any slab debugging. Its then simply the same state
> as before.

We have the PageSlab check in page_mapping now,
I don't see what further change is needed.

Hugh

2007-06-25 17:00:31

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Mon, 25 Jun 2007, Hugh Dickins wrote:

> > In many situations the page struct passed to flush_dcache_page is
> > simply used to calculate the virtual address. So its mostly harmless.
> > Trouble starts when page attributes like the mapping is used.
>
> Mostly harmless indeed. I don't understand why you insist on trying
> to complicate the situation. flush_dcache_page is only expected to
> do something on pages mapped into userspace (correct me if I'm wrong
> there), it's expected to do nothing on kmalloc'ed pages. It's
> been working that way for years, and will continue to work that way
> with slub, providing either page_mapping or flush_dcache_page checks
> PageSlab to avoid oopsing on page->mapping.

It is definitely intended to work. Otherwise we would not have code
like this:

christoph@fly:~/linux-2.6$ find . -name "*.c" | xargs grep "flush_dcache_page"|grep virt
./drivers/scsi/scsi_tgt_if.c: flush_dcache_page(virt_to_page(ev));
./drivers/scsi/scsi_tgt_if.c: flush_dcache_page(virt_to_page(ev));


> 2.6.22-rc6 has page_mapping making that check: we could argue about
> which is the better site for it, there are good arguments both ways
> (page_mapping is the correct place, flush_dcache_page is the more
> efficient place), I suggest we leave it as is.

Ok. I think your patch is fine as a quick fix for 2.6.22. I am a bit
uneasy with that given that its in such a broadly used function while its
only use is to enable flush_dcache_page to work. But we need the general
issue taken care of after 2.6.22.

> > A kmalloc slab object (even 64 byte) may be crossing a page boundary
> > with a ARCH_KMALLOC_MINALIGN of 4 or 8. So I think that
> > flush_dcache_range *must* be used rather than flush_dcache_page.
>
> Why???? All we require of flush_dcache_page is that it not oops on
> the first page in the range: we don't need to change over to
> flush_dcache_range for that.

As explained about: There are corner cases in which it does not work. You
seem to assume that flush_dcache_page can become a no op. That may not be
true on platforms that need explicit cache flushing for a DMA engine to
access a data structure. The above listed use suggests that the caller
expects flushing to occur correctly.



2007-06-25 17:24:24

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Mon, 25 Jun 2007, Christoph Lameter wrote:
> On Mon, 25 Jun 2007, Hugh Dickins wrote:
>
> > > In many situations the page struct passed to flush_dcache_page is
> > > simply used to calculate the virtual address. So its mostly harmless.
> > > Trouble starts when page attributes like the mapping is used.
> >
> > Mostly harmless indeed. I don't understand why you insist on trying
> > to complicate the situation. flush_dcache_page is only expected to
> > do something on pages mapped into userspace (correct me if I'm wrong
> > there), it's expected to do nothing on kmalloc'ed pages. It's
> > been working that way for years, and will continue to work that way
> > with slub, providing either page_mapping or flush_dcache_page checks
> > PageSlab to avoid oopsing on page->mapping.
>
> It is definitely intended to work. Otherwise we would not have code
> like this:
>
> christoph@fly:~/linux-2.6$ find . -name "*.c" | xargs grep "flush_dcache_page"|grep virt
> ./drivers/scsi/scsi_tgt_if.c: flush_dcache_page(virt_to_page(ev));
> ./drivers/scsi/scsi_tgt_if.c: flush_dcache_page(virt_to_page(ev));

I didn't claim that flush_dcache_page(virt_to_page(virt)) is not expected
to work. I claim that flush_dcache_page is expected to be a noop rather
than an oops on a kmalloced page.

> > 2.6.22-rc6 has page_mapping making that check: we could argue about
> > which is the better site for it, there are good arguments both ways
> > (page_mapping is the correct place, flush_dcache_page is the more
> > efficient place), I suggest we leave it as is.
>
> Ok. I think your patch is fine as a quick fix for 2.6.22. I am a bit
> uneasy with that given that its in such a broadly used function while its
> only use is to enable flush_dcache_page to work. But we need the general
> issue taken care of after 2.6.22.

What general issue?

>
> > > A kmalloc slab object (even 64 byte) may be crossing a page boundary
> > > with a ARCH_KMALLOC_MINALIGN of 4 or 8. So I think that
> > > flush_dcache_range *must* be used rather than flush_dcache_page.
> >
> > Why???? All we require of flush_dcache_page is that it not oops on
> > the first page in the range: we don't need to change over to
> > flush_dcache_range for that.
>
> As explained about: There are corner cases in which it does not work. You
> seem to assume that flush_dcache_page can become a no op. That may not be
> true on platforms that need explicit cache flushing for a DMA engine to
> access a data structure. The above listed use suggests that the caller
> expects flushing to occur correctly.

The scsi_tgt_if.c use you show above? That's not dealing with
kmalloced pages, is it? True, the pages it is dealing with don't
have page->mapping set, so those architectures which use page->mapping
to find what to do in their flush_dcache_page, won't do anything there
in their flush_dcache_page. Whether that's a bug or not, I wouldn't
pretend to know; but it's nothing to do with the present discussion.

Please see Documentation/cachetlb.txt: flush_dcache_page is about
pagecache pages mapped into userspace. We don't use kmalloc for those,
but we do sometimes need to flush_dcache_page in places which commonly
deal with pagecache pages, but sometimes handle kmalloc'ed buffers too.
Luckily we don't have to deal with buffers in which the first page is
kmalloced and the next comes from pagecache.

Hugh

2007-06-25 18:23:47

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Mon, 25 Jun 2007, Hugh Dickins wrote:

> > > PageSlab to avoid oopsing on page->mapping.
> >
> > It is definitely intended to work. Otherwise we would not have code
> > like this:
> >
> > christoph@fly:~/linux-2.6$ find . -name "*.c" | xargs grep "flush_dcache_page"|grep virt
> > ./drivers/scsi/scsi_tgt_if.c: flush_dcache_page(virt_to_page(ev));
> > ./drivers/scsi/scsi_tgt_if.c: flush_dcache_page(virt_to_page(ev));
>
> I didn't claim that flush_dcache_page(virt_to_page(virt)) is not expected
> to work. I claim that flush_dcache_page is expected to be a noop rather
> than an oops on a kmalloced page.

There are no kmalloced pages. There is only kmalloced memory. You allocate
pages from the page allocator. Its a layering violation to expect a page
struct operation on a slab object to work.

It is not okay to use a page cache function on a slab object. The slab
object does not fulfill the alignment requirements of a page cache page
nor does it have a compatible page struct content as a page cache page.
This can only cause trouble.

The problem is that the code is allocating some slab memory and then
determines the page struct pointer and then hands it back to the DMA
layer?

> > Ok. I think your patch is fine as a quick fix for 2.6.22. I am a bit
> > uneasy with that given that its in such a broadly used function while its
> > only use is to enable flush_dcache_page to work. But we need the general
> > issue taken care of after 2.6.22.
>
> What general issue?

How to flush slab objects in a reliable way.

> Please see Documentation/cachetlb.txt: flush_dcache_page is about
> pagecache pages mapped into userspace. We don't use kmalloc for those,
> but we do sometimes need to flush_dcache_page in places which commonly
> deal with pagecache pages, but sometimes handle kmalloc'ed buffers too.
> Luckily we don't have to deal with buffers in which the first page is
> kmalloced and the next comes from pagecache.

This gets crazier and crazier. flush_dcache_page is for pages not for
allocated buffers via kmalloc. So this has nothing to do with any need
to flush slab objects?

Sometimes we do this and then we do that? So someone played loose
ball with the slab, was successful and that makes it right now?

We need the VM_BUG_ON in include/linux/mm.h to stop this nonsense.

The "sometimes we have kmalloced buffers" locations need to be fixed.

2007-06-25 18:44:39

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Mon, 25 Jun 2007, Christoph Lameter wrote:
> On Mon, 25 Jun 2007, Hugh Dickins wrote:
> >
> > I didn't claim that flush_dcache_page(virt_to_page(virt)) is not expected
> > to work. I claim that flush_dcache_page is expected to be a noop rather
> > than an oops on a kmalloced page.
>
> There are no kmalloced pages. There is only kmalloced memory. You allocate
> pages from the page allocator. Its a layering violation to expect a page
> struct operation on a slab object to work.

Oh, thanks so much for resolving my confusion on that.

>
> It is not okay to use a page cache function on a slab object. The slab
> object does not fulfill the alignment requirements of a page cache page
> nor does it have a compatible page struct content as a page cache page.
> This can only cause trouble.
>
> The problem is that the code is allocating some slab memory and then
> determines the page struct pointer and then hands it back to the DMA
> layer?
>
> > > Ok. I think your patch is fine as a quick fix for 2.6.22. I am a bit
> > > uneasy with that given that its in such a broadly used function while its
> > > only use is to enable flush_dcache_page to work. But we need the general
> > > issue taken care of after 2.6.22.
> >
> > What general issue?
>
> How to flush slab objects in a reliable way.
>
> > Please see Documentation/cachetlb.txt: flush_dcache_page is about
> > pagecache pages mapped into userspace. We don't use kmalloc for those,
> > but we do sometimes need to flush_dcache_page in places which commonly
> > deal with pagecache pages, but sometimes handle kmalloc'ed buffers too.
> > Luckily we don't have to deal with buffers in which the first page is
> > kmalloced and the next comes from pagecache.
>
> This gets crazier and crazier. flush_dcache_page is for pages not for
> allocated buffers via kmalloc. So this has nothing to do with any need
> to flush slab objects?
>
> Sometimes we do this and then we do that? So someone played loose
> ball with the slab, was successful and that makes it right now?
>
> We need the VM_BUG_ON in include/linux/mm.h to stop this nonsense.
>
> The "sometimes we have kmalloced buffers" locations need to be fixed.

I've said enough, I'd better leave it to others to deter you or not
from fiddling around pointlessly here.

But Andrew, please cancel my Acked-by to mm's
add-vm_bug_on-in-case-someone-uses-page_mapping-on-a-slab-page.patch

Thanks,
Hugh

2007-06-25 18:50:59

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Mon, 25 Jun 2007, Hugh Dickins wrote:

> > The "sometimes we have kmalloced buffers" locations need to be fixed.
>
> I've said enough, I'd better leave it to others to deter you or not
> from fiddling around pointlessly here.

Are there any locations left after the two fixes to pa-risc and arm?

If the ARM fix solves the issue then we may not need that special casing
anymore. Maybe the sometimes has become never?



2007-06-25 19:05:07

by Hugh Dickins

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Mon, 25 Jun 2007, Christoph Lameter wrote:
> On Mon, 25 Jun 2007, Hugh Dickins wrote:
>
> > > The "sometimes we have kmalloced buffers" locations need to be fixed.
> >
> > I've said enough, I'd better leave it to others to deter you or not
> > from fiddling around pointlessly here.
>
> Are there any locations left after the two fixes to pa-risc and arm?
>
> If the ARM fix solves the issue then we may not need that special casing
> anymore. Maybe the sometimes has become never?

Please reread mails of the last 20 hours. Your "ARM fix" may or may not
be a good thing, I don't know. But it had nothing to do with this oops,
which (we believe) came from a kmalloc in drivers/mmc/core/sd.c. There
are likely to be many other drivers which pass down kmalloc'ed buffers
to routines which handle both kmalloc'ed buffers and page buffers.
Though very few of those needing to flush_dcache_page.

Hugh

2007-06-26 18:09:59

by Christoph Lameter

[permalink] [raw]
Subject: Re: Oops in a driver while using SLUB as a SLAB allocator

On Mon, 25 Jun 2007, Hugh Dickins wrote:

> Please reread mails of the last 20 hours. Your "ARM fix" may or may not
> be a good thing, I don't know. But it had nothing to do with this oops,
> which (we believe) came from a kmalloc in drivers/mmc/core/sd.c. There
> are likely to be many other drivers which pass down kmalloc'ed buffers
> to routines which handle both kmalloc'ed buffers and page buffers.
> Though very few of those needing to flush_dcache_page.

If that is so then those functions may want to check PageSlab before
calling flush_dcache_page.