2014-02-25 19:32:57

by Sasha Levin

[permalink] [raw]
Subject: mm: NULL ptr deref in balance_dirty_pages_ratelimited

Hi all,

While fuzzing with trinity inside a KVM tools running latest -next kernel I've stumbled on the
following spew:

[ 232.869443] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 232.870230] IP: [<mm/page-writeback.c:1612>] balance_dirty_pages_ratelimited+0x1e/0x150
[ 232.870230] PGD 586e1d067 PUD 586e1e067 PMD 0
[ 232.870230] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 232.870230] Dumping ftrace buffer:
[ 232.870230] (ftrace buffer empty)
[ 232.870230] Modules linked in:
[ 232.870230] CPU: 36 PID: 9707 Comm: trinity-c36 Tainted: G W
3.14.0-rc4-next-20140225-sasha-00010-ga117461 #42
[ 232.870230] task: ffff880586dfb000 ti: ffff880586e34000 task.ti: ffff880586e34000
[ 232.870230] RIP: 0010:[<mm/page-writeback.c:1612>] [<mm/page-writeback.c:1612>]
balance_dirty_pages_ratelimited+0x1e/0x150
[ 232.870230] RSP: 0000:ffff880586e35c58 EFLAGS: 00010282
[ 232.870230] RAX: 0000000000000000 RBX: ffff880582831361 RCX: 0000000000000007
[ 232.870230] RDX: 0000000000000007 RSI: ffff880586dfbcc0 RDI: ffff880582831361
[ 232.870230] RBP: ffff880586e35c78 R08: 0000000000000000 R09: 0000000000000000
[ 232.870230] R10: 0000000000000001 R11: 0000000000000001 R12: 00007f58007ee000
[ 232.870230] R13: ffff880c8d6d4f70 R14: 0000000000000200 R15: ffff880c8dcce710
[ 232.870230] FS: 00007f58018bb700(0000) GS:ffff880c8e800000(0000) knlGS:0000000000000000
[ 232.870230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 232.870230] CR2: 0000000000000020 CR3: 0000000586e1c000 CR4: 00000000000006e0
[ 232.870230] Stack:
[ 232.870230] ffff880586e35c78 ffff880586e33400 00007f58007ee000 ffff880c8d6d4f70
[ 232.870230] ffff880586e35cd8 ffffffff8127d241 0000000000000001 0000000000000001
[ 232.870230] 0000000000000000 ffffea0032337080 0000000080000000 ffff880586e33400
[ 232.870230] Call Trace:
[ 232.870230] [<mm/memory.c:3467>] do_shared_fault+0x1a1/0x1f0
[ 232.870230] [<mm/memory.c:3487>] handle_pte_fault+0xc8/0x230
[ 232.870230] [<arch/x86/include/asm/preempt.h:98>] ? delay_tsc+0xea/0x110
[ 232.870230] [<mm/memory.c:3770>] __handle_mm_fault+0x36e/0x3a0
[ 232.870230] [<include/linux/rcupdate.h:829>] ? rcu_read_unlock+0x5d/0x60
[ 232.870230] [<include/linux/memcontrol.h:148>] handle_mm_fault+0x10b/0x1b0
[ 232.870230] [<arch/x86/mm/fault.c:1147>] ? __do_page_fault+0x2e2/0x590
[ 232.870230] [<arch/x86/mm/fault.c:1214>] __do_page_fault+0x551/0x590
[ 232.870230] [<kernel/sched/cputime.c:681>] ? vtime_account_user+0x91/0xa0
[ 232.870230] [<arch/x86/include/asm/atomic.h:26>] ? context_tracking_user_exit+0xa8/0x1c0
[ 232.870230] [<arch/x86/include/asm/preempt.h:98>] ? _raw_spin_unlock+0x30/0x50
[ 232.870230] [<kernel/sched/cputime.c:681>] ? vtime_account_user+0x91/0xa0
[ 232.870230] [<arch/x86/include/asm/atomic.h:26>] ? context_tracking_user_exit+0xa8/0x1c0
[ 232.870230] [<arch/x86/include/asm/atomic.h:26>] do_page_fault+0x3d/0x70
[ 232.870230] [<arch/x86/kernel/kvm.c:263>] do_async_page_fault+0x35/0x100
[ 232.870230] [<arch/x86/kernel/entry_64.S:1496>] async_page_fault+0x28/0x30
[ 232.870230] Code: 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c
89 65 f0 4c 89 6d f8 48 89 fb 48 8b 87 50 01 00 00 <f6> 40 20 01 0f 85 18 01 00 00 65 48 8b 14 25 40
da 00 00 44 8b
[ 232.870230] RIP [<mm/page-writeback.c:1612>] balance_dirty_pages_ratelimited+0x1e/0x150
[ 232.870230] RSP <ffff880586e35c58>
[ 232.870230] CR2: 0000000000000020


Thanks,
Sasha


2014-02-26 07:15:10

by Bob Liu

[permalink] [raw]
Subject: Re: mm: NULL ptr deref in balance_dirty_pages_ratelimited

On Wed, Feb 26, 2014 at 3:32 AM, Sasha Levin <[email protected]> wrote:
> Hi all,
>
> While fuzzing with trinity inside a KVM tools running latest -next kernel
> I've stumbled on the following spew:
>
> [ 232.869443] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000020
> [ 232.870230] IP: [<mm/page-writeback.c:1612>]
> balance_dirty_pages_ratelimited+0x1e/0x150
> [ 232.870230] PGD 586e1d067 PUD 586e1e067 PMD 0
> [ 232.870230] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [ 232.870230] Dumping ftrace buffer:
> [ 232.870230] (ftrace buffer empty)
> [ 232.870230] Modules linked in:
> [ 232.870230] CPU: 36 PID: 9707 Comm: trinity-c36 Tainted: G W
> 3.14.0-rc4-next-20140225-sasha-00010-ga117461 #42
> [ 232.870230] task: ffff880586dfb000 ti: ffff880586e34000 task.ti:
> ffff880586e34000
> [ 232.870230] RIP: 0010:[<mm/page-writeback.c:1612>]
> [<mm/page-writeback.c:1612>] balance_dirty_pages_ratelimited+0x1e/0x150
> [ 232.870230] RSP: 0000:ffff880586e35c58 EFLAGS: 00010282
> [ 232.870230] RAX: 0000000000000000 RBX: ffff880582831361 RCX:
> 0000000000000007
> [ 232.870230] RDX: 0000000000000007 RSI: ffff880586dfbcc0 RDI:
> ffff880582831361
> [ 232.870230] RBP: ffff880586e35c78 R08: 0000000000000000 R09:
> 0000000000000000
> [ 232.870230] R10: 0000000000000001 R11: 0000000000000001 R12:
> 00007f58007ee000
> [ 232.870230] R13: ffff880c8d6d4f70 R14: 0000000000000200 R15:
> ffff880c8dcce710
> [ 232.870230] FS: 00007f58018bb700(0000) GS:ffff880c8e800000(0000)
> knlGS:0000000000000000
> [ 232.870230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 232.870230] CR2: 0000000000000020 CR3: 0000000586e1c000 CR4:
> 00000000000006e0
> [ 232.870230] Stack:
> [ 232.870230] ffff880586e35c78 ffff880586e33400 00007f58007ee000
> ffff880c8d6d4f70
> [ 232.870230] ffff880586e35cd8 ffffffff8127d241 0000000000000001
> 0000000000000001
> [ 232.870230] 0000000000000000 ffffea0032337080 0000000080000000
> ffff880586e33400
> [ 232.870230] Call Trace:
> [ 232.870230] [<mm/memory.c:3467>] do_shared_fault+0x1a1/0x1f0
> [ 232.870230] [<mm/memory.c:3487>] handle_pte_fault+0xc8/0x230
> [ 232.870230] [<arch/x86/include/asm/preempt.h:98>] ? delay_tsc+0xea/0x110
> [ 232.870230] [<mm/memory.c:3770>] __handle_mm_fault+0x36e/0x3a0
> [ 232.870230] [<include/linux/rcupdate.h:829>] ? rcu_read_unlock+0x5d/0x60
> [ 232.870230] [<include/linux/memcontrol.h:148>]
> handle_mm_fault+0x10b/0x1b0
> [ 232.870230] [<arch/x86/mm/fault.c:1147>] ? __do_page_fault+0x2e2/0x590
> [ 232.870230] [<arch/x86/mm/fault.c:1214>] __do_page_fault+0x551/0x590
> [ 232.870230] [<kernel/sched/cputime.c:681>] ?
> vtime_account_user+0x91/0xa0
> [ 232.870230] [<arch/x86/include/asm/atomic.h:26>] ?
> context_tracking_user_exit+0xa8/0x1c0
> [ 232.870230] [<arch/x86/include/asm/preempt.h:98>] ?
> _raw_spin_unlock+0x30/0x50
> [ 232.870230] [<kernel/sched/cputime.c:681>] ?
> vtime_account_user+0x91/0xa0
> [ 232.870230] [<arch/x86/include/asm/atomic.h:26>] ?
> context_tracking_user_exit+0xa8/0x1c0
> [ 232.870230] [<arch/x86/include/asm/atomic.h:26>] do_page_fault+0x3d/0x70
> [ 232.870230] [<arch/x86/kernel/kvm.c:263>] do_async_page_fault+0x35/0x100
> [ 232.870230] [<arch/x86/kernel/entry_64.S:1496>]
> async_page_fault+0x28/0x30
> [ 232.870230] Code: 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48
> 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 48 89 fb 48 8b 87 50 01 00 00
> <f6> 40 20 01 0f 85 18 01 00 00 65 48 8b 14 25 40 da 00 00 44 8b
> [ 232.870230] RIP [<mm/page-writeback.c:1612>]
> balance_dirty_pages_ratelimited+0x1e/0x150
> [ 232.870230] RSP <ffff880586e35c58>
> [ 232.870230] CR2: 0000000000000020
>
>

Could you please test below patch? I think it may fix this issue.

diff --git a/mm/memory.c b/mm/memory.c
index 548d97e..90cea22 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3419,6 +3419,7 @@ static int do_shared_fault(struct mm_struct *mm,
struct vm_area_struct *vma,
pgoff_t pgoff, unsigned int flags, pte_t orig_pte)
{
struct page *fault_page;
+ struct address_space *mapping;
spinlock_t *ptl;
pte_t *pte;
int dirtied = 0;
@@ -3454,13 +3455,14 @@ static int do_shared_fault(struct mm_struct
*mm, struct vm_area_struct *vma,

if (set_page_dirty(fault_page))
dirtied = 1;
+ mapping = fault_page->mapping;
unlock_page(fault_page);
- if ((dirtied || vma->vm_ops->page_mkwrite) && fault_page->mapping) {
+ if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
/*
* Some device drivers do not set page.mapping but still
* dirty their pages
*/
- balance_dirty_pages_ratelimited(fault_page->mapping);
+ balance_dirty_pages_ratelimited(mapping);
}

/* file_update_time outside page_lock */

2014-02-26 14:09:59

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: mm: NULL ptr deref in balance_dirty_pages_ratelimited

On Wed, Feb 26, 2014 at 03:15:07PM +0800, Bob Liu wrote:
> On Wed, Feb 26, 2014 at 3:32 AM, Sasha Levin <[email protected]> wrote:
> > Hi all,
> >
> > While fuzzing with trinity inside a KVM tools running latest -next kernel
> > I've stumbled on the following spew:
> >
> > [ 232.869443] BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000020
> > [ 232.870230] IP: [<mm/page-writeback.c:1612>]
> > balance_dirty_pages_ratelimited+0x1e/0x150
> > [ 232.870230] PGD 586e1d067 PUD 586e1e067 PMD 0
> > [ 232.870230] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [ 232.870230] Dumping ftrace buffer:
> > [ 232.870230] (ftrace buffer empty)
> > [ 232.870230] Modules linked in:
> > [ 232.870230] CPU: 36 PID: 9707 Comm: trinity-c36 Tainted: G W
> > 3.14.0-rc4-next-20140225-sasha-00010-ga117461 #42
> > [ 232.870230] task: ffff880586dfb000 ti: ffff880586e34000 task.ti:
> > ffff880586e34000
> > [ 232.870230] RIP: 0010:[<mm/page-writeback.c:1612>]
> > [<mm/page-writeback.c:1612>] balance_dirty_pages_ratelimited+0x1e/0x150
> > [ 232.870230] RSP: 0000:ffff880586e35c58 EFLAGS: 00010282
> > [ 232.870230] RAX: 0000000000000000 RBX: ffff880582831361 RCX:
> > 0000000000000007
> > [ 232.870230] RDX: 0000000000000007 RSI: ffff880586dfbcc0 RDI:
> > ffff880582831361
> > [ 232.870230] RBP: ffff880586e35c78 R08: 0000000000000000 R09:
> > 0000000000000000
> > [ 232.870230] R10: 0000000000000001 R11: 0000000000000001 R12:
> > 00007f58007ee000
> > [ 232.870230] R13: ffff880c8d6d4f70 R14: 0000000000000200 R15:
> > ffff880c8dcce710
> > [ 232.870230] FS: 00007f58018bb700(0000) GS:ffff880c8e800000(0000)
> > knlGS:0000000000000000
> > [ 232.870230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 232.870230] CR2: 0000000000000020 CR3: 0000000586e1c000 CR4:
> > 00000000000006e0
> > [ 232.870230] Stack:
> > [ 232.870230] ffff880586e35c78 ffff880586e33400 00007f58007ee000
> > ffff880c8d6d4f70
> > [ 232.870230] ffff880586e35cd8 ffffffff8127d241 0000000000000001
> > 0000000000000001
> > [ 232.870230] 0000000000000000 ffffea0032337080 0000000080000000
> > ffff880586e33400
> > [ 232.870230] Call Trace:
> > [ 232.870230] [<mm/memory.c:3467>] do_shared_fault+0x1a1/0x1f0
> > [ 232.870230] [<mm/memory.c:3487>] handle_pte_fault+0xc8/0x230
> > [ 232.870230] [<arch/x86/include/asm/preempt.h:98>] ? delay_tsc+0xea/0x110
> > [ 232.870230] [<mm/memory.c:3770>] __handle_mm_fault+0x36e/0x3a0
> > [ 232.870230] [<include/linux/rcupdate.h:829>] ? rcu_read_unlock+0x5d/0x60
> > [ 232.870230] [<include/linux/memcontrol.h:148>]
> > handle_mm_fault+0x10b/0x1b0
> > [ 232.870230] [<arch/x86/mm/fault.c:1147>] ? __do_page_fault+0x2e2/0x590
> > [ 232.870230] [<arch/x86/mm/fault.c:1214>] __do_page_fault+0x551/0x590
> > [ 232.870230] [<kernel/sched/cputime.c:681>] ?
> > vtime_account_user+0x91/0xa0
> > [ 232.870230] [<arch/x86/include/asm/atomic.h:26>] ?
> > context_tracking_user_exit+0xa8/0x1c0
> > [ 232.870230] [<arch/x86/include/asm/preempt.h:98>] ?
> > _raw_spin_unlock+0x30/0x50
> > [ 232.870230] [<kernel/sched/cputime.c:681>] ?
> > vtime_account_user+0x91/0xa0
> > [ 232.870230] [<arch/x86/include/asm/atomic.h:26>] ?
> > context_tracking_user_exit+0xa8/0x1c0
> > [ 232.870230] [<arch/x86/include/asm/atomic.h:26>] do_page_fault+0x3d/0x70
> > [ 232.870230] [<arch/x86/kernel/kvm.c:263>] do_async_page_fault+0x35/0x100
> > [ 232.870230] [<arch/x86/kernel/entry_64.S:1496>]
> > async_page_fault+0x28/0x30
> > [ 232.870230] Code: 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48
> > 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 48 89 fb 48 8b 87 50 01 00 00
> > <f6> 40 20 01 0f 85 18 01 00 00 65 48 8b 14 25 40 da 00 00 44 8b
> > [ 232.870230] RIP [<mm/page-writeback.c:1612>]
> > balance_dirty_pages_ratelimited+0x1e/0x150
> > [ 232.870230] RSP <ffff880586e35c58>
> > [ 232.870230] CR2: 0000000000000020
> >
> >
>
> Could you please test below patch? I think it may fix this issue.

What stops compiler from transform this back to unpatched?
Do you relay on unlock_page() to have a compiler barrier?

>
> diff --git a/mm/memory.c b/mm/memory.c
> index 548d97e..90cea22 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3419,6 +3419,7 @@ static int do_shared_fault(struct mm_struct *mm,
> struct vm_area_struct *vma,
> pgoff_t pgoff, unsigned int flags, pte_t orig_pte)
> {
> struct page *fault_page;
> + struct address_space *mapping;
> spinlock_t *ptl;
> pte_t *pte;
> int dirtied = 0;
> @@ -3454,13 +3455,14 @@ static int do_shared_fault(struct mm_struct
> *mm, struct vm_area_struct *vma,
>
> if (set_page_dirty(fault_page))
> dirtied = 1;
> + mapping = fault_page->mapping;
> unlock_page(fault_page);
> - if ((dirtied || vma->vm_ops->page_mkwrite) && fault_page->mapping) {
> + if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
> /*
> * Some device drivers do not set page.mapping but still
> * dirty their pages
> */
> - balance_dirty_pages_ratelimited(fault_page->mapping);
> + balance_dirty_pages_ratelimited(mapping);
> }
>
> /* file_update_time outside page_lock */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Kirill A. Shutemov

2014-02-26 14:48:32

by Bob Liu

[permalink] [raw]
Subject: Re: mm: NULL ptr deref in balance_dirty_pages_ratelimited

On Wed, Feb 26, 2014 at 10:09 PM, Kirill A. Shutemov
<[email protected]> wrote:
> On Wed, Feb 26, 2014 at 03:15:07PM +0800, Bob Liu wrote:
>> On Wed, Feb 26, 2014 at 3:32 AM, Sasha Levin <[email protected]> wrote:
>> > Hi all,
>> >
>> > While fuzzing with trinity inside a KVM tools running latest -next kernel
>> > I've stumbled on the following spew:
>> >
>> > [ 232.869443] BUG: unable to handle kernel NULL pointer dereference at
>> > 0000000000000020
>> > [ 232.870230] IP: [<mm/page-writeback.c:1612>]
>> > balance_dirty_pages_ratelimited+0x1e/0x150
>> > [ 232.870230] PGD 586e1d067 PUD 586e1e067 PMD 0
>> > [ 232.870230] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> > [ 232.870230] Dumping ftrace buffer:
>> > [ 232.870230] (ftrace buffer empty)
>> > [ 232.870230] Modules linked in:
>> > [ 232.870230] CPU: 36 PID: 9707 Comm: trinity-c36 Tainted: G W
>> > 3.14.0-rc4-next-20140225-sasha-00010-ga117461 #42
>> > [ 232.870230] task: ffff880586dfb000 ti: ffff880586e34000 task.ti:
>> > ffff880586e34000
>> > [ 232.870230] RIP: 0010:[<mm/page-writeback.c:1612>]
>> > [<mm/page-writeback.c:1612>] balance_dirty_pages_ratelimited+0x1e/0x150
>> > [ 232.870230] RSP: 0000:ffff880586e35c58 EFLAGS: 00010282
>> > [ 232.870230] RAX: 0000000000000000 RBX: ffff880582831361 RCX:
>> > 0000000000000007
>> > [ 232.870230] RDX: 0000000000000007 RSI: ffff880586dfbcc0 RDI:
>> > ffff880582831361
>> > [ 232.870230] RBP: ffff880586e35c78 R08: 0000000000000000 R09:
>> > 0000000000000000
>> > [ 232.870230] R10: 0000000000000001 R11: 0000000000000001 R12:
>> > 00007f58007ee000
>> > [ 232.870230] R13: ffff880c8d6d4f70 R14: 0000000000000200 R15:
>> > ffff880c8dcce710
>> > [ 232.870230] FS: 00007f58018bb700(0000) GS:ffff880c8e800000(0000)
>> > knlGS:0000000000000000
>> > [ 232.870230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> > [ 232.870230] CR2: 0000000000000020 CR3: 0000000586e1c000 CR4:
>> > 00000000000006e0
>> > [ 232.870230] Stack:
>> > [ 232.870230] ffff880586e35c78 ffff880586e33400 00007f58007ee000
>> > ffff880c8d6d4f70
>> > [ 232.870230] ffff880586e35cd8 ffffffff8127d241 0000000000000001
>> > 0000000000000001
>> > [ 232.870230] 0000000000000000 ffffea0032337080 0000000080000000
>> > ffff880586e33400
>> > [ 232.870230] Call Trace:
>> > [ 232.870230] [<mm/memory.c:3467>] do_shared_fault+0x1a1/0x1f0
>> > [ 232.870230] [<mm/memory.c:3487>] handle_pte_fault+0xc8/0x230
>> > [ 232.870230] [<arch/x86/include/asm/preempt.h:98>] ? delay_tsc+0xea/0x110
>> > [ 232.870230] [<mm/memory.c:3770>] __handle_mm_fault+0x36e/0x3a0
>> > [ 232.870230] [<include/linux/rcupdate.h:829>] ? rcu_read_unlock+0x5d/0x60
>> > [ 232.870230] [<include/linux/memcontrol.h:148>]
>> > handle_mm_fault+0x10b/0x1b0
>> > [ 232.870230] [<arch/x86/mm/fault.c:1147>] ? __do_page_fault+0x2e2/0x590
>> > [ 232.870230] [<arch/x86/mm/fault.c:1214>] __do_page_fault+0x551/0x590
>> > [ 232.870230] [<kernel/sched/cputime.c:681>] ?
>> > vtime_account_user+0x91/0xa0
>> > [ 232.870230] [<arch/x86/include/asm/atomic.h:26>] ?
>> > context_tracking_user_exit+0xa8/0x1c0
>> > [ 232.870230] [<arch/x86/include/asm/preempt.h:98>] ?
>> > _raw_spin_unlock+0x30/0x50
>> > [ 232.870230] [<kernel/sched/cputime.c:681>] ?
>> > vtime_account_user+0x91/0xa0
>> > [ 232.870230] [<arch/x86/include/asm/atomic.h:26>] ?
>> > context_tracking_user_exit+0xa8/0x1c0
>> > [ 232.870230] [<arch/x86/include/asm/atomic.h:26>] do_page_fault+0x3d/0x70
>> > [ 232.870230] [<arch/x86/kernel/kvm.c:263>] do_async_page_fault+0x35/0x100
>> > [ 232.870230] [<arch/x86/kernel/entry_64.S:1496>]
>> > async_page_fault+0x28/0x30
>> > [ 232.870230] Code: 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48
>> > 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 48 89 fb 48 8b 87 50 01 00 00
>> > <f6> 40 20 01 0f 85 18 01 00 00 65 48 8b 14 25 40 da 00 00 44 8b
>> > [ 232.870230] RIP [<mm/page-writeback.c:1612>]
>> > balance_dirty_pages_ratelimited+0x1e/0x150
>> > [ 232.870230] RSP <ffff880586e35c58>
>> > [ 232.870230] CR2: 0000000000000020
>> >
>> >
>>
>> Could you please test below patch? I think it may fix this issue.
>
> What stops compiler from transform this back to unpatched?

Sorry for my fault. I'll format a patch later.

> Do you relay on unlock_page() to have a compiler barrier?
>

Before your commit mapping is a local variable and be assigned before
unlock_page():
struct address_space *mapping = page->mapping;
unlock_page(dirty_page);
put_page(dirty_page);
if ((dirtied || page_mkwrite) && mapping) {


I'm afraid now "fault_page->mapping" might be changed to NULL after
"if ((dirtied || vma->vm_ops->page_mkwrite) && fault_page->mapping) {"
and then passed down to balance_dirty_pages_ratelimited(NULL).

>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 548d97e..90cea22 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3419,6 +3419,7 @@ static int do_shared_fault(struct mm_struct *mm,
>> struct vm_area_struct *vma,
>> pgoff_t pgoff, unsigned int flags, pte_t orig_pte)
>> {
>> struct page *fault_page;
>> + struct address_space *mapping;
>> spinlock_t *ptl;
>> pte_t *pte;
>> int dirtied = 0;
>> @@ -3454,13 +3455,14 @@ static int do_shared_fault(struct mm_struct
>> *mm, struct vm_area_struct *vma,
>>
>> if (set_page_dirty(fault_page))
>> dirtied = 1;
>> + mapping = fault_page->mapping;
>> unlock_page(fault_page);
>> - if ((dirtied || vma->vm_ops->page_mkwrite) && fault_page->mapping) {
>> + if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
>> /*
>> * Some device drivers do not set page.mapping but still
>> * dirty their pages
>> */
>> - balance_dirty_pages_ratelimited(fault_page->mapping);
>> + balance_dirty_pages_ratelimited(mapping);
>> }
>>
>> /* file_update_time outside page_lock */
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
> --
> Kirill A. Shutemov

--
Regards,
--Bob

2014-02-26 15:26:51

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: mm: NULL ptr deref in balance_dirty_pages_ratelimited

On Wed, Feb 26, 2014 at 10:48:30PM +0800, Bob Liu wrote:
> > Do you relay on unlock_page() to have a compiler barrier?
> >
>
> Before your commit mapping is a local variable and be assigned before
> unlock_page():
> struct address_space *mapping = page->mapping;
> unlock_page(dirty_page);
> put_page(dirty_page);
> if ((dirtied || page_mkwrite) && mapping) {
>
>
> I'm afraid now "fault_page->mapping" might be changed to NULL after
> "if ((dirtied || vma->vm_ops->page_mkwrite) && fault_page->mapping) {"
> and then passed down to balance_dirty_pages_ratelimited(NULL).

I see what you try to fix. I wounder if we need to do

mapping = ACCESS_ONCE(fault_page->mapping);

instead.

The question is if compiler on its own can eliminate intermediate variable
and dereference fault_page->mapping twice, as code with my patch does.
I ask because smp_mb__after_clear_bit() in unlock_page() does nothing on
some architectures.

> >>
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index 548d97e..90cea22 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -3419,6 +3419,7 @@ static int do_shared_fault(struct mm_struct *mm,
> >> struct vm_area_struct *vma,
> >> pgoff_t pgoff, unsigned int flags, pte_t orig_pte)
> >> {
> >> struct page *fault_page;
> >> + struct address_space *mapping;
> >> spinlock_t *ptl;
> >> pte_t *pte;
> >> int dirtied = 0;
> >> @@ -3454,13 +3455,14 @@ static int do_shared_fault(struct mm_struct
> >> *mm, struct vm_area_struct *vma,
> >>
> >> if (set_page_dirty(fault_page))
> >> dirtied = 1;
> >> + mapping = fault_page->mapping;
> >> unlock_page(fault_page);
> >> - if ((dirtied || vma->vm_ops->page_mkwrite) && fault_page->mapping) {
> >> + if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
> >> /*
> >> * Some device drivers do not set page.mapping but still
> >> * dirty their pages
> >> */
> >> - balance_dirty_pages_ratelimited(fault_page->mapping);
> >> + balance_dirty_pages_ratelimited(mapping);
> >> }
> >>
> >> /* file_update_time outside page_lock */
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at http://www.tux.org/lkml/
> >
> > --
> > Kirill A. Shutemov
>
> --
> Regards,
> --Bob

--
Kirill A. Shutemov

2014-02-26 15:45:42

by Paul E. McKenney

[permalink] [raw]
Subject: Re: mm: NULL ptr deref in balance_dirty_pages_ratelimited

On Wed, Feb 26, 2014 at 05:20:51PM +0200, Kirill A. Shutemov wrote:
> On Wed, Feb 26, 2014 at 10:48:30PM +0800, Bob Liu wrote:
> > > Do you relay on unlock_page() to have a compiler barrier?
> > >
> >
> > Before your commit mapping is a local variable and be assigned before
> > unlock_page():
> > struct address_space *mapping = page->mapping;
> > unlock_page(dirty_page);
> > put_page(dirty_page);
> > if ((dirtied || page_mkwrite) && mapping) {
> >
> >
> > I'm afraid now "fault_page->mapping" might be changed to NULL after
> > "if ((dirtied || vma->vm_ops->page_mkwrite) && fault_page->mapping) {"
> > and then passed down to balance_dirty_pages_ratelimited(NULL).
>
> I see what you try to fix. I wounder if we need to do
>
> mapping = ACCESS_ONCE(fault_page->mapping);
>
> instead.
>
> The question is if compiler on its own can eliminate intermediate variable
> and dereference fault_page->mapping twice, as code with my patch does.
> I ask because smp_mb__after_clear_bit() in unlock_page() does nothing on
> some architectures.

The compiler is most definitely within its rights to eliminate intermediate
variables if you don't use something like ACCESS_ONCE(). For more info,
see the LWN writeup: http://lwn.net/Articles/508991/

Thanx, Paul

> > >>
> > >> diff --git a/mm/memory.c b/mm/memory.c
> > >> index 548d97e..90cea22 100644
> > >> --- a/mm/memory.c
> > >> +++ b/mm/memory.c
> > >> @@ -3419,6 +3419,7 @@ static int do_shared_fault(struct mm_struct *mm,
> > >> struct vm_area_struct *vma,
> > >> pgoff_t pgoff, unsigned int flags, pte_t orig_pte)
> > >> {
> > >> struct page *fault_page;
> > >> + struct address_space *mapping;
> > >> spinlock_t *ptl;
> > >> pte_t *pte;
> > >> int dirtied = 0;
> > >> @@ -3454,13 +3455,14 @@ static int do_shared_fault(struct mm_struct
> > >> *mm, struct vm_area_struct *vma,
> > >>
> > >> if (set_page_dirty(fault_page))
> > >> dirtied = 1;
> > >> + mapping = fault_page->mapping;
> > >> unlock_page(fault_page);
> > >> - if ((dirtied || vma->vm_ops->page_mkwrite) && fault_page->mapping) {
> > >> + if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
> > >> /*
> > >> * Some device drivers do not set page.mapping but still
> > >> * dirty their pages
> > >> */
> > >> - balance_dirty_pages_ratelimited(fault_page->mapping);
> > >> + balance_dirty_pages_ratelimited(mapping);
> > >> }
> > >>
> > >> /* file_update_time outside page_lock */
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > >> the body of a message to [email protected]
> > >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >> Please read the FAQ at http://www.tux.org/lkml/
> > >
> > > --
> > > Kirill A. Shutemov
> >
> > --
> > Regards,
> > --Bob
>
> --
> Kirill A. Shutemov
>

2014-02-26 15:47:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: mm: NULL ptr deref in balance_dirty_pages_ratelimited

On Wed, Feb 26, 2014 at 05:20:51PM +0200, Kirill A. Shutemov wrote:
> On Wed, Feb 26, 2014 at 10:48:30PM +0800, Bob Liu wrote:
> > > Do you relay on unlock_page() to have a compiler barrier?
> > >
> >
> > Before your commit mapping is a local variable and be assigned before
> > unlock_page():
> > struct address_space *mapping = page->mapping;
> > unlock_page(dirty_page);
> > put_page(dirty_page);
> > if ((dirtied || page_mkwrite) && mapping) {
> >
> >
> > I'm afraid now "fault_page->mapping" might be changed to NULL after
> > "if ((dirtied || vma->vm_ops->page_mkwrite) && fault_page->mapping) {"
> > and then passed down to balance_dirty_pages_ratelimited(NULL).
>
> I see what you try to fix. I wounder if we need to do
>
> mapping = ACCESS_ONCE(fault_page->mapping);
>
> instead.
>
> The question is if compiler on its own can eliminate intermediate variable
> and dereference fault_page->mapping twice, as code with my patch does.
> I ask because smp_mb__after_clear_bit() in unlock_page() does nothing on
> some architectures.

That's a bug, and I have patches for that. That said; this is only ia64
and sparc32. ia64 has an actual full memory barrier in there very much
including a compiler fence. And sparc32 atomics do too.

In general, any atomic RMW op also implies a compiler fence. This
includes clear_bit().

That said; unlock_page() should have RELEASE semantics, this too
enforces that the read of page->mapping stay before the unlock_page().
The second usage of mapping may leak into the locked region, but it may
not re-read after.