Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753296AbYG2A0Z (ORCPT ); Mon, 28 Jul 2008 20:26:25 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751471AbYG2A0Q (ORCPT ); Mon, 28 Jul 2008 20:26:16 -0400 Received: from fg-out-1718.google.com ([72.14.220.153]:55874 "EHLO fg-out-1718.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751391AbYG2A0P (ORCPT ); Mon, 28 Jul 2008 20:26:15 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=bLZWNLRBEPp/NEEkLQwa+1bD/wxcG7xoUjBMEq/AiBIfHtK/+uINrcJ65vR33Wll/6 NEms9v+HA/NeV4Re8VtUQ56vnFL5qCpWW4GqNL7CXK9CyURCS1X8h7IdAbSqh4SVX1/f 0vMAvQcRz9ETo0K1ZcbzuDCx+gydqI0+EOpbE= Date: Tue, 29 Jul 2008 04:26:24 +0400 From: Alexey Dobriyan To: Johannes Weiner Cc: akpm@osdl.org, torvalds@osdl.org, npiggin@suse.de, linux-kernel@vger.kernel.org Subject: Re: [PATCH] x86: do not overrun page table ranges in gup Message-ID: <20080729002624.GA5193@martell.zuzino.mipt.ru> References: <20080728184947.GA5041@martell.zuzino.mipt.ru> <20080728185316.GA19479@martell.zuzino.mipt.ru> <87y73l5zlj.fsf_-_@saeurebad.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87y73l5zlj.fsf_-_@saeurebad.de> User-Agent: Mutt/1.5.16 (2007-06-09) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4589 Lines: 105 On Tue, Jul 29, 2008 at 02:00:08AM +0200, Johannes Weiner wrote: > Alexey Dobriyan writes: > > > On Mon, Jul 28, 2008 at 10:49:47PM +0400, Alexey Dobriyan wrote: > >> Version: 2.6.26-837b41b5de356aa67abb2cadb5eef3efc7776f91 > >> Core2 Duo, x86_64, 4 GB of RAM. > >> > >> Kernel is "tainted" with ZFS driver, but it can so little, and > >> probability of screwup is very little too. :-) > >> > >> > >> Long LTP session finally ended with > >> > >> BUG: unable to handle kernel paging request at ffff88012b60c000 > >> IP: [] gup_pte_range+0x54/0x120 > >> PGD 202063 PUD a067 PMD 17cedc163 PTE 800000012b60c160 > >> Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC > >> CPU 0 > >> Modules linked in: zfs iptable_raw xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 ip_tables x_tables nf_conntrack_irc nf_conntrack fuse usblp uhci_hcd ehci_hcd usbcore sr_mod cdrom [last unloaded: zfs] > >> Pid: 16863, comm: vmsplice01 Tainted: G W 2.6.26-zfs #2 > >> RIP: 0010:[] [] gup_pte_range+0x54/0x120 > >> RSP: 0018:ffff88012ff57c68 EFLAGS: 00010096 > >> RAX: 0000000000000008 RBX: 00007fff4a800000 RCX: 0000000000000001 > >> RDX: ffffe200040b5f00 RSI: 00007fff4a800310 RDI: ffff88012b60c000 > >> RBP: ffff88012ff57c78 R08: 0000000000000005 R09: ffff88012ff57cec > >> R10: 0000000000000024 R11: 0000000000000205 R12: ffff88012ff57e58 > >> R13: 00007fff4a807310 R14: 00007fff4a80730f R15: ffff88012ff57e58 > >> FS: 00007fbb4280b6f0(0000) GS:ffffffff805dec40(0000) knlGS:0000000000000000 > >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> CR2: ffff88012b60c000 CR3: 000000017e294000 CR4: 00000000000006e0 > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >> Process vmsplice01 (pid: 16863, threadinfo ffff88012ff56000, task ffff88015f9db360) > >> Stack: 00007fff4a800000 ffff88010e6cf298 ffff88012ff57d18 ffffffff802243cb > >> 0000000000000002 ffff88015f9db360 0000000004f23a08 00007fff4a7f7310 > >> ffff88017d582880 00007fff4a807310 00007fff4a807310 ffff88017e2947f8 > >> Call Trace: > >> [] get_user_pages_fast+0x1db/0x300 > >> [] sys_vmsplice+0x32d/0x420 > >> [] ? unlock_page+0x2d/0x40 > >> [] ? __do_fault+0x1c8/0x450 > >> [] ? __up_read+0x4c/0xb0 > >> [] ? up_read+0x26/0x30 > >> [] ? spd_release_page+0x0/0x20 > >> [] ? lockdep_sys_exit_thunk+0x35/0x67 > >> [] system_call_fastpath+0x16/0x1b > > > > Very reproducible and ZFS driver doesn't matter: > > > > # vmsplice01 from LTP 20080630 > > # while true; do ./vmsplice01; done > > I think this is the right fix, but my thinking might be buggy, please > verify. > > --- > From: Johannes Weiner > Subject: x86: do not overrun page table ranges in gup > > Walking the addresses in PAGE_SIZE steps and checking for addr != end > assumes that the distance between addr and end is a multiple of > PAGE_SIZE. > > This is not garuanteed, though, if the end of this level walk is not the > total end (for which we know that the distance to the starting address > is a multiple of PAGE_SIZE) but that of the range the upper level entry > maps. This immediately triggers kernel BUG at arch/x86/mm/gup.c:267! > --- a/arch/x86/mm/gup.c > +++ b/arch/x86/mm/gup.c > @@ -92,7 +92,7 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr, > pages[*nr] = page; > (*nr)++; > > - } while (ptep++, addr += PAGE_SIZE, addr != end); > + } while (ptep++, addr += PAGE_SIZE, addr <= end); > pte_unmap(ptep - 1); > > return 1; > @@ -131,7 +131,7 @@ static noinline int gup_huge_pmd(pmd_t pmd, unsigned long addr, > (*nr)++; > page++; > refs++; > - } while (addr += PAGE_SIZE, addr != end); > + } while (addr += PAGE_SIZE, addr <= end); > get_head_page_multiple(head, refs); > > return 1; > @@ -188,7 +188,7 @@ static noinline int gup_huge_pud(pud_t pud, unsigned long addr, > (*nr)++; > page++; > refs++; > - } while (addr += PAGE_SIZE, addr != end); > + } while (addr += PAGE_SIZE, addr <= end); > get_head_page_multiple(head, refs); > > return 1; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/