Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752521Ab3GADSw (ORCPT ); Sun, 30 Jun 2013 23:18:52 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:36376 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751890Ab3GADSv (ORCPT ); Sun, 30 Jun 2013 23:18:51 -0400 Message-ID: <51D0F514.3070309@oracle.com> Date: Mon, 01 Jul 2013 11:18:44 +0800 From: Joe Jin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6 MIME-Version: 1.0 To: Alex Bligh CC: Eric Dumazet , Frank Blaschka , "David S. Miller" , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, zheng.x.li@oracle.com, Xen Devel , Ian Campbell , Jan Beulich , Stefano Stabellini Subject: Re: kernel panic in skb_copy_bits References: <51CBAA48.3080802@oracle.com> <1372311118.3301.214.camel@edumazet-glaptop> <51CD0E67.4000008@oracle.com> <6BFD5AF235F72F13CE646A0D@nimrod.local> In-Reply-To: <6BFD5AF235F72F13CE646A0D@nimrod.local> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Source-IP: acsinet21.oracle.com [141.146.126.237] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3315 Lines: 87 On 06/30/13 17:13, Alex Bligh wrote: > > > --On 28 June 2013 12:17:43 +0800 Joe Jin wrote: > >> Find a similar issue >> http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen >> developer as well. > > I thought this sounded familiar. I haven't got the start of this > thread, but what version of Xen are you running and what device > model? If before 4.3, there is a page lifetime bug in the kernel > (not the xen code) which can affect anything where the guest accesses > the host's block stack and that in turn accesses the networking > stack (it may in fact be wider than that). So, e.g. domU on > iCSSI will do it. It tends to get triggered by a TCP retransmit > or (on NFS) the RPC equivalent. Essentially block operation > is considered complete, returning through xen and freeing the > grant table entry, and yet something in the kernel (e.g. tcp > retransmit) can still access the data. The nature of the bug > is extensively discussed in that thread - you'll also find > a reference to a thread on linux-nfs which concludes it > isn't an nfs problem, and even some patches to fix it in the > kernel adding reference counting. Do you know if have a fix for above? so far we also suspected the grant page be unmapped earlier, we using 4.1 stable during our test. > > A workaround is to turn off O_DIRECT use by Xen as that ensures > the pages are copied. Xen 4.3 does this by default. > > I believe fixes for this are in 4.3 and 4.2.2 if using the > qemu upstream DM. Note these aren't real fixes, just a workaround > of a kernel bug. The guest is pvm, and disk model is xvbd, guest config file as below: vif = ['mac=00:21:f6:00:00:01,bridge=c0a80b00'] OVM_simple_name = 'Guest#1' disk = ['file:/OVS/Repositories/0004fb000003000091e9eae94d1e907c/VirtualDisks/0004fb0000120000f78799dad800ef47.img,xvda,w', 'phy:/dev/mapper/360060e8010141870058b415700000002,xvdb,w', 'phy:/dev/mapper/360060e8010141870058b415700000003,xvdc,w'] bootargs = '' uuid = '0004fb00-0006-0000-2b00-77a4766001ed' on_reboot = 'restart' cpu_weight = 27500 OVM_os_type = 'Oracle Linux 5' cpu_cap = 0 maxvcpus = 8 OVM_high_availability = False memory = 4096 OVM_description = '' on_poweroff = 'destroy' on_crash = 'restart' bootloader = '/usr/bin/pygrub' guest_os_type = 'linux' name = '0004fb00000600002b0077a4766001ed' vfb = ['type=vnc,vncunused=1,vnclisten=127.0.0.1,keymap=en-us'] vcpus = 8 OVM_cpu_compat_group = '' OVM_domain_type = 'xen_pvm' > > To fix on a local build of xen you will need something like this: > https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 > and something like this (NB: obviously insert your own git > repo and commit numbers) > https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca > I think this only for pvhvm/hvm? Thanks, Joe > Also note those fixes are (technically) unsafe for live migration > unless there is an ordering change made in qemu's block open > call. > > Of course this might be something completely different. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/