Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754961AbdCBSBD convert rfc822-to-8bit (ORCPT ); Thu, 2 Mar 2017 13:01:03 -0500 Received: from userp1050.oracle.com ([156.151.31.82]:40274 "EHLO userp1050.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752219AbdCBSAh (ORCPT ); Thu, 2 Mar 2017 13:00:37 -0500 Subject: Re: BUG due to "xen-netback: protect resource cleaning on XenBus disconnect" To: Juergen Gross , igor.druzhinin@citrix.com, xen-devel , Linux Kernel Mailing List , "netdev@vger.kernel.org" References: <75c81731-e4a7-bde1-c4fd-a52e97b820a0@suse.com> Cc: David Miller , Wei Liu , Paul Durrant From: Boris Ostrovsky Message-ID: <969001fa-d76f-801a-28c9-9b65d06cb351@oracle.com> Date: Thu, 2 Mar 2017 09:25:35 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: <75c81731-e4a7-bde1-c4fd-a52e97b820a0@suse.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Source-IP: userp1040.oracle.com [156.151.31.81] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2528 Lines: 62 On 03/02/2017 06:56 AM, Juergen Gross wrote: > With commits f16f1df65 and 9a6cdf52b we get in our Xen testing: > > [ 174.512861] switch: port 2(vif3.0) entered disabled state > [ 174.522735] BUG: sleeping function called from invalid context at > /home/build/linux-linus/mm/vmalloc.c:1441 > [ 174.523451] in_atomic(): 1, irqs_disabled(): 0, pid: 28, name: xenwatch > [ 174.524131] CPU: 1 PID: 28 Comm: xenwatch Tainted: G W > 4.10.0upstream-11073-g4977ab6-dirty #1 > [ 174.524819] Hardware name: MSI MS-7680/H61M-P23 (MS-7680), BIOS V17.0 > 03/14/2011 > [ 174.525517] Call Trace: > [ 174.526217] show_stack+0x23/0x60 > [ 174.526899] dump_stack+0x5b/0x88 > [ 174.527562] ___might_sleep+0xde/0x130 > [ 174.528208] __might_sleep+0x35/0xa0 > [ 174.528840] ? _raw_spin_unlock_irqrestore+0x13/0x20 > [ 174.529463] ? __wake_up+0x40/0x50 > [ 174.530089] remove_vm_area+0x20/0x90 > [ 174.530724] __vunmap+0x1d/0xc0 > [ 174.531346] ? delete_object_full+0x13/0x20 > [ 174.531973] vfree+0x40/0x80 > [ 174.532594] set_backend_state+0x18a/0xa90 > [ 174.533221] ? dwc_scan_descriptors+0x24d/0x430 > [ 174.533850] ? kfree+0x5b/0xc0 > [ 174.534476] ? xenbus_read+0x3d/0x50 > [ 174.535101] ? xenbus_read+0x3d/0x50 > [ 174.535718] ? xenbus_gather+0x31/0x90 > [ 174.536332] ? ___might_sleep+0xf6/0x130 > [ 174.536945] frontend_changed+0x6b/0xd0 > [ 174.537565] xenbus_otherend_changed+0x7d/0x80 > [ 174.538185] frontend_changed+0x12/0x20 > [ 174.538803] xenwatch_thread+0x74/0x110 > [ 174.539417] ? woken_wake_function+0x20/0x20 > [ 174.540049] kthread+0xe5/0x120 > [ 174.540663] ? xenbus_printf+0x50/0x50 > [ 174.541278] ? __kthread_init_worker+0x40/0x40 > [ 174.541898] ret_from_fork+0x21/0x2c > [ 174.548635] switch: port 2(vif3.0) entered disabled state > > I believe calling vfree() when holding a spin_lock isn't a good idea. > > Boris, this is the dumpdata failure: > FAILURE 4.10.0upstream-11073-g4977ab6-dirty(x86_64) > 4.10.0upstream-11073-g4977ab6-dirty(i386)\: 2017-03-02 (tst007) That's not the cause of the test failure though --- it's "just" a warning. The problem here was that 64- and 32-bit build trees got out of sync (which is my fault, I switched the former to staging but forgot to do the same for the latter). We have in the log: libxl: error: libxl_create.c:564:libxl__domain_make: domain creation fail: Operation not supported libxl: error: libxl_create.c:931:initiate_domain_create: cannot make domain: -3 I now have both trees use staging. -boris