Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932479AbdIRMzR (ORCPT ); Mon, 18 Sep 2017 08:55:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34450 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932366AbdIRMzP (ORCPT ); Mon, 18 Sep 2017 08:55:15 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 6BBF380472 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=vkuznets@redhat.com From: Vitaly Kuznetsov To: KY Srinivasan Cc: Stephen Hemminger , leann.ogasawara@canonical.com, Stephen Hemminger , "apw\@canonical.com" , "olaf\@aepfle.de" , "marcelo.cerri\@canonical.com" , "gregkh\@linuxfoundation.org" , Haiyang Zhang , "linux-kernel\@vger.kernel.org" , "jasowang\@redhat.com" , "devel\@linuxdriverproject.org" Subject: Re: [PATCH 1/1] Drivers: hv: vmbus: Fix rescind handling issues References: <1502471039-5281-1-git-send-email-kys@exchange.microsoft.com> <20170824154102.62a02190@xeon-e3> <87ingkulhp.fsf@vitty.brq.redhat.com> <87efr8ul71.fsf@vitty.brq.redhat.com> <87vakgtnl2.fsf@vitty.brq.redhat.com> Date: Mon, 18 Sep 2017 14:55:10 +0200 In-Reply-To: <87vakgtnl2.fsf@vitty.brq.redhat.com> (Vitaly Kuznetsov's message of "Mon, 18 Sep 2017 10:31:21 +0200") Message-ID: <87wp4wrwsx.fsf@vitty.brq.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 18 Sep 2017 12:55:14 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3769 Lines: 62 Vitaly Kuznetsov writes: > > Reverting 6f3d791f300618caf82a2be0c27456edd76d5164 still helps. In addition to the above I got the following crash while playing with 4.14-rc1 (unmodified): [ 55.810080] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 55.814293] BUG: unable to handle kernel paging request at ffff8800059985f0 [ 55.818065] IP: 0xffff8800059985f0 [ 55.819925] PGD 22eb067 P4D 22eb067 PUD 22ec067 PMD 5f37063 PTE 8000000005998163 [ 55.820018] Oops: 0011 [#1] SMP [ 55.820018] Modules linked in: vfat fat bnx2x mdio efi_pstore hv_utils efivars pci_hyperv ptp pps_core pcspkr hv_balloon xfs libcrc32c hv_storvsc hyperv_fb hv_netvsc scsi_transport_fc hid_hyperv hyperv_keyboard hv_vmbus [ 55.834837] CPU: 0 PID: 498 Comm: kworker/0:2 Not tainted 4.14.0-rc1 #63 [ 55.834837] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012 [ 55.834837] Workqueue: events vmbus_onmessage_work [hv_vmbus] [ 55.834837] task: ffff88003f448000 task.stack: ffffc90005398000 [ 55.834837] RIP: 0010:0xffff8800059985f0 [ 55.834837] RSP: 0018:ffffc9000539be00 EFLAGS: 00010286 [ 55.834837] RAX: ffff880005998010 RBX: ffff880005998000 RCX: 0000000000000000 [ 55.834837] RDX: ffff8800059985f0 RSI: 0000000000000246 RDI: ffff880005998000 [ 55.860040] RBP: ffffc9000539be18 R08: 00000000000002e6 R09: 0000000000000000 [ 55.865057] R10: ffffc9000539bdf0 R11: 000000000000a000 R12: 0000000000000286 [ 55.865057] R13: ffff88007ae1ed00 R14: 0000000000000000 R15: ffff8800065c3200 [ 55.865057] FS: 0000000000000000(0000) GS:ffff88007ae00000(0000) knlGS:0000000000000000 [ 55.865057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.865057] CR2: ffff8800059985f0 CR3: 00000000075a5000 CR4: 00000000001406f0 [ 55.886745] Call Trace: [ 55.886745] ? vmbus_onoffer_rescind+0xfa/0x160 [hv_vmbus] [ 55.890968] vmbus_onmessage+0x2a/0x90 [hv_vmbus] [ 55.891934] vmbus_onmessage_work+0x1d/0x30 [hv_vmbus] [ 55.891934] process_one_work+0x193/0x390 [ 55.891934] worker_thread+0x48/0x3c0 [ 55.891934] kthread+0x120/0x140 [ 55.891934] ? process_one_work+0x390/0x390 [ 55.891934] ? kthread_create_on_node+0x60/0x60 [ 55.891934] ret_from_fork+0x25/0x30 [ 55.891934] Code: 88 ff ff c0 85 99 05 00 88 ff ff d0 85 99 05 00 88 ff ff d0 85 99 05 00 88 ff ff e0 85 99 05 00 88 ff ff e0 85 99 05 00 88 ff ff 85 99 05 00 88 ff ff f0 85 99 05 00 88 ff ff 00 86 99 05 00 [ 55.922505] RIP: 0xffff8800059985f0 RSP: ffffc9000539be00 [ 55.922505] CR2: ffff8800059985f0 [ 55.922505] ---[ end trace 25226e00af3f94fb ]--- [ 55.933590] Kernel panic - not syncing: Fatal exception [ 55.933590] Kernel Offset: disabled [ 55.933590] ---[ end Kernel panic - not syncing: Fatal exception So it seems that during while (READ_ONCE(channel->probe_done) == false) { /* * We wait here until any channel offer is currently * being processed. */ msleep(1); } loop the channel disappeared. The issue may not be related to the netvsc hang I mentioned before. It may make sense to do refcounting for channels/subchannels (or employ RCU). -- Vitaly