Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp1845904imm; Thu, 27 Sep 2018 03:32:29 -0700 (PDT) X-Google-Smtp-Source: ACcGV61d/LzD78O6pIEmVatwrnECt9DiTmXQlZGyq6mxTopppdwxggDRX4eJnwxMn8w/rWtG2gOR X-Received: by 2002:a17:902:aa47:: with SMTP id c7-v6mr10562442plr.100.1538044349413; Thu, 27 Sep 2018 03:32:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538044349; cv=none; d=google.com; s=arc-20160816; b=psrrv6LMRZ8zWTi4b2gtYtUq38j0Ba4Fm5gp5GuH/KmMS31DLdyHJBTb9zeKPU0QV8 LgZG2Rkx+LcjYihvFpntXi8mWzC/uD+PulwFGUsPElehimbyuYZGtCIkcrufmTxkvyni Q2Srl8D5CKhsFkbBHSfa1T4H3tLqGCPfl8MNdhDut5cOaa3UqFmkfBC8jTnmUmYDU6J0 6rtTFnhS+wip/n3vKtPj3V4VIM39vh2DzlQ+MqVV0HNNIfsDB6WIwG/d7Nb1PgyGQfNo 9C6FanVh/gLxDpYliCpKbVKlqX5GB0pKiinTFA9njZdbemglg9FbT4Xo4xCpeFlbuPaz 0H8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:date:cc:to:reply-to:from :subject:message-id; bh=kHotYxC67BsS9peo4TcrQWiq2EoBBuG7Xp2D5rDLv2M=; b=a8mmE6gaPm42eg1t21BXOBjdFt0tNdFikv38uqpVVPUu/+/3x5uNgKq6N2ZAUhuvYP OmhkzB8Ep3hNM9RpR51zGH99M0vI4eU/5AmMPIDrapODP+zQfPPzjJn6FfMvTONoDEy2 HveAs/A0SRahXQj6OIQyhc9QpImJYK/n+3GFc0a8DZZtsuErzKs2VC1TLNMSuAyRq9ye lD+XDG4lMKvLwrj6ZbCWLs1juaU5odT2hmF0ArNa5X35NmSmBPN9++ECAO60Df0u13ai HaHeCMRaZ+eOJvZNB+6JlPPfKUyCDKfnlIpEvpdY+s5MY588M89MFrlcrdJZCxLVtPhw RMuA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 3-v6si1735295pln.324.2018.09.27.03.32.13; Thu, 27 Sep 2018 03:32:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727432AbeI0QtU (ORCPT + 99 others); Thu, 27 Sep 2018 12:49:20 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35984 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727112AbeI0QtU (ORCPT ); Thu, 27 Sep 2018 12:49:20 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E5D663084044; Thu, 27 Sep 2018 10:31:43 +0000 (UTC) Received: from ovpn-112-72.ams2.redhat.com (ovpn-112-72.ams2.redhat.com [10.36.112.72]) by smtp.corp.redhat.com (Postfix) with ESMTP id D0AC25D76E; Thu, 27 Sep 2018 10:31:22 +0000 (UTC) Message-ID: <1538044281.19334.4.camel@redhat.com> Subject: Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send From: Mohammed Gamal Reply-To: mgamal@redhat.com To: Stephen Hemminger Cc: Haiyang Zhang , Stephen Hemminger , "netdev@vger.kernel.org" , "otubo@redhat.com" , "linux-kernel@vger.kernel.org" , "devel@linuxdriverproject.org" , vkuznets Date: Thu, 27 Sep 2018 12:31:21 +0200 In-Reply-To: <20180927122355.470df119@shemminger-XPS-13-9360> References: <1537979659-26979-1-git-send-email-mgamal@redhat.com> <1538038625.19334.2.camel@redhat.com> <20180927122355.470df119@shemminger-XPS-13-9360> Organization: Red Hat Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Thu, 27 Sep 2018 10:31:44 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2018-09-27 at 12:23 +0200, Stephen Hemminger wrote: > On Thu, 27 Sep 2018 10:57:05 +0200 > Mohammed Gamal wrote: > > > On Wed, 2018-09-26 at 17:13 +0000, Haiyang Zhang wrote: > > > > -----Original Message----- > > > > From: Mohammed Gamal > > > > Sent: Wednesday, September 26, 2018 12:34 PM > > > > To: Stephen Hemminger ; netdev@vger.ker > > > > nel. > > > > org > > > > Cc: KY Srinivasan ; Haiyang Zhang > > > > ; vkuznets ; > > > > otubo@redhat.com; cavery ; linux- > > > > kernel@vger.kernel.org; devel@linuxdriverproject.org; Mohammed > > > > Gamal > > > > > > > > Subject: [PATCH] hv_netvsc: Make sure out channel is fully > > > > opened > > > > on send > > > > > > > > Dring high network traffic changes to network interface > > > > parameters > > > > such as > > > > number of channels or MTU can cause a kernel panic with a NULL > > > > pointer > > > > dereference. This is due to netvsc_device_remove() being called > > > > and > > > > deallocating the channel ring buffers, which can then be > > > > accessed > > > > by > > > > netvsc_send_pkt() before they're allocated on calling > > > > netvsc_device_add() > > > > > > > > The patch fixes this problem by checking the channel state and > > > > returning > > > > ENODEV if not yet opened. We also move the call to > > > > hv_ringbuf_avail_percent() > > > > which may access the uninitialized ring buffer. > > > > > > > > Signed-off-by: Mohammed Gamal > > > > --- > > > >  drivers/net/hyperv/netvsc.c | 7 ++++++- > > > >  1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/net/hyperv/netvsc.c > > > > b/drivers/net/hyperv/netvsc.c index > > > > fe01e14..75f1b31 100644 > > > > --- a/drivers/net/hyperv/netvsc.c > > > > +++ b/drivers/net/hyperv/netvsc.c > > > > @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt( > > > >   struct netdev_queue *txq = netdev_get_tx_queue(ndev, > > > > packet->q_idx); > > > >   u64 req_id; > > > >   int ret; > > > > - u32 ring_avail = > > > > hv_get_avail_to_write_percent(&out_channel-   > > > > > outbound);   > > > > > > > > + u32 ring_avail; > > > > + > > > > + if (out_channel->state != CHANNEL_OPENED_STATE) > > > > + return -ENODEV; > > > > + > > > > + ring_avail = > > > > hv_get_avail_to_write_percent(&out_channel-   > > > > > outbound);   > > > > > > When you reproducing the NULL ptr panic, does your kernel include > > > the > > > following patch? > > > hv_netvsc: common detach logic > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.g > > > it/c > > > ommit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea > > >    > > > > Yes it is included. And the commit did reduce the occurrence of > > this > > race condition, but it still nevertheless occurs albeit rarely. > > > > > We call netif_tx_disable(ndev) and netif_device_detach(ndev) > > > before > > > doing the changes  > > > on MTU or #channels. So there should be no call to start_xmit() > > > when > > > channel is not ready. > > > > > > If you see the check for CHANNEL_OPENED_STATE is still necessary > > > on > > > upstream kernel (including  > > > the patch " common detach logic "), we should debug further on > > > the > > > code and find out the  > > > root cause. > > > > > > Thanks, > > > - Haiyang > > >    > > > > _______________________________________________ > > devel mailing list > > devel@linuxdriverproject.org > > http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev- > > devel > > Is there some workload, that can be used to reproduce this? > The stress test from Vitaly with changing parameters while running > network traffic > passes now. > > Can you reproduce this with the upstream current kernel? > > Adding the check in start xmit is still racy, and won't cure the > problem. > > Another solution would be to add a grace period in the netvsc detach > logic. > Steps to reproduce are listed here: https://bugzilla.redhat.com/show_bug.cgi?id=1632653 We've also managed to reproduce the same issue upstream. It's more likely to be reproduced on Windows 2012R2 than 2016. Regards, Mohammed