Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp1673743ybh; Fri, 13 Mar 2020 05:30:01 -0700 (PDT) X-Google-Smtp-Source: ADFU+vtL4tPjd4XIvEQXaDgUsoSZTfgL7pqqJptkoNhrG6aMUqbi9UhAUtj82Rgju7Cvg0sTjPsa X-Received: by 2002:aca:2b14:: with SMTP id i20mr6803316oik.79.1584102600933; Fri, 13 Mar 2020 05:30:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584102600; cv=none; d=google.com; s=arc-20160816; b=UoXMdChFScjalyMSY6Ux/A/BeiJ1Wb4SfEpY5fY9VCRXkKuS87h/ZgWJN3rbpXVVgo TaD55Wm/gqzJd3YGjzeK5Y0MmD9BqfBOk9UEizTCd10tO4uJGrn58gJ3lTt9m6NXA+ub v+r10m/kNkFlDhUk2AK1dKJqrvhIK3v37uT0bl5Wo8Txt8XmINYP8NwzGwno6ZqwnVdA jliGF4QxwXnzL4xx4Kg2weM9QKC7lNmzABRNERGKgdMG9U+UFXWSGO7xUNitVb80bIGK 0Yrut7E0LRxj8KxizRxh4rlnXX/xNYHSEz+/UQ0eqr+oji8qlfWyTnyZflwTaKNvR0Cp XToA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=OU4/3Xjn6q1nsZUpNUkrGNn7HgcETqn5PJkM5yaJ1RY=; b=NI/vX7qaPZoO7owDQhOCp6/izxlXDY2bFX4cfNwbqAOB8Z5W6o8EASHCzDLEex5JsS wKAWH0Z58SyQvgyPtCJDK3SqlldAKBr6xWWxvchjWfZX/eI0jZxI+R+NnTOqNff97p+W eYCCU3sHj+F4uzJ2f8u7JKMv6MhvdVwbD/Qral8EMgSllpoXstwKxhjrkWdHjstrou7c ziMszOm+KC+5l/U9Mv82iHdb3d5Sz9PNDXpR30gCfUzAfxeZFpRd46gxW7BRwk2kwF1M ZTWrrRmQ27buCCsN7ZEo7CBqkDWoUOaabECbQa2icagxCiPe4joUfogNoesqUHelOHe/ GzDw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 193si4393848oie.51.2020.03.13.05.29.47; Fri, 13 Mar 2020 05:30:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726801AbgCMM23 (ORCPT + 99 others); Fri, 13 Mar 2020 08:28:29 -0400 Received: from relay9-d.mail.gandi.net ([217.70.183.199]:57483 "EHLO relay9-d.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726216AbgCMM22 (ORCPT ); Fri, 13 Mar 2020 08:28:28 -0400 X-Originating-IP: 81.149.34.29 Received: from localhost (host81-149-34-29.in-addr.btopenworld.com [81.149.34.29]) (Authenticated sender: josh@joshtriplett.org) by relay9-d.mail.gandi.net (Postfix) with ESMTPSA id 7EE4EFF803; Fri, 13 Mar 2020 12:28:25 +0000 (UTC) Date: Fri, 13 Mar 2020 12:28:24 +0000 From: Josh Triplett To: "Jubran, Samih" Cc: "Machulsky, Zorik" , "Belgazal, Netanel" , "Kiyanovski, Arthur" , "Tzalik, Guy" , "Bshara, Saeed" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: Re: [PATCH] ena: Speed up initialization 90x by reducing poll delays Message-ID: <20200313122824.GA1389@localhost> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 11, 2020 at 01:24:17PM +0000, Jubran, Samih wrote: > Hi Josh, > > Thanks for taking the time to write this patch. I have faced a bug while testing it that I haven't pinpointed yet the root cause of the issue, but it seems to me like a race in the netlink infrastructure. > > Here is the bug scenario: > 1. created ac c5.24xlarge instance in AWS in v_virginia region using the default amazon Linux 2 AMI > 2. apply your patch won top of net-next v5.2 and install the kernel (currently I'm able to boot net-next v5.2 only, higher versions of net-next suffer from errors during boot time) > 3. run "rmmod ena && insmod ena.ko" twice > > Result: > The interface is not in up state > > Expected result: > The interface should be in up state > > What I know so far: > * ena_probe() seems to finish with no errors whatsoever > * adding prints / delays to ena_probe() causes the bug to vanish or less likely to occur depending on the amount of delays I add > * ena_up() is not called at all when the bug occurs, so it's something to do with netlink not invoking dev_open() > > Did you face such issues? Do you have any idea what might be causing this? I haven't observed anything like this. I didn't test with Amazon Linux 2, though. To rule out some possibilities, could you try disabling *all* userspace networking bits, so that userspace does nothing with a newly discovered interface, and then testing again? (The interface wouldn't be "up" in that case, but it should still have a link detected.) If that works, then I wonder if the userspace used in Amazon Linux 2 might have some kind of race where it's still using the previous incarnation of the device when you rmmod and insmod? Perhaps the previous delays made it difficult or impossible to trigger that race? - Josh Triplett