Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp743552pxb; Wed, 27 Oct 2021 11:31:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwd+/SFAHRJrg1kVaFb7/4krjCxkMIwY6VAwT9woFR9ZW/zsdB+nnNthvkgvpNiVCvBW3DY X-Received: by 2002:a05:6402:34d1:: with SMTP id w17mr45896219edc.383.1635359457876; Wed, 27 Oct 2021 11:30:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635359457; cv=none; d=google.com; s=arc-20160816; b=VkJBpNfcS+uR74j5ptkhVyIKkEO3bRgzn50TeSgSffY2llRYjvEb4Ch4oNq2ig+k5P rkILYi+EnOtMH338BZHE2tpr4GLwLcUNdGaPgv3sO7KZZ1B6fLel2GUkgRX9452p0W05 vUW+AiF3mFmwqtDCS01TqKPec6xHbplcLMX6F9JhwHvb5AVXpE1CW1VnmYGfanoHBG+O V1kY/in00rzAArWMNL2PFho2N3tus3vHxBfpgKWVeeTsl6gSTs5giwTKFEPjg5f2UH3/ 4GGIW0WdgbBkdv8XECXXtUg9KWUjX8F6C6mWSAww3CGJLLbqSsDA3ZP1zEs2CAB8moaN DioA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=7CBJw7zPUiW999CTiRF3SOVkwDtx4aH+xqmVHJXGp5Y=; b=sbC+31oW69POYpqtBUktCFqRPGNJ67gBtTP8fUJUL1kFtPLQGxVsCLc2eaNymyuWx3 GsgTGSrPVoGWWvKtsEkNpMv5GGpwgH+Zx6CfFQViSA7ut4PPWsU2pwEoQGRxq04l1SnU Dj+T1FAnz17hgxZib+oi/V4lpq6lXNzkGTaE7kqAXOrItj4Zc/FYY2nc2LWSZBjrRJEY OuopPhpkAYPMWyuulWdDvn5QmETfooJ+e6eJjfLpzUg1d6iPChCMJpsyabN/q2erQ52P Nh57e+x8GpwuJya3D7U1t341Zf/0eBsByQBnWif/yFCQKLnjaYVyfN4G0f7eQJqQ99qs XOZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HbfzhxOv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o11si756760eja.423.2021.10.27.11.30.31; Wed, 27 Oct 2021 11:30:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HbfzhxOv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239974AbhJ0GqX (ORCPT + 99 others); Wed, 27 Oct 2021 02:46:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:52088 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231530AbhJ0GqV (ORCPT ); Wed, 27 Oct 2021 02:46:21 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 9757061073; Wed, 27 Oct 2021 06:43:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1635317035; bh=16Olf4WVQYSRwe1OkFa1dBzILsGrYofaKzrqASex/Zg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=HbfzhxOvHNtFDWxRIqBqW/oFXtkwTVszVX6KZTI7cIIZNEmyS5BwtpFZ5FB9Na+eG Dim3h+I2yLnBxfUCr3bRMI5N4gkyiPhJkQXAioqR36saTmfu8wMucioBwNmqqJMQ3G bWBYXNM5R9Q9U9hq41H5LgBqOkDtiJN8Y0JfoTtB6oHBEtj2Cg5Dw4TaaPJHdQVU34 bgjlBQnLV/Sa8xU1BAozwvzhnTpx/qAKyGQX9G7MEqSlp9Fh42BseYhStiEv0PHJ0J M1tlbYxC57jouCTf8WEBijR5WK0Gg9osWZHdVcrl9RM94vCUhysaU1vjzlgQ6MfbfJ g1pOCsvt5qoDQ== Date: Wed, 27 Oct 2021 09:43:51 +0300 From: Leon Romanovsky To: Edwin Peer Cc: Ido Schimmel , "David S . Miller" , Jakub Kicinski , Ido Schimmel , Jiri Pirko , Linux Kernel Mailing List , netdev , syzbot+93d5accfaefceedf43c1@syzkaller.appspotmail.com, Michael Chan Subject: Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device Message-ID: References: <725e121f05362da4328dda08d5814211a0725dac.1635064599.git.leonro@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 26, 2021 at 01:03:41PM -0700, Edwin Peer wrote: > On Tue, Oct 26, 2021 at 12:22 PM Leon Romanovsky wrote: > > > At least in mlx5 case, reload_enable() was before register_netdev(). > > It stayed like this after swapping it with devlink_register(). > > What am I missing here? > > err = mlx5_init_one(dev); > if (err) { > mlx5_core_err(dev, "mlx5_init_one failed with error code %d\n", err); > goto err_init_one; > } > > err = mlx5_crdump_enable(dev); > if (err) > dev_err(&pdev->dev, "mlx5_crdump_enable failed with error code > %d\n", err); > > pci_save_state(pdev); > devlink_register(devlink); > > Doesn't mlx5_init_one() ultimately result in the netdev being > presented to user space, even if it is via aux bus? The mlx5_init_one() aux devices, and driver is not always loaded directly in the Linux kernel. The device creation triggers udev event, which is handled by udev systemd. The systemd reads various modules.* files that kernel provides and this is how it knows which driver to load. In our case, the eth driver is part of mlx5_core module, so at the device creation phase that module is already loaded and driver/core will try to autoprobe it. However, the last step is not always performed and controlled by the userspace. Users can disable driver autoprobe and bind manually. This is pretty standard practice in the SR-IOV or VFIO modes. > > > No, it is not requirement, but my suggestion. You need to be aware that > > after call to devlink_register(), the device will be fully open for devlink > > netlink access. So it is strongly advised to put devlink_register to be the > > last command in PCI initialization sequence. > > Right, that's the problem. Once we register the netdev, we're in a > race with user space, which may expect to be able to call devlink > before we get to devlink_register(). This is why devlink has monitor mode where you can see devlink device addition and removal. It is user space job to check that device is ready. > > > You obviously need to fix your code. Upstream version of bnxt driver > > doesn't have reload_* support, so all this regression blaming it not > > relevant here. > > Right, our timing is unfortunate and that's on us. It's still not > clear to me how to actually fix the devlink reload code without the > benefit of something similar to the reload enable API. > > > In upstream code, devlink_register() doesn't accept ops like it was > > before and position of that call does only one thing - opens devlink > > netlink access. All kernel devlink APIs continue to be accessible even > > before devlink_register. > > This isn't about kernel API. This is precisely about existing user > space that expects devlink to work immediately after the netdev > appears. Can you please share open source project that has such assumption? > > > It looks like your failure is in backport code. > > Our out-of-tree driver isn't the issue here. I'm talking about the > proposed upstream code. The issue is what to do in order to get > something workable upstream for devlink reload. We can't move > devlink_register() later, that will cause a regression. What do you > suggest instead? Fix your test respect devlink notifications and don't ignore them. Thanks > > Regards, > Edwin Peer