2023-01-14 06:40:36

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v7 1/8] bnxt_en: Add auxiliary driver support

On Thu, 12 Jan 2023 12:29:32 -0800 Ajit Khaparde wrote:
> Add auxiliary driver support.
> An auxiliary device will be created if the hardware indicates
> support for RDMA.
> The bnxt_ulp_probe() function has been removed and a new
> bnxt_rdma_aux_device_add() function has been added.
> The bnxt_free_msix_vecs() and bnxt_req_msix_vecs() will now hold
> the RTNL lock when they call the bnxt_close_nic()and bnxt_open_nic()
> since the device close and open need to be protected under RTNL lock.
> The operations between the bnxt_en and bnxt_re will be protected
> using the en_ops_lock.
> This will be used by the bnxt_re driver in a follow-on patch
> to create ROCE interfaces.

> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -13178,6 +13178,9 @@ static void bnxt_remove_one(struct pci_dev *pdev)
> struct net_device *dev = pci_get_drvdata(pdev);
> struct bnxt *bp = netdev_priv(dev);
>
> + bnxt_rdma_aux_device_uninit(bp);
> + bnxt_aux_dev_free(bp);

You still free bp->aux_dev synchronously..

> +void bnxt_aux_dev_free(struct bnxt *bp)
> +{
> + kfree(bp->aux_dev);

.. here. Which is called on .remove of the PCI device.

> + bp->aux_dev = NULL;
> +}
> +
> +static struct bnxt_aux_dev *bnxt_aux_dev_alloc(struct bnxt *bp)
> +{
> + return kzalloc(sizeof(struct bnxt_aux_dev), GFP_KERNEL);
> +}
> +
> +void bnxt_rdma_aux_device_uninit(struct bnxt *bp)
> +{
> + struct bnxt_aux_dev *bnxt_adev;
> + struct auxiliary_device *adev;
> +
> + /* Skip if no auxiliary device init was done. */
> + if (!(bp->flags & BNXT_FLAG_ROCE_CAP))
> + return;
> +
> + bnxt_adev = bp->aux_dev;
> + adev = &bnxt_adev->aux_dev;
> + auxiliary_device_delete(adev);
> + auxiliary_device_uninit(adev);
> + if (bnxt_adev->id >= 0)
> + ida_free(&bnxt_aux_dev_ids, bnxt_adev->id);
> +}
> +
> +static void bnxt_aux_dev_release(struct device *dev)
> +{
> + struct bnxt_aux_dev *bnxt_adev =
> + container_of(dev, struct bnxt_aux_dev, aux_dev.dev);
> + struct bnxt *bp = netdev_priv(bnxt_adev->edev->net);
> +
> + bnxt_adev->edev->en_ops = NULL;
> + kfree(bnxt_adev->edev);

And yet the reference counted "release" function accesses the bp->adev
like it must exist.

This seems odd to me - why do we need refcounting on devices at all
if we can free them synchronously? To be clear - I'm not sure this is
wrong, just seems odd.

> + bnxt_adev->edev = NULL;
> + bp->edev = NULL;
> +}

2023-01-14 20:56:16

by Ajit Khaparde

[permalink] [raw]
Subject: Re: [PATCH net-next v7 1/8] bnxt_en: Add auxiliary driver support

On Fri, Jan 13, 2023 at 10:10 PM Jakub Kicinski <[email protected]> wrote:
>
> On Thu, 12 Jan 2023 12:29:32 -0800 Ajit Khaparde wrote:
> > Add auxiliary driver support.
> > An auxiliary device will be created if the hardware indicates
> > support for RDMA.
> > The bnxt_ulp_probe() function has been removed and a new
> > bnxt_rdma_aux_device_add() function has been added.
> > The bnxt_free_msix_vecs() and bnxt_req_msix_vecs() will now hold
> > the RTNL lock when they call the bnxt_close_nic()and bnxt_open_nic()
> > since the device close and open need to be protected under RTNL lock.
> > The operations between the bnxt_en and bnxt_re will be protected
> > using the en_ops_lock.
> > This will be used by the bnxt_re driver in a follow-on patch
> > to create ROCE interfaces.
>
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > @@ -13178,6 +13178,9 @@ static void bnxt_remove_one(struct pci_dev *pdev)
> > struct net_device *dev = pci_get_drvdata(pdev);
> > struct bnxt *bp = netdev_priv(dev);
> >
> > + bnxt_rdma_aux_device_uninit(bp);
> > + bnxt_aux_dev_free(bp);
>
> You still free bp->aux_dev synchronously..
>
> > +void bnxt_aux_dev_free(struct bnxt *bp)
> > +{
> > + kfree(bp->aux_dev);
>
> .. here. Which is called on .remove of the PCI device.
>
> > + bp->aux_dev = NULL;
> > +}
> > +
> > +static struct bnxt_aux_dev *bnxt_aux_dev_alloc(struct bnxt *bp)
> > +{
> > + return kzalloc(sizeof(struct bnxt_aux_dev), GFP_KERNEL);
> > +}
> > +
> > +void bnxt_rdma_aux_device_uninit(struct bnxt *bp)
> > +{
> > + struct bnxt_aux_dev *bnxt_adev;
> > + struct auxiliary_device *adev;
> > +
> > + /* Skip if no auxiliary device init was done. */
> > + if (!(bp->flags & BNXT_FLAG_ROCE_CAP))
> > + return;
> > +
> > + bnxt_adev = bp->aux_dev;
> > + adev = &bnxt_adev->aux_dev;
> > + auxiliary_device_delete(adev);
> > + auxiliary_device_uninit(adev);
> > + if (bnxt_adev->id >= 0)
> > + ida_free(&bnxt_aux_dev_ids, bnxt_adev->id);
> > +}
> > +
> > +static void bnxt_aux_dev_release(struct device *dev)
> > +{
> > + struct bnxt_aux_dev *bnxt_adev =
> > + container_of(dev, struct bnxt_aux_dev, aux_dev.dev);
> > + struct bnxt *bp = netdev_priv(bnxt_adev->edev->net);
> > +
> > + bnxt_adev->edev->en_ops = NULL;
> > + kfree(bnxt_adev->edev);
>
> And yet the reference counted "release" function accesses the bp->adev
> like it must exist.
>
> This seems odd to me - why do we need refcounting on devices at all
> if we can free them synchronously? To be clear - I'm not sure this is
> wrong, just seems odd.
I followed the existing implementations in that regard. Thanks

>
> > + bnxt_adev->edev = NULL;
> > + bp->edev = NULL;
> > +}


Attachments:
smime.p7s (4.12 kB)
S/MIME Cryptographic Signature

2023-01-17 05:07:44

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v7 1/8] bnxt_en: Add auxiliary driver support

On Sat, 14 Jan 2023 12:39:09 -0800 Ajit Khaparde wrote:
> > > +static void bnxt_aux_dev_release(struct device *dev)
> > > +{
> > > + struct bnxt_aux_dev *bnxt_adev =
> > > + container_of(dev, struct bnxt_aux_dev, aux_dev.dev);
> > > + struct bnxt *bp = netdev_priv(bnxt_adev->edev->net);
> > > +
> > > + bnxt_adev->edev->en_ops = NULL;
> > > + kfree(bnxt_adev->edev);
> >
> > And yet the reference counted "release" function accesses the bp->adev
> > like it must exist.
> >
> > This seems odd to me - why do we need refcounting on devices at all
> > if we can free them synchronously? To be clear - I'm not sure this is
> > wrong, just seems odd.
> I followed the existing implementations in that regard. Thanks

Leon, could you take a look? Is there no problem in assuming bnxt_adev
is still around in the release function?

2023-01-17 12:38:37

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH net-next v7 1/8] bnxt_en: Add auxiliary driver support

On Mon, Jan 16, 2023 at 08:56:25PM -0800, Jakub Kicinski wrote:
> On Sat, 14 Jan 2023 12:39:09 -0800 Ajit Khaparde wrote:
> > > > +static void bnxt_aux_dev_release(struct device *dev)
> > > > +{
> > > > + struct bnxt_aux_dev *bnxt_adev =
> > > > + container_of(dev, struct bnxt_aux_dev, aux_dev.dev);
> > > > + struct bnxt *bp = netdev_priv(bnxt_adev->edev->net);
> > > > +
> > > > + bnxt_adev->edev->en_ops = NULL;
> > > > + kfree(bnxt_adev->edev);
> > >
> > > And yet the reference counted "release" function accesses the bp->adev
> > > like it must exist.
> > >
> > > This seems odd to me - why do we need refcounting on devices at all
> > > if we can free them synchronously? To be clear - I'm not sure this is
> > > wrong, just seems odd.
> > I followed the existing implementations in that regard. Thanks
>
> Leon, could you take a look? Is there no problem in assuming bnxt_adev
> is still around in the release function?

You caught a real bug. The auxdev idea is very simple - it needs to
behave like driver core, but in the driver itself.

As such, bnxt_aux_dev_free() shouldn't be called after bnxt_rdma_aux_device_uninit().
Device will be released through auxiliary_device_uninit();

BTW, line 325 from below shouldn't exist too.

312 void bnxt_rdma_aux_device_uninit(struct bnxt *bp)
313 {
...
325 if (bnxt_adev->id >= 0)
326 ida_free(&bnxt_aux_dev_ids, bnxt_adev->id);

And one line bnxt_aux_dev_alloc() needs to be deleted too.

Thanks

2023-01-17 17:47:18

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH net-next v7 1/8] bnxt_en: Add auxiliary driver support

On Tue, Jan 17, 2023 at 02:31:01PM +0200, Leon Romanovsky wrote:
> On Mon, Jan 16, 2023 at 08:56:25PM -0800, Jakub Kicinski wrote:
> > On Sat, 14 Jan 2023 12:39:09 -0800 Ajit Khaparde wrote:
> > > > > +static void bnxt_aux_dev_release(struct device *dev)
> > > > > +{
> > > > > + struct bnxt_aux_dev *bnxt_adev =
> > > > > + container_of(dev, struct bnxt_aux_dev, aux_dev.dev);
> > > > > + struct bnxt *bp = netdev_priv(bnxt_adev->edev->net);
> > > > > +
> > > > > + bnxt_adev->edev->en_ops = NULL;
> > > > > + kfree(bnxt_adev->edev);
> > > >
> > > > And yet the reference counted "release" function accesses the bp->adev
> > > > like it must exist.
> > > >
> > > > This seems odd to me - why do we need refcounting on devices at all
> > > > if we can free them synchronously? To be clear - I'm not sure this is
> > > > wrong, just seems odd.
> > > I followed the existing implementations in that regard. Thanks
> >
> > Leon, could you take a look? Is there no problem in assuming bnxt_adev
> > is still around in the release function?
>
> You caught a real bug. The auxdev idea is very simple - it needs to
> behave like driver core, but in the driver itself.

BTW, this can be classic example why assigning NULL pointers after
release is bad practice. It hides this class of errors.

+void bnxt_aux_dev_free(struct bnxt *bp)
+{
+ kfree(bp->aux_dev);
+ bp->aux_dev = NULL;
+}

Thanks

2023-01-17 21:58:07

by Ajit Khaparde

[permalink] [raw]
Subject: Re: [PATCH net-next v7 1/8] bnxt_en: Add auxiliary driver support

On Tue, Jan 17, 2023 at 4:31 AM Leon Romanovsky <[email protected]> wrote:
>
> On Mon, Jan 16, 2023 at 08:56:25PM -0800, Jakub Kicinski wrote:
> > On Sat, 14 Jan 2023 12:39:09 -0800 Ajit Khaparde wrote:
> > > > > +static void bnxt_aux_dev_release(struct device *dev)
> > > > > +{
> > > > > + struct bnxt_aux_dev *bnxt_adev =
> > > > > + container_of(dev, struct bnxt_aux_dev, aux_dev.dev);
> > > > > + struct bnxt *bp = netdev_priv(bnxt_adev->edev->net);
> > > > > +
> > > > > + bnxt_adev->edev->en_ops = NULL;
> > > > > + kfree(bnxt_adev->edev);
> > > >
> > > > And yet the reference counted "release" function accesses the bp->adev
> > > > like it must exist.
> > > >
> > > > This seems odd to me - why do we need refcounting on devices at all
> > > > if we can free them synchronously? To be clear - I'm not sure this is
> > > > wrong, just seems odd.
> > > I followed the existing implementations in that regard. Thanks
> >
> > Leon, could you take a look? Is there no problem in assuming bnxt_adev
> > is still around in the release function?
>
> You caught a real bug. The auxdev idea is very simple - it needs to
> behave like driver core, but in the driver itself.
>
> As such, bnxt_aux_dev_free() shouldn't be called after bnxt_rdma_aux_device_uninit().
> Device will be released through auxiliary_device_uninit();
>
> BTW, line 325 from below shouldn't exist too.
>
> 312 void bnxt_rdma_aux_device_uninit(struct bnxt *bp)
> 313 {
> ...
> 325 if (bnxt_adev->id >= 0)
> 326 ida_free(&bnxt_aux_dev_ids, bnxt_adev->id);
>
> And one line bnxt_aux_dev_alloc() needs to be deleted too.
>
> Thanks
Thanks.
We are reviewing the comments and will have an update soon.


Attachments:
smime.p7s (4.12 kB)
S/MIME Cryptographic Signature

2023-01-18 07:32:29

by Ajit Khaparde

[permalink] [raw]
Subject: Re: [PATCH net-next v7 1/8] bnxt_en: Add auxiliary driver support

On Tue, Jan 17, 2023 at 4:31 AM Leon Romanovsky <[email protected]> wrote:
>
> On Mon, Jan 16, 2023 at 08:56:25PM -0800, Jakub Kicinski wrote:
> > On Sat, 14 Jan 2023 12:39:09 -0800 Ajit Khaparde wrote:
> > > > > +static void bnxt_aux_dev_release(struct device *dev)
> > > > > +{
> > > > > + struct bnxt_aux_dev *bnxt_adev =
> > > > > + container_of(dev, struct bnxt_aux_dev, aux_dev.dev);
> > > > > + struct bnxt *bp = netdev_priv(bnxt_adev->edev->net);
> > > > > +
> > > > > + bnxt_adev->edev->en_ops = NULL;
> > > > > + kfree(bnxt_adev->edev);
> > > >
> > > > And yet the reference counted "release" function accesses the bp->adev
> > > > like it must exist.
> > > >
> > > > This seems odd to me - why do we need refcounting on devices at all
> > > > if we can free them synchronously? To be clear - I'm not sure this is
> > > > wrong, just seems odd.
> > > I followed the existing implementations in that regard. Thanks
> >
> > Leon, could you take a look? Is there no problem in assuming bnxt_adev
> > is still around in the release function?
>
> You caught a real bug. The auxdev idea is very simple - it needs to
> behave like driver core, but in the driver itself.
>
> As such, bnxt_aux_dev_free() shouldn't be called after bnxt_rdma_aux_device_uninit().
> Device will be released through auxiliary_device_uninit();
Ok. But..
bnxt_aux_dev_free() is actually freeing up the private memory allocated
for holding the pointer returned by my_aux_dev_alloc(xxx);
The aux device is freed via the auxiliary_device_uninit only.

>
> BTW, line 325 from below shouldn't exist too.
ACK

>
> 312 void bnxt_rdma_aux_device_uninit(struct bnxt *bp)
> 313 {
> ...
> 325 if (bnxt_adev->id >= 0)
> 326 ida_free(&bnxt_aux_dev_ids, bnxt_adev->id);
>
> And one line bnxt_aux_dev_alloc() needs to be deleted too.
To avoid confusion, I will refactor and rename the code handling
auxiliary_device alloc, cleanup and the alloc, cleanup of priv
pointers used for bookkeeping.

I hope the new patchset will address the concerns raised.

>
> Thanks


Attachments:
smime.p7s (4.12 kB)
S/MIME Cryptographic Signature