There is a bug in ptp_clock_unregister() where ptp_clock_release() can
free up resources needed by posix_clock_unregister() to properly destroy
a related sysfs device.
Fix this by calling posix_clock_unregister() in ptp_clock_release().
See also:
commit 75718584cb3c ("ptp: free ptp device pin descriptors properly").
BugLink: https://bugs.launchpad.net/bugs/1864754
Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and cdev")
Signed-off-by: Andrea Righi <[email protected]>
---
drivers/ptp/ptp_clock.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index ac1f2bf9e888..12951023d0c6 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -171,6 +171,7 @@ static void ptp_clock_release(struct device *dev)
struct ptp_clock *ptp = container_of(dev, struct ptp_clock, dev);
ptp_cleanup_pin_groups(ptp);
+ posix_clock_unregister(&ptp->clock);
mutex_destroy(&ptp->tsevq_mux);
mutex_destroy(&ptp->pincfg_mux);
ida_simple_remove(&ptp_clocks_map, ptp->index);
@@ -303,8 +304,6 @@ int ptp_clock_unregister(struct ptp_clock *ptp)
if (ptp->pps_source)
pps_unregister_source(ptp->pps_source);
- posix_clock_unregister(&ptp->clock);
-
return 0;
}
EXPORT_SYMBOL(ptp_clock_unregister);
--
2.25.0
Hello, Andrea, all,
----- Original Message -----
> From: "Andrea Righi" <[email protected]>
> Subject: [PATCH] ptp: free ptp clock properly
>
> There is a bug in ptp_clock_unregister() where ptp_clock_release() can
> free up resources needed by posix_clock_unregister() to properly destroy
> a related sysfs device.
>
> Fix this by calling posix_clock_unregister() in ptp_clock_release().
Honestly, this does not seem right. The calls at PTP clock release are:
ptp_clock_unregister() -> posix_clock_unregister() -> cdev_device_del() ->
-> ... bla ... -> ptp_clock_release()
So, it looks like with this patch both posix_clock_unregister() and
ptp_clock_release() are not called at all. And it looks like the "fix" is
not removing PTP clock's cdev, i.e. leaking it and related sysfs resources.
I would guess that a kernel in question (5.3.0-40-generic) has the commit
a33121e5487b but does not have the commit 75718584cb3c, which should be
exactly fixing a docking station disconnect crash. Could you please,
check this?
Why? We have 2 crash call traces. 1) the launchpad bug 2) the email which
led to the commit 75718584cb3c creation (see Link:).
Aaaaand they are identical starting from device_release_driver_internal()
and almost to the top.
> See also:
> commit 75718584cb3c ("ptp: free ptp device pin descriptors properly").
>
> BugLink: https://bugs.launchpad.net/bugs/1864754
> Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and
> cdev")
> Signed-off-by: Andrea Righi <[email protected]>
> ---
> drivers/ptp/ptp_clock.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
> index ac1f2bf9e888..12951023d0c6 100644
> --- a/drivers/ptp/ptp_clock.c
> +++ b/drivers/ptp/ptp_clock.c
> @@ -171,6 +171,7 @@ static void ptp_clock_release(struct device *dev)
> struct ptp_clock *ptp = container_of(dev, struct ptp_clock, dev);
>
> ptp_cleanup_pin_groups(ptp);
> + posix_clock_unregister(&ptp->clock);
> mutex_destroy(&ptp->tsevq_mux);
> mutex_destroy(&ptp->pincfg_mux);
> ida_simple_remove(&ptp_clocks_map, ptp->index);
> @@ -303,8 +304,6 @@ int ptp_clock_unregister(struct ptp_clock *ptp)
> if (ptp->pps_source)
> pps_unregister_source(ptp->pps_source);
>
> - posix_clock_unregister(&ptp->clock);
> -
> return 0;
> }
> EXPORT_SYMBOL(ptp_clock_unregister);
> --
> 2.25.0
Best regards,
Vladis Dronov | Red Hat, Inc. | The Core Kernel | Senior Software Engineer
On Wed, Mar 04, 2020 at 03:11:44PM -0500, Vladis Dronov wrote:
> Hello, Andrea, all,
>
> ----- Original Message -----
> > From: "Andrea Righi" <[email protected]>
> > Subject: [PATCH] ptp: free ptp clock properly
> >
> > There is a bug in ptp_clock_unregister() where ptp_clock_release() can
> > free up resources needed by posix_clock_unregister() to properly destroy
> > a related sysfs device.
> >
> > Fix this by calling posix_clock_unregister() in ptp_clock_release().
>
> Honestly, this does not seem right. The calls at PTP clock release are:
>
> ptp_clock_unregister() -> posix_clock_unregister() -> cdev_device_del() ->
> -> ... bla ... -> ptp_clock_release()
>
> So, it looks like with this patch both posix_clock_unregister() and
> ptp_clock_release() are not called at all. And it looks like the "fix" is
> not removing PTP clock's cdev, i.e. leaking it and related sysfs resources.
That's absolutely right, thanks for the clarification!
With my "fix" we don't see the the kernel oops anymore, but we're
leaking resources, so definitely not a valid fix.
>
> I would guess that a kernel in question (5.3.0-40-generic) has the commit
> a33121e5487b but does not have the commit 75718584cb3c, which should be
> exactly fixing a docking station disconnect crash. Could you please,
> check this?
Unfortunately the kernel in question already has 75718584cb3c:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/commit/?h=hwe&id=c71b774732f997ef38ed7bd62e73891a01f2bbfe
It looks like there's something else that can free up too early the
resources required by posix_clock_unregister() to destroy the related
sysfs files.
Maybe what we really need to call from ptp_clock_release() is
pps_unregister_source()? Something like this:
From: Andrea Righi <[email protected]>
Subject: [PATCH] ptp: free ptp clock properly
There is a bug in ptp_clock_unregister() where ptp_clock_release() can
free up resources needed by posix_clock_unregister() to properly destroy
a related sysfs device.
Fix this by calling pps_unregister_source() in ptp_clock_release().
See also:
commit 75718584cb3c ("ptp: free ptp device pin descriptors properly").
BugLink: https://bugs.launchpad.net/bugs/1864754
Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and cdev")
Signed-off-by: Andrea Righi <[email protected]>
---
drivers/ptp/ptp_clock.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index ac1f2bf9e888..468286ef61ad 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -170,6 +170,9 @@ static void ptp_clock_release(struct device *dev)
{
struct ptp_clock *ptp = container_of(dev, struct ptp_clock, dev);
+ /* Release the clock's resources. */
+ if (ptp->pps_source)
+ pps_unregister_source(ptp->pps_source);
ptp_cleanup_pin_groups(ptp);
mutex_destroy(&ptp->tsevq_mux);
mutex_destroy(&ptp->pincfg_mux);
@@ -298,11 +301,6 @@ int ptp_clock_unregister(struct ptp_clock *ptp)
kthread_cancel_delayed_work_sync(&ptp->aux_work);
kthread_destroy_worker(ptp->kworker);
}
-
- /* Release the clock's resources. */
- if (ptp->pps_source)
- pps_unregister_source(ptp->pps_source);
-
posix_clock_unregister(&ptp->clock);
return 0;
--
2.25.0
Hello, Andrea, all,
> > I would guess that a kernel in question (5.3.0-40-generic) has the commit
> > a33121e5487b but does not have the commit 75718584cb3c, which should be
> > exactly fixing a docking station disconnect crash. Could you please,
> > check this?
>
> Unfortunately the kernel in question already has 75718584cb3c:
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/commit/?h=hwe&id=c71b774732f997ef38ed7bd62e73891a01f2bbfe
>
> It looks like there's something else that can free up too early the
> resources required by posix_clock_unregister() to destroy the related
> sysfs files.
>
> Maybe what we really need to call from ptp_clock_release() is
> pps_unregister_source()? Something like this:
Err... I believe, "Maybe" is not a good enough reason to accept a kernel patch.
Probably, there should be something supporting this statement.
Best regards,
Vladis Dronov | Red Hat, Inc. | The Core Kernel | Senior Software Engineer
On Thu, Mar 05, 2020 at 05:47:34AM -0500, Vladis Dronov wrote:
> Hello, Andrea, all,
>
> > > I would guess that a kernel in question (5.3.0-40-generic) has the commit
> > > a33121e5487b but does not have the commit 75718584cb3c, which should be
> > > exactly fixing a docking station disconnect crash. Could you please,
> > > check this?
> >
> > Unfortunately the kernel in question already has 75718584cb3c:
> > https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/commit/?h=hwe&id=c71b774732f997ef38ed7bd62e73891a01f2bbfe
> >
> > It looks like there's something else that can free up too early the
> > resources required by posix_clock_unregister() to destroy the related
> > sysfs files.
> >
> > Maybe what we really need to call from ptp_clock_release() is
> > pps_unregister_source()? Something like this:
>
> Err... I believe, "Maybe" is not a good enough reason to accept a kernel patch.
> Probably, there should be something supporting this statement.
Indeed. :) I've asked the original bug reporter to repeat the test with
this patch. Let's see if we can still reproduce the failure...
Thanks,
-Andrea