LinuxLists.cc - [PATCH 1/2] thunderbolt: Do not handle ICM events after domain is stopped

2018-09-03 13:22:18

Subject: [PATCH 1/2] thunderbolt: Do not handle ICM events after domain is stopped

If there is a long chain of devices connected when the driver is loaded
ICM sends device connected event for each and those are put to tb->wq
for later processing. Now if the driver gets unloaded in the middle, so
that the work queue is not yet empty it gets flushed by tb_domain_stop().
However, by that time the root switch is already removed so the driver
crashes when it tries to dereference it in ICM event handling callbacks.

Fix this by checking whether the root switch is already removed. If it
is we know that the domain is stopped and we should merely skip handling
the event.

Signed-off-by: Mika Westerberg <[email protected]>
---
drivers/thunderbolt/icm.c | 49 ++++++++++++++++-----------------------
1 file changed, 20 insertions(+), 29 deletions(-)

diff --git a/drivers/thunderbolt/icm.c b/drivers/thunderbolt/icm.c
index e1e264a9a4c7..28fc4ce75edb 100644
--- a/drivers/thunderbolt/icm.c
+++ b/drivers/thunderbolt/icm.c
@@ -738,14 +738,6 @@ icm_fr_xdomain_connected(struct tb *tb, const struct icm_pkg_header *hdr)
u8 link, depth;
u64 route;

- /*
- * After NVM upgrade adding root switch device fails because we
- * initiated reset. During that time ICM might still send
- * XDomain connected message which we ignore here.
- */
- if (!tb->root_switch)
- return;
-
link = pkg->link_info & ICM_LINK_INFO_LINK_MASK;
depth = (pkg->link_info & ICM_LINK_INFO_DEPTH_MASK) >>
ICM_LINK_INFO_DEPTH_SHIFT;
@@ -1037,14 +1029,6 @@ icm_tr_device_connected(struct tb *tb, const struct icm_pkg_header *hdr)
if (pkg->hdr.packet_id)
return;

- /*
- * After NVM upgrade adding root switch device fails because we
- * initiated reset. During that time ICM might still send device
- * connected message which we ignore here.
- */
- if (!tb->root_switch)
- return;
-
route = get_route(pkg->route_hi, pkg->route_lo);
authorized = pkg->link_info & ICM_LINK_INFO_APPROVED;
security_level = (pkg->hdr.flags & ICM_FLAGS_SLEVEL_MASK) >>
@@ -1408,19 +1392,26 @@ static void icm_handle_notification(struct work_struct *work)

mutex_lock(&tb->lock);

- switch (n->pkg->code) {
- case ICM_EVENT_DEVICE_CONNECTED:
- icm->device_connected(tb, n->pkg);
- break;
- case ICM_EVENT_DEVICE_DISCONNECTED:
- icm->device_disconnected(tb, n->pkg);
- break;
- case ICM_EVENT_XDOMAIN_CONNECTED:
- icm->xdomain_connected(tb, n->pkg);
- break;
- case ICM_EVENT_XDOMAIN_DISCONNECTED:
- icm->xdomain_disconnected(tb, n->pkg);
- break;
+ /*
+ * When the domain is stopped we flush its workqueue but before
+ * that the root switch is removed. In that case we should treat
+ * the queued events as being canceled.
+ */
+ if (tb->root_switch) {
+ switch (n->pkg->code) {
+ case ICM_EVENT_DEVICE_CONNECTED:
+ icm->device_connected(tb, n->pkg);
+ break;
+ case ICM_EVENT_DEVICE_DISCONNECTED:
+ icm->device_disconnected(tb, n->pkg);
+ break;
+ case ICM_EVENT_XDOMAIN_CONNECTED:
+ icm->xdomain_connected(tb, n->pkg);
+ break;
+ case ICM_EVENT_XDOMAIN_DISCONNECTED:
+ icm->xdomain_disconnected(tb, n->pkg);
+ break;
+ }
}

mutex_unlock(&tb->lock);
--
2.18.0

2018-09-03 13:21:42

by Mika Westerberg

[permalink] [raw]

Subject: [PATCH 2/2] thunderbolt: Initialize after IOMMUs

If IOMMU is enabled and Thunderbolt driver is built into the kernel
image, it will be probed before IOMMUs are attached to the PCI bus.
Because of this DMA mappings the driver does will not go through IOMMU
and start failing right after IOMMUs are enabled.

For this reason move the Thunderbolt driver initialization happen at
rootfs level.

Signed-off-by: Mika Westerberg <[email protected]>
---
drivers/thunderbolt/nhi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index 88cff05a1808..5cd6bdfa068f 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -1191,5 +1191,5 @@ static void __exit nhi_unload(void)
tb_domain_exit();
}

-fs_initcall(nhi_init);
+rootfs_initcall(nhi_init);
module_exit(nhi_unload);
--
2.18.0

2018-09-05 08:49:10

by Lukas Wunner

[permalink] [raw]

Subject: Re: [PATCH 2/2] thunderbolt: Initialize after IOMMUs

On Mon, Sep 03, 2018 at 04:20:12PM +0300, Mika Westerberg wrote:
> If IOMMU is enabled and Thunderbolt driver is built into the kernel
> image, it will be probed before IOMMUs are attached to the PCI bus.
> Because of this DMA mappings the driver does will not go through IOMMU
> and start failing right after IOMMUs are enabled.
>
> For this reason move the Thunderbolt driver initialization happen at
> rootfs level.
>
> Signed-off-by: Mika Westerberg <[email protected]>
> ---
> drivers/thunderbolt/nhi.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
> index 88cff05a1808..5cd6bdfa068f 100644
> --- a/drivers/thunderbolt/nhi.c
> +++ b/drivers/thunderbolt/nhi.c
> @@ -1191,5 +1191,5 @@ static void __exit nhi_unload(void)
> tb_domain_exit();
> }
>
> -fs_initcall(nhi_init);
> +rootfs_initcall(nhi_init);
> module_exit(nhi_unload);

What if the rootfs is located on a Thunderbolt-attached drive and
the thunderbolt driver needs to establish tunnels to that drive
before rootfs can be accessed? Doesn't the above break such a setup?

I think the dependency on the IOMMU should be open coded by returning
-EPROBE_DEFER from the ->probe hook if it's not yet attached.
Shuffling around initcall order is just applying duct tape.

Commit acb40d841257 already changed module_init() to fs_initcall()
and now it has to be changed again. Shows how fragile this is.

Thanks,

Lukas

2018-09-05 09:47:30

by Mika Westerberg

[permalink] [raw]

Subject: Re: [PATCH 2/2] thunderbolt: Initialize after IOMMUs

On Wed, Sep 05, 2018 at 10:47:46AM +0200, Lukas Wunner wrote:
> On Mon, Sep 03, 2018 at 04:20:12PM +0300, Mika Westerberg wrote:
> > If IOMMU is enabled and Thunderbolt driver is built into the kernel
> > image, it will be probed before IOMMUs are attached to the PCI bus.
> > Because of this DMA mappings the driver does will not go through IOMMU
> > and start failing right after IOMMUs are enabled.
> >
> > For this reason move the Thunderbolt driver initialization happen at
> > rootfs level.
> >
> > Signed-off-by: Mika Westerberg <[email protected]>
> > ---
> > drivers/thunderbolt/nhi.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
> > index 88cff05a1808..5cd6bdfa068f 100644
> > --- a/drivers/thunderbolt/nhi.c
> > +++ b/drivers/thunderbolt/nhi.c
> > @@ -1191,5 +1191,5 @@ static void __exit nhi_unload(void)
> > tb_domain_exit();
> > }
> >
> > -fs_initcall(nhi_init);
> > +rootfs_initcall(nhi_init);
> > module_exit(nhi_unload);
>
> What if the rootfs is located on a Thunderbolt-attached drive and
> the thunderbolt driver needs to establish tunnels to that drive
> before rootfs can be accessed? Doesn't the above break such a setup?

No, then you put the driver as part of your initrd.

> I think the dependency on the IOMMU should be open coded by returning
> -EPROBE_DEFER from the ->probe hook if it's not yet attached.
> Shuffling around initcall order is just applying duct tape.

It is not a dependency. The same thing can happen with any other driver
if they happen to initialize any DMA with the device before IOMMUs are
initialized.

> Commit acb40d841257 already changed module_init() to fs_initcall()
> and now it has to be changed again. Shows how fragile this is.

It is a bit fragile but I don't see any other way to handle this than
trusting on the link ordering. Both -EPROBE_DEFER and device_links are
out of the question AFAICT.

2018-09-06 08:15:47

by Lukas Wunner

[permalink] [raw]

Subject: Re: [PATCH 2/2] thunderbolt: Initialize after IOMMUs

On Wed, Sep 05, 2018 at 12:46:02PM +0300, Mika Westerberg wrote:
> On Wed, Sep 05, 2018 at 10:47:46AM +0200, Lukas Wunner wrote:
> > On Mon, Sep 03, 2018 at 04:20:12PM +0300, Mika Westerberg wrote:
> > > If IOMMU is enabled and Thunderbolt driver is built into the kernel
> > > image, it will be probed before IOMMUs are attached to the PCI bus.
> > > Because of this DMA mappings the driver does will not go through IOMMU
> > > and start failing right after IOMMUs are enabled.
> > >
> > > For this reason move the Thunderbolt driver initialization happen at
> > > rootfs level.
> > >
> > > Signed-off-by: Mika Westerberg <[email protected]>
> > > ---
> > > drivers/thunderbolt/nhi.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
> > > index 88cff05a1808..5cd6bdfa068f 100644
> > > --- a/drivers/thunderbolt/nhi.c
> > > +++ b/drivers/thunderbolt/nhi.c
> > > @@ -1191,5 +1191,5 @@ static void __exit nhi_unload(void)
> > > tb_domain_exit();
> > > }
> > >
> > > -fs_initcall(nhi_init);
> > > +rootfs_initcall(nhi_init);
> > > module_exit(nhi_unload);
> >
> > I think the dependency on the IOMMU should be open coded by returning
> > -EPROBE_DEFER from the ->probe hook if it's not yet attached.
> > Shuffling around initcall order is just applying duct tape.
>
> It is not a dependency. The same thing can happen with any other driver
> if they happen to initialize any DMA with the device before IOMMUs are
> initialized.
>
> > Commit acb40d841257 already changed module_init() to fs_initcall()
> > and now it has to be changed again. Shows how fragile this is.
>
> It is a bit fragile but I don't see any other way to handle this than
> trusting on the link ordering. Both -EPROBE_DEFER and device_links are
> out of the question AFAICT.

So with this patch, you rely on the linker ordering nhi_init() after
ir_dev_scope_init(), however to the best of my knowledge the link
order is not guaranteed.

In that sense, commit acb40d841257 was already flawed because it
executed nhi_init() at "fs" initcall level, the *same* level used by
map_properties() in drivers/firmware/efi/apple-properties.c, which
retrieves the DROM device property needed by tb_drom_copy_efi().

That was arguably a regression which the above patch cures because
"rootfs" is guaranteed to run after "fs". Still, the fragility
remains that ir_dev_scope_init() isn't guaranteed to run before
nhi_init().

Looking at commit acb40d841257, which started this, I'm wondering
why you did not simply export tbnet_init() and call it from the
thunderbolt driver after the property stuff has been fully set up?
After all, thunderbolt-net is useless without thunderbolt or am I
missing something? Then you could revert back to module_init().

Thanks,

Lukas

2018-09-06 11:12:10

by Mika Westerberg

[permalink] [raw]

Subject: Re: [PATCH 2/2] thunderbolt: Initialize after IOMMUs

On Thu, Sep 06, 2018 at 10:13:37AM +0200, Lukas Wunner wrote:
> On Wed, Sep 05, 2018 at 12:46:02PM +0300, Mika Westerberg wrote:
> > On Wed, Sep 05, 2018 at 10:47:46AM +0200, Lukas Wunner wrote:
> > > On Mon, Sep 03, 2018 at 04:20:12PM +0300, Mika Westerberg wrote:
> > > > If IOMMU is enabled and Thunderbolt driver is built into the kernel
> > > > image, it will be probed before IOMMUs are attached to the PCI bus.
> > > > Because of this DMA mappings the driver does will not go through IOMMU
> > > > and start failing right after IOMMUs are enabled.
> > > >
> > > > For this reason move the Thunderbolt driver initialization happen at
> > > > rootfs level.
> > > >
> > > > Signed-off-by: Mika Westerberg <[email protected]>
> > > > ---
> > > > drivers/thunderbolt/nhi.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
> > > > index 88cff05a1808..5cd6bdfa068f 100644
> > > > --- a/drivers/thunderbolt/nhi.c
> > > > +++ b/drivers/thunderbolt/nhi.c
> > > > @@ -1191,5 +1191,5 @@ static void __exit nhi_unload(void)
> > > > tb_domain_exit();
> > > > }
> > > >
> > > > -fs_initcall(nhi_init);
> > > > +rootfs_initcall(nhi_init);
> > > > module_exit(nhi_unload);
> > >
> > > I think the dependency on the IOMMU should be open coded by returning
> > > -EPROBE_DEFER from the ->probe hook if it's not yet attached.
> > > Shuffling around initcall order is just applying duct tape.
> >
> > It is not a dependency. The same thing can happen with any other driver
> > if they happen to initialize any DMA with the device before IOMMUs are
> > initialized.
> >
> > > Commit acb40d841257 already changed module_init() to fs_initcall()
> > > and now it has to be changed again. Shows how fragile this is.
> >
> > It is a bit fragile but I don't see any other way to handle this than
> > trusting on the link ordering. Both -EPROBE_DEFER and device_links are
> > out of the question AFAICT.
>
> So with this patch, you rely on the linker ordering nhi_init() after
> ir_dev_scope_init(), however to the best of my knowledge the link
> order is not guaranteed.

What says that?

As far as I can tell it has been used with initcalls forever to make
sure certain block of code gets executed at certain time.

> In that sense, commit acb40d841257 was already flawed because it
> executed nhi_init() at "fs" initcall level, the *same* level used by
> map_properties() in drivers/firmware/efi/apple-properties.c, which
> retrieves the DROM device property needed by tb_drom_copy_efi().
>
> That was arguably a regression which the above patch cures because
> "rootfs" is guaranteed to run after "fs". Still, the fragility
> remains that ir_dev_scope_init() isn't guaranteed to run before
> nhi_init().
>
> Looking at commit acb40d841257, which started this, I'm wondering
> why you did not simply export tbnet_init() and call it from the
> thunderbolt driver after the property stuff has been fully set up?
> After all, thunderbolt-net is useless without thunderbolt or am I
> missing something? Then you could revert back to module_init().

The same reason you don't call PCI driver functions from PCI core. It
makes absolutely zero sense.

Thunderbolt is bus and provides driver API to drivers. We hopefully are
getting other service drivers (say SCSI over TBT) that are going to be
use the same interfaces.

2018-09-06 11:22:57

by Lukas Wunner

[permalink] [raw]

Subject: Re: [PATCH 2/2] thunderbolt: Initialize after IOMMUs

On Thu, Sep 06, 2018 at 01:36:02PM +0300, Mika Westerberg wrote:
> On Thu, Sep 06, 2018 at 10:13:37AM +0200, Lukas Wunner wrote:
> > So with this patch, you rely on the linker ordering nhi_init() after
> > ir_dev_scope_init(), however to the best of my knowledge the link
> > order is not guaranteed.
>
> What says that?

Within the same initcall level, the ordering is determined by the Makefile
AFAIK. Someone changes the Makefile, your dependency scheme falls apart.

> > Looking at commit acb40d841257, which started this, I'm wondering
> > why you did not simply export tbnet_init() and call it from the
> > thunderbolt driver after the property stuff has been fully set up?
> > After all, thunderbolt-net is useless without thunderbolt or am I
> > missing something? Then you could revert back to module_init().
>
> The same reason you don't call PCI driver functions from PCI core. It
> makes absolutely zero sense.
>
> Thunderbolt is bus and provides driver API to drivers. We hopefully are
> getting other service drivers (say SCSI over TBT) that are going to be
> use the same interfaces.

Then add a blocking notifier chain into which these service drivers can
hook. Other buses have that as well.

Thanks,

Lukas

2018-09-06 11:26:03

by Mika Westerberg

[permalink] [raw]

Subject: Re: [PATCH 2/2] thunderbolt: Initialize after IOMMUs

On Thu, Sep 06, 2018 at 01:00:49PM +0200, Lukas Wunner wrote:
> On Thu, Sep 06, 2018 at 01:36:02PM +0300, Mika Westerberg wrote:
> > On Thu, Sep 06, 2018 at 10:13:37AM +0200, Lukas Wunner wrote:
> > > So with this patch, you rely on the linker ordering nhi_init() after
> > > ir_dev_scope_init(), however to the best of my knowledge the link
> > > order is not guaranteed.
> >
> > What says that?
>
> Within the same initcall level, the ordering is determined by the Makefile
> AFAIK. Someone changes the Makefile, your dependency scheme falls apart.

There are other drivers doing the same so they would fail as well. It is
common practice AFAIK.

> > > Looking at commit acb40d841257, which started this, I'm wondering
> > > why you did not simply export tbnet_init() and call it from the
> > > thunderbolt driver after the property stuff has been fully set up?
> > > After all, thunderbolt-net is useless without thunderbolt or am I
> > > missing something? Then you could revert back to module_init().
> >
> > The same reason you don't call PCI driver functions from PCI core. It
> > makes absolutely zero sense.
> >
> > Thunderbolt is bus and provides driver API to drivers. We hopefully are
> > getting other service drivers (say SCSI over TBT) that are going to be
> > use the same interfaces.
>
> Then add a blocking notifier chain into which these service drivers can
> hook. Other buses have that as well.

It is really too complex to add notifier just for that. This works fine
and is not against any kernel principles I am aware of.

2018-09-06 11:31:10

by Lukas Wunner

[permalink] [raw]

Subject: Re: [PATCH 2/2] thunderbolt: Initialize after IOMMUs

On Thu, Sep 06, 2018 at 02:07:56PM +0300, Mika Westerberg wrote:
> On Thu, Sep 06, 2018 at 01:00:49PM +0200, Lukas Wunner wrote:
> > On Thu, Sep 06, 2018 at 01:36:02PM +0300, Mika Westerberg wrote:
> > > On Thu, Sep 06, 2018 at 10:13:37AM +0200, Lukas Wunner wrote:
> > > > So with this patch, you rely on the linker ordering nhi_init() after
> > > > ir_dev_scope_init(), however to the best of my knowledge the link
> > > > order is not guaranteed.
> > >
> > > What says that?
> >
> > Within the same initcall level, the ordering is determined by the Makefile
> > AFAIK. Someone changes the Makefile, your dependency scheme falls apart.
>
> There are other drivers doing the same so they would fail as well. It is
> common practice AFAIK.

That doesn't make it a *good* practice.

> > > > Looking at commit acb40d841257, which started this, I'm wondering
> > > > why you did not simply export tbnet_init() and call it from the
> > > > thunderbolt driver after the property stuff has been fully set up?
> > > > After all, thunderbolt-net is useless without thunderbolt or am I
> > > > missing something? Then you could revert back to module_init().
> > >
> > > The same reason you don't call PCI driver functions from PCI core. It
> > > makes absolutely zero sense.
> > >
> > > Thunderbolt is bus and provides driver API to drivers. We hopefully are
> > > getting other service drivers (say SCSI over TBT) that are going to be
> > > use the same interfaces.
> >
> > Then add a blocking notifier chain into which these service drivers can
> > hook. Other buses have that as well.
>
> It is really too complex to add notifier just for that. This works fine
> and is not against any kernel principles I am aware of.

Well, there's a difference between "it works and gets the job done,
let's move on" and "let's try to find a solution that fixes not just
this use case but potentially benefits others as well".

FWIW, what I had in mind is a blocking notifier chain that gets called
when a bus registers or unregisters. TB service drivers would then check
if it's tb_bus_type and start initialization.

Thanks,

Lukas

2018-09-06 11:51:51

by Mika Westerberg

[permalink] [raw]

Subject: Re: [PATCH 2/2] thunderbolt: Initialize after IOMMUs

On Thu, Sep 06, 2018 at 01:21:01PM +0200, Lukas Wunner wrote:
> On Thu, Sep 06, 2018 at 02:07:56PM +0300, Mika Westerberg wrote:
> > On Thu, Sep 06, 2018 at 01:00:49PM +0200, Lukas Wunner wrote:
> > > On Thu, Sep 06, 2018 at 01:36:02PM +0300, Mika Westerberg wrote:
> > > > On Thu, Sep 06, 2018 at 10:13:37AM +0200, Lukas Wunner wrote:
> > > > > So with this patch, you rely on the linker ordering nhi_init() after
> > > > > ir_dev_scope_init(), however to the best of my knowledge the link
> > > > > order is not guaranteed.
> > > >
> > > > What says that?
> > >
> > > Within the same initcall level, the ordering is determined by the Makefile
> > > AFAIK. Someone changes the Makefile, your dependency scheme falls apart.
> >
> > There are other drivers doing the same so they would fail as well. It is
> > common practice AFAIK.
>
> That doesn't make it a *good* practice.

It is good enough for our case.

> > > > > Looking at commit acb40d841257, which started this, I'm wondering
> > > > > why you did not simply export tbnet_init() and call it from the
> > > > > thunderbolt driver after the property stuff has been fully set up?
> > > > > After all, thunderbolt-net is useless without thunderbolt or am I
> > > > > missing something? Then you could revert back to module_init().
> > > >
> > > > The same reason you don't call PCI driver functions from PCI core. It
> > > > makes absolutely zero sense.
> > > >
> > > > Thunderbolt is bus and provides driver API to drivers. We hopefully are
> > > > getting other service drivers (say SCSI over TBT) that are going to be
> > > > use the same interfaces.
> > >
> > > Then add a blocking notifier chain into which these service drivers can
> > > hook. Other buses have that as well.
> >
> > It is really too complex to add notifier just for that. This works fine
> > and is not against any kernel principles I am aware of.
>
> Well, there's a difference between "it works and gets the job done,
> let's move on" and "let's try to find a solution that fixes not just
> this use case but potentially benefits others as well".
>
> FWIW, what I had in mind is a blocking notifier chain that gets called
> when a bus registers or unregisters. TB service drivers would then check
> if it's tb_bus_type and start initialization.

Like I said, I think it is too complex.

If we ever need to change the initcall level third time (which I doubt)
we can start thinking about more complex solutions.

2018-09-14 07:53:22

by Mika Westerberg

[permalink] [raw]

Subject: Re: [PATCH 1/2] thunderbolt: Do not handle ICM events after domain is stopped

On Mon, Sep 03, 2018 at 04:20:11PM +0300, Mika Westerberg wrote:
> If there is a long chain of devices connected when the driver is loaded
> ICM sends device connected event for each and those are put to tb->wq
> for later processing. Now if the driver gets unloaded in the middle, so
> that the work queue is not yet empty it gets flushed by tb_domain_stop().
> However, by that time the root switch is already removed so the driver
> crashes when it tries to dereference it in ICM event handling callbacks.
>
> Fix this by checking whether the root switch is already removed. If it
> is we know that the domain is stopped and we should merely skip handling
> the event.
>
> Signed-off-by: Mika Westerberg <[email protected]>

Applied to thunderbolt.git/fixes.

2018-09-14 07:58:21

by Mika Westerberg

[permalink] [raw]