by Nicolin Chen

[permalink] [raw]

Subject: Re: [PATCH v6 5/6] iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV

On Thu, May 02, 2024 at 09:41:03AM -0300, Jason Gunthorpe wrote:
> On Wed, May 01, 2024 at 10:43:39AM -0700, Nicolin Chen wrote:
> > > It doesn't fix any real race, I'm not sure what this is supposed to be
> > > doing. The cmdq becomes broken and you get an ISR, so before the ISR
> > > it will still post but get stuck, during the ISR it will avoid
> > > posting, and after it will go back to posting?
> > >
> > > Why? Just always post to the Q and let the ISR fix it?
> >
> > Yes, we could do so. I was thinking of the worst case by giving
> > the guest OS a chance to continue (though in a slower mode), if
> > something unrecoverable happens to the VINTF/VCMDQ part.
>
> Does that happn? The stuck vcmdq will have stuck entries on it no
> matter what, can we actually fully recover from that? Ie re-issue the
> commands on another queue?

Well, the handle_vintf0_error() should fix that and recover. And
rethinking about this, if this happens it's likely a SW bug that
we shouldn't ignore.

With that being said, the viommu infrastructure still needs an irq
forwarding that is currently missing. I'd need to draft something
likely on top of Baolu's work.

> > > So just don't use it. There is no value if the places where it should
> > > work automatically are not functioning.
> >
> > I thought devm could work when rmmod too, not only when the probe
> > fails..
>
> It is limited to cases when the probing driver of the passed struct
> device unbinds, including probe failure.

OK. I'll drop all devm_ and add tegra241_cmdqv_device_remove()
instead.

Thanks
Nicolin

2024-05-06 03:52:52

by Nicolin Chen

[permalink] [raw]

Subject: Re: [PATCH v6 6/6] iommu/tegra241-cmdqv: Limit CMDs for guest owned VINTF

On Tue, Apr 30, 2024 at 09:17:58PM -0300, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2024 at 11:58:44AM -0700, Nicolin Chen wrote:
> > Otherwise, there has to be a get_suported_cmdq callback so batch
> > or its callers can avoid adding unsupported commands at the first
> > place.
>
> If you really feel strongly the invalidation could be split into
> S1/S2/S1_VM groupings that align with the feature bits and that could
> be passed down from one step above. But I don't think the complexity
> is really needed. It is better to deal with it through the feature
> mechanism.

Hmm, I tried following your design by passing in a CMD_TYPE_xxx
to the tegra241_cmdqv_get_cmdq(), but I found a little painful
to accommodate these two cases:
1. TLBI_NH_ASID is issued via arm_smmu_cmdq_issue_cmdlist(), so
we should not mark it as CMD_TYPE_ALL. Yet, this function is
used by other commands too. So, either we pass in a type from
higher callers, or simply check the opcode in that function.
2. It is a bit tricky to define, from SMMU's P.O.V, a good TYPE
subset for VCMDQ, since guest-owned VCMDQ does not support
TLBI_NSNH_ALL.

So, it feels to me that checking against the opcode is still a
straightforward solution. And what I ended up with is somewhat
similar to this v6, yet this time it only checks at batch init
call as your design does.

How do you think of this?

Thanks
Nicolin

Attachments:

(No filename) (1.39 kB)
cmdq_limit_mine.patch (6.84 kB)
cmdq_limit_mine.patch Download all attachments

2024-05-06 13:00:20

by Jason Gunthorpe

[permalink] [raw]

Subject: Re: [PATCH v6 6/6] iommu/tegra241-cmdqv: Limit CMDs for guest owned VINTF

On Sun, May 05, 2024 at 08:52:32PM -0700, Nicolin Chen wrote:
> On Tue, Apr 30, 2024 at 09:17:58PM -0300, Jason Gunthorpe wrote:
> > On Tue, Apr 30, 2024 at 11:58:44AM -0700, Nicolin Chen wrote:
> > > Otherwise, there has to be a get_suported_cmdq callback so batch
> > > or its callers can avoid adding unsupported commands at the first
> > > place.
> >
> > If you really feel strongly the invalidation could be split into
> > S1/S2/S1_VM groupings that align with the feature bits and that could
> > be passed down from one step above. But I don't think the complexity
> > is really needed. It is better to deal with it through the feature
> > mechanism.
>
> Hmm, I tried following your design by passing in a CMD_TYPE_xxx
> to the tegra241_cmdqv_get_cmdq(), but I found a little painful
> to accommodate these two cases:
> 1. TLBI_NH_ASID is issued via arm_smmu_cmdq_issue_cmdlist(), so
> we should not mark it as CMD_TYPE_ALL. Yet, this function is
> used by other commands too. So, either we pass in a type from
> higher callers, or simply check the opcode in that function.

Yes, you'd have to pass in the type there too, which makes it more
ugly.

> So, it feels to me that checking against the opcode is still a
> straightforward solution. And what I ended up with is somewhat
> similar to this v6, yet this time it only checks at batch init
> call as your design does.

Well, the only downside is that the commands have to be same in a
batch, but maybe that is OK anyhow.

Don't forget to take the hunks that fix the queue as well.

Jason