With commit a8d4a37d1bb9 ("iommu/amd: Restore GA log/tail pointer on
host resume"), there now is a WARN_ON in iommu_ga_log_enable that
triggers if the GALOG_RUN bit is set in the MMIO status register, and
this warning has been reported when a kdump kernel is booting.
We probably don't really care about guest vapic logging during kdump,
but the proper thing seems to be disabling the GALOG_EN bit in the
MMIO control register in the kdump code path of
early_enable_iommus(). The iommu pci init code is going to set the ga
log address pointers to new values in the amd_iommu struct, and these
aren't pushed to the iommu until after the warning and return in
iommu_ga_log_enable(). So worst case, the GALOG bits are set and the
iommu has stale pointers to the ga log addresses. So disable GALOG_EN
in early_enable_iommus() and allow iommu_ga_log_enable() to set it up
properly and enable again.
Without the fix you will see the following warning when the kdump
kernel boots:
[ 7.731955] ------------[ cut here ]------------
[ 7.736575] WARNING: CPU: 0 PID: 1 at drivers/iommu/amd/init.c:829 iommu_ga_log_enable.isra.0+0x16f/0x190
[ 7.746135] Modules linked in:
[ 7.749193] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W -------- --- 5.19.0-0.rc7.53.eln120.x86_64 #1
[ 7.759706] Hardware name: Dell Inc. PowerEdge R7525/04D5GJ, BIOS 2.1.6 03/09/2021
[ 7.767274] RIP: 0010:iommu_ga_log_enable.isra.0+0x16f/0x190
[ 7.772931] Code: 20 20 00 00 8b 00 f6 c4 01 74 da 48 8b 44 24 08 65 48 2b 04 25 28 00 00 00 75 13 48 83 c4 10 5b 5d e9 f5 00 72 00 0f 0b eb e1 <0f> 0b eb dd e8 f8 66 42 00 48 8b 15 f1 85 53 01 e9 29 ff ff ff 48
[ 7.791679] RSP: 0018:ffffc90000107d20 EFLAGS: 00010206
[ 7.796905] RAX: ffffc90000780000 RBX: 0000000000000100 RCX: ffffc90000780000
[ 7.804038] RDX: 0000000000000001 RSI: ffffc90000780000 RDI: ffff8880451f9800
[ 7.811170] RBP: ffff8880451f9800 R08: ffffffffffffffff R09: 0000000000000000
[ 7.818303] R10: 0000000000000000 R11: 0000000000000000 R12: 0008000000000000
[ 7.825435] R13: ffff8880462ea900 R14: 0000000000000021 R15: 0000000000000000
[ 7.832572] FS: 0000000000000000(0000) GS:ffff888054a00000(0000) knlGS:0000000000000000
[ 7.840657] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.846400] CR2: ffff888054dff000 CR3: 0000000053210000 CR4: 0000000000350eb0
[ 7.853533] Call Trace:
[ 7.855979] <TASK>
[ 7.858085] amd_iommu_enable_interrupts+0x180/0x270
[ 7.863051] ? iommu_setup+0x271/0x271
[ 7.866803] state_next+0x197/0x2c0
[ 7.870295] ? iommu_setup+0x271/0x271
[ 7.874049] iommu_go_to_state+0x24/0x2c
[ 7.877976] amd_iommu_init+0xf/0x29
[ 7.881554] pci_iommu_init+0xe/0x36
[ 7.885133] do_one_initcall+0x44/0x200
[ 7.888975] do_initcalls+0xc8/0xe1
[ 7.892466] kernel_init_freeable+0x14c/0x199
[ 7.896826] ? rest_init+0xd0/0xd0
[ 7.900231] kernel_init+0x16/0x130
[ 7.903723] ret_from_fork+0x22/0x30
[ 7.907306] </TASK>
[ 7.909497] ---[ end trace 0000000000000000 ]---
Fixes: 3ac3e5ee5ed5 ("iommu/amd: Copy old trans table from old kernel")
Cc: Joerg Roedel <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Will Deacon <[email protected]> (maintainer:IOMMU DRIVERS)
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Jerry Snitselaar <[email protected]>
---
drivers/iommu/amd/init.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1d08f87e734b..2b00d7f28df7 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -815,6 +815,11 @@ static void free_ga_log(struct amd_iommu *iommu)
#endif
}
+static void iommu_ga_log_disable(struct amd_iommu *iommu)
+{
+ iommu_feature_disable(iommu, CONTROL_GALOG_EN);
+}
+
static int iommu_ga_log_enable(struct amd_iommu *iommu)
{
#ifdef CONFIG_IRQ_REMAP
@@ -2504,6 +2509,7 @@ static void early_enable_iommus(void)
for_each_iommu(iommu) {
iommu_disable_command_buffer(iommu);
iommu_disable_event_buffer(iommu);
+ iommu_ga_log_disable(iommu);
iommu_enable_command_buffer(iommu);
iommu_enable_event_buffer(iommu);
iommu_enable_ga(iommu);
base-commit: ca85855bdcae8f84f1512e88b4c75009ea17ea2f
--
2.36.1
On Wed, Jul 20, 2022 at 05:34:39PM -0700, Jerry Snitselaar wrote:
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 1d08f87e734b..2b00d7f28df7 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -815,6 +815,11 @@ static void free_ga_log(struct amd_iommu *iommu)
> #endif
> }
>
> +static void iommu_ga_log_disable(struct amd_iommu *iommu)
> +{
> + iommu_feature_disable(iommu, CONTROL_GALOG_EN);
> +}
> +
> static int iommu_ga_log_enable(struct amd_iommu *iommu)
> {
> #ifdef CONFIG_IRQ_REMAP
> @@ -2504,6 +2509,7 @@ static void early_enable_iommus(void)
> for_each_iommu(iommu) {
> iommu_disable_command_buffer(iommu);
> iommu_disable_event_buffer(iommu);
> + iommu_ga_log_disable(iommu);
> iommu_enable_command_buffer(iommu);
> iommu_enable_event_buffer(iommu);
> iommu_enable_ga(iommu);
Looks about right, but I also let Suravee comment on this.
Disabling the GA-Log under a device still using it should hopefully not
put it into some undefined state.
On Fri, Jul 22, 2022 at 04:37:13PM +0200, Joerg Roedel wrote:
> On Wed, Jul 20, 2022 at 05:34:39PM -0700, Jerry Snitselaar wrote:
> > diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> > index 1d08f87e734b..2b00d7f28df7 100644
> > --- a/drivers/iommu/amd/init.c
> > +++ b/drivers/iommu/amd/init.c
> > @@ -815,6 +815,11 @@ static void free_ga_log(struct amd_iommu *iommu)
> > #endif
> > }
> >
> > +static void iommu_ga_log_disable(struct amd_iommu *iommu)
> > +{
> > + iommu_feature_disable(iommu, CONTROL_GALOG_EN);
> > +}
> > +
> > static int iommu_ga_log_enable(struct amd_iommu *iommu)
> > {
> > #ifdef CONFIG_IRQ_REMAP
> > @@ -2504,6 +2509,7 @@ static void early_enable_iommus(void)
> > for_each_iommu(iommu) {
> > iommu_disable_command_buffer(iommu);
> > iommu_disable_event_buffer(iommu);
> > + iommu_ga_log_disable(iommu);
> > iommu_enable_command_buffer(iommu);
> > iommu_enable_event_buffer(iommu);
> > iommu_enable_ga(iommu);
>
> Looks about right, but I also let Suravee comment on this.
>
> Disabling the GA-Log under a device still using it should hopefully not
> put it into some undefined state.
Hi Joerg,
Yes, I sent email to Suravee a few days ago as well to see what he thought,
but haven't heard back yet. It might be a couple days as I think he is
overseas at the moment.
Minor, stupid thing, but I wasn't sure whether it would be better to
name it iommu_disable_ga_log() like everything else called here, or
name it in the format of the other functions relating to logs. I opted
for the latter.
Did you see my other email about kdump kernels on vt-d with scalable
mode and passthrough enabled?
https://lore.kernel.org/linux-iommu/20220702154956.x2dxqalzjuzjraxk@cantor/
I've been looking at it, but haven't figured out what to do about tracking
the copying if can't borrow a bit in one of the pasid fields now. Any thoughts
on that?
Regards,
Jerry
Joeg / Jerry,
On 7/22/22 9:37 PM, Joerg Roedel wrote:
> On Wed, Jul 20, 2022 at 05:34:39PM -0700, Jerry Snitselaar wrote:
>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>> index 1d08f87e734b..2b00d7f28df7 100644
>> --- a/drivers/iommu/amd/init.c
>> +++ b/drivers/iommu/amd/init.c
>> @@ -815,6 +815,11 @@ static void free_ga_log(struct amd_iommu *iommu)
>> #endif
>> }
>>
>> +static void iommu_ga_log_disable(struct amd_iommu *iommu)
>> +{
>> + iommu_feature_disable(iommu, CONTROL_GALOG_EN);
>> +}
>> +
>> static int iommu_ga_log_enable(struct amd_iommu *iommu)
>> {
>> #ifdef CONFIG_IRQ_REMAP
>> @@ -2504,6 +2509,7 @@ static void early_enable_iommus(void)
>> for_each_iommu(iommu) {
>> iommu_disable_command_buffer(iommu);
>> iommu_disable_event_buffer(iommu);
>> + iommu_ga_log_disable(iommu);
>> iommu_enable_command_buffer(iommu);
>> iommu_enable_event_buffer(iommu);
>> iommu_enable_ga(iommu);
>
> Looks about right, but I also let Suravee comment on this.
>
> Disabling the GA-Log under a device still using it should hopefully not
> put it into some undefined state.
Sorry for late reply. I have been actually working on the new GAM and GALOG enabling code,
which should also address this issue as well. I'll send out the patch soon.
Regards,
Suravee
Hi Suravee,
On Mon, Jul 25, 2022 at 02:05:59PM +0700, Suravee Suthikulpanit wrote:
> Sorry for late reply. I have been actually working on the new GAM and GALOG enabling code,
> which should also address this issue as well. I'll send out the patch soon.
Okay, how about basing your changes on Jerry's fix? A backportable fix
for a real issue is always better than a bigger rework (which can still
happen on-top).
Regards,
Joerg
On 7/26/22 2:12 PM, Joerg Roedel wrote:
> Hi Suravee,
>
> On Mon, Jul 25, 2022 at 02:05:59PM +0700, Suravee Suthikulpanit wrote:
>> Sorry for late reply. I have been actually working on the new GAM and GALOG enabling code,
>> which should also address this issue as well. I'll send out the patch soon.
>
> Okay, how about basing your changes on Jerry's fix? A backportable fix
> for a real issue is always better than a bigger rework (which can still
> happen on-top).
>
> Regards,
>
> Joerg
Ahh... I have just sent out the new patch
https://lore.kernel.org/lkml/[email protected]/
Please lemme know in case you want me to rebase this on top of Jerry's.
Suravee