2021-11-23 16:10:51

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 0/5] iommu/amd: fixes for suspend/resume

As I sadly found out, a s3 cycle makes the AMD's iommu stop sending interrupts
until the system is rebooted.

I only noticed it now because otherwise the IOMMU works, and these interrupts
are only used for errors and for GA log which I tend not to use by
making my VMs do mwait/pause/etc in guest (cpu-pm=on).

There are two issues here that prevent interrupts from being generated after
s3 cycle:

1. GA log base address was not restored after resume, and was all zeroed
after resume (by BIOS or such).

In theory if BIOS writes some junk to it, that can even cause a memory corruption.
Patch 2 fixes that.

2. INTX (aka x2apic mode) settings were not restored after resume.
That mode is used regardless if the host uses/supports x2apic, but rather when
the IOMMU supports it, and mine does.
Patches 3-4 fix that.

Note that there is still one slight (userspace) bug remaining:
During suspend all but the boot CPU are offlined and then after resume
are onlined again.

The offlining moves all non-affinity managed interrupts to CPU0, and
later when all other CPUs are onlined, there is nothing in the kernel
to spread back the interrupts over the cores.

The userspace 'irqbalance' daemon does fix this but it seems to ignore
the IOMMU interrupts in INTX mode since they are not attached to any
PCI device, and thus they remain on CPU0 after a s3 cycle,
which is suboptimal when the system has multiple IOMMUs
(mine has 4 of them).

Setting the IRQ affinity manually via /proc/irq/ does work.

This was tested on my 3970X with both INTX and regular MSI mode (later was enabled
by patching out INTX detection), by running a guest with AVIC enabled and with
a PCI assigned device (network card), and observing interrupts from
IOMMU while guest is mostly idle.

This was also tested on my AMD laptop with 4650U (which has the same issue)
(I tested only INTX mode)

Patch 1 is a small refactoring to remove an unused struct field.

Best regards,
Maxim Levitsky

Maxim Levitsky (5):
iommu/amd: restore GA log/tail pointer on host resume
iommu/amd: x2apic mode: re-enable after resume
iommu/amd: x2apic mode: setup the INTX registers on mask/unmask
iommu/amd: x2apic mode: mask/unmask interrupts on suspend/resume
iommu/amd: remove useless irq affinity notifier

drivers/iommu/amd/amd_iommu_types.h | 2 -
drivers/iommu/amd/init.c | 107 +++++++++++++++-------------
2 files changed, 58 insertions(+), 51 deletions(-)

--
2.26.3




2021-11-23 16:10:54

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 1/5] iommu/amd: restore GA log/tail pointer on host resume

This will give IOMMU GA log a chance to work after resume
from s3/s4.

Fixes: 8bda0cfbdc1a6 ("iommu/amd: Detect and initialize guest vAPIC log")

Signed-off-by: Maxim Levitsky <[email protected]>
---
drivers/iommu/amd/init.c | 31 +++++++++++++++----------------
1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1eacd43cb4368..8dae85fcfc2eb 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -806,16 +806,27 @@ static int iommu_ga_log_enable(struct amd_iommu *iommu)
{
#ifdef CONFIG_IRQ_REMAP
u32 status, i;
+ u64 entry;

if (!iommu->ga_log)
return -EINVAL;

- status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
-
/* Check if already running */
- if (status & (MMIO_STATUS_GALOG_RUN_MASK))
+ status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
+ if (WARN_ON(status & (MMIO_STATUS_GALOG_RUN_MASK)))
return 0;

+ entry = iommu_virt_to_phys(iommu->ga_log) | GA_LOG_SIZE_512;
+ memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_BASE_OFFSET,
+ &entry, sizeof(entry));
+ entry = (iommu_virt_to_phys(iommu->ga_log_tail) &
+ (BIT_ULL(52)-1)) & ~7ULL;
+ memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_TAIL_OFFSET,
+ &entry, sizeof(entry));
+ writel(0x00, iommu->mmio_base + MMIO_GA_HEAD_OFFSET);
+ writel(0x00, iommu->mmio_base + MMIO_GA_TAIL_OFFSET);
+
+
iommu_feature_enable(iommu, CONTROL_GAINT_EN);
iommu_feature_enable(iommu, CONTROL_GALOG_EN);

@@ -825,7 +836,7 @@ static int iommu_ga_log_enable(struct amd_iommu *iommu)
break;
}

- if (i >= LOOP_TIMEOUT)
+ if (WARN_ON(i >= LOOP_TIMEOUT))
return -EINVAL;
#endif /* CONFIG_IRQ_REMAP */
return 0;
@@ -834,8 +845,6 @@ static int iommu_ga_log_enable(struct amd_iommu *iommu)
static int iommu_init_ga_log(struct amd_iommu *iommu)
{
#ifdef CONFIG_IRQ_REMAP
- u64 entry;
-
if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
return 0;

@@ -849,16 +858,6 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
if (!iommu->ga_log_tail)
goto err_out;

- entry = iommu_virt_to_phys(iommu->ga_log) | GA_LOG_SIZE_512;
- memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_BASE_OFFSET,
- &entry, sizeof(entry));
- entry = (iommu_virt_to_phys(iommu->ga_log_tail) &
- (BIT_ULL(52)-1)) & ~7ULL;
- memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_TAIL_OFFSET,
- &entry, sizeof(entry));
- writel(0x00, iommu->mmio_base + MMIO_GA_HEAD_OFFSET);
- writel(0x00, iommu->mmio_base + MMIO_GA_TAIL_OFFSET);
-
return 0;
err_out:
free_ga_log(iommu);
--
2.26.3


2021-11-23 16:11:00

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 2/5] iommu/amd: x2apic mode: re-enable after resume

Otherwise it is guaranteed to not work after the resume...

Fixes: 66929812955bb ("iommu/amd: Add support for X2APIC IOMMU interrupts")

Signed-off-by: Maxim Levitsky <[email protected]>
---
drivers/iommu/amd/init.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 8dae85fcfc2eb..b905604f434e1 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2172,7 +2172,6 @@ static int iommu_setup_intcapxt(struct amd_iommu *iommu)
return ret;
}

- iommu_feature_enable(iommu, CONTROL_INTCAPXT_EN);
return 0;
}

@@ -2195,6 +2194,10 @@ static int iommu_init_irq(struct amd_iommu *iommu)

iommu->int_enabled = true;
enable_faults:
+
+ if (amd_iommu_xt_mode == IRQ_REMAP_X2APIC_MODE)
+ iommu_feature_enable(iommu, CONTROL_INTCAPXT_EN);
+
iommu_feature_enable(iommu, CONTROL_EVT_INT_EN);

if (iommu->ppr_log != NULL)
--
2.26.3


2021-11-23 16:11:02

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 3/5] iommu/amd: x2apic mode: setup the INTX registers on mask/unmask

This is more logically correct and will also allow us to
to use mask/unmask logic to restore INTX setttings after
the resume from s3/s4.

Fixes: 66929812955bb ("iommu/amd: Add support for X2APIC IOMMU interrupts")

Signed-off-by: Maxim Levitsky <[email protected]>
---
drivers/iommu/amd/init.c | 65 ++++++++++++++++++++--------------------
1 file changed, 33 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index b905604f434e1..9e895bb8086a6 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2015,48 +2015,18 @@ union intcapxt {
};
} __attribute__ ((packed));

-/*
- * There isn't really any need to mask/unmask at the irqchip level because
- * the 64-bit INTCAPXT registers can be updated atomically without tearing
- * when the affinity is being updated.
- */
-static void intcapxt_unmask_irq(struct irq_data *data)
-{
-}
-
-static void intcapxt_mask_irq(struct irq_data *data)
-{
-}

static struct irq_chip intcapxt_controller;

static int intcapxt_irqdomain_activate(struct irq_domain *domain,
struct irq_data *irqd, bool reserve)
{
- struct amd_iommu *iommu = irqd->chip_data;
- struct irq_cfg *cfg = irqd_cfg(irqd);
- union intcapxt xt;
-
- xt.capxt = 0ULL;
- xt.dest_mode_logical = apic->dest_mode_logical;
- xt.vector = cfg->vector;
- xt.destid_0_23 = cfg->dest_apicid & GENMASK(23, 0);
- xt.destid_24_31 = cfg->dest_apicid >> 24;
-
- /**
- * Current IOMMU implemtation uses the same IRQ for all
- * 3 IOMMU interrupts.
- */
- writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
- writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
- writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
return 0;
}

static void intcapxt_irqdomain_deactivate(struct irq_domain *domain,
struct irq_data *irqd)
{
- intcapxt_mask_irq(irqd);
}


@@ -2090,6 +2060,38 @@ static void intcapxt_irqdomain_free(struct irq_domain *domain, unsigned int virq
irq_domain_free_irqs_top(domain, virq, nr_irqs);
}

+
+static void intcapxt_unmask_irq(struct irq_data *irqd)
+{
+ struct amd_iommu *iommu = irqd->chip_data;
+ struct irq_cfg *cfg = irqd_cfg(irqd);
+ union intcapxt xt;
+
+ xt.capxt = 0ULL;
+ xt.dest_mode_logical = apic->dest_mode_logical;
+ xt.vector = cfg->vector;
+ xt.destid_0_23 = cfg->dest_apicid & GENMASK(23, 0);
+ xt.destid_24_31 = cfg->dest_apicid >> 24;
+
+ /**
+ * Current IOMMU implementation uses the same IRQ for all
+ * 3 IOMMU interrupts.
+ */
+ writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
+ writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
+ writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
+}
+
+static void intcapxt_mask_irq(struct irq_data *irqd)
+{
+ struct amd_iommu *iommu = irqd->chip_data;
+
+ writeq(0, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
+ writeq(0, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
+ writeq(0, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
+}
+
+
static int intcapxt_set_affinity(struct irq_data *irqd,
const struct cpumask *mask, bool force)
{
@@ -2099,8 +2101,7 @@ static int intcapxt_set_affinity(struct irq_data *irqd,
ret = parent->chip->irq_set_affinity(parent, mask, force);
if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
return ret;
-
- return intcapxt_irqdomain_activate(irqd->domain, irqd, false);
+ return 0;
}

static struct irq_chip intcapxt_controller = {
--
2.26.3


2021-11-23 16:11:05

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 4/5] iommu/amd: x2apic mode: mask/unmask interrupts on suspend/resume

Use IRQCHIP_MASK_ON_SUSPEND to make the core irq code to
mask the iommu interrupt on suspend and unmask it on the resume.

Since now the unmask function updates the INTX settings,
that will restore them on resume from s3/s4.

Since IRQCHIP_MASK_ON_SUSPEND is only effective for interrupts
which are not wakeup sources, remove IRQCHIP_SKIP_SET_WAKE flag
and instead implement a dummy .irq_set_wake which doesn't allow
the interrupt to become a wakeup source.

Fixes: 66929812955bb ("iommu/amd: Add support for X2APIC IOMMU interrupts")

Signed-off-by: Maxim Levitsky <[email protected]>
---
drivers/iommu/amd/init.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 9e895bb8086a6..b94822fc2c9f7 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2104,6 +2104,11 @@ static int intcapxt_set_affinity(struct irq_data *irqd,
return 0;
}

+static int intcapxt_set_wake(struct irq_data *irqd, unsigned int on)
+{
+ return on ? -EOPNOTSUPP : 0;
+}
+
static struct irq_chip intcapxt_controller = {
.name = "IOMMU-MSI",
.irq_unmask = intcapxt_unmask_irq,
@@ -2111,7 +2116,8 @@ static struct irq_chip intcapxt_controller = {
.irq_ack = irq_chip_ack_parent,
.irq_retrigger = irq_chip_retrigger_hierarchy,
.irq_set_affinity = intcapxt_set_affinity,
- .flags = IRQCHIP_SKIP_SET_WAKE,
+ .irq_set_wake = intcapxt_set_wake,
+ .flags = IRQCHIP_MASK_ON_SUSPEND,
};

static const struct irq_domain_ops intcapxt_domain_ops = {
--
2.26.3


2021-11-23 16:11:07

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 5/5] iommu/amd: remove useless irq affinity notifier

iommu->intcapxt_notify field is no longer used
after a switch to a separate domain was done

Fixes: d1adcfbb520c ("iommu/amd: Fix IOMMU interrupt generation in X2APIC mode")
Signed-off-by: Maxim Levitsky <[email protected]>
---
drivers/iommu/amd/amd_iommu_types.h | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 867535eb0ce97..ffc89c4fb1205 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -645,8 +645,6 @@ struct amd_iommu {
/* DebugFS Info */
struct dentry *debugfs;
#endif
- /* IRQ notifier for IntCapXT interrupt */
- struct irq_affinity_notify intcapxt_notify;
};

static inline struct amd_iommu *dev_to_amd_iommu(struct device *dev)
--
2.26.3


2021-12-01 23:08:17

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

On Tue, 2021-11-23 at 18:10 +0200, Maxim Levitsky wrote:
> As I sadly found out, a s3 cycle makes the AMD's iommu stop sending interrupts
> until the system is rebooted.
>
> I only noticed it now because otherwise the IOMMU works, and these interrupts
> are only used for errors and for GA log which I tend not to use by
> making my VMs do mwait/pause/etc in guest (cpu-pm=on).
>
> There are two issues here that prevent interrupts from being generated after
> s3 cycle:
>
> 1. GA log base address was not restored after resume, and was all zeroed
> after resume (by BIOS or such).
>
> In theory if BIOS writes some junk to it, that can even cause a memory corruption.
> Patch 2 fixes that.
>
> 2. INTX (aka x2apic mode) settings were not restored after resume.
> That mode is used regardless if the host uses/supports x2apic, but rather when
> the IOMMU supports it, and mine does.
> Patches 3-4 fix that.
>
> Note that there is still one slight (userspace) bug remaining:
> During suspend all but the boot CPU are offlined and then after resume
> are onlined again.
>
> The offlining moves all non-affinity managed interrupts to CPU0, and
> later when all other CPUs are onlined, there is nothing in the kernel
> to spread back the interrupts over the cores.
>
> The userspace 'irqbalance' daemon does fix this but it seems to ignore
> the IOMMU interrupts in INTX mode since they are not attached to any
> PCI device, and thus they remain on CPU0 after a s3 cycle,
> which is suboptimal when the system has multiple IOMMUs
> (mine has 4 of them).
>
> Setting the IRQ affinity manually via /proc/irq/ does work.
>
> This was tested on my 3970X with both INTX and regular MSI mode (later was enabled
> by patching out INTX detection), by running a guest with AVIC enabled and with
> a PCI assigned device (network card), and observing interrupts from
> IOMMU while guest is mostly idle.
>
> This was also tested on my AMD laptop with 4650U (which has the same issue)
> (I tested only INTX mode)
>
> Patch 1 is a small refactoring to remove an unused struct field.
>
> Best regards,
> Maxim Levitsky
>
> Maxim Levitsky (5):
> iommu/amd: restore GA log/tail pointer on host resume
> iommu/amd: x2apic mode: re-enable after resume
> iommu/amd: x2apic mode: setup the INTX registers on mask/unmask
> iommu/amd: x2apic mode: mask/unmask interrupts on suspend/resume
> iommu/amd: remove useless irq affinity notifier
>
> drivers/iommu/amd/amd_iommu_types.h | 2 -
> drivers/iommu/amd/init.c | 107 +++++++++++++++-------------
> 2 files changed, 58 insertions(+), 51 deletions(-)
>
> --
> 2.26.3
>
>

Polite ping on these patches.
Best regards,
Maxim Levitsky


2021-12-06 14:01:25

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

On Tue, Nov 23, 2021 at 06:10:33PM +0200, Maxim Levitsky wrote:
> Best regards,
> Maxim Levitsky
>
> Maxim Levitsky (5):
> iommu/amd: restore GA log/tail pointer on host resume
> iommu/amd: x2apic mode: re-enable after resume
> iommu/amd: x2apic mode: setup the INTX registers on mask/unmask
> iommu/amd: x2apic mode: mask/unmask interrupts on suspend/resume
> iommu/amd: remove useless irq affinity notifier
>
> drivers/iommu/amd/amd_iommu_types.h | 2 -
> drivers/iommu/amd/init.c | 107 +++++++++++++++-------------
> 2 files changed, 58 insertions(+), 51 deletions(-)

Suravee, can you please have a look? These look like v5.16 material.

Thanks,

Joerg

2021-12-10 08:00:40

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

On Thu, 2021-12-02 at 01:08 +0200, Maxim Levitsky wrote:
> On Tue, 2021-11-23 at 18:10 +0200, Maxim Levitsky wrote:
> > As I sadly found out, a s3 cycle makes the AMD's iommu stop sending interrupts
> > until the system is rebooted.
> >
> > I only noticed it now because otherwise the IOMMU works, and these interrupts
> > are only used for errors and for GA log which I tend not to use by
> > making my VMs do mwait/pause/etc in guest (cpu-pm=on).
> >
> > There are two issues here that prevent interrupts from being generated after
> > s3 cycle:
> >
> > 1. GA log base address was not restored after resume, and was all zeroed
> > after resume (by BIOS or such).
> >
> > In theory if BIOS writes some junk to it, that can even cause a memory corruption.
> > Patch 2 fixes that.
> >
> > 2. INTX (aka x2apic mode) settings were not restored after resume.
> > That mode is used regardless if the host uses/supports x2apic, but rather when
> > the IOMMU supports it, and mine does.
> > Patches 3-4 fix that.
> >
> > Note that there is still one slight (userspace) bug remaining:
> > During suspend all but the boot CPU are offlined and then after resume
> > are onlined again.
> >
> > The offlining moves all non-affinity managed interrupts to CPU0, and
> > later when all other CPUs are onlined, there is nothing in the kernel
> > to spread back the interrupts over the cores.
> >
> > The userspace 'irqbalance' daemon does fix this but it seems to ignore
> > the IOMMU interrupts in INTX mode since they are not attached to any
> > PCI device, and thus they remain on CPU0 after a s3 cycle,
> > which is suboptimal when the system has multiple IOMMUs
> > (mine has 4 of them).
> >
> > Setting the IRQ affinity manually via /proc/irq/ does work.
> >
> > This was tested on my 3970X with both INTX and regular MSI mode (later was enabled
> > by patching out INTX detection), by running a guest with AVIC enabled and with
> > a PCI assigned device (network card), and observing interrupts from
> > IOMMU while guest is mostly idle.
> >
> > This was also tested on my AMD laptop with 4650U (which has the same issue)
> > (I tested only INTX mode)
> >
> > Patch 1 is a small refactoring to remove an unused struct field.
> >
> > Best regards,
> > Maxim Levitsky
> >
> > Maxim Levitsky (5):
> > iommu/amd: restore GA log/tail pointer on host resume
> > iommu/amd: x2apic mode: re-enable after resume
> > iommu/amd: x2apic mode: setup the INTX registers on mask/unmask
> > iommu/amd: x2apic mode: mask/unmask interrupts on suspend/resume
> > iommu/amd: remove useless irq affinity notifier
> >
> > drivers/iommu/amd/amd_iommu_types.h | 2 -
> > drivers/iommu/amd/init.c | 107 +++++++++++++++-------------
> > 2 files changed, 58 insertions(+), 51 deletions(-)
> >
> > --
> > 2.26.3
> >
> >
>
> Polite ping on these patches.

Another very polite ping on these patches :)

Best regards,
Maxim Levitsky

> Best regards,
> Maxim Levitsky



2021-12-17 08:31:20

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

On Tue, Nov 23, 2021 at 06:10:33PM +0200, Maxim Levitsky wrote:
> Maxim Levitsky (5):
> iommu/amd: restore GA log/tail pointer on host resume
> iommu/amd: x2apic mode: re-enable after resume
> iommu/amd: x2apic mode: setup the INTX registers on mask/unmask
> iommu/amd: x2apic mode: mask/unmask interrupts on suspend/resume
> iommu/amd: remove useless irq affinity notifier
>
> drivers/iommu/amd/amd_iommu_types.h | 2 -
> drivers/iommu/amd/init.c | 107 +++++++++++++++-------------
> 2 files changed, 58 insertions(+), 51 deletions(-)

Applied for v5.17, thanks.

2022-01-25 22:48:05

by Mike Lothian

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

Hi

I'm seeing a WARNING that I think might be related to these patches, unfortunately another issue is making bisecting difficult

[ 0.359362] AMD-Vi: X2APIC enabled
[ 0.395140] ------------[ cut here ]------------
[ 0.395142] WARNING: CPU: 0 PID: 1 at amd_iommu_enable_interrupts+0x1da/0x440
[ 0.395146] Modules linked in:
[ 0.395148] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc1-tip+ #2995
[ 0.395150] Hardware name: ASUSTeK COMPUTER INC. ROG Strix G513QY_G513QY/G513QY, BIOS G513QY.316 11/29/2021
[ 0.395152] RIP: 0010:amd_iommu_enable_interrupts+0x1da/0x440
[ 0.395154] Code: 4b 38 48 89 41 18 b8 a0 86 01 00 0f 1f 44 00 00 48 8b 4b 38 8b 89 20 20 00 00 f7 c1 00 01 00 00 0f 85 7a fe ff ff ff c8 75 e6 <0f> 0b e9 6f fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[ 0.395157] RSP: 0018:ffff88810022fc68 EFLAGS: 00010246
[ 0.395158] RAX: 0000000000000000 RBX: ffff88810004b000 RCX: 0000000000000018
[ 0.395160] RDX: 0000000000000008 RSI: ffff88810022fc70 RDI: ffffc900000800f0
[ 0.395161] RBP: ffff88810022fc68 R08: ffff888100fce088 R09: 0000000000000000
[ 0.395162] R10: 0000000000000000 R11: ffffffffffffffff R12: ffffffff7fffffff
[ 0.395163] R13: 0000777f80000000 R14: 0000000000000000 R15: ffffffff8357c9e8
[ 0.395165] FS: 0000000000000000(0000) GS:ffff888fde400000(0000) knlGS:0000000000000000
[ 0.395166] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.395167] CR2: ffff88901e1ff000 CR3: 00000000b440c000 CR4: 0000000000150ef0
[ 0.395169] Call Trace:
[ 0.395170] <TASK>
[ 0.395171] ? iommu_setup+0x29a/0x29a
[ 0.395174] ? state_next+0x6e/0x1c9
[ 0.395177] ? iommu_setup+0x29a/0x29a
[ 0.395178] ? iommu_go_to_state+0x1f/0x33
[ 0.395180] ? amd_iommu_init+0xa/0x23
[ 0.395182] ? pci_iommu_init+0xf/0x45
[ 0.395183] ? iommu_setup+0x29a/0x29a
[ 0.395184] ? __initstub__kmod_pci_dma__250_136_pci_iommu_initrootfs+0x5/0x8
[ 0.395186] ? do_one_initcall+0x100/0x290
[ 0.395190] ? do_initcall_level+0x8b/0xe5
[ 0.395192] ? do_initcalls+0x44/0x6d
[ 0.395194] ? kernel_init_freeable+0xc7/0x10d
[ 0.395196] ? rest_init+0xc0/0xc0
[ 0.395198] ? kernel_init+0x11/0x150
[ 0.395200] ? ret_from_fork+0x22/0x30
[ 0.395201] </TASK>
[ 0.395202] ---[ end trace 0000000000000000 ]---
[ 0.395204] PCI-DMA: Using software bounce buffer

Let me know if you need any more info

Cheers

Mike

2022-01-26 07:39:05

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

On Tue, 2022-01-25 at 15:08 +0000, Mike Lothian wrote:
> Hi
>
> I'm seeing a WARNING that I think might be related to these patches, unfortunately another issue is making bisecting difficult
>
> [ 0.359362] AMD-Vi: X2APIC enabled
> [ 0.395140] ------------[ cut here ]------------
> [ 0.395142] WARNING: CPU: 0 PID: 1 at amd_iommu_enable_interrupts+0x1da/0x440
> [ 0.395146] Modules linked in:
> [ 0.395148] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc1-tip+ #2995
> [ 0.395150] Hardware name: ASUSTeK COMPUTER INC. ROG Strix G513QY_G513QY/G513QY, BIOS G513QY.316 11/29/2021
> [ 0.395152] RIP: 0010:amd_iommu_enable_interrupts+0x1da/0x440
> [ 0.395154] Code: 4b 38 48 89 41 18 b8 a0 86 01 00 0f 1f 44 00 00 48 8b 4b 38 8b 89 20 20 00 00 f7 c1 00 01 00 00 0f 85 7a fe ff ff ff c8 75 e6 <0f> 0b e9 6f fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
> [ 0.395157] RSP: 0018:ffff88810022fc68 EFLAGS: 00010246
> [ 0.395158] RAX: 0000000000000000 RBX: ffff88810004b000 RCX: 0000000000000018
> [ 0.395160] RDX: 0000000000000008 RSI: ffff88810022fc70 RDI: ffffc900000800f0
> [ 0.395161] RBP: ffff88810022fc68 R08: ffff888100fce088 R09: 0000000000000000
> [ 0.395162] R10: 0000000000000000 R11: ffffffffffffffff R12: ffffffff7fffffff
> [ 0.395163] R13: 0000777f80000000 R14: 0000000000000000 R15: ffffffff8357c9e8
> [ 0.395165] FS: 0000000000000000(0000) GS:ffff888fde400000(0000) knlGS:0000000000000000
> [ 0.395166] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.395167] CR2: ffff88901e1ff000 CR3: 00000000b440c000 CR4: 0000000000150ef0
> [ 0.395169] Call Trace:
> [ 0.395170] <TASK>
> [ 0.395171] ? iommu_setup+0x29a/0x29a
> [ 0.395174] ? state_next+0x6e/0x1c9
> [ 0.395177] ? iommu_setup+0x29a/0x29a
> [ 0.395178] ? iommu_go_to_state+0x1f/0x33
> [ 0.395180] ? amd_iommu_init+0xa/0x23
> [ 0.395182] ? pci_iommu_init+0xf/0x45
> [ 0.395183] ? iommu_setup+0x29a/0x29a
> [ 0.395184] ? __initstub__kmod_pci_dma__250_136_pci_iommu_initrootfs+0x5/0x8
> [ 0.395186] ? do_one_initcall+0x100/0x290
> [ 0.395190] ? do_initcall_level+0x8b/0xe5
> [ 0.395192] ? do_initcalls+0x44/0x6d
> [ 0.395194] ? kernel_init_freeable+0xc7/0x10d
> [ 0.395196] ? rest_init+0xc0/0xc0
> [ 0.395198] ? kernel_init+0x11/0x150
> [ 0.395200] ? ret_from_fork+0x22/0x30
> [ 0.395201] </TASK>
> [ 0.395202] ---[ end trace 0000000000000000 ]---
> [ 0.395204] PCI-DMA: Using software bounce buffer
>
> Let me know if you need any more info
>
> Cheers
>
> Mike


Could you just apply these patches on top of 5.15 kernel and see if you get the warning?

If something could case it is I think patch 1, it does move the GA log enabled
to be a bit later.
I also added few warnings there. I wonder why your dmesg quote doesn't contain the C line
where the warning happens.

In partucular I added:

if (WARN_ON(status & (MMIO_STATUS_GALOG_RUN_MASK)))

That will fire if GA log is already running (maybe BIOS enabled it? - it really shouldn't do that)


And that:

if (WARN_ON(i >= LOOP_TIMEOUT))

also should not happen and worth to be logged IMHO.

Best regards,
Maxim Levitsky



2022-01-26 13:04:04

by Mike Lothian

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

On Tue, 25 Jan 2022 at 19:26, Maxim Levitsky <[email protected]> wrote:
>
> Could you just apply these patches on top of 5.15 kernel and see if you get the warning?
>
> If something could case it is I think patch 1, it does move the GA log enabled
> to be a bit later.
> I also added few warnings there. I wonder why your dmesg quote doesn't contain the C line
> where the warning happens.
>
> In partucular I added:
>
> if (WARN_ON(status & (MMIO_STATUS_GALOG_RUN_MASK)))
>
> That will fire if GA log is already running (maybe BIOS enabled it? - it really shouldn't do that)
>
>
> And that:
>
> if (WARN_ON(i >= LOOP_TIMEOUT))
>
> also should not happen and worth to be logged IMHO.
>
> Best regards,
> Maxim Levitsky
>

Hi

I applied on top of another kernel as you asked, I also enabled some debugging

[ 0.398833] ------------[ cut here ]------------
[ 0.398835] WARNING: CPU: 0 PID: 1 at drivers/iommu/amd/init.c:839
amd_iommu_enable_interrupts+0x1da/0x440
[ 0.398840] Modules linked in:
[ 0.398841] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc5-agd5f+ #1388
[ 0.398843] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.316 11/29/2021
[ 0.398845] RIP: 0010:amd_iommu_enable_interrupts+0x1da/0x440
[ 0.398847] Code: 4b 38 48 89 41 18 b8 a0 86 01 00 0f 1f 44 00 00
48 8b 4b 38 8b 89 20 20 00 00 f7 c1 00 01 00 00 0f 85 7a fe ff ff ff
c8 75 e6 <0f> 0b e9 6f fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
0
0 00
[ 0.398850] RSP: 0018:ffff888100927c68 EFLAGS: 00010246
[ 0.398851] RAX: 0000000000000000 RBX: ffff88810004b000 RCX: 0000000000000018
[ 0.398853] RDX: 0000000000000008 RSI: ffff888100927c70 RDI: ffffc900000800f0
[ 0.398854] RBP: ffff888100927c68 R08: ffff8881015b8f88 R09: 0000000000000000
[ 0.398855] R10: 0000000000000000 R11: ffffffffffffffff R12: ffffffff7fffffff
[ 0.398856] R13: 0000777f80000000 R14: 0000000000000000 R15: ffffffff8357a758
[ 0.398858] FS: 0000000000000000(0000) GS:ffff888fde400000(0000)
knlGS:0000000000000000
[ 0.398859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.398860] CR2: 0000000000000000 CR3: 00000000ac40c000 CR4: 0000000000150ef0
[ 0.398862] Call Trace:
[ 0.398864] <TASK>
[ 0.398864] ? iommu_setup+0x29a/0x29a
[ 0.398867] ? state_next+0x6e/0x1c9
[ 0.398870] ? iommu_setup+0x29a/0x29a
[ 0.398872] ? iommu_go_to_state+0x1f/0x33
[ 0.398873] ? amd_iommu_init+0xa/0x23
[ 0.398875] ? pci_iommu_init+0xf/0x45
[ 0.398876] ? iommu_setup+0x29a/0x29a
[ 0.398878] ? __initstub__kmod_pci_dma__244_136_pci_iommu_initrootfs+0x5/0x8
[ 0.398880] ? do_one_initcall+0x100/0x290
[ 0.398882] ? do_initcall_level+0x8b/0xe5
[ 0.398884] ? do_initcalls+0x44/0x6d
[ 0.398885] ? kernel_init_freeable+0xc7/0x10d
[ 0.398886] ? rest_init+0xc0/0xc0
[ 0.398888] ? kernel_init+0x11/0x150
[ 0.398889] ? ret_from_fork+0x22/0x30
[ 0.398891] </TASK>
[ 0.398892] ---[ end trace f048a4ec907dc976 ]---

Which points to patch one and "if (WARN_ON(i >= LOOP_TIMEOUT))"

Hope that helps

Mike

2022-01-26 20:22:20

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

On Tue, 2022-01-25 at 23:25 +0000, Mike Lothian wrote:
> On Tue, 25 Jan 2022 at 19:26, Maxim Levitsky <[email protected]> wrote:
> > Could you just apply these patches on top of 5.15 kernel and see if you get the warning?
> >
> > If something could case it is I think patch 1, it does move the GA log enabled
> > to be a bit later.
> > I also added few warnings there. I wonder why your dmesg quote doesn't contain the C line
> > where the warning happens.
> >
> > In partucular I added:
> >
> > if (WARN_ON(status & (MMIO_STATUS_GALOG_RUN_MASK)))
> >
> > That will fire if GA log is already running (maybe BIOS enabled it? - it really shouldn't do that)
> >
> >
> > And that:
> >
> > if (WARN_ON(i >= LOOP_TIMEOUT))
> >
> > also should not happen and worth to be logged IMHO.
> >
> > Best regards,
> > Maxim Levitsky
> >
>
> Hi
>
> I applied on top of another kernel as you asked, I also enabled some debugging
>
> [ 0.398833] ------------[ cut here ]------------
> [ 0.398835] WARNING: CPU: 0 PID: 1 at drivers/iommu/amd/init.c:839
> amd_iommu_enable_interrupts+0x1da/0x440
> [ 0.398840] Modules linked in:
> [ 0.398841] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc5-agd5f+ #1388
> [ 0.398843] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
> G513QY_G513QY/G513QY, BIOS G513QY.316 11/29/2021
> [ 0.398845] RIP: 0010:amd_iommu_enable_interrupts+0x1da/0x440
> [ 0.398847] Code: 4b 38 48 89 41 18 b8 a0 86 01 00 0f 1f 44 00 00
> 48 8b 4b 38 8b 89 20 20 00 00 f7 c1 00 01 00 00 0f 85 7a fe ff ff ff
> c8 75 e6 <0f> 0b e9 6f fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
> 0
> 0 00
> [ 0.398850] RSP: 0018:ffff888100927c68 EFLAGS: 00010246
> [ 0.398851] RAX: 0000000000000000 RBX: ffff88810004b000 RCX: 0000000000000018
> [ 0.398853] RDX: 0000000000000008 RSI: ffff888100927c70 RDI: ffffc900000800f0
> [ 0.398854] RBP: ffff888100927c68 R08: ffff8881015b8f88 R09: 0000000000000000
> [ 0.398855] R10: 0000000000000000 R11: ffffffffffffffff R12: ffffffff7fffffff
> [ 0.398856] R13: 0000777f80000000 R14: 0000000000000000 R15: ffffffff8357a758
> [ 0.398858] FS: 0000000000000000(0000) GS:ffff888fde400000(0000)
> knlGS:0000000000000000
> [ 0.398859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.398860] CR2: 0000000000000000 CR3: 00000000ac40c000 CR4: 0000000000150ef0
> [ 0.398862] Call Trace:
> [ 0.398864] <TASK>
> [ 0.398864] ? iommu_setup+0x29a/0x29a
> [ 0.398867] ? state_next+0x6e/0x1c9
> [ 0.398870] ? iommu_setup+0x29a/0x29a
> [ 0.398872] ? iommu_go_to_state+0x1f/0x33
> [ 0.398873] ? amd_iommu_init+0xa/0x23
> [ 0.398875] ? pci_iommu_init+0xf/0x45
> [ 0.398876] ? iommu_setup+0x29a/0x29a
> [ 0.398878] ? __initstub__kmod_pci_dma__244_136_pci_iommu_initrootfs+0x5/0x8
> [ 0.398880] ? do_one_initcall+0x100/0x290
> [ 0.398882] ? do_initcall_level+0x8b/0xe5
> [ 0.398884] ? do_initcalls+0x44/0x6d
> [ 0.398885] ? kernel_init_freeable+0xc7/0x10d
> [ 0.398886] ? rest_init+0xc0/0xc0
> [ 0.398888] ? kernel_init+0x11/0x150
> [ 0.398889] ? ret_from_fork+0x22/0x30
> [ 0.398891] </TASK>
> [ 0.398892] ---[ end trace f048a4ec907dc976 ]---
>
> Which points to patch one and "if (WARN_ON(i >= LOOP_TIMEOUT))"


Could you post the whole dmesg, or at least:

dmesg | grep AMD-Vi


What CPU does your system have?

I suspect that your system doesn't GA log feature enabled in the IOMMU, and the code never checks
for that, and here it fails enabling it, which before my patches was just
ignoring it silently.


Best regards,
Maxim Levitsky
>
> Hope that helps
>
> Mike
>


2022-01-26 20:53:32

by Mike Lothian

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

On Wed, 26 Jan 2022 at 07:34, Maxim Levitsky <[email protected]> wrote:
>
> Could you post the whole dmesg, or at least:
>
> dmesg | grep AMD-Vi
>
>
> What CPU does your system have?
>
> I suspect that your system doesn't GA log feature enabled in the IOMMU, and the code never checks
> for that, and here it fails enabling it, which before my patches was just
> ignoring it silently.
>
>
> Best regards,
> Maxim Levitsky
> >
> > Hope that helps
> >
> > Mike
> >

Hi

It's an AMD Ryzen 9 5900HX

[ 0.186350] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0, rdevid:160
[ 0.186353] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1, rdevid:160
[ 0.186354] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2, rdevid:160
[ 0.186355] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3, rdevid:160
[ 0.355628] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.356134] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 0.356136] AMD-Vi: Extended features (0x206d73ef22254ade): PPR
X2APIC NX GT IA GA PC GA_vAPIC
[ 0.356140] AMD-Vi: Interrupt remapping enabled
[ 0.356141] AMD-Vi: Virtual APIC enabled
[ 0.356142] AMD-Vi: X2APIC enabled
[ 0.431377] AMD-Vi: AMD IOMMUv2 loaded and initialized

I've attached the dmesg, I notice that some boots it doesn't happen

Cheers

Mike


Attachments:
x2apic.dmesg (83.28 kB)

2022-01-26 20:58:28

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

On Wed, 2022-01-26 at 09:54 +0000, Mike Lothian wrote:
> On Wed, 26 Jan 2022 at 07:34, Maxim Levitsky <[email protected]> wrote:
> > Could you post the whole dmesg, or at least:
> >
> > dmesg | grep AMD-Vi
> >
> >
> > What CPU does your system have?
> >
> > I suspect that your system doesn't GA log feature enabled in the IOMMU, and the code never checks
> > for that, and here it fails enabling it, which before my patches was just
> > ignoring it silently.
> >
> >
> > Best regards,
> > Maxim Levitsky
> > > Hope that helps
> > >
> > > Mike
> > >
>
> Hi
>
> It's an AMD Ryzen 9 5900HX
>
> [ 0.186350] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0, rdevid:160
> [ 0.186353] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1, rdevid:160
> [ 0.186354] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2, rdevid:160
> [ 0.186355] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3, rdevid:160
> [ 0.355628] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
> [ 0.356134] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
> [ 0.356136] AMD-Vi: Extended features (0x206d73ef22254ade): PPR
> X2APIC NX GT IA GA PC GA_vAPIC
> [ 0.356140] AMD-Vi: Interrupt remapping enabled
> [ 0.356141] AMD-Vi: Virtual APIC enabled
> [ 0.356142] AMD-Vi: X2APIC enabled
> [ 0.431377] AMD-Vi: AMD IOMMUv2 loaded and initialized
>
> I've attached the dmesg, I notice that some boots it doesn't happen
>
> Cheers
>
> Mike

Great, your system does seem to support GA log
(but a patch to check if, other that assume blindly that it is supported is
something that should be done).

So could you bump the LOOP_TIMEOUT like by 10x or so and see if the problem goes away?

(that code should be rewritten to time based wait and not just blindly loop like that,
I also can prepare a patch for that as well).

Best regards,
Maxim Levitsky

2022-01-27 08:06:27

by Mike Lothian

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

On Wed, 26 Jan 2022 at 10:12, Maxim Levitsky <[email protected]> wrote:
>
> Great, your system does seem to support GA log
> (but a patch to check if, other that assume blindly that it is supported is
> something that should be done).
>
> So could you bump the LOOP_TIMEOUT like by 10x or so and see if the problem goes away?
>
> (that code should be rewritten to time based wait and not just blindly loop like that,
> I also can prepare a patch for that as well).
>
> Best regards,
> Maxim Levitsky
>

Hi

I've done quite a few restarts with the LOOP_TIMEOUT increased and
I've not seen the issue since

Cheers

Mike

2022-01-27 17:57:31

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

On Thu, 2022-01-27 at 00:39 +0000, Mike Lothian wrote:
> On Wed, 26 Jan 2022 at 10:12, Maxim Levitsky <[email protected]> wrote:
> > Great, your system does seem to support GA log
> > (but a patch to check if, other that assume blindly that it is supported is
> > something that should be done).
> >
> > So could you bump the LOOP_TIMEOUT like by 10x or so and see if the problem goes away?
> >
> > (that code should be rewritten to time based wait and not just blindly loop like that,
> > I also can prepare a patch for that as well).
> >
> > Best regards,
> > Maxim Levitsky
> >
>
> Hi
>
> I've done quite a few restarts with the LOOP_TIMEOUT increased and
> I've not seen the issue since

Great, so the problem is solved I guess.
Thanks for the help!


I'll send a patch for this in few days to replace this and other similiar timeouts
with a proper udelay() wait.

Best regards,
Maxim Levitsky

>
> Cheers
>
> Mike
>


2022-01-27 18:12:03

by Mike Lothian

[permalink] [raw]
Subject: Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume

On Thu, 27 Jan 2022 at 10:22, Maxim Levitsky <[email protected]> wrote:
>
> On Thu, 2022-01-27 at 00:39 +0000, Mike Lothian wrote:
> > On Wed, 26 Jan 2022 at 10:12, Maxim Levitsky <[email protected]> wrote:
> > > Great, your system does seem to support GA log
> > > (but a patch to check if, other that assume blindly that it is supported is
> > > something that should be done).
> > >
> > > So could you bump the LOOP_TIMEOUT like by 10x or so and see if the problem goes away?
> > >
> > > (that code should be rewritten to time based wait and not just blindly loop like that,
> > > I also can prepare a patch for that as well).
> > >
> > > Best regards,
> > > Maxim Levitsky
> > >
> >
> > Hi
> >
> > I've done quite a few restarts with the LOOP_TIMEOUT increased and
> > I've not seen the issue since
>
> Great, so the problem is solved I guess.
> Thanks for the help!
>
>
> I'll send a patch for this in few days to replace this and other similiar timeouts
> with a proper udelay() wait.
>

Thanks for your help