Hello,
I respined the v2 version to address an issue previously found here:
https://lore.kernel.org/all/[email protected]/
This is a small update of the previously introduced vcpu stall detector
which adds an interrupt to the virtual device to notify the guest VM in
case it stalls. This lets the guest VM to handle the reboot and to
panic in case it expires.
Changelog from v1:
* 1/2 : collected the Ack from Conor Dooley, thank you Conor !
* 2/2 : applied the feedback received from Conor and used
platform_get_irq_optional. Removed the error messages during
probe
Thanks,
Sebastian Ene (2):
dt-bindings: vcpu_stall_detector: Add a PPI interrupt to the virtual
device
misc: Register a PPI for the vcpu stall detection virtual device
.../misc/qemu,vcpu-stall-detector.yaml | 6 ++++
drivers/misc/vcpu_stall_detector.c | 31 +++++++++++++++++--
2 files changed, 35 insertions(+), 2 deletions(-)
--
2.45.2.505.gda0bf45e8d-goog
The vcpu stall detector allows the host to monitor the availability of a
guest VM. Introduce a PPI interrupt which can be injected from the host
into the virtual gic to let the guest reboot itself.
Signed-off-by: Sebastian Ene <[email protected]>
Acked-by: Conor Dooley <[email protected]>
---
.../devicetree/bindings/misc/qemu,vcpu-stall-detector.yaml | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/Documentation/devicetree/bindings/misc/qemu,vcpu-stall-detector.yaml b/Documentation/devicetree/bindings/misc/qemu,vcpu-stall-detector.yaml
index 1aebeb696ee0..e12d80be00cd 100644
--- a/Documentation/devicetree/bindings/misc/qemu,vcpu-stall-detector.yaml
+++ b/Documentation/devicetree/bindings/misc/qemu,vcpu-stall-detector.yaml
@@ -29,6 +29,9 @@ properties:
Defaults to 10 if unset.
default: 10
+ interrupts:
+ maxItems: 1
+
timeout-sec:
description: |
The stall detector expiration timeout measured in seconds.
@@ -43,9 +46,12 @@ additionalProperties: false
examples:
- |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
vmwdt@9030000 {
compatible = "qemu,vcpu-stall-detector";
reg = <0x9030000 0x10000>;
clock-frequency = <10>;
timeout-sec = <8>;
+ interrupts = <GIC_PPI 15 IRQ_TYPE_EDGE_RISING>;
};
--
2.45.2.505.gda0bf45e8d-goog
Request a PPI for each vCPU during probe which will be used by the host
to communicate a stall detected event on the vCPU. When the host raises
this interrupt from the virtual machine monitor, the guest is expected to
handle the interrupt and panic.
Signed-off-by: Sebastian Ene <[email protected]>
---
drivers/misc/vcpu_stall_detector.c | 31 ++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)
diff --git a/drivers/misc/vcpu_stall_detector.c b/drivers/misc/vcpu_stall_detector.c
index e2015c87f03f..41b8c2119e20 100644
--- a/drivers/misc/vcpu_stall_detector.c
+++ b/drivers/misc/vcpu_stall_detector.c
@@ -32,6 +32,7 @@
struct vcpu_stall_detect_config {
u32 clock_freq_hz;
u32 stall_timeout_sec;
+ int ppi_irq;
void __iomem *membase;
struct platform_device *dev;
@@ -77,6 +78,12 @@ vcpu_stall_detect_timer_fn(struct hrtimer *hrtimer)
return HRTIMER_RESTART;
}
+static irqreturn_t vcpu_stall_detector_irq(int irq, void *dev)
+{
+ panic("vCPU stall detector");
+ return IRQ_HANDLED;
+}
+
static int start_stall_detector_cpu(unsigned int cpu)
{
u32 ticks, ping_timeout_ms;
@@ -132,7 +139,7 @@ static int stop_stall_detector_cpu(unsigned int cpu)
static int vcpu_stall_detect_probe(struct platform_device *pdev)
{
- int ret;
+ int ret, irq;
struct resource *r;
void __iomem *membase;
u32 clock_freq_hz = VCPU_STALL_DEFAULT_CLOCK_HZ;
@@ -169,9 +176,22 @@ static int vcpu_stall_detect_probe(struct platform_device *pdev)
vcpu_stall_config = (struct vcpu_stall_detect_config) {
.membase = membase,
.clock_freq_hz = clock_freq_hz,
- .stall_timeout_sec = stall_timeout_sec
+ .stall_timeout_sec = stall_timeout_sec,
+ .ppi_irq = -1,
};
+ irq = platform_get_irq_optional(pdev, 0);
+ if (irq > 0) {
+ ret = request_percpu_irq(irq,
+ vcpu_stall_detector_irq,
+ "vcpu_stall_detector",
+ vcpu_stall_detectors);
+ if (ret)
+ goto err;
+
+ vcpu_stall_config.ppi_irq = irq;
+ }
+
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
"virt/vcpu_stall_detector:online",
start_stall_detector_cpu,
@@ -184,6 +204,9 @@ static int vcpu_stall_detect_probe(struct platform_device *pdev)
vcpu_stall_config.hp_online = ret;
return 0;
err:
+ if (vcpu_stall_config.ppi_irq > 0)
+ free_percpu_irq(vcpu_stall_config.ppi_irq,
+ vcpu_stall_detectors);
return ret;
}
@@ -193,6 +216,10 @@ static void vcpu_stall_detect_remove(struct platform_device *pdev)
cpuhp_remove_state(vcpu_stall_config.hp_online);
+ if (vcpu_stall_config.ppi_irq > 0)
+ free_percpu_irq(vcpu_stall_config.ppi_irq,
+ vcpu_stall_detectors);
+
for_each_possible_cpu(cpu)
stop_stall_detector_cpu(cpu);
}
--
2.45.2.505.gda0bf45e8d-goog
On Thu, 13 Jun 2024 15:13:33 +0100,
Sebastian Ene <[email protected]> wrote:
>
> Hello,
>
> I respined the v2 version to address an issue previously found here:
> https://lore.kernel.org/all/[email protected]/
So is this v2 or v3? Having two v2s on the list is... confusing.
M.
--
Without deviation from the norm, progress is not possible.
On Thu, Jun 13, 2024 at 03:52:08PM +0100, Marc Zyngier wrote:
> On Thu, 13 Jun 2024 15:13:33 +0100,
> Sebastian Ene <[email protected]> wrote:
> >
> > Hello,
> >
> > I respined the v2 version to address an issue previously found here:
> > https://lore.kernel.org/all/[email protected]/
>
> So is this v2 or v3? Having two v2s on the list is... confusing.
>
> M.
>
There is a small change in the patch 2/2 so you are right it should be v3,
sorry for the confusion.
> --
> Without deviation from the norm, progress is not possible.
Thanks,
Seb