2022-04-30 01:11:41

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs

On 4/29/22 01:48, Greg Kroah-Hartman wrote:
> On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
>> This driver creates per-cpu hrtimers which are required to do the
>> periodic 'pet' operation. On a conventional watchdog-core driver, the
>> userspace is responsible for delivering the 'pet' events by writing to
>> the particular /dev/watchdogN node. In this case we require a strong
>> thread affinity to be able to account for lost time on a per vCPU.
>>
>> This part of the driver is the 'frontend' which is reponsible for
>> delivering the periodic 'pet' events, configuring the virtual peripheral
>> and listening for cpu hotplug events. The other part of the driver
>> handles the peripheral emulation and this part accounts for lost time by
>> looking at the /proc/{}/task/{}/stat entries and is located here:
>> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
>>
>> Signed-off-by: Sebastian Ene <[email protected]>
>> ---
>> drivers/misc/Kconfig | 12 +++
>> drivers/misc/Makefile | 1 +
>> drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
>> 3 files changed, 219 insertions(+)
>> create mode 100644 drivers/misc/vm-watchdog.c
>>
>> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
>> index 2b9572a6d114..26c3a99e269c 100644
>> --- a/drivers/misc/Kconfig
>> +++ b/drivers/misc/Kconfig
>> @@ -493,6 +493,18 @@ config OPEN_DICE
>>
>> If unsure, say N.
>>
>> +config VM_WATCHDOG
>> + tristate "Virtual Machine Watchdog"
>> + select LOCKUP_DETECTOR
>> + help
>> + Detect CPU locks on the virtual machine. This driver relies on the
>> + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
>> + has to do a 'pet', it exits the guest through MMIO write and the
>> + backend driver takes into account the lost ticks for this particular
>> + CPU.
>> + To compile this driver as a module, choose M here: the
>> + module will be called vm-wdt.
>
> You forgot to name the module properly here based on the Makefile change
> you made.
>
> And again, as this is called a "watchdog", it seems crazy that it is not
> in drivers/watchdog/
>

I disagree. It is not a watchdog driver in the traditional sense (it does
not use, want to use, or need to use the watchdog driver API or ABI).
Its functionality is similar to the functionality of kernel/watchdog.c,
which doesn't belong into drivers/watchdog either.

Guenter


2022-05-03 00:07:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs

On Fri, Apr 29, 2022 at 09:51:51AM -0700, Guenter Roeck wrote:
> On 4/29/22 01:48, Greg Kroah-Hartman wrote:
> > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
> > > This driver creates per-cpu hrtimers which are required to do the
> > > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > > userspace is responsible for delivering the 'pet' events by writing to
> > > the particular /dev/watchdogN node. In this case we require a strong
> > > thread affinity to be able to account for lost time on a per vCPU.
> > >
> > > This part of the driver is the 'frontend' which is reponsible for
> > > delivering the periodic 'pet' events, configuring the virtual peripheral
> > > and listening for cpu hotplug events. The other part of the driver
> > > handles the peripheral emulation and this part accounts for lost time by
> > > looking at the /proc/{}/task/{}/stat entries and is located here:
> > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > >
> > > Signed-off-by: Sebastian Ene <[email protected]>
> > > ---
> > > drivers/misc/Kconfig | 12 +++
> > > drivers/misc/Makefile | 1 +
> > > drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
> > > 3 files changed, 219 insertions(+)
> > > create mode 100644 drivers/misc/vm-watchdog.c
> > >
> > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > > index 2b9572a6d114..26c3a99e269c 100644
> > > --- a/drivers/misc/Kconfig
> > > +++ b/drivers/misc/Kconfig
> > > @@ -493,6 +493,18 @@ config OPEN_DICE
> > > If unsure, say N.
> > > +config VM_WATCHDOG
> > > + tristate "Virtual Machine Watchdog"
> > > + select LOCKUP_DETECTOR
> > > + help
> > > + Detect CPU locks on the virtual machine. This driver relies on the
> > > + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
> > > + has to do a 'pet', it exits the guest through MMIO write and the
> > > + backend driver takes into account the lost ticks for this particular
> > > + CPU.
> > > + To compile this driver as a module, choose M here: the
> > > + module will be called vm-wdt.
> >
> > You forgot to name the module properly here based on the Makefile change
> > you made.
> >
> > And again, as this is called a "watchdog", it seems crazy that it is not
> > in drivers/watchdog/
> >
>
> I disagree. It is not a watchdog driver in the traditional sense (it does
> not use, want to use, or need to use the watchdog driver API or ABI).
> Its functionality is similar to the functionality of kernel/watchdog.c,
> which doesn't belong into drivers/watchdog either.

Ah, ok, that makes more sense, the user/kernel api is not the same.
Someone should put that in the changelog next time :)

thanks,

greg k-h

2022-05-03 00:14:39

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs

On 4/29/22 23:18, Greg Kroah-Hartman wrote:
> On Fri, Apr 29, 2022 at 09:51:51AM -0700, Guenter Roeck wrote:
>> On 4/29/22 01:48, Greg Kroah-Hartman wrote:
>>> On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
>>>> This driver creates per-cpu hrtimers which are required to do the
>>>> periodic 'pet' operation. On a conventional watchdog-core driver, the
>>>> userspace is responsible for delivering the 'pet' events by writing to
>>>> the particular /dev/watchdogN node. In this case we require a strong
>>>> thread affinity to be able to account for lost time on a per vCPU.
>>>>
>>>> This part of the driver is the 'frontend' which is reponsible for
>>>> delivering the periodic 'pet' events, configuring the virtual peripheral
>>>> and listening for cpu hotplug events. The other part of the driver
>>>> handles the peripheral emulation and this part accounts for lost time by
>>>> looking at the /proc/{}/task/{}/stat entries and is located here:
>>>> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
>>>>
>>>> Signed-off-by: Sebastian Ene <[email protected]>
>>>> ---
>>>> drivers/misc/Kconfig | 12 +++
>>>> drivers/misc/Makefile | 1 +
>>>> drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
>>>> 3 files changed, 219 insertions(+)
>>>> create mode 100644 drivers/misc/vm-watchdog.c
>>>>
>>>> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
>>>> index 2b9572a6d114..26c3a99e269c 100644
>>>> --- a/drivers/misc/Kconfig
>>>> +++ b/drivers/misc/Kconfig
>>>> @@ -493,6 +493,18 @@ config OPEN_DICE
>>>> If unsure, say N.
>>>> +config VM_WATCHDOG
>>>> + tristate "Virtual Machine Watchdog"
>>>> + select LOCKUP_DETECTOR
>>>> + help
>>>> + Detect CPU locks on the virtual machine. This driver relies on the
>>>> + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
>>>> + has to do a 'pet', it exits the guest through MMIO write and the
>>>> + backend driver takes into account the lost ticks for this particular
>>>> + CPU.
>>>> + To compile this driver as a module, choose M here: the
>>>> + module will be called vm-wdt.
>>>
>>> You forgot to name the module properly here based on the Makefile change
>>> you made.
>>>
>>> And again, as this is called a "watchdog", it seems crazy that it is not
>>> in drivers/watchdog/
>>>
>>
>> I disagree. It is not a watchdog driver in the traditional sense (it does
>> not use, want to use, or need to use the watchdog driver API or ABI).
>> Its functionality is similar to the functionality of kernel/watchdog.c,
>> which doesn't belong into drivers/watchdog either.
>
> Ah, ok, that makes more sense, the user/kernel api is not the same.
> Someone should put that in the changelog next time :)
>

Renaming it to "VCPU stall detector" or similar should fix the confusion.

Guenter