Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp3102262imw; Mon, 11 Jul 2022 01:41:52 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vM4EDKBlexzNlHhTj8f0qNTQXAAQViioLUDfhKlFdX/+i2hApZEVDbskwa9jOEu0QGdYO8 X-Received: by 2002:a63:4903:0:b0:416:97f:6115 with SMTP id w3-20020a634903000000b00416097f6115mr2961378pga.198.1657528911849; Mon, 11 Jul 2022 01:41:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657528911; cv=none; d=google.com; s=arc-20160816; b=x+Uq9DuDOH9fW0svSwE52+hPwF9nR/oJET69KLpE/1zKusi21sPdpAGpoU26nUIbBB tllb2dweT2t643SyZi+eHnurbtoreWOrEgidV7/sgsS6FJxAiuMw2tDDkj42rR0UxZvT kEHWfrnkplo/9ApAhIueLcHKvndv6PkvlHGG0OEePNNRknzEWEFfbJGoPmyrohLZIrxE 0/Bak27a0NxGbNBhJ7gPPr7s0YgYthXjIzyGLjeP4tAVNCfZZXx0nF/ZpnI2IoNUtUDW qTxXO6qaYuQ7Zq+N/nqNzxbbFB3AaKYZpk2S0TBuxWvpo2qnDmOoylm7CrtR2FoA77Pr prDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:mime-version:message-id:date :dkim-signature; bh=Ib4Fc/3Cp1X70LkZddSTB/wondVVSYZ1tVTiXK6eAog=; b=xYEA4ccny9AiFIxspBaJfMJcEphZO+Ts6iw4j6/rr8jBYbrAPfoBAXiN6PDOVKNbK7 GzQ+xF3vObv9LF5lkecZ0DcgzoLm1k8nUB2Ygmw2rZ0WFtipckcb28Tukq0IkJ5E5rZQ 9cY4jElXgDfJ2YT5btlm7XqkH+EwFX0Y9CPVRqP+NctPgkZSK76St7bJqmS6VYGFvvrn WrTROsBbV7KKIZ4XzSW6IqBXZ/e/Gg3h6X41Co7z2Suv8SIdMAG/ByPifskNJ05RTLdc p7XN7/OU6FWSx9LV+1rrloQEBdfP4iHEczAawB+IQ/TjXEb9A3IIRw5sbmOiM6c9be8h WuhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=qNo9KUnA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w5-20020a170902a70500b0016bd8071da6si7236475plq.473.2022.07.11.01.41.39; Mon, 11 Jul 2022 01:41:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=qNo9KUnA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229778AbiGKIRd (ORCPT + 99 others); Mon, 11 Jul 2022 04:17:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229593AbiGKIRb (ORCPT ); Mon, 11 Jul 2022 04:17:31 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7E6E1E3CA for ; Mon, 11 Jul 2022 01:17:30 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id w15-20020a25ac0f000000b0066e50e4a553so3261545ybi.16 for ; Mon, 11 Jul 2022 01:17:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=Ib4Fc/3Cp1X70LkZddSTB/wondVVSYZ1tVTiXK6eAog=; b=qNo9KUnAkAKMTTzv8JoUpYu6tWKvRhkg/2yalMse5/m6hAt0ZHORTgCvtrr0irxesH /txUHaWx5lsNjJ9fgVUDYsKU2YOVFoFP+wAqM/QuiyDTIjIWp/UCOqixrdU82mztdv26 sWYLmWwU98NgwZeKXxQqIMuHgVhhfIYLBFsdiOTIQRULtykAvMdwcYJ7O/5GO2KUNwja PqpgO0K+p9xNlGKUaEV3a000ZiKsd3fXiXWtVufIggQqhuWZMhQv4hVEx30zIqCPTIZ4 PaCxsMH0+DPR7Tg4qkZtRWMzNlXFhuRZKVTK3AEng4kO/emRr+EIDivdngbK0HYfrD2r OZpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=Ib4Fc/3Cp1X70LkZddSTB/wondVVSYZ1tVTiXK6eAog=; b=KhFV+A0PNzyL52zeX7wDxX071Rzq4+hbkBkdQb9lQArgSWzriykMU+WIYBiQ+vey1c fiLqWCZ1cx1OKtlRgdqw08CDrA40fddEmUyeOdsUGq4IPKEZ3+uLOhi/1EZReK7OnsVA SlBWeUkGQhGLDP9OxQHZ0IMwc26XQjkZdLWHYc5CY5woeUL7V6XA++RQ6fi/wk56XA8u WziNB7v3mWC+LgW5dmLBG/QTc2JsjiOFQnXAmRapcW2rn9qbq+11WF8jV4kSgPzwx3lm vgtci4b1kwjvjnJEk0TJmmzcrQax41okYGexSIzyK5nGSQb772R6P/YPLkXdTvt2Puri zJKA== X-Gm-Message-State: AJIora/9wu7fxLg/hJCLKi7i6iNByvppXbr5LisK2BXUcGKiT8hPn3fO 8Gg3BOaGSJ6yPNhgu7pKmQF576setx6z1tsD03U= X-Received: from seb.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:31bd]) (user=sebastianene job=sendgmr) by 2002:a81:f8f:0:b0:31c:bd9f:31ce with SMTP id 137-20020a810f8f000000b0031cbd9f31cemr17951152ywp.347.1657527450192; Mon, 11 Jul 2022 01:17:30 -0700 (PDT) Date: Mon, 11 Jul 2022 08:17:18 +0000 Message-Id: <20220711081720.2870509-1-sebastianene@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [PATCH v12 0/2] Detect stalls on guest vCPUS From: Sebastian Ene To: Rob Herring , Greg Kroah-Hartman , Arnd Bergmann , Dragan Cvetic Cc: linux-kernel@vger.kernel.org, devicetree@vger.kernel.org, maz@kernel.org, will@kernel.org, vdonnefort@google.com, Guenter Roeck , Sebastian Ene Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Minor change from v11 which cleans up the Kconfig option selection. This adds a mechanism to detect stalls on the guest vCPUS by creating a per CPU hrtimer which periodically 'pets' the host backend driver. On a conventional watchdog-core driver, the userspace is responsible for delivering the 'pet' events by writing to the particular /dev/watchdogN node. In this case we require a strong thread affinity to be able to account for lost time on a per vCPU basis. This device driver acts as a soft lockup detector by relying on the host backend driver to measure the elapesed time between subsequent 'pet' events. If the elapsed time doesn't match an expected value, the backend driver decides that the guest vCPU is locked and resets the guest. The host backend driver takes into account the time that the guest is not running. The communication with the backend driver is done through MMIO and the register layout of the virtual watchdog is described as part of the backend driver changes. The host backend driver is implemented as part of: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 Changelog v12: - don't select LOCKUP_DETECTOR from Kconfig when VCPU_STALL_DETECTOR is compiled in as suggested by Greg - add the review-by tag received from Guenter Changelog v11: - verify the values from DT if they are in an expected range and fallback to default values in case they are not. - added Will's review-by tag Changelog v10: - keep only the hrtimer and a flag in the per_cpu structure and move the other fields in a separate config structure - fix a potential race condition as pointed out by Will: the driver remove(..) can race with the hotplug cpu notifiers - replace alloc_percpu with devm_alloc_percpu and remove the free_percpu - unregister the hotplug notifiers - improve the Kconfig description and fix the license in the header file - add the review-by tag from Rob as the DT has not changed since v9 Changelog v9: - make the driver depend on CONFIG_OF - remove the platform_(set|get)_drvdata calls and keep a per-cpu static variable `vm_stall_detect` as suggested by Guenter on the (v8) series - improve commit description and fix styling Sebastian Ene (2): dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector compatible misc: Add a mechanism to detect stalls on guest vCPUs .../misc/qemu,vcpu-stall-detector.yaml | 51 ++++ drivers/misc/Kconfig | 13 + drivers/misc/Makefile | 1 + drivers/misc/vcpu_stall_detector.c | 223 ++++++++++++++++++ 4 files changed, 288 insertions(+) create mode 100644 Documentation/devicetree/bindings/misc/qemu,vcpu-stall-detector.yaml create mode 100644 drivers/misc/vcpu_stall_detector.c -- 2.37.0.rc0.161.g10f37bed90-goog