Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp15094pxb; Tue, 23 Feb 2021 16:34:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJypd3QprKNP0TL6gVrCF89lfq9W48GLt7CNVTF7fUGxa77olcaTJ/QdCAilqNR8y4/+QLCD X-Received: by 2002:a17:906:1951:: with SMTP id b17mr27612488eje.409.1614126893527; Tue, 23 Feb 2021 16:34:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614126893; cv=none; d=google.com; s=arc-20160816; b=gdsq8G5CbPFRM7NAFUbPWeWIpXjZSPcd4c4gfnN2Qv0fsUTmRp71vs33CHVuJnZ3T7 y20rmNZvcZnFVBvopqXy6DFmD08VIBIiSXQnG6FbRVx7PJVIwDqWZLz7Pf3E39RsZE5k WpQnWBdyF+WvPiSQ8+BaX7rfSo2KCx945pmgOWavmAVF+N24iEG36hnKvwg2BYMHSbvB QfKDMP4+OjLPdhAmHYdfIcGikngFjAHNGuidDUQgTz2qWmOd9/qpLBMBeaqfbqfe0cwo pOTDJCPPjK/TmnGEPRbaMGVkUgDMBq2quLf8lj36/kgMG4yogOFykDwcK3M+mXFoAShb Rsbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=fZxhrFPoVeuINwqWYDTQrRIMFcJNq/iX5+xBlXUPDIw=; b=JEnhyuoW5WT9WsYeGOuGLVr3J+jGZzG/CEFiQW42KbFbcLNsD3JlY2A3uTUcXceBPo CRvbNZZ9XMUKzu/OiKmZLsUmctQkOJkNADVf9B0DOqp8Il02MESHL3DWJZo1XCl8IzFY u9AZWxTYZ6d3TS6Ap7gKhD4COldd3xv5YORGSCOXIi6f3l7fyDcNLBSB1kTdtz+FuvBI MigHGLUjI730tU4Rd3up916CSMBLHojmCZqHkBYp6wo2Y2j4+k4xC2pV2BoRUXj5Rnze hpdhd7h8oQT4d7n0a6TF0p8V2PctenVohKaw6z6dI1tENHa373tc8wXv4t2NfWmRztN2 3R6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="Lkb/fePh"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 10si281966ejw.219.2021.02.23.16.33.44; Tue, 23 Feb 2021 16:34:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="Lkb/fePh"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230330AbhBWW23 (ORCPT + 99 others); Tue, 23 Feb 2021 17:28:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231978AbhBWW1X (ORCPT ); Tue, 23 Feb 2021 17:27:23 -0500 Received: from mail-ot1-x334.google.com (mail-ot1-x334.google.com [IPv6:2607:f8b0:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40627C061574 for ; Tue, 23 Feb 2021 14:26:42 -0800 (PST) Received: by mail-ot1-x334.google.com with SMTP id e45so242543ote.9 for ; Tue, 23 Feb 2021 14:26:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=fZxhrFPoVeuINwqWYDTQrRIMFcJNq/iX5+xBlXUPDIw=; b=Lkb/fePhnB3TeXO8nhWGUpDpvuWHlvarKeD0HrVivi0T+3SB/3s96S+TLlJRZWdu/7 LzAu9w42VS6gqUKOAyRRnlVvojownec1/CLyOEfqurejaYCCK5tVfresc04+Dl2XW/iN UGuZwMWXTPugvWndL9KLxaN5dLQR0IdCP9yjKf5/5tcsEkL84nOowVAysLLEnAfcg3cS z28cIFKTN5FIt0IiQWX+cbneyBitwyKBKVvCyjrF08Xt9Exkv3ZXlzFYa0UB2Km3buwT 44bJM65RqceFOuvwN2cfOXlSaBbfpqTbw3rXZXWhefF7OsdNuGkVpewdqnJ/XW/wWUWS blkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=fZxhrFPoVeuINwqWYDTQrRIMFcJNq/iX5+xBlXUPDIw=; b=ddF8qu4wuHSABYSoMX/XHYFKnjIa18+Oh/oJ8nlEt59EtI9LB56vQbX6H3C1cq2gQQ DJtXENUid1hRgv4DmHZ6PchH3KcZQQMK2xm8cmlWinniV7R0tQSs5hWFqie5YWUpZ0MJ 08bKYt4TWTyFoiMBxDQxZC+ykyXmtopLUo2rEdxeuov4icjB/soPRaNGklQXqce3Xu8F Rf54s9Dk1cY0h9J4Fxc8AJ5urDagSGEkfe3r/quR5TsYEokXRxmCpRjVTtC0QjKls82n ZJ068eoug0qoz++xUaiJ8e20SWsCjkL/AN+hTCA+PinEKKmMw+AM7fNif2oHTWLBThEN 4DqA== X-Gm-Message-State: AOAM530GPGLO40tyAtqwo5qqi9lLBtnfry+iRH8yqWLcB/1m+ewphyza iv/30j/JX4ChycJk5wHrmDqVgYnW2TFx7sZZMSv0fw== X-Received: by 2002:a9d:5a05:: with SMTP id v5mr22397134oth.17.1614119201352; Tue, 23 Feb 2021 14:26:41 -0800 (PST) MIME-Version: 1.0 References: <20210223143426.2412737-1-elver@google.com> <3D507285-835F-4C83-8343-2888835971B4@amacapital.net> In-Reply-To: <3D507285-835F-4C83-8343-2888835971B4@amacapital.net> From: Marco Elver Date: Tue, 23 Feb 2021 23:26:29 +0100 Message-ID: Subject: Re: [PATCH RFC 0/4] Add support for synchronous signals on perf events To: Andy Lutomirski Cc: Peter Zijlstra , Alexander Shishkin , Arnaldo Carvalho de Melo , Ingo Molnar , Jiri Olsa , Mark Rutland , Namhyung Kim , Thomas Gleixner , Alexander Potapenko , Al Viro , Arnd Bergmann , Christian Brauner , Dmitry Vyukov , Jann Horn , Jens Axboe , Matt Morehouse , Peter Collingbourne , Ian Rogers , kasan-dev , linux-arch , linux-fsdevel , LKML , linux-m68k@lists.linux-m68k.org, "the arch/x86 maintainers" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 23 Feb 2021 at 21:27, Andy Lutomirski wrote: > > On Feb 23, 2021, at 6:34 AM, Marco Elver wrote: > > > > =EF=BB=BFThe perf subsystem today unifies various tracing and monitorin= g > > features, from both software and hardware. One benefit of the perf > > subsystem is automatically inheriting events to child tasks, which > > enables process-wide events monitoring with low overheads. By default > > perf events are non-intrusive, not affecting behaviour of the tasks > > being monitored. > > > > For certain use-cases, however, it makes sense to leverage the > > generality of the perf events subsystem and optionally allow the tasks > > being monitored to receive signals on events they are interested in. > > This patch series adds the option to synchronously signal user space on > > events. > > Unless I missed some machinations, which is entirely possible, you can=E2= =80=99t call force_sig_info() from NMI context. Not only am I not convinced= that the core signal code is NMI safe, but at least x86 can=E2=80=99t corr= ectly deliver signals on NMI return. You probably need an IPI-to-self. force_sig_info() is called from an irq_work only: perf_pending_event -> perf_pending_event_disable -> perf_sigtrap -> force_sig_info. What did I miss? > > The discussion at [1] led to the changes proposed in this series. The > > approach taken in patch 3/4 to use 'event_limit' to trigger the signal > > was kindly suggested by Peter Zijlstra in [2]. > > > > [1] https://lore.kernel.org/lkml/CACT4Y+YPrXGw+AtESxAgPyZ84TYkNZdP0xpoc= X2jwVAbZD=3D-XQ@mail.gmail.com/ > > [2] https://lore.kernel.org/lkml/YBv3rAT566k+6zjg@hirez.programming.kic= ks-ass.net/ > > > > Motivation and example uses: > > > > 1. Our immediate motivation is low-overhead sampling-based race > > detection for user-space [3]. By using perf_event_open() at > > process initialization, we can create hardware > > breakpoint/watchpoint events that are propagated automatically > > to all threads in a process. As far as we are aware, today no > > existing kernel facility (such as ptrace) allows us to set up > > process-wide watchpoints with minimal overheads (that are > > comparable to mprotect() of whole pages). > > This would be doable much more simply with an API to set a breakpoint. A= ll the machinery exists except the actual user API. Isn't perf_event_open() that API? A new user API implementation will either be a thin wrapper around perf events or reinvent half of perf events to deal with managing watchpoints across a set of tasks (process-wide or some subset). It's not just breakpoints though. > > [3] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf > > > > 2. Other low-overhead error detectors that rely on detecting > > accesses to certain memory locations or code, process-wide and > > also only in a specific set of subtasks or threads. > > > > Other example use-cases we found potentially interesting: > > > > 3. Code hot patching without full stop-the-world. Specifically, by > > setting a code breakpoint to entry to the patched routine, then > > send signals to threads and check that they are not in the > > routine, but without stopping them further. If any of the > > threads will enter the routine, it will receive SIGTRAP and > > pause. > > Cute. > > > > > 4. Safepoints without mprotect(). Some Java implementations use > > "load from a known memory location" as a safepoint. When threads > > need to be stopped, the page containing the location is > > mprotect()ed and threads get a signal. This can be replaced with > > a watchpoint, which does not require a whole page nor DTLB > > shootdowns. > > I=E2=80=99m skeptical. Propagating a hardware breakpoint to all threads i= nvolves IPIs and horribly slow writes to DR1 (or 2, 3, or 4) and DR7. A TL= B flush can be accelerated using paravirt or hypothetical future hardware. = Or real live hardware on ARM64. > > (The hypothetical future hardware is almost present on Zen 3. A bit of w= ork is needed on the hardware end to make it useful.) Fair enough. Although watchpoints can be much more fine-grained than an mprotect() which then also has downsides (checking if the accessed memory was actually the bytes we're interested in). Maybe we should also ask CPU vendors to give us better watchpoints (perhaps start with more of them, and easier to set in batch)? We still need a user space API... Thanks, -- Marco > > > > 5. Tracking data flow globally. > > > > 6. Threads receiving signals on performance events to > > throttle/unthrottle themselves.