Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1728174ybt; Thu, 18 Jun 2020 16:03:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyJRWv0tpqMMWp2hjNzQHRnlPBRYaUkWXgks8sfX5UHvrXopO8lnHN3rlyJWuSSeaxRmE89 X-Received: by 2002:a17:906:360b:: with SMTP id q11mr986699ejb.290.1592521403238; Thu, 18 Jun 2020 16:03:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592521403; cv=none; d=google.com; s=arc-20160816; b=VcpYt4vy/U9ZcSSCm8APJJ+8gVQDw3zcUEQ6Wphpxtbk8cVLSGKw6RJXgeFav/Load tnInV9+scX05C/We/rQVReiepfNXVyPm9wj6lZunAMCVxe/iAkvoxdjUwA1SX9DSNttv HzgwSjBH4Fcooya6pDDXJNIbINSmBJIRqpFe5c5izaR9DebxYfBqbjCW1VnwGRVnGJnG ptPYlHL5cPhP0J1xjQC4pHpHs/14tGMvx9Moid2aBnsg92khBAOUk44GsV3D2e2o9mlc 3MgUBtNnodPJjvwAtKUtQbaVqanz3ow7ynnaFIug71PvRKurGiupNL3k0vb5Mke2D6lu 1GBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:mime-version :message-id:date:dkim-signature; bh=Ij26VoyzhYGvLjAG21TvPgeZf+MNFhcxJkH7FR67BL4=; b=alDPqVw68BMfptA+MU5cFCdCxMICYr29xHm5H2H2GGzESMRvQzKgvCUsXU2YskGd83 QW/JWz6v1ck79BBIIjOI0Y8zNKN9yfmy9M4qUrpBPBqluj+G04fRJ5wHsDnZTVLrkNRW zp8bvLLBK6h9RlFTMpygp39tgLUtYOiszavCJvbdl6nUAj6LP6F0aKg2UEaUcWQjcXTF ZnL7JFf/a2rv47ugh0T9y+42gCrzQchlzSkJUBCIRJ0myOw2AR+RgTJth0jlF6xY4b7m c348lYcIHiXtMF8kX5BT23ETw94SosMwxh6z2F77BrN0RykVIDyLRAWGGijj17IGorOK 9cCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=guONlS+z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dk3si2592718edb.580.2020.06.18.16.03.01; Thu, 18 Jun 2020 16:03:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=guONlS+z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731914AbgFRWWg (ORCPT + 99 others); Thu, 18 Jun 2020 18:22:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728071AbgFRWWf (ORCPT ); Thu, 18 Jun 2020 18:22:35 -0400 Received: from mail-qt1-x849.google.com (mail-qt1-x849.google.com [IPv6:2607:f8b0:4864:20::849]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2EB45C06174E for ; Thu, 18 Jun 2020 15:22:34 -0700 (PDT) Received: by mail-qt1-x849.google.com with SMTP id l26so5563544qtr.14 for ; Thu, 18 Jun 2020 15:22:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=Ij26VoyzhYGvLjAG21TvPgeZf+MNFhcxJkH7FR67BL4=; b=guONlS+zDvIwKKVsdiiPKNH7/eiTgu2dN+5fua4h7UfSizaHOrh8PHvEFcbioXU0UE s48UzvSNhVC6diNscyJjApImeqZRrbqiKamhkJipt5yFN2Kn8NDTYNWwObqjqxlh3tAw QHa0E/9PRSzQDQYhqBbFLle/SXijKCDIA0zzWc3XHyAGq5qVvT8cyZ0rGKKypA+/bQDO hUAa51FtVs70IN0xsiqETb3j9NfIC6deYVTKb2RtekE4VwtYibPz8Tj4lR6ck1pfQIv8 RtW2uLTOACEyOBvfTQJF4CqCiSIEJlTMR1AUn9awEU+Ez1WvYKQHHOuW7rpaY6v8UDGD 9ZSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=Ij26VoyzhYGvLjAG21TvPgeZf+MNFhcxJkH7FR67BL4=; b=rvKjFQxCtRl2TuRu2Vm929EnYC0LMPNyn0nomGPChuw4eJINcDexrBLdHd+qAt7hJg ksGmBbQbkNGEqJZp6pryxdSGH5VtZS/JAByEw6+I/3wCekaF21qbQvgfphxNjO5VF0OE VzbTxw+TY2h3+TVNhhrKcgZP7bsHhwhq4hetuZVLeyzybQyp8TUkEvndA9ndLbX/Zz98 Jv49C+CPh3ETSrCMsUFRWdXzovBqEvpbSWdMHM2lESRZXyNfkf7Nv59Dp41RukxqvR4G J8zLc1OZaAKKnUb9EQ8RG/9rKQlCHxDLM5Uux+ftymp6sW4KSvrXxu5pzCy0yUkDSW6g Ufzw== X-Gm-Message-State: AOAM531+zm65M4JYy7LqNntuHM7TWk2zpnYVL878YJuKa1Z63hwu6/Oe GhA9LgCzMbSctARjcSBBRbLCnb0xrfGAgwQ3SbpN X-Received: by 2002:a0c:e385:: with SMTP id a5mr6091856qvl.218.1592518953173; Thu, 18 Jun 2020 15:22:33 -0700 (PDT) Date: Thu, 18 Jun 2020 15:22:24 -0700 Message-Id: <20200618222225.102337-1-axelrasmussen@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.27.0.111.gc72c7da667-goog Subject: [RFC PATCH v3 0/1] Add rwsem "contended hook" API and mmap_lock histograms From: Axel Rasmussen To: Michel Lespinasse , Peter Zijlstra , David Howells Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Jonathan Adams , David Rientjes , Ying Han , Axel Rasmussen Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The overall goal of this patch is to add tracepoints around mmap_lock acquisition. This will let us collect latency histograms, so we can see how long we block for in the contended case. Our goal is to collect this data across all of production at Google, so low overhead is critical. I'm sending this RFC for feedback on the changes to rwsem.{h,c} and lockdep.h in particular. I'll describe reasoning for the down_write case, for brevity. We want to measure the time lock acquisition takes. Naively, this is: u64 start = sched_clock(); down_write(/* ... */); trace(sched_clock() - start); My measurements show that this adds ~5-6% overhead to building a kernel on a test machine [1]. This level of overhead is unacceptably high. My measurements show that only instrumenting the contended case lowers overhead to < 1%. Naively, we can instrument only the contended case like this: if (!down_write_trylock(/* ... */)) /* Time and call down_write as before. */ However, in the case where `_trylock` succeeds, we have lost the lockdep annotations (e.g. around ordering) `down_write` would normally include. (Granted, we don't run with lockdep in production, but debug builds do.) Assuming we need lower overhead, we aren't okay with losing lock annotations, and we reject various alternatives to this patch: - Making rwsem.c's __down_write and __down_write_trylock public, so mmap_lock.c could construct its own version of LOCK_CONTENDED with tracepoint calls. - Having mmap_lock.c reach into rwsem.c's internals with "extern" forward declarations for these functions (and removing "static inline"). - Somehow adding the instrumentation directly to rwsem.c (either affecting all locks, or polluting it some other way). The remaining alternative, I think, is what this patch proposes: add API surface to rwsem.h which allows callers to provide instrumentation callbacks which are invoked in the contended case. [1]: For measuring the overhead of the instrumentation, I've been timing a defconfig kernel build. The numbers above come from a KVM instance with 4 CPUs + 32G RAM, running 5.8-rc1 with this patch applied and a histogram trigger configured for the acquire_returned tracepoint. My test script is simple: for (( i=0; i<5; ++i)); do make mrproper > /dev/null || exit 1 make defconfig > /dev/null || exit 1 sync || exit 1 echo 3 > /proc/sys/vm/drop_caches || exit 1 /usr/bin/time make -j5 > /dev/null done The numbers I'm giving above are computed as: (avg of 5 runs with this hist trigger enabled) / (avg on 5.8-rc1). Axel Rasmussen (1): mmap_lock: add tracepoints around mmap_lock acquisition include/linux/lockdep.h | 47 ++++++ include/linux/mmap_lock.h | 27 ++- include/linux/rwsem.h | 12 ++ include/trace/events/mmap_lock.h | 76 +++++++++ kernel/locking/rwsem.c | 64 +++++++ mm/Kconfig | 19 +++ mm/Makefile | 1 + mm/mmap_lock.c | 281 +++++++++++++++++++++++++++++++ 8 files changed, 526 insertions(+), 1 deletion(-) create mode 100644 include/trace/events/mmap_lock.h create mode 100644 mm/mmap_lock.c -- 2.27.0.111.gc72c7da667-goog