Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp3009074pxu; Sat, 10 Oct 2020 15:55:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwAUVn9bYNx43lp8Hi8DArv67v70rCYn3QMGKTKwRqye+VgUnulOBLZelPcxCuosRXKu6av X-Received: by 2002:a17:906:60d6:: with SMTP id f22mr21324122ejk.250.1602370540461; Sat, 10 Oct 2020 15:55:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602370540; cv=none; d=google.com; s=arc-20160816; b=I694ZTuRzoF6L72sqyu7RBhnRDTsjReFv1hmiWtBLkPBTlA1d/sG8i+Zzj0CSJCUlK A/Tfj7J5Pt17Q8rtYp+4J+EXOUMlL4VdogwJRPMUJ8gBP0IAviZt2rB0olxwcsPvqEPc 5Pw0K2CMwX95gIZ11Qy9M6Oo/MV9ym3uUAxyw1ToTxbsa0uumDgUZljDabOQpiRFxyU/ 6AKlJgIZ5tkj/FQULYt7xuCKMR9LCnXiGn9sqv10qGp+UAabOLiaSVWGZECIa3AT13Ij SNTz3WG2lBl6AafVHYtNyttduk3+DK9RbSfxbcIYkVEWs0HQ5hxaD8kNCOXWD7ODpE6r 4iQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:sender:dkim-signature; bh=xU7vX6pQFejp9ijUnVlAOr7jSaZpMx0gsOkDnDT2lNo=; b=y5RsNLta6Pg3RfO4X6K0RtSSGJ5DS7fg3CeTTlsZqrKX4mS4NPzEJpukejj4YcZMuE UX7YmnmiVivablnWqvlJe9U3YH4D7NfUOSF3NCzlM7/laGdFUUxeCu9OndaPI76MekbW 36fFWqQ6+DKMF5rH/s7SoprsHMAlqqAEfrcvyMmjza8TpGGUM5yiATIkgwBO2mIvk80h srkD7R+/HcbdeBa7Dn5exbEC6EykBnSOkq7zYTOsG4mYveJvHr385WeAy8D5W/qmxD+Y 3jQPxMCZbJXzrQLScBvvOt6jcZ5IcYsUNj2yDumfHi/eCwiZTnmTxj4E4EaAawQnLBGW IbhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=nf9ptOaV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gs18si10876959ejb.8.2020.10.10.15.55.17; Sat, 10 Oct 2020 15:55:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=nf9ptOaV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390581AbgJIWFe (ORCPT + 99 others); Fri, 9 Oct 2020 18:05:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389840AbgJIWFd (ORCPT ); Fri, 9 Oct 2020 18:05:33 -0400 Received: from mail-qv1-xf4a.google.com (mail-qv1-xf4a.google.com [IPv6:2607:f8b0:4864:20::f4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 524F8C0613D5 for ; Fri, 9 Oct 2020 15:05:33 -0700 (PDT) Received: by mail-qv1-xf4a.google.com with SMTP id z12so6805763qvp.11 for ; Fri, 09 Oct 2020 15:05:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=xU7vX6pQFejp9ijUnVlAOr7jSaZpMx0gsOkDnDT2lNo=; b=nf9ptOaVxbau3V8WGUWQfOKny35tdngx3/rnYBjV6ftcNvD5Cy+jhPo1bxvxC6oCaK ngTZwNSzgyvGpjBUQvoIxYbD6zXnT1QA8asxud+Up99K+L1PMaIuXKGA6dF/rodk3etC I1B/BJ6zZEzAUDogdRwYJWyrsQUD/Ofhh2dfceSwywEejLBgzp0JAmlWHOjCwoXfpHPB q/qdlyvbzmcwoaSXfthLfdY659ftKz77WCHBFzP6MyPDBzn8L50YFJsd+HpjIqlh4awR M7iNR0WpN5+H056rLkQSQHNb9IHQxMWGtzoWxEg9dQq2skxq20wlUt9li1bw2ReBCs2X EDog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=xU7vX6pQFejp9ijUnVlAOr7jSaZpMx0gsOkDnDT2lNo=; b=umf9vU2hIj4yfQesKS1rUstOIHBC+yF5y624PcRlfPcuJmVG2g8+gWLc3Wwf5mmWbb aKhB08iwfNqt1b69aqQkjowNshL+/HWqYSp1f35pC2xXXfGBzXqyXW1T2qymlCiqEJJJ mfKPw+cQocf7ptgaQOumR6Xior8auwmvXuuycjGqnKCX9C40lzE1imLH1McjVG7Xtvjs 94ELJCGhRWhH3i96iqwMV1oRHRu0MLAFG0iyxm4PrhXM+SaJxZVjW9+Rc4MHr1SUEiVH nHTFaXAQsw+9fCDgwf+0JgbiGcZ2vqCbl6m2m9gv8WUEKS3Z9ssYte3zEM/tsnXwAlJl jwMw== X-Gm-Message-State: AOAM533ftz29YOxGaSQICA12culEpQsdx5WO1E9BWlLL0HPIJw12cdAp MuP6MeFlfuQi7o8TTaKqemZRkXraJUimmPk6Gdth Sender: "axelrasmussen via sendgmr" X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:f693:9fff:feef:c8f8]) (user=axelrasmussen job=sendgmr) by 2002:ad4:5184:: with SMTP id b4mr14643938qvp.26.1602281132423; Fri, 09 Oct 2020 15:05:32 -0700 (PDT) Date: Fri, 9 Oct 2020 15:05:24 -0700 In-Reply-To: <20201009220524.485102-1-axelrasmussen@google.com> Message-Id: <20201009220524.485102-3-axelrasmussen@google.com> Mime-Version: 1.0 References: <20201009220524.485102-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.28.0.1011.ga647a8990f-goog Subject: [PATCH v3 2/2] mmap_lock: add tracepoints around lock acquisition From: Axel Rasmussen To: Steven Rostedt , Ingo Molnar , Andrew Morton , Michel Lespinasse , Vlastimil Babka , Daniel Jordan , Laurent Dufour , Axel Rasmussen , Jann Horn , Chinwen Chang Cc: Yafang Shao , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The goal of these tracepoints is to be able to debug lock contention issues. This lock is acquired on most (all?) mmap / munmap / page fault operations, so a multi-threaded process which does a lot of these can experience significant contention. We trace just before we start acquisition, when the acquisition returns (whether it succeeded or not), and when the lock is released (or downgraded). The events are broken out by lock type (read / write). The events are also broken out by memcg path. For container-based workloads, users often think of several processes in a memcg as a single logical "task", so collecting statistics at this level is useful. The end goal is to get latency information. This isn't directly included in the trace events. Instead, users are expected to compute the time between "start locking" and "acquire returned", using e.g. synthetic events or BPF. The benefit we get from this is simpler code. Because we use tracepoint_enabled() to decide whether or not to trace, this patch has effectively no overhead unless tracepoints are enabled at runtime. If tracepoints are enabled, there is a performance impact, but how much depends on exactly what e.g. the BPF program does. Signed-off-by: Axel Rasmussen --- include/linux/mmap_lock.h | 93 ++++++++++++++++++++++++++++++-- include/trace/events/mmap_lock.h | 70 ++++++++++++++++++++++++ mm/Makefile | 2 +- mm/mmap_lock.c | 87 ++++++++++++++++++++++++++++++ 4 files changed, 246 insertions(+), 6 deletions(-) create mode 100644 include/trace/events/mmap_lock.h create mode 100644 mm/mmap_lock.c diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 0707671851a8..6586b42b4faa 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -1,11 +1,63 @@ #ifndef _LINUX_MMAP_LOCK_H #define _LINUX_MMAP_LOCK_H +#include +#include #include +#include +#include +#include #define MMAP_LOCK_INITIALIZER(name) \ .mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock), +DECLARE_TRACEPOINT(mmap_lock_start_locking); +DECLARE_TRACEPOINT(mmap_lock_acquire_returned); +DECLARE_TRACEPOINT(mmap_lock_released); + +#ifdef CONFIG_TRACING + +void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write); +void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write, + bool success); +void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write); + +static inline void __mmap_lock_trace_start_locking(struct mm_struct *mm, + bool write) +{ + if (tracepoint_enabled(mmap_lock_start_locking)) + __mmap_lock_do_trace_start_locking(mm, write); +} + +static inline void __mmap_lock_trace_acquire_returned(struct mm_struct *mm, + bool write, bool success) +{ + if (tracepoint_enabled(mmap_lock_acquire_returned)) + __mmap_lock_do_trace_acquire_returned(mm, write, success); +} + +static inline void __mmap_lock_trace_released(struct mm_struct *mm, bool write) +{ + if (tracepoint_enabled(mmap_lock_released)) + __mmap_lock_do_trace_released(mm, write); +} + +#else /* !CONFIG_TRACING */ + +static inline void __mmap_lock_trace_start_locking(struct mm_struct *mm, + bool write) +{ +} +static inline void __mmap_lock_trace_acquire_returned(struct mm_struct *mm, + bool write, bool success) +{ +} +static inline void __mmap_lock_trace_released(struct mm_struct *mm, bool write) +{ +} + +#endif /* CONFIG_TRACING */ + static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(&mm->mmap_lock); @@ -13,58 +65,88 @@ static inline void mmap_init_lock(struct mm_struct *mm) static inline void mmap_write_lock(struct mm_struct *mm) { + __mmap_lock_trace_start_locking(mm, true); down_write(&mm->mmap_lock); + __mmap_lock_trace_acquire_returned(mm, true, true); } static inline void mmap_write_lock_nested(struct mm_struct *mm, int subclass) { + __mmap_lock_trace_start_locking(mm, true); down_write_nested(&mm->mmap_lock, subclass); + __mmap_lock_trace_acquire_returned(mm, true, true); } static inline int mmap_write_lock_killable(struct mm_struct *mm) { - return down_write_killable(&mm->mmap_lock); + int ret; + + __mmap_lock_trace_start_locking(mm, true); + ret = down_write_killable(&mm->mmap_lock); + __mmap_lock_trace_acquire_returned(mm, true, ret == 0); + return ret; } static inline bool mmap_write_trylock(struct mm_struct *mm) { - return down_write_trylock(&mm->mmap_lock) != 0; + bool ret; + + __mmap_lock_trace_start_locking(mm, true); + ret = down_write_trylock(&mm->mmap_lock) != 0; + __mmap_lock_trace_acquire_returned(mm, true, ret); + return ret; } static inline void mmap_write_unlock(struct mm_struct *mm) { up_write(&mm->mmap_lock); + __mmap_lock_trace_released(mm, true); } static inline void mmap_write_downgrade(struct mm_struct *mm) { downgrade_write(&mm->mmap_lock); + __mmap_lock_trace_acquire_returned(mm, false, true); } static inline void mmap_read_lock(struct mm_struct *mm) { + __mmap_lock_trace_start_locking(mm, false); down_read(&mm->mmap_lock); + __mmap_lock_trace_acquire_returned(mm, false, true); } static inline int mmap_read_lock_killable(struct mm_struct *mm) { - return down_read_killable(&mm->mmap_lock); + int ret; + + __mmap_lock_trace_start_locking(mm, false); + ret = down_read_killable(&mm->mmap_lock); + __mmap_lock_trace_acquire_returned(mm, false, ret == 0); + return ret; } static inline bool mmap_read_trylock(struct mm_struct *mm) { - return down_read_trylock(&mm->mmap_lock) != 0; + bool ret; + + __mmap_lock_trace_start_locking(mm, false); + ret = down_read_trylock(&mm->mmap_lock) != 0; + __mmap_lock_trace_acquire_returned(mm, false, ret); + return ret; } static inline void mmap_read_unlock(struct mm_struct *mm) { up_read(&mm->mmap_lock); + __mmap_lock_trace_released(mm, false); } static inline bool mmap_read_trylock_non_owner(struct mm_struct *mm) { - if (down_read_trylock(&mm->mmap_lock)) { + if (mmap_read_trylock(mm)) { rwsem_release(&mm->mmap_lock.dep_map, _RET_IP_); + __mmap_lock_trace_released(mm, false); return true; } return false; @@ -73,6 +155,7 @@ static inline bool mmap_read_trylock_non_owner(struct mm_struct *mm) static inline void mmap_read_unlock_non_owner(struct mm_struct *mm) { up_read_non_owner(&mm->mmap_lock); + __mmap_lock_trace_released(mm, false); } static inline void mmap_assert_locked(struct mm_struct *mm) diff --git a/include/trace/events/mmap_lock.h b/include/trace/events/mmap_lock.h new file mode 100644 index 000000000000..ca652b52510e --- /dev/null +++ b/include/trace/events/mmap_lock.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM mmap_lock + +#if !defined(_TRACE_MMAP_LOCK_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_MMAP_LOCK_H + +#include +#include + +struct mm_struct; + +DECLARE_EVENT_CLASS( + mmap_lock_template, + + TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write, + bool success), + + TP_ARGS(mm, memcg_path, write, success), + + TP_STRUCT__entry( + __field(struct mm_struct *, mm) + __string(memcg_path, memcg_path) + __field(bool, write) + __field(bool, success) + ), + + TP_fast_assign( + __entry->mm = mm; + __assign_str(memcg_path, memcg_path); + __entry->write = write; + __entry->success = success; + ), + + TP_printk( + "mm=%p memcg_path=%s write=%s success=%s\n", + __entry->mm, + __get_str(memcg_path), + __entry->write ? "true" : "false", + __entry->success ? "true" : "false") + ); + +DEFINE_EVENT(mmap_lock_template, mmap_lock_start_locking, + + TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write, + bool success), + + TP_ARGS(mm, memcg_path, write, success) +); + +DEFINE_EVENT(mmap_lock_template, mmap_lock_acquire_returned, + + TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write, + bool success), + + TP_ARGS(mm, memcg_path, write, success) +); + +DEFINE_EVENT(mmap_lock_template, mmap_lock_released, + + TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write, + bool success), + + TP_ARGS(mm, memcg_path, write, success) +); + +#endif /* _TRACE_MMAP_LOCK_H */ + +/* This part must be outside protection */ +#include diff --git a/mm/Makefile b/mm/Makefile index d5649f1c12c0..1a7ea212fd8b 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -52,7 +52,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ mm_init.o percpu.o slab_common.o \ compaction.o vmacache.o \ interval_tree.o list_lru.o workingset.o \ - debug.o gup.o $(mmu-y) + debug.o gup.o mmap_lock.o $(mmu-y) # Give 'page_alloc' its own module-parameter namespace page-alloc-y := page_alloc.o diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c new file mode 100644 index 000000000000..b849287bd12a --- /dev/null +++ b/mm/mmap_lock.c @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0 +#define CREATE_TRACE_POINTS +#include + +#include +#include +#include +#include +#include +#include +#include + +/* + * We have to export these, as drivers use mmap_lock, and our inline functions + * in the header check if the tracepoint is enabled. They can't be GPL, as e.g. + * the nvidia driver is an existing caller of this code. + */ +EXPORT_SYMBOL(__tracepoint_mmap_lock_start_locking); +EXPORT_SYMBOL(__tracepoint_mmap_lock_acquire_returned); +EXPORT_SYMBOL(__tracepoint_mmap_lock_released); + +#ifdef CONFIG_MEMCG + +DEFINE_PER_CPU(char[MAX_FILTER_STR_VAL], trace_memcg_path); + +/* + * Write the given mm_struct's memcg path to a percpu buffer, and return a + * pointer to it. If the path cannot be determined, the buffer will contain the + * empty string. + * + * Note: buffers are allocated per-cpu to avoid locking, so preemption must be + * disabled by the caller before calling us, and re-enabled only after the + * caller is done with the pointer. + */ +static const char *get_mm_memcg_path(struct mm_struct *mm) +{ + struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm); + + if (memcg != NULL && likely(memcg->css.cgroup != NULL)) { + char *buf = this_cpu_ptr(trace_memcg_path); + + cgroup_path(memcg->css.cgroup, buf, MAX_FILTER_STR_VAL); + return buf; + } + return ""; +} + +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ + do { \ + if (trace_mmap_lock_##type##_enabled()) { \ + get_cpu(); \ + trace_mmap_lock_##type(mm, get_mm_memcg_path(mm), \ + ##__VA_ARGS__); \ + put_cpu(); \ + } \ + } while (0) + +#else /* !CONFIG_MEMCG */ + +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ + trace_mmap_lock_##type(mm, "", ##__VA_ARGS__) + +#endif /* CONFIG_MEMCG */ + +/* + * Trace calls must be in a separate file, as otherwise there's a circular + * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h. + */ + +void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write) +{ + TRACE_MMAP_LOCK_EVENT(start_locking, mm, write, true); +} +EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking); + +void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write, + bool success) +{ + TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, write, success); +} +EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned); + +void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write) +{ + TRACE_MMAP_LOCK_EVENT(released, mm, write, true); +} +EXPORT_SYMBOL(__mmap_lock_do_trace_released); -- 2.28.0.1011.ga647a8990f-goog