Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp627347pxu; Fri, 23 Oct 2020 09:17:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzE6DieM2Bp52qM4xAS4xgpKgAqJ+o2nq0BwI/UOFF+kK41x2tQjrcJW5GA6IMDEePW+j18 X-Received: by 2002:a05:6402:135a:: with SMTP id y26mr2953281edw.114.1603469830310; Fri, 23 Oct 2020 09:17:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603469830; cv=none; d=google.com; s=arc-20160816; b=Lbjpn/KiD0hT0Cetduw1lK7uDBx4YyL3c9XS31FArdILyP9s4DkekY/FQgvt3gQyfA uXBK2VK2CqdhNik/tncumdC5x9DUuoVXIvdV5f9dDThMj7+htwnhshHjeqJaPVidj1ub cnuO7K0IdWknM2QxZVEB9Lj1wXX3LKc8/y1TimSvd6wGycqlnobnKTJzPoJlT4UpRLmQ KkmKt093j8E+ONg9Hek4zn4jogmtpbzrLqlLxOpP7/V+6sDQOOATuR4Lrdlgygf6T2TY xXL8E4dRk9zvR6x3ZM//6pTGbea31+UpVQNuZoPGtwJdJMMQumRkl1L+1FCJIR6EtKA4 K3xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject:from :references:cc:to; bh=rnb6sy26FTLPMyUBuEH14Q7YdyA9xVgNBffaxYrb4qM=; b=uOMD8edP/FtbHUJM58TYuJYWl47HT5SDJ7yCM+FboZ1Ls4bBgJqcdOZCWcsBpa0rCt W7ZfLd/bZWHX2gnM5L+PhHpc8Sqf3lgthLXVAi9AfO0q0pDNbn8WgwSYyTj9E29260OX a+BPCcejTJEabrBjEq90+PEKVzjYQbVhC3GWmSfmg86W6TXdW6HgbwomoECnngfRwFS1 +HFSiPqGz+yuN7CnsPqZ+0//zx4hwPfGEPLT3VME9F89I4bCTi4IEvLFKfLpLXsIk657 2T9oi1mhNnR05L3dr0SBgrE0gLWyUc85X+T+6gixZYDKIe6US98ml3ckBo73IxOmCFp9 +FsQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y3si1168503ejo.381.2020.10.23.09.16.47; Fri, 23 Oct 2020 09:17:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750186AbgJWOAB (ORCPT + 99 others); Fri, 23 Oct 2020 10:00:01 -0400 Received: from mx2.suse.de ([195.135.220.15]:45782 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S373819AbgJWOAB (ORCPT ); Fri, 23 Oct 2020 10:00:01 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id F2286AEC1; Fri, 23 Oct 2020 13:59:59 +0000 (UTC) To: Axel Rasmussen , Steven Rostedt , Ingo Molnar , Andrew Morton , Michel Lespinasse , Daniel Jordan , Jann Horn , Chinwen Chang , Davidlohr Bueso , David Rientjes Cc: Yafang Shao , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20201020184746.300555-1-axelrasmussen@google.com> <20201020184746.300555-2-axelrasmussen@google.com> From: Vlastimil Babka Subject: Re: [PATCH v4 1/1] mmap_lock: add tracepoints around lock acquisition Message-ID: Date: Fri, 23 Oct 2020 15:59:59 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.3.3 MIME-Version: 1.0 In-Reply-To: <20201020184746.300555-2-axelrasmussen@google.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/20/20 8:47 PM, Axel Rasmussen wrote: > The goal of these tracepoints is to be able to debug lock contention > issues. This lock is acquired on most (all?) mmap / munmap / page fault > operations, so a multi-threaded process which does a lot of these can > experience significant contention. > > We trace just before we start acquisition, when the acquisition returns > (whether it succeeded or not), and when the lock is released (or > downgraded). The events are broken out by lock type (read / write). > > The events are also broken out by memcg path. For container-based > workloads, users often think of several processes in a memcg as a single > logical "task", so collecting statistics at this level is useful. > > The end goal is to get latency information. This isn't directly included > in the trace events. Instead, users are expected to compute the time > between "start locking" and "acquire returned", using e.g. synthetic > events or BPF. The benefit we get from this is simpler code. > > Because we use tracepoint_enabled() to decide whether or not to trace, > this patch has effectively no overhead unless tracepoints are enabled at > runtime. If tracepoints are enabled, there is a performance impact, but > how much depends on exactly what e.g. the BPF program does. > > Reviewed-by: Michel Lespinasse > Acked-by: Yafang Shao > Acked-by: David Rientjes > Signed-off-by: Axel Rasmussen All seem fine to me, except I started to wonder.. > + > +#ifdef CONFIG_MEMCG > + > +DEFINE_PER_CPU(char[MAX_FILTER_STR_VAL], trace_memcg_path); > + > +/* > + * Write the given mm_struct's memcg path to a percpu buffer, and return a > + * pointer to it. If the path cannot be determined, the buffer will contain the > + * empty string. > + * > + * Note: buffers are allocated per-cpu to avoid locking, so preemption must be > + * disabled by the caller before calling us, and re-enabled only after the > + * caller is done with the pointer. Is this enough? What if we fill the buffer and then an interrupt comes and the handler calls here again? We overwrite the buffer and potentially report a wrong cgroup after the execution resumes? If nothing worse can happen (are interrupts disabled while the ftrace code is copying from the buffer?), then it's probably ok? > + */ > +static const char *get_mm_memcg_path(struct mm_struct *mm) > +{ > + struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm); > + > + if (memcg != NULL && likely(memcg->css.cgroup != NULL)) { > + char *buf = this_cpu_ptr(trace_memcg_path); > + > + cgroup_path(memcg->css.cgroup, buf, MAX_FILTER_STR_VAL); > + return buf; > + } > + return ""; > +} > + > +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ > + do { \ > + get_cpu(); \ > + trace_mmap_lock_##type(mm, get_mm_memcg_path(mm), \ > + ##__VA_ARGS__); \ > + put_cpu(); \ > + } while (0) > + > +#else /* !CONFIG_MEMCG */ > + > +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ > + trace_mmap_lock_##type(mm, "", ##__VA_ARGS__) > + > +#endif /* CONFIG_MEMCG */ > + > +/* > + * Trace calls must be in a separate file, as otherwise there's a circular > + * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h. > + */ > + > +void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write) > +{ > + TRACE_MMAP_LOCK_EVENT(start_locking, mm, write); > +} > +EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking); > + > +void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write, > + bool success) > +{ > + TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, write, success); > +} > +EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned); > + > +void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write) > +{ > + TRACE_MMAP_LOCK_EVENT(released, mm, write); > +} > +EXPORT_SYMBOL(__mmap_lock_do_trace_released); >