Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp735839pxb; Thu, 25 Feb 2021 13:50:29 -0800 (PST) X-Google-Smtp-Source: ABdhPJyGMlxtRPTgbC6guzyUXYptCbeD+xtxOeaBBVQ2P5SvuxGCKTxQ8VYEQTnIBf1l9hp1zOwe X-Received: by 2002:a17:906:4d99:: with SMTP id s25mr4841443eju.351.1614289829540; Thu, 25 Feb 2021 13:50:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614289829; cv=none; d=google.com; s=arc-20160816; b=MFsZCCkUU5LeoBi/xgq0LQIVH5YWSNIzMtJQmtUavIfKrJR4ponbrWg6tczPDxeBJN B7y7nemDrl8Q6Xx5olXPm/dSG1RNblLkdn3AmyxBDtZmZTJAXCiSd4u+1dfjZuCAb9rs 8EvSB2JefJFa7TNDmDj0y+ATi3ZjudCFEgbV3VM8zSZ3Ss5tvhI6rkppcoFggRq6Q77G Srt9S7xCS/DIdFZu8pCvDWpAnO7tQuXKNaylr30wOpR7KtFK+HJ+jwykvsQuP2Uv4pDB LQnSF2/i5WbyhJSs+mw5+byDqj15hAaykugiF65IfJo44+hzmkf/ussz8ouDjXw/Lool btQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter; bh=o0aCFhK6hbOYxsIx7d6kKmaUc/JroV5nWAPRFyt2WAo=; b=paVK8PfSgS4TUL9jGfcm3lEawZZf/8vX486WGfVvafWwAWflI2sFqZgAmlaxvTEm7k D/OsHJBXftqfXpFk33ZnHsxnbb2QQtqKnaJgfRC86G4NYEWskPUPvAH3JYg0f/EtpOh0 /FCv0f/wMKc8koUcWjTGbEXUXjNwKO3iHU8GZzRGfeKyoQM8ygKlUI8mDgqsHzZM5e+s gMDt9nZNmlOR0naeKJLOnHJPcHT4OV9SvHYtBNFo9dzhWOuwK5HSEMx8W4C//XX7wqlm zUSk0nHYq3NNYTRYEBcvPaJB+bOgmNhWv40mB2cjs+hTwKigQQeokjVWLwdpfTMlyFb5 7w1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=e1yFIxBs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y5si4343091edm.43.2021.02.25.13.50.06; Thu, 25 Feb 2021 13:50:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=e1yFIxBs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233804AbhBYVrO (ORCPT + 99 others); Thu, 25 Feb 2021 16:47:14 -0500 Received: from mail.efficios.com ([167.114.26.124]:40640 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233365AbhBYVrM (ORCPT ); Thu, 25 Feb 2021 16:47:12 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id CD650308F50; Thu, 25 Feb 2021 16:46:30 -0500 (EST) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Kl9dAKxEcf28; Thu, 25 Feb 2021 16:46:30 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 4FBA4308CE5; Thu, 25 Feb 2021 16:46:30 -0500 (EST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 4FBA4308CE5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1614289590; bh=o0aCFhK6hbOYxsIx7d6kKmaUc/JroV5nWAPRFyt2WAo=; h=Date:From:To:Message-ID:MIME-Version; b=e1yFIxBsDQ7Nu4LsQM2dVNLiYYrL++s9uYuCRAl+zegkLDJW3JsQJwlWMcTvmop8a EEWjF69tjE+HdbdvpPmXNjr3W0+FkxMY95TbNzwdNPjkgqLhOaRHPBxKUC5yP5hpTN U9th3mRpfnRNKM5jlADeQjyg6qCyWXJ1znf1SKUWMZQt86TF39TEGU0CFYNtIkzlmB nHG+aD+F3ol3tHxWw7SLCm5EWr37j4IiCKIJMcA2b+Bynsa2Ed5usDjTsijcjVkxsN QJIG99ujgD5J6Xk9gQTD6xo17FES73Il8tlSbIb7oL8cvqCuMkmPZIn+S+L6Hqblvl P+Y/JiYTJofiw== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id XLjMK8LN6nAu; Thu, 25 Feb 2021 16:46:30 -0500 (EST) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id 3662A308CE4; Thu, 25 Feb 2021 16:46:30 -0500 (EST) Date: Thu, 25 Feb 2021 16:46:30 -0500 (EST) From: Mathieu Desnoyers To: rostedt Cc: Michael Jeanson , linux-kernel , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , paulmck , Ingo Molnar , acme , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , "Joel Fernandes, Google" , bpf Message-ID: <1130245502.6977.1614289590089.JavaMail.zimbra@efficios.com> In-Reply-To: <20210224131405.20d64b49@gandalf.local.home> References: <20210218222125.46565-1-mjeanson@efficios.com> <20210223211639.670db85c@gandalf.local.home> <083bce0f-bd66-ab83-1211-be9838499b45@efficios.com> <915297635.2997.1614185975415.JavaMail.zimbra@efficios.com> <20210224131405.20d64b49@gandalf.local.home> Subject: Re: [RFC PATCH 0/6] [RFC] Faultable tracepoints (v2) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_3996 (ZimbraWebClient - FF86 (Linux)/8.8.15_GA_4007) Thread-Topic: Faultable tracepoints (v2) Thread-Index: dAUCUbioVIglboY1pCOYm7WxMYc8kQ== Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Feb 24, 2021, at 1:14 PM, rostedt rostedt@goodmis.org wrote: > On Wed, 24 Feb 2021 11:59:35 -0500 (EST) > Mathieu Desnoyers wrote: >> >> As a prototype solution, what I've done currently is to copy the user-space >> data into a kmalloc'd buffer in a preparation step before disabling preemption >> and copying data over into the per-cpu buffers. It works, but I think we should >> be able to do it without the needless copy. >> >> What I have in mind as an efficient solution (not implemented yet) for the LTTng >> kernel tracer goes as follows: >> >> #define COMMIT_LOCAL 0 >> #define COMMIT_REMOTE 1 >> >> - faultable probe is called from system call tracepoint [ >> preemption/blocking/migration is allowed ] >> - probe code calculate the length which needs to be reserved to store the event >> (e.g. user strlen), >> >> - preempt disable -> [ preemption/blocking/migration is not allowed from here ] >> - reserve_cpu = smp_processor_id() >> - reserve space in the ring buffer for reserve_cpu >> [ from that point on, we have _exclusive_ access to write into the ring buffer >> "slot" >> from any cpu until we commit. ] >> - preempt enable -> [ preemption/blocking/migration is allowed from here ] >> > > So basically the commit position here doesn't move until this task is > scheduled back in and the commit (remote or local) is updated. Indeed. > To put it in terms of the ftrace ring buffer, where we have both a commit > page and a commit index, and it only gets moved by the first one to start a > commit stack (that is, interrupts that interrupted a write will not > increment the commit). The tricky part for ftrace is its reliance on the fact that the concurrent users of the per-cpu ring buffer are all nested contexts. LTTng does not assume that and has been designed to be used both in kernel and user-space: lttng-modules and lttng-ust share a lot of ring buffer code. Therefore, LTTng's ring buffer supports preemption/migration of concurrent contexts. The fact that LTTng uses local-atomic-ops on its kernel ring buffers is just an optimization on an overall ring buffer design meant to allow preemption. > Now, I'm not sure how LTTng does it, but I could see issues for ftrace to > try to move the commit pointer (the pointer to the new commit page), as the > design is currently dependent on the fact that it can't happen while > commits are taken place. Indeed, what makes it easy for LTTng is because the ring buffer has been designed to support preemption/migration from the ground up. > Are the pages of the LTTng indexed by an array of pages? Yes, they are. Handling the initial page allocation and then the tracer copy of data to/from the ring buffer pages is the responsibility of the LTTng lib ring buffer "backend". The LTTng lib ring buffer backend is somewhat similar to a page table done in software, where the top level of the page table can be dynamically updated when doing flight recorder tracing. It is however completely separate from the space reservation/commit scheme which is handled by the lib ring buffer "frontend". The algorithm I described in my prior email is specifically targeted at the frontend layer, leaving the "backend" unchanged. For some reasons I suspect Ftrace ring buffer combined those two layers into a single algorithm, which may have its advantages, but seems to strengthen its dependency on only having nested contexts sharing a given per-cpu ring buffer. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com