Received: by 2002:ab2:4a89:0:b0:1f4:a8b6:6e69 with SMTP id w9csp310164lqj; Wed, 10 Apr 2024 11:03:05 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWmSxqcJj8WMPLWfJekaLXtTV+yza27NUa9OotBbWGt1WV2qFHNs16Y3eJLWDOcvi/ZyjH6nTqp1XxfW/PTxOSX0m5XmOpUEpci24JkQQ== X-Google-Smtp-Source: AGHT+IE406jU8cGJRVAERf0BBM3wBwHid4cflGnOHRa1+6VctCgsA+Pyz1OLYw/ef3Nl2SebwSYK X-Received: by 2002:a17:907:d2a:b0:a51:a689:c706 with SMTP id gn42-20020a1709070d2a00b00a51a689c706mr2447042ejc.74.1712772185348; Wed, 10 Apr 2024 11:03:05 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712772185; cv=pass; d=google.com; s=arc-20160816; b=CjyYI3CZW30WfefbD4mVqTzd2L2R/RMGq4HHGDEokLVg9qu1OgXr/YtGvbn1M7GUSW 6dx5rjH3PVsFZF/VsNExxSNUaCW7A5vCoBr7yPz81o/Qg2kFjBK1N/6n2Wc7/PhpPOZ6 UjUxt2ljA3ZzN/RYhTZGM5cQAbQ7vnzy51/uLTRP9W409WdzxGeSVcBkhyb+OHO6QBfq O7roGuopMlN5OHhMY3nrHHTTLToWurSMPNaN51QMy/a4aW0GESEyQh8h4Zry7auLxWO4 aO4OEUSqYDeCaxCRZg4ozSRhUi35JHn/yblrFEoq11F7X/gRc3F8zFNkKz8g7TEVN1nV EIIQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :subject:cc:to:from:date; bh=fNZref2jFp0hqrUQw0zWv6y8+C9oICnQimtYQYYyqMw=; fh=3F73ui0hgPi/YBavfMjBakj4UXr4eM42RorfBszOoLs=; b=yJDvtetOt1oXEUc9VlClrqZb2ckfxpe2XMGEFq4PnQqOCL1S19qsEQQBSDp5TPjEWA fuE2Kgm/0INHkplvjMgjptt1jFmvd1PyLhx++Ag40Kesr5328D8tVQfYprrbXPlaulSb kCXaxIGiiOghS/6voMPvHLiXQSZBLv2v3zOujC9NVJCiuu+C6Syp6F/zzGfHaNqLVw9q 93BzYXD1y1N5wxWaslnrPEKW+nLQFWcLtHi9zqlHHlKmHSdywYvA9elY0slcjip0PjhB VLZEd7m4MvWDKL2xEA9jnHLfV2bx3EIaYmYta+3Ajv0Tzz9uEb9UsIkf+6QjIvCYDCMY rbYA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-139152-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-139152-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id jv1-20020a170907768100b00a4737ad5b74si5784303ejc.544.2024.04.10.11.03.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Apr 2024 11:03:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-139152-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-139152-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-139152-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id D5FB71F24940 for ; Wed, 10 Apr 2024 17:53:43 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A5FC217F38C; Wed, 10 Apr 2024 17:53:37 +0000 (UTC) Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0AFBC17F36D; Wed, 10 Apr 2024 17:53:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712771617; cv=none; b=FWDwrgU8uLD2k+s5bVMjn6VNL94S5p6N3fprp9NmKLoNQ7uP9XHKc0KSb/9dflIGFPH5KWgaIiji+CvupcLRFDAZMGIA/6oPpZVf61xGU5uEve4bM14t3wyMfbBlua0Hk5CLZQyQ+A35NSo1UE7/T09C0zCIgY/ufZtlgYW2wAc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712771617; c=relaxed/simple; bh=3/m6CnlZT8WDYrp1oFmaSDvXJdpwkiKKCHdd+w0rOqs=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DJcJqPvTSak3Z84Mk9BIBxmw3EgBoZc9AhlomTN1ThZkQ6oAiO3R1ju3GORmXkZvgLm2vV+20YqgHc/DQoOxQNPAMKXFc+oW8UDUZDDKgwmAsZ8rN89KLTo2VaOz2o2Urbvu8bUn17tUCVonuwrRMQ9ZNvidEvmSV2NEzcbVgro= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 53D9CC433C7; Wed, 10 Apr 2024 17:53:35 +0000 (UTC) Date: Wed, 10 Apr 2024 13:56:12 -0400 From: Steven Rostedt To: Andrew Morton , "Liam R. Howlett" , Vlastimil Babka , Lorenzo Stoakes Cc: Vincent Donnefort , mhiramat@kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com, kernel-team@android.com, rdunlap@infradead.org Subject: Re: [PATCH v20 0/5] Introducing trace buffer mapping by user-space Message-ID: <20240410135612.5dc362e3@gandalf.local.home> In-Reply-To: <20240406173649.3210836-1-vdonnefort@google.com> References: <20240406173649.3210836-1-vdonnefort@google.com> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Hi Andrew, et.al. Linus said it's a hard requirement that this code gets an Acked-by (or Reviewed-by) from the memory sub-maintainers before he will accept it. He was upset that we faulted in pages one at a time instead of mapping it in one go: https://lore.kernel.org/all/CAHk-=wh5wWeib7+kVHpBVtUn7kx7GGadWqb5mW5FYTdewEfL=w@mail.gmail.com/ Could you take a look at patches 1-3 to make sure they look sane from a memory management point of view? I really want this applied in the next merge window. Thanks! -- Steve On Sat, 6 Apr 2024 18:36:44 +0100 Vincent Donnefort wrote: > The tracing ring-buffers can be stored on disk or sent to network > without any copy via splice. However the later doesn't allow real time > processing of the traces. A solution is to give userspace direct access > to the ring-buffer pages via a mapping. An application can now become a > consumer of the ring-buffer, in a similar fashion to what trace_pipe > offers. > > Support for this new feature can already be found in libtracefs from > version 1.8, when built with EXTRA_CFLAGS=-DFORCE_MMAP_ENABLE. > > Vincent > > v19 -> v20: > * Fix typos in documentation. > * Remove useless mmap open and fault callbacks. > * add mm.h include for vm_insert_pages > > v18 -> v19: > * Use VM_PFNMAP and vm_insert_pages > * Allocate ring-buffer subbufs with __GFP_COMP > * Pad the meta-page with the zero-page to align on the subbuf_order > * Extend the ring-buffer test with mmap() dedicated suite > > v17 -> v18: > * Fix lockdep_assert_held > * Fix spin_lock_init typo > * Fix CONFIG_TRACER_MAX_TRACE typo > > v16 -> v17: > * Documentation and comments improvements. > * Create get/put_snapshot_map() for clearer code. > * Replace kzalloc with kcalloc. > * Fix -ENOMEM handling in rb_alloc_meta_page(). > * Move flush(cpu_buffer->reader_page) behind the reader lock. > * Move all inc/dec of cpu_buffer->mapped behind reader lock and buffer > mutex. (removes READ_ONCE/WRITE_ONCE accesses). > > v15 -> v16: > * Add comment for the dcache flush. > * Remove now unnecessary WRITE_ONCE for the meta-page. > > v14 -> v15: > * Add meta-page and reader-page flush. Intends to fix the mapping > for VIVT and aliasing-VIPT data caches. > * -EPERM on VM_EXEC. > * Fix build warning !CONFIG_TRACER_MAX_TRACE. > > v13 -> v14: > * All cpu_buffer->mapped readers use READ_ONCE (except for swap_cpu) > * on unmap, sync meta-page teardown with the reader_lock instead of > the synchronize_rcu. > * Add a dedicated spinlock for trace_array ->snapshot and ->mapped. > (intends to fix a lockdep issue) > * Add kerneldoc for flags and Reserved fields. > * Add kselftest for snapshot/map mutual exclusion. > > v12 -> v13: > * Swap subbufs_{touched,lost} for Reserved fields. > * Add a flag field in the meta-page. > * Fix CONFIG_TRACER_MAX_TRACE. > * Rebase on top of trace/urgent. > * Add a comment for try_unregister_trigger() > > v11 -> v12: > * Fix code sample mmap bug. > * Add logging in sample code. > * Reset tracer in selftest. > * Add a refcount for the snapshot users. > * Prevent mapping when there are snapshot users and vice versa. > * Refine the meta-page. > * Fix types in the meta-page. > * Collect Reviewed-by. > > v10 -> v11: > * Add Documentation and code sample. > * Add a selftest. > * Move all the update to the meta-page into a single > rb_update_meta_page(). > * rb_update_meta_page() is now called from > ring_buffer_map_get_reader() to fix NOBLOCK callers. > * kerneldoc for struct trace_meta_page. > * Add a patch to zero all the ring-buffer allocations. > > v9 -> v10: > * Refactor rb_update_meta_page() > * In-loop declaration for foreach_subbuf_page() > * Check for cpu_buffer->mapped overflow > > v8 -> v9: > * Fix the unlock path in ring_buffer_map() > * Fix cpu_buffer cast with rb_work_rq->is_cpu_buffer > * Rebase on linux-trace/for-next (3cb3091138ca0921c4569bcf7ffa062519639b6a) > > v7 -> v8: > * Drop the subbufs renaming into bpages > * Use subbuf as a name when relevant > > v6 -> v7: > * Rebase onto lore.kernel.org/lkml/20231215175502.106587604@goodmis.org/ > * Support for subbufs > * Rename subbufs into bpages > > v5 -> v6: > * Rebase on next-20230802. > * (unsigned long) -> (void *) cast for virt_to_page(). > * Add a wait for the GET_READER_PAGE ioctl. > * Move writer fields update (overrun/pages_lost/entries/pages_touched) > in the irq_work. > * Rearrange id in struct buffer_page. > * Rearrange the meta-page. > * ring_buffer_meta_page -> trace_buffer_meta_page. > * Add meta_struct_len into the meta-page. > > v4 -> v5: > * Trivial rebase onto 6.5-rc3 (previously 6.4-rc3) > > v3 -> v4: > * Add to the meta-page: > - pages_lost / pages_read (allow to compute how full is the > ring-buffer) > - read (allow to compute how many entries can be read) > - A reader_page struct. > * Rename ring_buffer_meta_header -> ring_buffer_meta > * Rename ring_buffer_get_reader_page -> ring_buffer_map_get_reader_page > * Properly consume events on ring_buffer_map_get_reader_page() with > rb_advance_reader(). > > v2 -> v3: > * Remove data page list (for non-consuming read) > ** Implies removing order > 0 meta-page > * Add a new meta page field ->read > * Rename ring_buffer_meta_page_header into ring_buffer_meta_header > > v1 -> v2: > * Hide data_pages from the userspace struct > * Fix META_PAGE_MAX_PAGES > * Support for order > 0 meta-page > * Add missing page->mapping. > > Vincent Donnefort (5): > ring-buffer: allocate sub-buffers with __GFP_COMP > ring-buffer: Introducing ring-buffer mapping functions > tracing: Allow user-space mapping of the ring-buffer > Documentation: tracing: Add ring-buffer mapping > ring-buffer/selftest: Add ring-buffer mapping test > > Documentation/trace/index.rst | 1 + > Documentation/trace/ring-buffer-map.rst | 106 +++++ > include/linux/ring_buffer.h | 6 + > include/uapi/linux/trace_mmap.h | 48 +++ > kernel/trace/ring_buffer.c | 403 +++++++++++++++++- > kernel/trace/trace.c | 113 ++++- > kernel/trace/trace.h | 1 + > tools/testing/selftests/ring-buffer/Makefile | 8 + > tools/testing/selftests/ring-buffer/config | 2 + > .../testing/selftests/ring-buffer/map_test.c | 302 +++++++++++++ > 10 files changed, 979 insertions(+), 11 deletions(-) > create mode 100644 Documentation/trace/ring-buffer-map.rst > create mode 100644 include/uapi/linux/trace_mmap.h > create mode 100644 tools/testing/selftests/ring-buffer/Makefile > create mode 100644 tools/testing/selftests/ring-buffer/config > create mode 100644 tools/testing/selftests/ring-buffer/map_test.c > > > base-commit: 7604256cecef34a82333d9f78262d3180f4eb525