Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp798140rwl; Wed, 29 Mar 2023 08:35:56 -0700 (PDT) X-Google-Smtp-Source: AKy350YwhyEOy6XitSJuCEoKsHM+8Xm3DNEN/Uh0jaL+oVYOd1lcXEMe9qPN9nhS/UBmKwftSvyv X-Received: by 2002:a17:907:d047:b0:945:2f54:5eae with SMTP id vb7-20020a170907d04700b009452f545eaemr9915195ejc.77.1680104156646; Wed, 29 Mar 2023 08:35:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680104156; cv=none; d=google.com; s=arc-20160816; b=k67T2xdfR7iyf2S+u2B4LMDhmLiFi33ORtSjD7tco5CrAt1u+YoU+fsjtrC/6R4AEk D8ISX56bPMOJKdhkzf+tJf1RhqKEUOw7lnXLWWY2M0U0//NIvIKeSy+VWxIuc6gjfiWd g3VJ60Vk+eEW4DG2Eu9a1YRX8/eB3qKw6AaJ1dT49eCQOinL4d76/fmbEIEN4UulGeQY GfSogjxEbOSyrZuOgRTVNE9l7kXlU2zuI3l8KiijXa7BMGlF6iJ5ApDb+LMQbjsm9mYV SeKFm5UDgC3DHGCSUHM6NDue7jLxCFFZftfogy/f6WRZO1W5d1o9Gnqyfz732eFx6IYe BNRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=sw2Ae2AYbutPnvu0M0bOpsRHoGMt1ljvKDIKzHX7oXA=; b=Xiyyt1ia9SyV4IpjX2xG0xTdkq0LDlthWZM0KRmka4g85ksIZUu1eHG6RIu0hAQHdC MT+QJptv1LsYPOPCQiymSqM+qZtVB1JVjlCheul1WXViTtVkQbD35Wi7j5C7tgpSHw22 lfVTPdRtd/26GavSPry6XKuGLbKZmNxdCoSiZk07iz5TjmOzWB2V3BGFtj77O+fECsKW 8VcCUB4eEsDE2bwwFl2oOCp0AknM7PBlyvwBYdjL7W0n507v8QPfW3P2pnYEzyvJznj8 /GzkD/FetJIqbUZ6DutSKt5atPxharWP2cEcnQ8L1fBMd+L7BYKL71QctzqLuDA0xE3Y lCAg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i1-20020a056402054100b004acc613462fsi31974630edx.525.2023.03.29.08.35.01; Wed, 29 Mar 2023 08:35:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229794AbjC2PdT (ORCPT + 99 others); Wed, 29 Mar 2023 11:33:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229603AbjC2PdS (ORCPT ); Wed, 29 Mar 2023 11:33:18 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC1614215; Wed, 29 Mar 2023 08:32:39 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 193CDB8236F; Wed, 29 Mar 2023 15:32:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 18CE3C433EF; Wed, 29 Mar 2023 15:32:36 +0000 (UTC) Date: Wed, 29 Mar 2023 11:32:34 -0400 From: Steven Rostedt To: Vincent Donnefort Cc: mhiramat@kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH v2 1/2] ring-buffer: Introducing ring-buffer mapping functions Message-ID: <20230329113234.3285209c@gandalf.local.home> In-Reply-To: References: <20230322102244.3239740-1-vdonnefort@google.com> <20230322102244.3239740-2-vdonnefort@google.com> <20230328224411.0d69e272@gandalf.local.home> <20230329070353.1e1b443b@gandalf.local.home> <20230329085106.046a8991@rorschach.local.home> <20230329091107.408d63a8@rorschach.local.home> <20230329093602.2b3243f0@rorschach.local.home> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 29 Mar 2023 14:55:41 +0100 Vincent Donnefort wrote: > > Yes, in fact it shouldn't need to call the ioctl until after it read it. > > > > Maybe, we should have the ioctl take a parameter of how much was read? > > To prevent races? > > Races would only be with other consuming readers. In that case we'd probably > have many other problems anyway as I suppose nothing would prevent another one > of swapping the page while our userspace reader is still processing it? I'm not worried about user space readers. I'm worried about writers, as the ioctl will update the reader_page->read = reader_page->commit. The time that the reader last read and stopped and then called the ioctl, a writer could fill the page, then the ioctl may even swap the page. By passing in the read amount, the ioctl will know if it needs to keep the same page or not. > > I don't know if this is worth splitting the ABI between the meta-page and the > ioctl parameters for this? > > Or maybe we should say the meta-page contains things modified by the writer and > parameters modified by the reader are passed by the get_reader_page ioctl i.e. > the reader page ID and cpu_buffer->reader_page->read? (for the hyp tracing, we > have up to 4 registers for the HVC which would replace in our case the ioctl) I don't think we need the reader_page id, as that should never move without reader involvement. If there's more than one reader, that's up to the readers to keep track of each other, not the kernel. Which BTW, the more I look at doing this without ioctls, I think we may need to update things slightly different. I would keep the current approach, but for clarification of terminology, we have: meta_data - the data that holds information that is shared between user and kernel space. data_pages - this is a separate mapping that holds the mapped ring buffer pages. In user space, this is one contiguous array and also holds the reader page. data_index - This is an array of what the writer sees. It maps the index into data_pages[] of where to find the mapped pages. It does not contain the reader page. We currently map this with the meta_data, but that's not a requirement (although we may continue to do so). I'm thinking that we make the data_index[] elements into a structure: struct trace_map_data_index { int idx; /* index into data_pages[] */ int cnt; /* counter updated by writer */ }; The cnt is initialized to zero when initially mapped. Instead of having the bpage->id = index into data_pages[], have it equal the index into data_index[]. The cpu_buffer->reader_page->id = -1; meta_data->reader_page = index into data_pages[] of reader page The swapping of the header page would look something like this: static inline void rb_meta_page_head_swap(struct ring_buffer_per_cpu *cpu_buffer) { struct ring_buffer_meta_page *meta = cpu_buffer->meta_page; int head_page; if (!READ_ONCE(cpu_buffer->mapped)) return; head_page = meta->data_pages[meta->hdr.data_page_head]; meta->data_pages[meta->hdr.data_page_head] = meta->hdr.reader_page; meta->hdr.reader_page = head_page; meta->data_pages[head_page]->id = -1; } As hdr.data_page_head would be an index into data_index[] and not data_pages[]. The fact that bpage->id points to the data_index[] and not the data_pages[] means that the writer can easily get to that index, and modify the count. That way, in rb_tail_page_update() (between cmpxchgs) we can do something like: if (cpu_buffer->mapped) { meta = cpu_buffer->meta_page; meta->data_index[next_page->id].cnt++; } And this will allow the reader to know if the current page it is on just got overwritten by the writer, by doing: prev_id = meta->data_index[this_page].cnt; smp_rmb(); read event (copy it, whatever) smp_rmb(); if (prev_id != meta->data_index[this_page].cnt) /* read data may be corrupted, abort it */ Does this make sense? -- Steve