Received: by 2002:a05:6358:795:b0:dc:4c66:fc3e with SMTP id n21csp1684861rwj; Sun, 30 Oct 2022 05:00:48 -0700 (PDT) X-Google-Smtp-Source: AMsMyM76bbRbBrzfGpTe78RQUbaEu4vWofgD5GLWIglXuRaZbAfrio1TFDzmhpO5QdsEAHEJiHJs X-Received: by 2002:a17:907:a053:b0:7ad:4a55:4ce0 with SMTP id gz19-20020a170907a05300b007ad4a554ce0mr7558091ejc.541.1667131247883; Sun, 30 Oct 2022 05:00:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667131247; cv=none; d=google.com; s=arc-20160816; b=kd/MlrNwlR/gYdJFnce6uSW4RICnpqGZPZ0Ns9xdgbBk2B2hUzEUYwWi2piD+Lkkm/ TaClv5js/NjwzGMUOQ0osSH3AX4ShxhS1kvaOGT/FfSoz5QPxKq3vmSws8sLVY0O5X8P gHLlzqEHkxBFNgBR5aNVcMfRGXMImC7BZXO3O6uyRBKGL3QiZoK40iOQsPjKcTTXUidl GsGVvSNYFUDxoNtZ2xkbKGzqnrE/BTsStL32NaS0kynCxLXov+exP2A06W/RdrzGVCzy VLaiOqGr67Wc3n1pkWzA/EsSFqss/8BqpoTugCNS5HcCbf08EKLJJNTfpUJuATdnqMYv uxLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:user-agent:mime-version:date :message-id:dkim-signature; bh=by4aPSWwqAsh8nUTsMZ9TAXtUDX14naKNqFGb9pSybY=; b=ndyfDJJapWt/9u4Q2WAJ7fKZKYxOPa+/QO2KrqfsTT9Bc/UpisYlC2cz75deH+WToh O1iJpYDBIJVrKZSJTnsZjcqk8HEISd7vn/u+DBjTBoc2wqnLspXOO4MlvfMJA5OkMI04 ns5VfOKOq6/+tKf6bNWf8eua2IBGSF8haq5iHdNIW/yiAiyp8w5NjGEscgdC7ATcgDbk ezgmRgvi5Wuj/Eyjq42S8swIKR+ewvC8JLCjXbW8abNUgqcxOUiRVDJInhWrgsugqJlh tZZ5YZSLSZy20+m1X6NViMYeKwteYcobiqBukwNdIuLfCU5yAM5CfYt/onzLIJrEUaTU x6Ow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=m05x1tg3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d25-20020a056402079900b0046137ed86fasi3988567edy.138.2022.10.30.05.00.23; Sun, 30 Oct 2022 05:00:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=m05x1tg3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229783AbiJ3Lpj (ORCPT + 99 others); Sun, 30 Oct 2022 07:45:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229497AbiJ3Lpi (ORCPT ); Sun, 30 Oct 2022 07:45:38 -0400 Received: from smtpout.efficios.com (smtpout.efficios.com [IPv6:2607:5300:203:5aae::31e5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87F12247; Sun, 30 Oct 2022 04:45:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1667130333; bh=FHitCiRYLg3puiEvplBQIr9bBOi1r3xMt4XKwVmIP8g=; h=Date:Subject:From:To:Cc:References:In-Reply-To:From; b=m05x1tg3pEuo/oPHR9i9LBvx/H9XPzFxZuoXHoykJODzLaTiN9jxTYa8gNr3y+IZx 1o+FrV56VZC5w2/Kk3oeW4Ym/6efOWNl8k+TVisujboc3dl7/tBypOFQhAnpEmXbq7 byU/30qF5O81i2DGSkOZzZxnRBEQ1ctUDUb/aIwkseZrHtCEZGeDaL0lC2UniEgGeD T2vMSdK1nIMmQRqmglhusvu0DSHAJeoR3jraeDZ+2QB7fQrAOgRDo5kpmiL5dwk/R0 NaHJw73t05wOS+S0cFpoFh+CS3y3gdYJ04KcCMDEh8mGgF+rBFq1krW6PJKZxVjAoy QCfPIirceTzZA== Received: from [10.1.0.216] (192-222-188-69.qc.cable.ebox.net [192.222.188.69]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4N0ZGj4MxQzZ6W; Sun, 30 Oct 2022 07:45:33 -0400 (EDT) Message-ID: Date: Sun, 30 Oct 2022 07:45:33 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.0 Subject: Re: [RFC PATCH 2/2] tracing/user_events: Fixup enable faults asyncly Content-Language: en-US From: Mathieu Desnoyers To: Beau Belgrave Cc: rostedt@goodmis.org, mhiramat@kernel.org, dcook@linux.microsoft.com, alanau@linux.microsoft.com, linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org References: <20221027224011.2075-1-beaub@linux.microsoft.com> <20221027224011.2075-3-beaub@linux.microsoft.com> <20221028224256.GA202@W11-BEAU-MD.localdomain> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022-10-29 10:40, Mathieu Desnoyers wrote: > On 2022-10-28 18:42, Beau Belgrave wrote: >> On Fri, Oct 28, 2022 at 06:19:10PM -0400, Mathieu Desnoyers wrote: >>> On 2022-10-27 18:40, Beau Belgrave wrote: >>>> When events are enabled within the various tracing facilities, such as >>>> ftrace/perf, the event_mutex is held. As events are enabled pages are >>>> accessed. We do not want page faults to occur under this lock. Instead >>>> queue the fault to a workqueue to be handled in a process context safe >>>> way without the lock. >>>> >>>> The enable address is disabled while the async fault-in occurs. This >>>> ensures that we don't attempt fault-in more than is necessary. Once the >>>> page has been faulted in, the address write is attempted again. If the >>>> page couldn't fault-in, then we wait until the next time the event is >>>> enabled to prevent any potential infinite loops. >>> >>> I'm also unclear about how the system call initiating the enabled state >>> change is delayed (or not) when a page fault is queued. >>> >> >> It's not, if needed we could call schedule_delayed_work(). However, I >> don't think we need it. When pin_user_pages_remote is invoked, it's with >> FOLL_NOFAULT. This will tell us if we need to fault pages in, we then >> call fixup_user_fault with unlocked value passed. This will cause the >> fixup to retry and get the page in. >> >> It's called out in the comments for this exact purpose (lucked out >> here): >> mm/gup.c >>   * This is meant to be called in the specific scenario where for >> locking reasons >>   * we try to access user memory in atomic context (within a >> pagefault_disable() >>   * section), this returns -EFAULT, and we want to resolve the user >> fault before >>   * trying again. >> >> The fault in happens in a workqueue, this is the same way KVM does it's >> async page fault logic, so it's not a new pattern. As soon as the >> fault-in is done, we have a page we should be able to use, so we >> re-attempt the write immediately. If the write fails, another queue >> happens and we could loop, but not like the unmap() case I replied with >> earlier. >> >>> I would expect that when a page fault is needed, after enqueuing work >>> to the >>> worker thread, the system call initiating the state change would somehow >>> wait for a completion (after releasing the user events mutex). That >>> completion would be signaled by the worker thread either if the page >>> fault >>> fails, or if the state change is done. >>> >> >> I didn't see a generic fault-in + notify anywhere, do you know of one I >> could use? Otherwise, it seems the pattern used is check fault, fault-in >> via workqueue, re-attempt. > > I don't think we're talking about the same thing. I'll try stating my > concern in a different way. > > user_event_enabler_write() calls user_event_enabler_queue_fault() when > it cannot perform the enabled state update immediately (because a page > fault is needed). > > But then AFAIU it returns immediately to the caller. The caller could > very well expect that the event has been enabled, as requested, > immediately when the enabler write returns. The fact that enabling the > event can be delayed for an arbitrary amount of time due to page faults > means that we have no hard guarantee that the event is enabled as > requested upon return to the caller. > > I'd like to add a completion there, to be waited for after > user_event_enabler_queue_fault(), but before returning from > user_event_enabler_create(). Waiting for the completion should be done > without any mutexes held, so after releasing event_mutex. > > The completion would be placed on the stack of > user_event_enabler_create(), and a reference to the completion would be > placed in the queued fault request. The completion notification would be > emitted by the worker thread either when enabling is done, or if a page > fault fails. > > See include/linux/completion.h > > Thoughts ? Actually, after further thinking, I wonder if we need a worker thread at all to perform the page faults. Could we simply handle the page faults from user_event_enabler_create() by releasing the mutex and re-trying ? From what contexts is user_event_enabler_create() called ? (any locks taken ? system call context ? irqs and preemption enabled or disabled ?) Thanks, Mathieu > > Thanks, > > Mathieu > > >> >>> Thoughts ? >>> >>> Thanks, >>> >>> Mathieu >>> >> >> Thanks, >> -Beau > -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com