Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp3322512pxv; Mon, 12 Jul 2021 14:46:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyZM8+1xaLDX1PGzA6U6V1QnUNKUhBKUIBPl6uMOKBNFPboRL+k2uuyKZ9kxG9qByqIn5YR X-Received: by 2002:a17:906:cb8b:: with SMTP id mf11mr1289182ejb.297.1626126415032; Mon, 12 Jul 2021 14:46:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626126415; cv=none; d=google.com; s=arc-20160816; b=Nhx0q6z7+qtlpsiYXb9+c6iQ29Fv8XgJbtb6i9dKzn81CSKRa8AkVeVv6OgVnQh2k5 t/Rw0SNDW8IMJOXWuwnFZB4MBE5AfOFLY/OqvOCWAQWpUhcAskStHZVjMZdb50S6qs+S q+s9zg8HItzRceDwvLLBmSsUZngCvW0NrPX4/lvekuXGQ3B8xsgfu9UH9iXi7G78uDzt 1rPOfeQVfpijK8OmvQE8lW7Hr6kA7ZlHW3BSFkDDGJNRCxuxDwRRMHKhdBa30V453dGX D+d4z7kddxdATmL2EwzlHem6N41ECpuo21cGoT/5Oym1VTjywNqkZdGy7Xxzk/T7pQ22 jeUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :mime-version:user-agent:date:message-id:from:in-reply-to:subject :references:cc:to:dkim-signature; bh=hbF2KIvU/v3acJPYeyjKu/RfW4HR2y/I7d6WcxeKXzc=; b=kDTzUuE4U9tSO/GNxguhgpWcGGBcjaujfi+LfgpWQSttz4Oubpn81Ttvkz6XNHLZnC EaPXma3Q6DWz1r1mqadcudj7e9JUhORxwxaT6j8s2FykKPwqyZ1aSVavEfMufN0aee86 +Zxj1Gh8k4cTfydiF/lsY5dRT7NOFjd4OwMWWENss3tup4t7mkIw2zgj1TKGxkr5NEQq tIigrdS51LlkfpYFsoEvtSwH8jcC79egh4JDnDmyzbtZlJTqX9T5OmXeoIPJzxVKl34s P9M6GhjjKNrM4lyYIYMBSpoerPy/TS0NcSewzSYexXnLvxFV05AusYN7RXb64tAv2DJN YeXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@uwaterloo.ca header.s=default header.b="C30j/v9C"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=uwaterloo.ca Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f2si19371875edl.338.2021.07.12.14.46.27; Mon, 12 Jul 2021 14:46:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@uwaterloo.ca header.s=default header.b="C30j/v9C"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=uwaterloo.ca Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234538AbhGLVrM (ORCPT + 99 others); Mon, 12 Jul 2021 17:47:12 -0400 Received: from esa.hc503-62.ca.iphmx.com ([216.71.131.47]:55736 "EHLO esa.hc503-62.ca.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229842AbhGLVrM (ORCPT ); Mon, 12 Jul 2021 17:47:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=uwaterloo.ca; i=@uwaterloo.ca; q=dns/txt; s=default; t=1626126263; x=1657662263; h=to:cc:references:subject:in-reply-to:from:message-id: date:mime-version:content-transfer-encoding; bh=seij5sVjCpiZ5ThWMz7niqAvKuxJMkBdQqYc7ftubO4=; b=C30j/v9CzCtrUJvxKP1sQN4vMa3ayY0szBvfzDXZSDhnWJaB33cA8AbH QIpDfEZ0M8OAmFde95AnFao8E4Z1D5TM3qt+iwX0b5OsY0Jac21hceMrA HshPYpdguDOlC6G1gqi77wGst92/QaN4+/DUhn2Dd1tRrfk1phA+V/AMQ k=; Received: from connect.uwaterloo.ca (HELO connhm04.connect.uwaterloo.ca) ([129.97.208.43]) by ob1.hc503-62.ca.iphmx.com with ESMTP/TLS/AES256-GCM-SHA384; 12 Jul 2021 17:44:19 -0400 Received: from [10.42.0.123] (10.32.139.159) by connhm04.connect.uwaterloo.ca (172.16.137.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Mon, 12 Jul 2021 17:44:18 -0400 To: CC: , , , , , , , , , , , , , , References: Subject: Re: [RFC PATCH 3/3 v0.2] sched/umcg: RFC: implement UMCG syscalls In-Reply-To: From: Thierry Delisle Message-ID: Date: Mon, 12 Jul 2021 17:44:18 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Originating-IP: [10.32.139.159] X-ClientProxiedBy: connhm04.connect.uwaterloo.ca (172.16.137.68) To connhm04.connect.uwaterloo.ca (172.16.137.68) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > sys_umcg_wait without next_tid puts the task in UMCG_IDLE state; wake > wakes it. These are standard sched operations. If they are emulated > via futexes, fast context switching will require something like > FUTEX_SWAP that was NACKed last year. I understand these wait and wake semantics and the need for the fast context-switch(swap). As I see it, you need 3 operations: - SWAP: context-switch directly to a different thread, no scheduler involved - WAIT: block current thread, go back to server thread - WAKE: unblock target thread, add it to scheduler, e.g. through         idle_workers_ptr There is no existing syscalls to handle SWAP, so I agree sys_umcg_wait is needed for this to work. However, there already exists sys_futex to handle WAIT and WAKE. When a worker calls either sys_futex WAIT or sys_umcg_wait next_tid == NULL, in both case the worker will block, SWAP to the server and wait for FUTEX_WAKE, UMCG_WAIT_WAKE_ONLY respectively. It's not obvious to me that there would be performance difference and the semantics seem to be the same to me. So what I am asking is: is UMCG_WAIT_WAKE_ONLY needed? Is the idea to support workers directly context-switching among each other, without involving server threads and without going through idle_servers_ptr? If so, can you explain some of the intended state transitions in this case. > > However, I do not understand how the userspace is expected to use it. I also > > do not understand if these link fields form a stack or a queue and where is > > the head. > > When a server has nothing to do (no work to run), it is put into IDLE > state and added to the list. The kernel wakes an IDLE server if a > blocked worker unblocks. From the code in umcg_wq_worker_running (Step 3), I am guessing users are expected to provide a global head somewhere in memory and umcg_task.idle_servers_ptr points to the head of the list for all workers. Servers are then added in user space using atomic_stack_push_user. Is this correct? I did not find any documentation on the list head. I like the idea that each worker thread points to a given list, it allows the possibility for separate containers with their own independent servers, workers and scheduling. However, it seems that the list itself could be implemented using existing kernel APIs, for example a futex or an event fd. Like so: struct umcg_task {      [...]      /**       * @idle_futex_ptr: pointer to a futex user for idle server threads.       *       * When waking a worker, the kernel decrements the pointed to futex value       * if it is non-zero and wakes a server if the decrement occurred.       *       * Server threads that have no work to do should increment the futex       * value and FUTEX_WAIT       */      uint64_t    idle_futex_ptr;    /* r/w */      [...] } __attribute__((packed, aligned(8 * sizeof(__u64)))); I believe the futex approach, like the list, has the advantage that when there are no idle servers, checking the list requires no locking. I don't know if that can be achieved with eventfd.