Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp3703740pxv; Mon, 26 Jul 2021 09:48:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy5qzz5Syhjzw9XI/TOffKgi1lf36Mw36tZXOen3BmRjrJwByRVt7K+jgmzV1BHiBX7TucE X-Received: by 2002:a6b:c9d3:: with SMTP id z202mr9757310iof.44.1627318094626; Mon, 26 Jul 2021 09:48:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627318094; cv=none; d=google.com; s=arc-20160816; b=B7kxfq9xLj1JQhFvAQyuqj5F4DOT88HLZs2t2XUViLL0lD4CAF9aY+HHCHtgagN1Fh S7COkOND/gcXVmbOnqMhrX7FUDhkUmN2Q3UrW8Wk3M9c+vz/Efq5mrC1rkGUjOIudTaR JnLReHVKARWyzQxfci08AONDnEtQU4rFkV+FPD4IbWCi07HaqSarT8LnU0SuervPnbES /eUDTP229F16MjQsBnR8rWUeJR803Wd24uRHSeEyBDLAfrHJYl+fhXGkil1ep0ZHCEvO dgEK0PqG2kstWMEyNeWSZH7beVJPLHE3QdjXfeWgLfKz7EpGM3wMb7wWEY6kYOgmRaMs rZBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=DlqUXAYss0z4CwOkVfGtreR6OseE/eXEXPezp0fcD5Q=; b=iirA5YIE06AWf0bqsK4XNmtOrjzAcQKdDAcZrWuMtpXWx/WBmWIZ3Kri/sh509MSrK lvx6y4OfGCFhmqp7rvOlDf0e9tjUN9u6/MOU0GRT7EpEXZfbHL2yBUsNEo5Y72bgq8fI SAAylVEpz/yY7M6GMQwGjjyx+UbqxFY6GU8oEy8YuP+diju3Trsth0v9GrNqW8tpNACG fbaM53o4LXFTUUeUaEBWEYzSz8jt1y5WtpWfwbDf67K+nFCvTQQRbcOSoZB7zFZIF9uY 91FhIO43luFh52gBFQR8woXNncvlCiM4csl2dOUYB3EsYaN1/ca1UhNYF8oUeug5GNLc coFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="v/GwQOc/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p29si443929jal.22.2021.07.26.09.48.01; Mon, 26 Jul 2021 09:48:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="v/GwQOc/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239836AbhGZQGb (ORCPT + 99 others); Mon, 26 Jul 2021 12:06:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59390 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240780AbhGZQFA (ORCPT ); Mon, 26 Jul 2021 12:05:00 -0400 Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55C0EC0617A1 for ; Mon, 26 Jul 2021 09:44:41 -0700 (PDT) Received: by mail-lf1-x12d.google.com with SMTP id m13so16566455lfg.13 for ; Mon, 26 Jul 2021 09:44:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DlqUXAYss0z4CwOkVfGtreR6OseE/eXEXPezp0fcD5Q=; b=v/GwQOc/nfVoJOaP+06LI0uq/yjmmZ5T1xMXwDdNwHDrRxLXrNQwJP1e4uwRyy/xrp IOsfYH+cdtu62lwlEx6yb2NTKwdzZFS458KkFOrWnizERkKOK6i0bQGNnwFSRZz/fWGp IUKWriV5wMB6wC/h+yVzUQwFI1DpjFLSud5NrBDizwqsJstnZB+Xp9j+SsylMZ8dAMa0 c5vGk2HKgYounGCCH8vn5pJBitXuNwTgVHaH2TLqdTMi19u/hzVUfmrugGVmb0G267k4 qqIiLUiCwPOwDvDOINBK5Mr9CyRbvnH0wmr+ej6s4ON7UwpM6YzmX8c1kKl6VEmKwpnO IOCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DlqUXAYss0z4CwOkVfGtreR6OseE/eXEXPezp0fcD5Q=; b=cBDjrzRr6xaDda9CApehZmWc7wV2GTaILNmmlsFc4jKOr+d0ccPvpZv6fiidQbe802 hqowy2iH/K6XItRObVBWXuvR60JvDXo5NAsUoxZJTJahHnV6zPJGnT9ccfewbYLdz/ks T2CZvwdLckFjGClRKOoGA57pwoebSIBpc08I8Pya7K4Cbjcg41AjpzYbE6ffQmL37fhW Zj7LmglzBP7oVQFeM1yQgPSuhIs5uE39wmphf34rWHkJPf076LwtRi3utNhhre2cGllY FffgEm6jQ4gokA4eG/cWl8tzFjb1tbZdQ962M+wKQcOef9H57oRlCV8i2fmlgc99gv+O bRkQ== X-Gm-Message-State: AOAM530ijyTQdgodNt92h6oudZNdcb5aFW7pyx+Jotn1dVc+AQx9B5/P uZULTOYyhr9wOC40e+CQaza/qKwOOu9e2/aPwsQhqQ== X-Received: by 2002:a05:6512:3f8:: with SMTP id n24mr6547107lfq.125.1627317879269; Mon, 26 Jul 2021 09:44:39 -0700 (PDT) MIME-Version: 1.0 References: <20210716184719.269033-5-posk@google.com> <2c971806-b8f6-50b9-491f-e1ede4a33579@uwaterloo.ca> <5790661b-869c-68bd-86fa-62f580e84be1@uwaterloo.ca> In-Reply-To: From: Peter Oskolkov Date: Mon, 26 Jul 2021 09:44:27 -0700 Message-ID: Subject: Re: [RFC PATCH 4/4 v0.3] sched/umcg: RFC: implement UMCG syscalls To: Thierry Delisle Cc: Peter Oskolkov , Andrei Vagin , Ben Segall , Jann Horn , Jim Newsome , Joel Fernandes , linux-api@vger.kernel.org, Linux Kernel Mailing List , Ingo Molnar , Peter Zijlstra , Paul Turner , Thomas Gleixner , Peter Buhr Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 23, 2021 at 12:06 PM Thierry Delisle wrote: > > > In my tests reclaimed nodes have their next pointers immediately set > > to point to the list head. If the kernel gets a node with its @next > > pointing to something else, then yes, things break down (the kernel > > kills the process); this has happened occasionally when I had a bug in > > the userspace code. > > I believe that approach is fine for production, but for testing it may > not detect some bugs. For example, it may not detect the race I detail > below. While I think I have the idle servers list working, I now believe that what peterz@ was suggesting is not much slower in the common case (many idle workers; few, if any, idle servers) than having a list of idle servers exposed to the kernel: I think having a single idle server at head, not a list, is enough: when a worker is added to idle workers list, a single idle server at head, if present, can be "popped" and woken; the userspace can maintain the list of idle servers itself; having the kernel wake only one is enough - it will pop all idle workers and decide whether any other servers are needed to process the newly available work. [...] > > Workers are trickier, as they can be woken by signals and then block > > again, but stray signals are so bad here that I'm thinking of actually > > not letting sleeping workers wake on signals. Other than signals > > waking queued/unqueued idle workers, are there any other potential > > races here? > > Timeouts on blocked threads is virtually the same as a signal I think. I > can see that both could lead to attempts at waking workers that are not > blocked. I've got preemption working well enough to warrant a new RFC patchset (also have timeouts done, but these were easy). I'll clean things up, change the idle servers logic to only one idle server exposed to the kernel, not a list, add some additional documentation (state transitions, userspace code snippets, etc.) and will post v0.4 RFC patchset to LKML later this week. [...]