Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp2328584lqz; Tue, 2 Apr 2024 14:04:42 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXQB9HwN6V9eIcvtraqANJQttlW6uhaxXSea1sH7owRQklSPuKQkuK5kZTw+zYj8/AX+XMz/9YGyaV/Cv51zx3mO736JpTP84cgVE174A== X-Google-Smtp-Source: AGHT+IEZ+dRCiYk05kgXp4G0S5IVd2U6xUoBwUIaPCvEtyL2e2Unu07GguZbCwlMg+B4xhXoSZAF X-Received: by 2002:a05:6a20:8417:b0:1a5:6abb:7503 with SMTP id c23-20020a056a20841700b001a56abb7503mr13247006pzd.49.1712091882645; Tue, 02 Apr 2024 14:04:42 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712091882; cv=pass; d=google.com; s=arc-20160816; b=XlKWXnZJYoMYNg174llw2OmmINweBBjxx2Kzi2pzd7Z6TbXRTnzFL8TXIfQYEvdpNy MplIQa4OzdQaX+3SB72QHV1XrIvBZgQovKcJ5grreIOsn/in5MutwOh9nXBpMyQHhpVB tS+S4PIRUHkoQgDTHaLwWNgnjpBrZBqdMid8MySoOBp3yLA7j1eic0Y794hko2HT2j9J DtPp9bnDl+ec98TT4MKD3Q5jpIvSRETadTr0gd7np5qD6hYjbnyP8tsiP/H8J09METDR dNgPMJU011MzTYruBq2drBQ78v2mR4A7R6L6LTKiqTwgrIg3FjDysv2Pa7ShPbnPxM4B aZmQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:subject:organization:from:to :content-language:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=8z8DVTWbzdy428zzR60eZFLx1WPjP9TLpB43SNc5P/A=; fh=esB4eN6OcipyZjujNbzHK+0CC3jB3KBK/2lybo1TQuU=; b=N7BtlAGjFTOI4EH3PNexj70lRo+exYX67X96yjvYIh+VNGJIhw+gVk7R144IyLZNmP LmU5RnbDsqcEyZjHPR+Tj3jitATCXQZzWW8drAT1REU58Z64w97yE/43dbVpwDj675oi kFbllleSl2spLYDddMi6h9Q4pJqCbbpNAC7tJjIgi4fIGiiES+YBa/X+Wo1y8OM9FaJu SE6cr13RFBwLjBb1exKZT8WmkbgFb1cyJqldC+D8bb0JKJYb5kKvLN8Bfn5tArYeC9FP qKCj6R2TV1ZUjekOBsJT+M3VpB6Z9texhzwVgGJsVoumtf6cBLvspvxapoQuGIEQJG0R ReTA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=fail header.i=@metaparadigm.com header.s=100043 header.b=PUn6dMCQ; arc=pass (i=1 spf=pass spfdomain=metaparadigm.com dmarc=pass fromdomain=metaparadigm.com); spf=pass (google.com: domain of linux-kernel+bounces-128684-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-128684-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=metaparadigm.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id h12-20020a170902680c00b001e0a14c3b79si11292029plk.609.2024.04.02.14.04.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Apr 2024 14:04:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-128684-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=fail header.i=@metaparadigm.com header.s=100043 header.b=PUn6dMCQ; arc=pass (i=1 spf=pass spfdomain=metaparadigm.com dmarc=pass fromdomain=metaparadigm.com); spf=pass (google.com: domain of linux-kernel+bounces-128684-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-128684-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=metaparadigm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 454A228C3B8 for ; Tue, 2 Apr 2024 20:56:03 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 98D6615E5A4; Tue, 2 Apr 2024 20:54:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=metaparadigm.com header.i=@metaparadigm.com header.b="PUn6dMCQ" Received: from anarch128.org (anarch128.org [23.253.174.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 735E915E1ED for ; Tue, 2 Apr 2024 20:54:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=23.253.174.110 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712091277; cv=none; b=GZlLwuqm75Y/JSfPQfZOFFGookZX//T4R0Y+W1DRPL/pIa0cOoK/S16CAfAVl0G+1owdaudlq2cdh6pgmoD6GjVg44ceefyO5QCIEgZhq2oBzEpnoHydk2OC06dK+B/D7y7v5mSk6sn3iyDhiYbO0cL6VNSEPj+RfWzkpl6h9oE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712091277; c=relaxed/simple; bh=ItRqQxRVa/m81QrwLlyILDa0UEZoouUYVtvYXJrTq84=; h=Message-ID:Date:MIME-Version:To:From:Subject:Content-Type; b=W7XWjEscTKYWIUCAJHtBmi9JwYMQeOA3aJzExQLh/F9/GGfbKg+j1jmSLaT7IkFyqc/VYb4X1zJiO/vwhI246pEPfR487ezj8Y0Qzjur0xLW/x8wj3d4hbYvzwI2wCLC7NrGlcfqD56HSyOXwJBtNTmQbM1/d49fCb8vqGkVo/I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=metaparadigm.com; spf=pass smtp.mailfrom=metaparadigm.com; dkim=fail (2048-bit key) header.d=metaparadigm.com header.i=@metaparadigm.com header.b=PUn6dMCQ reason="signature verification failed"; arc=none smtp.client-ip=23.253.174.110 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=metaparadigm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=metaparadigm.com Received: from [192.168.1.5] (default-rdns.vocus.co.nz [202.150.110.104] (may be forged)) (authenticated bits=0) by anarch128.org (8.15.2/8.15.2/Debian-22) with ESMTPSA id 432Kres7305610 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Tue, 2 Apr 2024 20:53:43 GMT Authentication-Results: anarch128.org; auth=pass; dkim=pass (2048-bit rsa key sha256) header.d=metaparadigm.com header.i=@metaparadigm.com header.b=PUn6dMCQ header.a=rsa-sha256 header.s=100043; x-return-mx=pass header.domain=metaparadigm.com policy.is_org=yes (MX Records found: mail.anarch128.org); x-return-mx=pass smtp.domain=metaparadigm.com policy.is_org=yes (MX Records found: mail.anarch128.org) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=metaparadigm.com; s=100043; t=1712091227; bh=ItRqQxRVa/m81QrwLlyILDa0UEZoouUYVtvYXJrTq84=; h=Date:To:From:Subject:From; b=PUn6dMCQn2mU3DsTfuPGuqL0pJlpQS1McPZeld8JHbMmpePBaX1Ia4/f6ip9/dN4c XLawcUrgIcUxcuo1cxOnNgx90VzjYCtTEljXbFWEKdevozx9YSV7HO6IeKnnoBpWs3 woQSgtLeCQb6MnYilyrynw7QbcXx6aEC4ddbD53+1c7bkScKLlaTeEeSq7aZR+nSfB 2VfeG3QFpiVoTB4B9eEXklMfbva/ul65XMRGJ1e06mihWRyzeWP3HNjEk3eowRz2nq rYdtqdxig/Y+BPAtF9vEBNEcUAQFPHD6upDD0p2LT+hedB/x1jCyifPmWkf87qOc3a IZPEhqKEzyPvg== Message-ID: <969ccc0f-d909-4b45-908e-e98279777733@metaparadigm.com> Date: Wed, 3 Apr 2024 09:53:34 +1300 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Linus Torvalds , Jens Axboe , Ingo Molnar , Peter Zijlstra , linux-kernel@vger.kernel.org From: Michael Clark Organization: Metaparadigm Subject: user-space concurrent pipe buffer scheduler interactions Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Folks, I am working on a low latency cross-platform concurrent pipe buffer using C11 threads and atomics. It is portable code using a polyfill on Windows that wraps the intrinsics that Microsoft provides. There is a detailed write up with implementation details, source code, tests and benchmark results in the URL here: - https://github.com/michaeljclark/cpipe/ I have been eagerly following the work of Jens on io_uring which is why I am including him as he may be interested in these scheduler findings, because I am currently using busy memory polling for synchronization. The reason why I am writing here, is that I think I now have a pretty decent test case to test the Windows and Linux schedulers side-by-side. Let's just say it has been an eye opening process and I think folks here might be interested in what I am seeing and what we could predict should happen based on Amdahl's Law and low-level cache ping-pong on atomics. Let me cut to the chase. What I am observing is a situation where when I add threads on Windows, performance increases, but when I add threads on Linux, performance decreases. I don't know exactly why. I am wondering if Windows is doing some topologically affine scheduling? or if it is using performance counters to intuit scheduling decisions? I have checked the codegen and it is basically two LOCK CMPXCHG instructions. I ran bare metal tests on Kaby Lake and Skylake processors on both OSes: - `Windows 11 Version 23H2 Build 22631.3296` - `Linux 6.5.0-25-generic #25~22.04.1-Ubuntu` In any case, here are numbers. I will let them speak for themselves: # Minimum Latency (nanoseconds) | | cpipe win11 | cpipe linux | linux pipes | |:---------------------|------------:|------------:|------------:| | Kaby Lake (i7-8550U) | ~219ns | ~362ns | ~7692ns | | Skylake (i9-7980XE) | ~404ns | ~425ns | ~9183ns | # Message Rate (messages per second) | | cpipe win11 | cpipe linux | linux pipes | |:---------------------|------------:|------------:|------------:| | Kaby Lake (i7-8550U) | 4.55M | 2.71M | 129.62K | | Skylake (i9-7980XE) | 2.47M | 2.35M | 108.89K | # Bandwidth 32KB buffer (1-thread) | | cpipe win11 | cpipe linux | linux pipes | |:---------------------|------------:|------------:|------------:| | Kaby Lake (i7-8550U) | 2.91GB/sec | 1.36GB/sec | 1.72GB/sec | | Skylake (i9-7980XE) | 2.98GB/sec | 1.44GB/sec | 1.67GB/sec | # Bandwidth 32KB buffer (4-threads) | | cpipe win11 | cpipe linux | |:---------------------|------------:|------------:| | Kaby Lake (i7-8550U) | 5.56GB/sec | 0.79GB/sec | | Skylake (i9-7980XE) | 7.11GB/sec | 0.89GB/sec | I think we have a very useful test case here for the Linux scheduler. I have been working on a generalization of memory polled user-space queue and this is about the 5th iteration where I have been very careful about modulo arithmetic and overflow as the normal case. I know it is a little unfair to compare latency with Linux pipes and also we waste a lot of time spinning on queue full. This is where we would really like to use something like SENDUIPI, UMONITOR and UMWAIT but I don't have access to silicon that supports those yet. Regards, Michael Clark