Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp2440216pxk; Mon, 14 Sep 2020 13:27:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzhnnx+KZTDpaLAnNHeop/WC84G3zIiehY2ioSUdpcHJw1aowkVor2ZdTKk4q5mHMSpFLhi X-Received: by 2002:a17:906:c2d2:: with SMTP id ch18mr17407352ejb.79.1600115258037; Mon, 14 Sep 2020 13:27:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600115258; cv=none; d=google.com; s=arc-20160816; b=eB+09v/oZO5o1kKVg2+R19vfmtPy5BHd6FElM2mxf+bhUcMAEvUstMs4s+F4jpsrfe /+MLvPd4VTw+uh9vDUxqodPiRipCPOd+alIN9pQPSeT/adiDbC+h96sn7pcaCKpw2LCY jEGPNsm9t4mIy6/kyqaJg7++jDCi2Kj+ea3+zoWS/2NqISdr647EPwyyTB0PPAg38cWT ymKZKJMi3iLFIj5zvkEDBwmUhUOQ9LpDktR3gL1cZI9Nfe5Rbw6rL1wdykXXg/mpCulw vpjW0VKPLgGkaHt9M0IjS3oF/rIlW99T58SPH7xATrpBOksFStPCfT7tv9S5rp8GA+4X VlBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=K7MtII+9UVBbc4MWSUMa9tMOVUjvppbzTQKND+2IRGY=; b=Wi7DnI+0671Nd0UnhfhPogq2g+jdy6n7C2Imo02TKgBAN4KKr7NkORY8qWHF90+xO9 zF/ZPdB7xsOKjePovuyrT4tOtbKaLLagQ/O6Q2+w8VpF1yNAm+3Pont5Odk3T19lBWGJ JrMwHCRgYUE5mhl91OUYAEArjrymla9ZWbvfARhIdPcEjhS74hpO8uab2dvSbiD/cckx jEj6MXeJen9y0Z8N36JECu5sja/XXexWDA1mX1RXpB5fO1R+0bcwTFgZ0nf8jklKeb7x jUxK+lySPeUivw6pyd78XkUYvquZdEBlT3MhC1hJoSA1kUC22tMoR6vHhQrhffdL9r/V xUFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@tessares-net.20150623.gappssmtp.com header.s=20150623 header.b=QMWTsqox; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q24si7850895eju.323.2020.09.14.13.27.06; Mon, 14 Sep 2020 13:27:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@tessares-net.20150623.gappssmtp.com header.s=20150623 header.b=QMWTsqox; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726004AbgINUXS (ORCPT + 99 others); Mon, 14 Sep 2020 16:23:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726205AbgINUVs (ORCPT ); Mon, 14 Sep 2020 16:21:48 -0400 Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA971C06178A for ; Mon, 14 Sep 2020 13:21:46 -0700 (PDT) Received: by mail-wr1-x442.google.com with SMTP id t10so1014042wrv.1 for ; Mon, 14 Sep 2020 13:21:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tessares-net.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=K7MtII+9UVBbc4MWSUMa9tMOVUjvppbzTQKND+2IRGY=; b=QMWTsqoxxDOoK8xdKVyHbn5nYOZldwgSP/AMuFtHHBeeKuGwTEYYliJNqyFdLtf6nL i/4KIpY+lIamkxU25m3O3rTroHQIuwQDdEnXpUwP8I91UsOy4Kc1ZnjH0/jkdKp//Uao dkKdZVD7nN5Qnywd7o5oWmkaiXc7zuOhM7Lizfk34mV0HnZcBpsunsO75E9csAAFXg7j TfP0CVINRdmOR/nyYYcs4lXvD/ljjqYXon3ASE47oi1te8xxF62vHq7kFPW5cAsA45Cq Hd6GbAoQu1PtujKO+aiLI3v72/E7PwUBcw3FTju6W86VPFXMZTF0WgoqABrlKvT8EPAN yklQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=K7MtII+9UVBbc4MWSUMa9tMOVUjvppbzTQKND+2IRGY=; b=Rn95iEpVCCBski1U6zZqCw7+oliuJhs2uaxt8wugFgZIK3BHRhwB25DekOcjs6W6en C5LDQNzw1eBP1G2MhDYYzval90oQ8aRyCJVZRwh250FFdxyMTPQqdTFOC4l5W9wvPZGf qH5nMNAiN27Qqb7HCZAFeYY1fHIYwLwsgc7gJYyJr3qNqSej4/WRx4PTOeGf1tU1gYxd gA5RuQKrbBb2Il5/+dgv8bswzwD4TXhVFLGF8TzthXsKOerm2zSuDe6Vr942zAFqpbDV 15l2ECTA8B2Parfm6ob1alkdD9XVwd6L4WUnAUYmh5ZPLiKxn1yGj3UOHHno/Xf3Lxxu Kobw== X-Gm-Message-State: AOAM530rS99heiKT3TTcOeUHtK2W8IIFtHeKgV4M7TsaC+tWWpl1quuA 1CuLUuk/WHn1BC45BRJPVXdo9Ky3KM5Og6P6 X-Received: by 2002:a5d:5306:: with SMTP id e6mr18276903wrv.156.1600114904387; Mon, 14 Sep 2020 13:21:44 -0700 (PDT) Received: from tsr-lap-08.nix.tessares.net ([2a02:578:85b0:e00:21bd:4887:6b93:136b]) by smtp.gmail.com with ESMTPSA id m185sm21851166wmf.5.2020.09.14.13.21.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 14 Sep 2020 13:21:43 -0700 (PDT) Subject: Re: Kernel Benchmarking To: Linus Torvalds , Michael Larabel Cc: Matthew Wilcox , Amir Goldstein , Ted Ts'o , Andreas Dilger , Ext4 Developers List , Jan Kara , linux-fsdevel References: <8bb582d2-2841-94eb-8862-91d1225d5ebc@MichaelLarabel.com> <0cbc959e-1b8d-8d7e-1dc6-672cf5b3899a@MichaelLarabel.com> <0daf6ae6-422c-dd46-f85a-e83f6e1d1113@MichaelLarabel.com> <20200912143704.GB6583@casper.infradead.org> <658ae026-32d9-0a25-5a59-9c510d6898d5@MichaelLarabel.com> From: Matthieu Baerts Message-ID: Date: Mon, 14 Sep 2020 22:21:42 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hello everyone, On 14/09/2020 19:47, Linus Torvalds wrote: > Michael et al, > Ok, I redid my failed "hybrid mode" patch from scratch (original > patch never sent out, I never got it to a working point). > > Having learnt from my mistake, this time instead of trying to mix the > old and the new code, instead I just extended the new code, and wrote > a _lot_ of comments about it. > > I also made it configurable, using a "page_lock_unfairness" knob, > which this patch defaults to 1000 (which is basically infinite). > That's just a value that says how many times we'll try the old unfair > case, so "1000" means "we'll re-queue up to a thousand times before we > say enough is enough" and zero is the fair mode that shows the > performance problems. Thank you for the new patch and all the work around from everybody! Sorry to jump in this thread but I wanted to share my issue, also linked to the same commit: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") I have a simple test environment[1] using Docker and virtme[2] almost with the default kernel config and validating some tests for the MPTCP Upstream project[3]. Some of these tests are using a modified version of packetdrill[4]. Recently, some of these packetdrill tests have been failing after 2 minutes (timeout) instead of being executed in a few seconds (~6 seconds). No packets are even exchanged during these two minutes. I did a git bisect and it also pointed me to 2a9127fcf229. I can run the same test 10 times without any issue with the parent commit (v5.8 tag) but with 2a9127fcf229, I have a timeout most of the time. Of course, when I try to add some debug info on the userspace or kernelspace side, I can no longer reproduce the timeout issue. But without debug, it is easy for me to validate if the issue is there or not. My issue doesn't seem to be linked to a small file that needs to be read multiple of times on a FS. Only a few bytes should be transferred with packetdrill but when there is a timeout, it is even before that because I don't see any transferred packets in case of issue. I don't think a lot of IO is used by Packetdrill before transferring a few packets to a "tun" interface but I didn't analyse further. With your new patch and the default value, I no longer have the issue. > I've only (lightly) tested those two extremes, I think the interesting > range is likely in the 1-5 range. > > So you can do > > echo 0 > /proc/sys/vm/page_lock_unfairness > .. run test .. > > and you should get the same numbers as without this patch (within > noise, of course). On my side, I have the issue with 0. So it seems good because expected! > Or do > > echo 5 > /proc/sys/vm/page_lock_unfairness > .. run test .. > > and get numbers for "we accept some unfairness, but if we have to > requeue more than five times, we force the fair mode". Already with 1, it is fine on my side: no more timeout! Same with 5. I am not checking the performances but only the fact I can run packetdrill without timeout. With 1 and 5, tests finish in a normal time, that's really good. I didn't have any timeout in 10 runs, each of them started from a fresh VM. Patch tested with success! I would be glad to help by validating new modifications or providing new info. My setup is also easy to put in place: a Docker image is built with all required tools to start the same VM just like the one I have. All scripts are on a public repository[1]. Please tell me if I can help! Cheers, Matt [1] https://github.com/multipath-tcp/mptcp_net-next/blob/scripts/ci/virtme.sh and https://github.com/multipath-tcp/mptcp_net-next/blob/scripts/ci/Dockerfile.virtme.sh [2] https://git.kernel.org/pub/scm/utils/kernel/virtme/virtme.git [3] https://github.com/multipath-tcp/mptcp_net-next/wiki [4] https://github.com/multipath-tcp/packetdrill -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net