Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1069580yba; Fri, 26 Apr 2019 13:33:48 -0700 (PDT) X-Google-Smtp-Source: APXvYqzdq3++J30099WUjEuEuwAABx2Z9ly6gKz0BRo8Kes8adPaark3+TAoK4FNdahOYTVZ/SFX X-Received: by 2002:a17:902:f08a:: with SMTP id go10mr47025034plb.121.1556310828659; Fri, 26 Apr 2019 13:33:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556310828; cv=none; d=google.com; s=arc-20160816; b=ImnO2MrnxUJacR7Glcm5MGtOyqLlFLc/iyA4tMZwF94KHsp41/UELfbS1W9moyhVNl vSnweUzZDqz8Iq8+NjPVb+uIOfY0lV9wDxWOrTWdfkTnYKIbaspXUrztpXig9fKjVajn jmBd6Nt5BYAv0XreezYl9WxibWNX3WXDfujqfj8h04Q5iiGZwyWjB6o6QyXMRPOC//RL Qz4tHDKbeAxuUDCO823zYfomCNQBmavGQufc0LgFX6UafvvOniVC8x9qDTcOl2Ca1Ojb cd27Pvy3UHodvBa6eeIISMKg4D7clGtS3cF1U7KZmxQFe6Bi5ddA5v8M3yohZUGp8ON0 q8aA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=NRhCrgdcVtWSzkm+pdTS9sMArbFGolgpvwXOxwSyggQ=; b=qbjFb/v7MRGlH1vc6O9+OF3TgesNwVT36rrRjub2IGF319meyASM/5uxUSvXs8DY+H EeUcOFE9EP+NJZCImU6lXhrnGMgqdRl/mrXq0rGmssfbgac9vT6DtAwXSWOiDmAyhUp5 xB7devZ8pdgh1jjiCjTfKfZ8dLljhr8D0ZOuCzmg3/Dxl2Ij9PFpnxeUWBfbpi2bKQD4 B4fe5Nfpg57SSdjqYb0x1ZuHnVbiUKPthv1+ZKoxQt/3I/J3thBuv6w9zJTNH9CeDvOO FtjeluEX8EcGcmWHMIhxSt9i2DL5NUvEWK57vvHsqciDuq9E02hwPY1Pz4uAuHXuJEU/ yiUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=nB+PjGkc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z143si27731712pfc.64.2019.04.26.13.33.32; Fri, 26 Apr 2019 13:33:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=nB+PjGkc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726314AbfDZUbS (ORCPT + 99 others); Fri, 26 Apr 2019 16:31:18 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:43754 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726077AbfDZUbS (ORCPT ); Fri, 26 Apr 2019 16:31:18 -0400 Received: by mail-pg1-f195.google.com with SMTP id z9so2119751pgu.10 for ; Fri, 26 Apr 2019 13:31:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=NRhCrgdcVtWSzkm+pdTS9sMArbFGolgpvwXOxwSyggQ=; b=nB+PjGkczsije8B8z+PDeI9W+4LEbce9e8Ktw9RPJx6QlQnXgZNKNdCKnWfM9DMk3m E1pA9YyO6kCQskSJNpGC3FBzccvhaYYzoyaUvO3uyAbu/miryUuxnj4OrZeYxXF3NIlP ffFU4pOj4trayV/gF0F/7yqY/xaDmLZGgogpk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=NRhCrgdcVtWSzkm+pdTS9sMArbFGolgpvwXOxwSyggQ=; b=ChzpVRY1ZP5n4tceypJSI8SaOAgzTHwFKt0mo2Q9elAaRuuQVZQqvyT+KUe8qpSRBr aGgoC/ZZ4eRanDz60ABcsTQMTWmtQjz3Np1mZL99O54KucI/KeEykzbM3P20vh74/K4Y iytMVJMhH73LuyMvqBs9pqkM0QbtzFir+HCCS8hhZ4ZHqe65sRc6LwWiFXodtRXtBLyF TPcEuk/zcnKYRzW8v62Okig0ADMtb32KEFZjU2HTZ9LixcxXIDHnDYpdwz2AZuU9LclF Ppfphbx4ZmFinOkSFLSXAlugNgLK0xQeBysO0oyAEyJXyrolQ/0mB0D01l07p1/MKMhf EUpg== X-Gm-Message-State: APjAAAXTnqg2/OW/71lAUwJJEIE+zGHDk1i75Hc8BCJ32ToS/iumlPzC uttrPkLc8pckRK1/WTLXqIfQHg== X-Received: by 2002:a63:6e01:: with SMTP id j1mr45913798pgc.442.1556310676949; Fri, 26 Apr 2019 13:31:16 -0700 (PDT) Received: from localhost ([2620:15c:6:12:9c46:e0da:efbf:69cc]) by smtp.gmail.com with ESMTPSA id u17sm35108711pfn.19.2019.04.26.13.31.15 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 26 Apr 2019 13:31:15 -0700 (PDT) Date: Fri, 26 Apr 2019 16:31:14 -0400 From: Joel Fernandes To: Daniel Colascione Cc: Christian Brauner , linux-kernel , Andrew Morton , Arnd Bergmann , "Eric W. Biederman" , Greg Kroah-Hartman , Ingo Molnar , Jann Horn , Jann Horn , Jonathan Kowalski , Android Kernel Team , "open list:KERNEL SELFTEST FRAMEWORK" , Andy Lutomirski , Michal Hocko , "Peter Zijlstra (Intel)" , Steven Rostedt , Serge Hallyn , Shuah Khan , Sandeep Patil , Stephen Rothwell , Suren Baghdasaryan , Thomas Gleixner , Tim Murray , Linus Torvalds , Tycho Andersen Subject: Re: [PATCH v1 2/2] Add selftests for pidfd polling Message-ID: <20190426203114.GA185553@google.com> References: <20190425190010.46489-1-joel@joelfernandes.org> <20190425190010.46489-2-joel@joelfernandes.org> <20190425212917.yotnir4uqgpnh764@brauner.io> <20190426172602.GD261279@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 26, 2019 at 12:35:40PM -0700, Daniel Colascione wrote: > On Fri, Apr 26, 2019 at 10:26 AM Joel Fernandes wrote: > > On Thu, Apr 25, 2019 at 03:07:48PM -0700, Daniel Colascione wrote: > > > On Thu, Apr 25, 2019 at 2:29 PM Christian Brauner wrote: > > > > This timing-based testing seems kinda odd to be honest. Can't we do > > > > something better than this? > > > > > > Agreed. Timing-based tests have a substantial risk of becoming flaky. > > > We ought to be able to make these tests fully deterministic and not > > > subject to breakage from odd scheduling outcomes. We don't have > > > sleepable events for everything, granted, but sleep-waiting on a > > > condition with exponential backoff is fine in test code. In general, > > > if you start with a robust test, you can insert a sleep(100) anywhere > > > and not break the logic. Violating this rule always causes pain sooner > > > or later. > > > > I prefer if you can be more specific about how to redesign the test. Please > > go through the code and make suggestions there. The tests have not been flaky > > in my experience. > > You've been running them in an ideal environment. One would hope for a reliable test environment. > > In this case, we want to make sure that the poll unblocks at the right "time" > > that is when the non-leader thread exits, and not when the leader thread > > exits (test 1), or when the non-leader thread exits and not when the same > > non-leader previous did an execve (test 2). > > Instead of sleeping, you want to wait for some condition. Right now, > in a bunch of places, the test does something like this: > > do_something() > sleep(SOME_TIMEOUT) > check(some_condition()) No. I don't have anything like "some_condition()". My some_condition() is just the difference in time. > > You can replace each of these clauses with something like this: > > do_something() > start_time = now() > while(!some_condition() && now() - start_time < LONG_TIMEOUT) > sleep(SHORT_DELAY) > check(some_condition()) > > This way, you're insensitive to timing, up to LONG_TIMEOUT (which can > be something like a minute). Yes, you can always write > sleep(LARGE_TIMEOUT) instead, but a good, robust value of LONG_TIMEOUT > (which should be tens of seconds) would make the test take far too > long to run in the happy case. Yes, but try implementing some_condition() :-). It is easy to talk in the abstract, I think it would be more productive if you can come up with an implementation/patchh of the test itself and send a patch for that. I know you wrote some pseudocode below, but it is a complex reimplementation that I don't think will make the test more robust. I mean reading /proc/pid stat? yuck :) You are welcome to send a patch though if you have a better implementation. > Note that this code is fine: > > check(!some_condition()) > sleep(SOME_REASONABLE_TIMEOUT) > check(!some_condition()) > > It's okay to sleep for a little while and check that something did > *not* happen, but it's not okay for the test to *fail* due to > scheduling delays. The difference is that As I said, multi-second scheduling delay are really unacceptable anyway. I bet many kselftest may fail on a "bad" system like that way, that does not mean the test is flaky. If there are any reports in the future that the test fails or is flaky, I am happy to address them at that time. The tests work and catch bugs reliably as I have seen. We could also make the test task as RT if scheduling class is a concern. I don't think its worth bikeshedding about hypothetical issues. thanks, - Joel