Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp3190971ybh; Sat, 25 Jul 2020 14:23:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzt/A3gQAHo6iFWZY9CsmhG7tfSZW0EcHiTwcD44/DnrPaWhbeFfs/y/EN3393nhV2z8nRr X-Received: by 2002:a05:6402:742:: with SMTP id p2mr14243110edy.135.1595712190798; Sat, 25 Jul 2020 14:23:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595712190; cv=none; d=google.com; s=arc-20160816; b=hPc2wgJ1P5K10AkKq9x65gaVj1kJZ48OBcBK4PmRtkUF/1domRgod7h6HoDecrc7Ff x2fy7A0CdzYEyeEjNOTEYipimbojXJnbTe9lw1Ksj4o1OGKRT6i2cLx+XK2PPdmK+JKa 2+kZUJkwJYIgPWkhlWTJCbMtli66hbuDR76qFB0jRhyVxme1yumqc6bTgGYqYNCSkYsd gomEqqoqY3wDVdTa12Y6nHnXOVecY/nb20vUlK+HT7xuYeb7RHXwf7654efpbrJdqWWl jEID+qT3rJiJxo3xJX+qqb1CMnhIUgIdZ9H1phmJh9aFHyuKQQQYmrTwWkBdFEXy+4qF NLbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=q3pXJWvqYv2PwCVcBEiyQuUZc+6aMPazhjDam8Vbwso=; b=nC+VbiE3TSkbRPnNq9v4qc9OK+GgB04uaPHlhS3kWpsaLFGgLL+SmoG8S/X5nKR4fg dq8MbQbT7HmY74jEJ5QLYR6q4Vvo2eghIaIwiR8xGHauFOLXZ6tda9/r/DJb5elhnYdt s5cg4lp2IuFNvtqv3Lh6+ysOpz1gZvKbELJOGGmqnEi7JUR9CckcjyWRD/KFGyUvKlzn pudjeRgqWIEAZmqlDGQCOFfxPdSxbqN7j2mS3DsVQmOYxe3ROlbtupNebx7oV55PvZZS BFUgQjHNER/f0Mj+DSmCv+q3/37dcETJpZZMEEyEgtp+SQVInvY+2hEJPMBkx7jg10m+ QVYQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=JiQbHcw2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bs1si3254675edb.76.2020.07.25.14.22.47; Sat, 25 Jul 2020 14:23:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=JiQbHcw2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727009AbgGYVUD (ORCPT + 99 others); Sat, 25 Jul 2020 17:20:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51266 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726786AbgGYVUD (ORCPT ); Sat, 25 Jul 2020 17:20:03 -0400 Received: from mail-qv1-xf42.google.com (mail-qv1-xf42.google.com [IPv6:2607:f8b0:4864:20::f42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CA1CC08C5C0 for ; Sat, 25 Jul 2020 14:20:03 -0700 (PDT) Received: by mail-qv1-xf42.google.com with SMTP id dd12so746404qvb.0 for ; Sat, 25 Jul 2020 14:20:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=q3pXJWvqYv2PwCVcBEiyQuUZc+6aMPazhjDam8Vbwso=; b=JiQbHcw2+4jpoPbh77ibRmDOgWQcZkZzuS8Dn9PIimC21my883B0SL+o88BKMHq6BZ rccBq2coxS///bma6PFRUj/ZpJa3nID6c8w8TCgEK9f+5sOfw38N6IH/ToxZjj9/rQg0 n86Yp0Keq+nzPwKru/5qTBZ3aPAxjy7JbDtI+uv3sAu50Qbfpa7xRiJ5SVswE2giAzBb 9mIkAepypfA5R7wtYMfEBzvHykgfc1jrbr3IpIZmer8kzMJykv8YIxCXXe3C105pPPm3 hbE5xPSiOeI6SQf5YL364/HmSzUWni8jGhGBhvQJxGaIF0EEnkxfs1rU/XaNfl0MvLm0 BrAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=q3pXJWvqYv2PwCVcBEiyQuUZc+6aMPazhjDam8Vbwso=; b=csLmBq53XkPeRdPHlcT1w8zlcreQqfEDzS9lX5TCQ8qngiJ/h4Qb31N9RqvMy8QUte wogDxKNREY7fOPBpH3NHRmHEYe3FJYIkokeSfD7p3SpBuGigU3iHo+W9vF+VGPka0QSh LwDHsnYp1H1LoQM9FOt0sD2YtAlJHq9Zn+61CFARFYSt2vUzqBaKKOskTYC1RzBgRifS TuaXK9lrn7iETtwrnagc2GOtClo5TCfE++ezM18Erk8CdeHg05o+Jrr39dOVzfuHGl68 9gj5ZFAoxts+WhZj2/aA3itT0T+gXOFPQwyIPc/x2eIlzt00ZANDF6OinyUyLqPU5wbM x5eA== X-Gm-Message-State: AOAM530RMas5m7K9u1tCZLd0rMcNEz2DrzskBm87NfcCCBeqSDTvUKmx kT/H6WixXK6u8GHnc6cgqQoAdg== X-Received: by 2002:ad4:424a:: with SMTP id l10mr15932830qvq.29.1595712001864; Sat, 25 Jul 2020 14:20:01 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 205sm11163498qkj.19.2020.07.25.14.19.59 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Sat, 25 Jul 2020 14:20:00 -0700 (PDT) Date: Sat, 25 Jul 2020 14:19:46 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Linus Torvalds cc: Oleg Nesterov , Hugh Dickins , Michal Hocko , Linux-MM , LKML , Andrew Morton , Tim Chen , Michal Hocko Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page In-Reply-To: Message-ID: References: <20200723124749.GA7428@redhat.com> <20200724152424.GC17209@redhat.com> <20200725101445.GB3870@redhat.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 25 Jul 2020, Linus Torvalds wrote: > On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov wrote: > > > > Heh. I too thought about this. And just in case, your patch looks correct > > to me. But I can't really comment this behavioural change. Perhaps it > > should come in a separate patch? > > We could do that. At the same time, I think both parts change how the > waitqueue works that it might as well just be one "fix page_bit_wait > waitqueue usage". > > But let's wait to see what Hugh's numbers say. Oh no, no no: sorry for getting your hopes up there, I won't come up with any numbers more significant than "0 out of 10" machines crashed. I know it would be *really* useful if I could come up with performance comparisons, or steer someone else to do so: but I'm sorry, cannot. Currently it's actually 1 out of 10 machines crashed, for the same driverland issue seen last time, maybe it's a bad machine; and another 1 out of the 10 machines went AWOL for unknown reasons, but probably something outside the kernel got confused by the stress. No reason to suspect your changes at all (but some unanalyzed "failure"s, of dubious significance, accumulating like last time). I'm optimistic: nothing has happened to warn us off your changes. And on Fri, 24 Jul 2020, Linus Torvalds had written: > So the loads you are running are known to have sensitivity to this > particular area, and are why you've done your patches to the page wait > bit code? Yes. It's a series of nineteen ~hour-long tests, of which about five exhibited wake_up_page_bit problems in the past, and one has remained intermittently troublesome that way. Intermittently: usually it does get through, so getting through yesterday and today won't even tell us that your changes fixed it - that we shall learn over time later. Hugh