Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp2462050pxa; Fri, 7 Aug 2020 11:42:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwGIIqctabpjWm9q84E9BXfnzAP6qg38zMNqPhwtY3cdK93kh0M26ogrjcCq067coOXan7b X-Received: by 2002:a17:906:1f8b:: with SMTP id t11mr10961106ejr.32.1596825744862; Fri, 07 Aug 2020 11:42:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596825744; cv=none; d=google.com; s=arc-20160816; b=EW69WmQIAVX017/0HElejsq2gefw7Fd2wrsS6VWSj5BqhwPpRqRG3LhNZgvnIaj+up l7q4KGcPyfJMvMxa7O+TNoJZhdG/0gcwKrzu+XPH5tI2kNha1OnVgDykgDu5A1O/cRjG Vohp1idnUFVzCkor9hD25Bzx5SVwkEfoldHl+1C2NKtaULXV+sijdqrBijKGaQjSYiD9 4nJJDwWd5XGoexjfgliksWA9CFa4Sy1/z6SPZHmxR6okAgvYC8/C/zuV10Jr3nkV7Wzm FMsJn0PFVNjdn5u+4M4ynIW1oaQa9oHOie3xm/xms39Cks17sX4lPkGX7BDlBucc1Wq3 ueeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=qmBoGKJ/6jf0yN39yT89ZvHLfPlQIChbVPrk/otNWQ0=; b=McBgL8U9AYQ1qq6dPc5jqy7qe9nz2rGVsMS7WX3VCe5zvvCn3jYzsgm3Nkjg6Ujl1H QHG6I+9f3+bQCDG+OwGEEC+MIa5LiQduCmsz1Z/jn4qLPd37qT2ZdhDm/6e3uY1IJJm4 XZDvsGrOmV5E3tkhNrh8uGToQCKbITzFlmmTVp4Xqm8DwetL3YtnkeQjunLf5lR6Ii7R Wq249i9qA2z71tRjat1agrHHIbIEjes92wWQZAIzlaTo0LTqfwQh7FjcaaAo6jVN5CqI ZieAu/21F+EtDLCRbmErd5WYPdW6JaZKtD0yS/i+63bS2xctbpdZIvlu2ClH8N3JcBED dthw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="LN/kimao"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f9si5999425edm.167.2020.08.07.11.42.01; Fri, 07 Aug 2020 11:42:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="LN/kimao"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726078AbgHGSl0 (ORCPT + 99 others); Fri, 7 Aug 2020 14:41:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725934AbgHGSl0 (ORCPT ); Fri, 7 Aug 2020 14:41:26 -0400 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34123C061756 for ; Fri, 7 Aug 2020 11:41:26 -0700 (PDT) Received: by mail-oi1-x244.google.com with SMTP id z22so2815270oid.1 for ; Fri, 07 Aug 2020 11:41:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=qmBoGKJ/6jf0yN39yT89ZvHLfPlQIChbVPrk/otNWQ0=; b=LN/kimaoPxNG7QYyWZoeFQ1GljX2ckB7ujRpdZqKUZo6aEhUnGhDNdRNJYX9qgjG50 4J9MMVXR4T16g8OfbfsNmZ7KbRjMI8bitZ8HVwidzNEB77ivrSeKq4ExqpcLBD9w4xsh VQIoy25TaJ8G28LKYf84f7MUAUWYAz0xElhDKICbvdAPuaAbLNIBSjV8uMEPmwcFHsa6 5tja8W0ULyNw+pVogR6MlHqXoZ+3rRASVIod19W+z6Zcj7jzuwg6F7Lqy8xXoY/kCdeC n8527o34/Ze9xkLcAR1IF4jmwXtejTM9Qqr4EY0/t8By+h6els1583FfG5EevsFFBMj+ UaYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=qmBoGKJ/6jf0yN39yT89ZvHLfPlQIChbVPrk/otNWQ0=; b=TjBOFvbfzJKPkg+2wd3YD5lOKdDxZyictdqSgGZkrP0no/RwbFocnS4kcLUYCeR0EM nyFLpfvNBAA88VMy4IN/C4EdaH1H/7pcyvnZ5RLwNRvWbW91MdFoIe+EM34vhx5beFdp 1l44oua45jYDBDnGn+Ehj7h3b6FCCtiiZRQzKdq4ZSGaKntSdmqn4jetitRMNFSETHug j7005NK5zoWCXkUtJA6LVcuxLZIuLb8BvNMGW6OJE8E9/s2G2d0+1WifH01jaWiiWwNm i8AqSZvQWplW9sYfiRSHsc7/s++U+ML2sK/ZGrurxDksNsfs4rmNS2/swkDuPFF3dNIZ ej1w== X-Gm-Message-State: AOAM530VheBQNiI7SoGn7R4x1bpaKbqEaXzzAiBTdDVzbr3ChJm6SRo7 lamAsD7WF9IDsJSHpec9AIYk5A== X-Received: by 2002:aca:d88a:: with SMTP id p132mr13091890oig.95.1596825684270; Fri, 07 Aug 2020 11:41:24 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id k25sm1798842otb.1.2020.08.07.11.41.21 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Fri, 07 Aug 2020 11:41:22 -0700 (PDT) Date: Fri, 7 Aug 2020 11:41:09 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Linus Torvalds cc: Matthew Wilcox , Hugh Dickins , Oleg Nesterov , Michal Hocko , Linux-MM , LKML , Andrew Morton , Tim Chen , Michal Hocko , Greg KH Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page In-Reply-To: Message-ID: References: <20200724152424.GC17209@redhat.com> <20200725101445.GB3870@redhat.com> <20200806180024.GB17456@casper.infradead.org> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 6 Aug 2020, Linus Torvalds wrote: > On Thu, Aug 6, 2020 at 11:00 AM Matthew Wilcox wrote: > > > > It wasn't clear to me whether Hugh thought it was an improvement or > > not that trylock was now less likely to jump the queue. There're > > the usual "fair is the opposite of throughput" kind of arguments. I don't know. I'm inclined to think that keeping just a small chance of jumping the queue is probably good; but pay no attention to me, that's an opinion based on ignorance. Thanks for mentioning the lucky lock_page(): I was quite wrong to single out trylock_page() - I suppose its lack of a slow path just helped it spring to mind more easily. > > Yeah, it could go either way. But on the whole, if the lock bit is > getting any contention, I think we'd rather have it be fair for > latency reasons. > > That said, I'm not convinced about my patch, and I actually threw it > away without even testing it (sometimes I keep patches around in my > private tree for testing, and they can live there for months or even > years when I wonder if they are worth it, but this time I didn't > bother to go to the trouble). > > If somebody is interested in pursuing this, I think that patch might > be a good starting point (and it _might_ even work), but it seemed to > be too subtle to really worry about unless somebody finds an actual > acute reason for it. It is an attractive idea, passing the baton from one to the next; and a logical destination from where you had already arrived. Maybe somebody, not me, should pursue it. I tried to ignore it, but eventually succumbed to temptation, and ran it overnight at home (my usual tmpfs swapping), and on one machine at work (fork/exit stress etc). It did need one fixup, below: then home testing went fine. But the fork/exit stress hit that old familiar unlock_page wake_up_page_bit softlockup that your existing patch has appeared to fix. I can't afford to investigate further (though might regret that: I keep wondering if the small dose of unfairness in your existing patch is enough to kick the test when it otherwise would hang, and this new stricter patch be pointing to that). My home testing was, effectively, on top of c6fe44d96fc1 (v5.8 plus your two patches): I did not have in the io_uring changes from the current tree. In glancing there, I noticed one new and one previous instance of SetPageWaiters() *after* __add_wait_queue_entry_tail(): are those correct? > > I think the existing patch narrowing the window is good, and it > clearly didn't hurt throughput (although that was almost certainly for > other reasons). Agreed. > > Linus > --- linus/mm/filemap.c 2020-08-07 02:08:13.727709186 -0700 +++ hughd/mm/filemap.c 2020-08-07 02:16:10.960331473 -0700 @@ -1044,7 +1044,7 @@ static int wake_page_function(wait_queue return ret; } -static int wake_up_page_bit(struct page *page, int bit_nr) +static void wake_up_page_bit(struct page *page, int bit_nr, bool exclude) { wait_queue_head_t *q = page_waitqueue(page); struct wait_page_key key; @@ -1096,15 +1096,28 @@ static int wake_up_page_bit(struct page * That's okay, it's a rare case. The next waker will clear it. */ } + + /* + * If we hoped to pass PG_locked on to the next locker, but found + * no exclusive taker, then we must clear it before dropping q->lock. + * Otherwise unlock_page() can race trylock_page_bit_common(), and if + * PageWaiters was already set from before, then cmpxchg sees no + * difference to send it back to wake_up_page_bit(). + * + * We may have already dropped and retaken q->lock on the way here, but + * I think this works out because new entries are always added at tail. + */ + if (exclude && !exclusive) + clear_bit(bit_nr, &page->flags); + spin_unlock_irqrestore(&q->lock, flags); - return exclusive; } static void wake_up_page(struct page *page, int bit) { if (!PageWaiters(page)) return; - wake_up_page_bit(page, bit); + wake_up_page_bit(page, bit, false); } /* @@ -1339,8 +1352,8 @@ void unlock_page(struct page *page) * spinlock wake_up_page_bit() will do. */ smp_mb__before_atomic(); - if (wake_up_page_bit(page, PG_locked)) - return; + wake_up_page_bit(page, PG_locked, true); + return; } new = cmpxchg_release(&page->flags, flags, flags & ~(1 << PG_locked)); if (likely(new == flags))