Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1852870ybh; Thu, 23 Jul 2020 20:46:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzGqypIGZ7dQM9HlfrXm5H6FxtlUPzYa34yi2bYUpXwgamANufdSOywZFxoD6rgK67nc2Lf X-Received: by 2002:a17:906:1756:: with SMTP id d22mr7343138eje.29.1595562416898; Thu, 23 Jul 2020 20:46:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595562416; cv=none; d=google.com; s=arc-20160816; b=V7xCBus2ASi9+M5d7gdZQ1HhmOAZzH894yJnSOnrKPegz47IpyOAcLjKq92q3Fg650 WMhxBn/XiKL18PPVGHp6XzSItMr6dMqMQR7u0jtl/73bgr1L9eJMve+W2rwnoNipQs/x xJpwKap+H/V9OlOZfrLdpGptvujnC5wluagIvE8N7r4hEicFwGOv64HjJmBEQNb890c8 nGTkjD52B1RadyMaorX/bss2+AfPLCH6LgSC0WY6wmFnFqhPacvyGuI3uqgWLppHW74j 2QYWHTREVHFx6QIK3LATjoOAqkW9KxCwXA3bwWL183q/CWzuws9ZWEeJzmTFUfxSa7+A zGEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=M4J07zWXl7gx9PKEYJvn7EJ3LUQ854Vz+YTl+3vbqAY=; b=wCLb2WNT2M0lrc+Ke77u0Pi3HfcXpdL4449v7kg9cgCnghLPN4m/Zky77OER827Rnn i7e8omkeBOxIr4z3uHedTABcWcYhbUFodNOuDTK3czpV8p0+UEAdGUHifZPzRQ7WV3M+ /pSgLtJgvhlTRitzY7sN/LSW2GnynQ0Q9gy2SevtYLryTdc9g1BUY9sJbmYT2wLjN8BK sPOxH2RGt6o7Nxbg0nyuSKoirgsQhtat+fHSmsQviW3HsEtdnnt9S+49sbpkjDRbtdso vJsthcOC5tYslASER8XZAIKVDLffwGrV7H7PZnHvbcsqImsz+PFeKMJabFIK+odAhCha OCfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=C6mVErwt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h8si2822592edn.92.2020.07.23.20.46.33; Thu, 23 Jul 2020 20:46:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=C6mVErwt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726559AbgGXDqQ (ORCPT + 99 others); Thu, 23 Jul 2020 23:46:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726381AbgGXDqP (ORCPT ); Thu, 23 Jul 2020 23:46:15 -0400 Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55CF2C0619D3 for ; Thu, 23 Jul 2020 20:46:15 -0700 (PDT) Received: by mail-wm1-x344.google.com with SMTP id c80so6713842wme.0 for ; Thu, 23 Jul 2020 20:46:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=M4J07zWXl7gx9PKEYJvn7EJ3LUQ854Vz+YTl+3vbqAY=; b=C6mVErwtomReoITMugJNqU569a7mg+Mi54NW0KTPb/ek0WcdjBImAYVsJZ2NBfpOnm FQ2wlSJFDFOO4d94T1hhmkXW1fbi8S9p483YWzMyqu3r+bGBngERW+C3Qs0na7MrVh8q NEAhrF6u90fodX0dCPFOkyQIsOtoWnTwdUogV0qsCccR3Gks+GEl963pMUuVe9Y7WI5r RTIcvOruSfzxiiVFKse6/NG8dKh73Yo06eev8NgFCbtZBAGSXZBaTMp5YfknP80pISK/ LxHrhby0G1lFReM4GpTz0kiL7D7eg08mRnRS8Vw/ShtKLC4Yp/MqsTizbqvq4seR+MZT KaPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=M4J07zWXl7gx9PKEYJvn7EJ3LUQ854Vz+YTl+3vbqAY=; b=A6Vgg06nd7fBfNec9QZy87veiR26MBa/JH0ljAUmEgzY53Y17BWyBVlKvV5HCgTJSN HLXT/VAFQJ9zhvqWJEOiI8k4cn38ue9V0eTyX+MBB7KmmKqfksDNvabusZILBVPGJteD myZJ69oledaSxBmm89asmD/vSKOttY1GccdPBK1Af2gWYmWQG5pXXfESYv4IJ+nfBqHQ r5kh/ysnOk+5gZ8i9v1HDI7P9tN9gbC4nR3FKiUH5w+S1ZRMY4VgkPTZhmjPcKitgrlz A82cgtZ6mNXqr+sNRqIm7OW0t5ansNxqyLOZ8AqU/iboEtbKIdpBPMbSYPxUe4CBwOXa MEnQ== X-Gm-Message-State: AOAM532CZeG3IaDjUao/2pcyVPZSLvuzMMaYABy+WcJzuMrGP0PBq6RB KkflFbpACSxgeMOamjUkCxG6eqLDx6tGqSE/dC98iNUG X-Received: by 2002:a1c:9650:: with SMTP id y77mr6597593wmd.101.1595562373681; Thu, 23 Jul 2020 20:46:13 -0700 (PDT) MIME-Version: 1.0 References: <20200721063258.17140-1-mhocko@kernel.org> <20200723124749.GA7428@redhat.com> In-Reply-To: From: Hugh Dickins Date: Thu, 23 Jul 2020 20:45:34 -0700 Message-ID: Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page To: Linus Torvalds Cc: Oleg Nesterov , Michal Hocko , Linux-MM , LKML , Andrew Morton , Tim Chen , Michal Hocko Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 23, 2020 at 5:47 PM Linus Torvalds wrote: > > On Thu, Jul 23, 2020 at 5:07 PM Hugh Dickins wrote: > > > > I say that for full disclosure, so you don't wrack your brains > > too much, when it may still turn out to be a screwup on my part. > > Sounds unlikely. > > If that patch applied even reasonably closely, I don't see how you'd > see a list corruption that wasn't due to the patch. > > You'd have had to use the wrong spinlock by mistake due to munging it, > or something crazy like that. > > The main list-handling change is > > (a) open-coding of that finish_wait() > > (b) slightly different heuristics for removal in the wakeup function > > where (a) was because my original version of finishing the wait needed > to do that return code checking. > > So a normal "finish_wait()" just does > > list_del_init(&wait->entry); > > where-as my open-coded one replaced that with > > if (!list_empty(&wait->entry)) { > list_del(&wait->entry); > ret = -EINTR; > } > > and apart from that "set return to -EINTR because nobody woke us up", > it also uses just a regular "list_del()" rather than a > "list_del_init()". That causes the next/prev field to be poisoned > rather than re-initialized. But that second change was because the > list entry is on the stack, and we're not touching it any more and are > about to return, so I removed the "init" part. > > Anyway, (a) really looks harmless. Unless the poisoning now triggers > some racy debug test that had been hidden by the "init". Hmm. > > In contrast, (b) means that the likely access patterns of irqs > removing the wait entry from the list might be very different from > before. The old "autoremove" semantics would only remove the entry > from the list when try_to_wake_up() actually woke things up. Now, a > successful bit state _always_ removes it, which was kind of the point. > But it might cause very different list handling patterns. > > All the actual list handling looks "obviously safe" because it's > protected by the spinlock, though... > > If you do get oopses with the new patch too, try to send me a copy, > and maybe I'll stare at exactly where it happens register contents and > go "aah". This new version is doing much better: many hours to go, but all machines have got beyond the danger point where yesterday's version was crashing - phew! Hugh