Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp828350ybh; Tue, 21 Jul 2020 08:51:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJybaLTW1nPa6tKF2uI/hAxzNnezmZ6wz1aR/ZrrnmBYRC0Ak9ho6pYHX2KrP48trJtc6YX0 X-Received: by 2002:a05:6402:a58:: with SMTP id bt24mr27252155edb.333.1595346680992; Tue, 21 Jul 2020 08:51:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595346680; cv=none; d=google.com; s=arc-20160816; b=fJzN0ToOXtNxkTAQqAFUYDZx8puiImQNFQOMxT02wglbQCjzPOY8mq5gGQ+xEru4h6 VQlrZhj8+Gs72shI/N3EnhIfOxBUcqOZCZVe+mbWblTvlKmlm/NJTnbFlw1btV8N++LE Gj2OUg7MXFPnWLeaYdsSIu0+qSuJv4jSQ9U2kESBaOI4W0WmsWW5RG8FwgqLiTGwGyMZ BSvHyPSaCdr4rENi/lqH0j1az2sBLQVzl1NGuWlq2QNEJbLVgdXI/2WrHx5F5BTWS9rH vwAzysfNqWSBURtw1+bC5dho3lIDDxDp2EjXZmD1xOq2enjiaWYa9/9Tl5bNFaB7p9Ys qqaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=Xtdmt+rnuzEMChFrWyb/dtdXZAk0jWQrZ/8hX8rAe9g=; b=QnJFXTZAnwr/jTQY/dugjGcys1pQqyddPuoxNywdWiJARv/KY7B7dCBUgHqf+tLj3K dRoJpaoYwcxx3a5H0LirsDvustxZtfORZMo/gjMLacXage0O4fRPPHbTCgLNUJdytVHP XyH6UiPoA1QR2a8UOxPTRsPqNYr0VqZ5ePcW0RU589I1AOHAvj/i4y+V0e1dMAxVCV/6 fXzqgrFYk5K9IB+lJm9nOfUSKaueMhkKDZAyK3C9FFoEH5O3VuLqAwGKpgMO3rBuoZDp C8CR/SKerxezObACoHACcgc6iwnB5+6c/mlE3KAbCkq8MhR7/MR2x3ns3fQHZgHcgS4O nAag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bs1si14676886edb.418.2020.07.21.08.50.57; Tue, 21 Jul 2020 08:51:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729989AbgGUPtn (ORCPT + 99 others); Tue, 21 Jul 2020 11:49:43 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:44439 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726029AbgGUPtn (ORCPT ); Tue, 21 Jul 2020 11:49:43 -0400 Received: by mail-wr1-f66.google.com with SMTP id b6so21671397wrs.11 for ; Tue, 21 Jul 2020 08:49:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Xtdmt+rnuzEMChFrWyb/dtdXZAk0jWQrZ/8hX8rAe9g=; b=muZyO22wnVJZAEeN0JzpPU5MgnxpN6q9Qmf7eJNFJkxhPxPyJeoSo41bXMGA+hEwpN Q/PPPRYhjf1XzZjCuhrsO9bxMYEhIIXW+CjHiVQhoSSQ+Mm0jnmE0p3KmBBhJRNrbrh7 MbFCviTs2qbdNEu5MYlS9gPbpwVL7ZFqTRezRDgSv4dbGjxkKghSwBGoQq0SSa+wVcMc Twiowfuyay4tdPUVddb6FmPo3OQXQlKfvEtWLW+D5egO38MrmUVO9D8saL6KIwlMmoyj IOjVE/VzENtTfZxqnNSUnvGZrIX5wzP4Q5NwjQpeoFiU/Sp9fsZ9bzBXLVmMc91f2WbK hkTA== X-Gm-Message-State: AOAM532fH7JEGGZCIYdfoWJwtpxXYbxuvM+JodV4GRZzcUMyoCja3ZiN T5ODRx9xyIpc1cAiqJSHPjg= X-Received: by 2002:a5d:6a8d:: with SMTP id s13mr19257556wru.201.1595346581091; Tue, 21 Jul 2020 08:49:41 -0700 (PDT) Received: from localhost (ip-37-188-169-187.eurotel.cz. [37.188.169.187]) by smtp.gmail.com with ESMTPSA id 138sm4050398wmb.1.2020.07.21.08.49.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jul 2020 08:49:40 -0700 (PDT) Date: Tue, 21 Jul 2020 17:49:39 +0200 From: Michal Hocko To: Linus Torvalds Cc: Linux-MM , LKML , Andrew Morton , Tim Chen Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page Message-ID: <20200721154939.GO4061@dhcp22.suse.cz> References: <20200721063258.17140-1-mhocko@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 21-07-20 08:33:33, Linus Torvalds wrote: > On Mon, Jul 20, 2020 at 11:33 PM Michal Hocko wrote: > > > > The lockup is in page_unlock in do_read_fault and I suspect that this is > > yet another effect of a very long waitqueue chain which has been > > addresses by 11a19c7b099f ("sched/wait: Introduce wakeup boomark in > > wake_up_page_bit") previously. > > Hmm. > > I do not believe that you can actually get to the point where you have > a million waiters and it takes 20+ seconds to wake everybody up. I was really suprised as well! > More likely, it's actually *caused* by that commit 11a19c7b099f, and > what might be happening is that other CPU's are just adding new > waiters to the list *while* we're waking things up, because somebody > else already got the page lock again. > > Humor me.. Does something like this work instead? It's > whitespace-damaged because of just a cut-and-paste, but it's entirely > untested, and I haven't really thought about any memory ordering > issues, but I think it's ok. > > The logic is that anybody who called wake_up_page_bit() _must_ have > cleared that bit before that. So if we ever see it set again (and > memory ordering doesn't matter), then clearly somebody else got access > to the page bit (whichever it was), and we should not > > (a) waste time waking up people who can't get the bit anyway > > (b) be in a livelock where other CPU's continually add themselves to > the wait queue because somebody else got the bit. > > and it's that (b) case that I think happens for you. > > NOTE! Totally UNTESTED patch follows. I think it's good, but maybe > somebody sees some problem with this approach? I can ask them to give it a try. -- Michal Hocko SUSE Labs