Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753661AbdHVTPe (ORCPT ); Tue, 22 Aug 2017 15:15:34 -0400 Received: from mail-oi0-f66.google.com ([209.85.218.66]:36167 "EHLO mail-oi0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752871AbdHVTPX (ORCPT ); Tue, 22 Aug 2017 15:15:23 -0400 MIME-Version: 1.0 In-Reply-To: <20170822185624.GN32112@worktop.programming.kicks-ass.net> References: <37D7C6CF3E00A74B8858931C1DB2F077537879BB@SHSMSX103.ccr.corp.intel.com> <20170818144622.oabozle26hasg5yo@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F07753787AE4@SHSMSX103.ccr.corp.intel.com> <20170818185455.qol3st2nynfa47yc@techsingularity.net> <20170821183234.kzennaaw2zt2rbwz@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F07753788B58@SHSMSX103.ccr.corp.intel.com> <37D7C6CF3E00A74B8858931C1DB2F0775378A24A@SHSMSX103.ccr.corp.intel.com> <20170822185624.GN32112@worktop.programming.kicks-ass.net> From: Linus Torvalds Date: Tue, 22 Aug 2017 12:15:20 -0700 X-Google-Sender-Auth: S5AtYwIkF0sL-sGQVakG2Uyii0M Message-ID: Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk To: Peter Zijlstra Cc: "Liang, Kan" , Mel Gorman , Mel Gorman , "Kirill A. Shutemov" , Tim Chen , Ingo Molnar , Andi Kleen , Andrew Morton , Johannes Weiner , Jan Kara , linux-mm , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1602 Lines: 41 On Tue, Aug 22, 2017 at 11:56 AM, Peter Zijlstra wrote: > > Won't we now prematurely terminate the wait when we get a spurious > wakeup? I think there's two answers to that: (a) do we even care? (b) what spurious wakeup? The "do we even care" quesiton is because wait_on_page_bit by definition isn't really serializing. And I'm not even talking about memory ordering, altough that is true too - I'm talking just fundamentally, that by definition when we're not locking, by the time wait_on_page_bit() returns to the caller, it could obviously have changed again. So I think wait_on_page_bit() is by definition not really guaranteeing that the bit really is clear. And I don't think we have really have cases that matter. But if we do - say, 'fsync()' waiting for a page to wait for writeback, where would you get spurious wakeups from? They normally happen either when we have nested waiting (eg a page fault happens while we have other wait queues active), and I'm not seeing that being an issue here. That said, I do think we might want to perhaps make a "careful" vs "just wait a bit" version of this if the patch works out. The patch is primarily for testing this particular case. I actually think it's probably ok in general, but maybe there really is some special case that could have multiple wakeup sources and it needs to see *this* particular one. (We could perhaps handle that case by checking "is the wait-queue empty now" instead, and just get rid of the re-arming, not break out of the loop immediately after the io_schedule()). Linus