Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp1368902ybi; Wed, 3 Jul 2019 14:30:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqwbNrdQ2Y4WoQ64X+S4iHyhFWC1yXPnzHabdKNyIWeNd9eSJbkr2QOgvcqjkPMgz4tBTtEX X-Received: by 2002:a17:902:a60d:: with SMTP id u13mr37045756plq.144.1562189420707; Wed, 03 Jul 2019 14:30:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562189420; cv=none; d=google.com; s=arc-20160816; b=sEftt5zoq6q4nI0B8MfVFB/D7ZXU249vq+thxqUAVZsUp5HTkIjcCX/qtSy0BHnIVZ p9UDvdueXbKWnzNWisJh3uyvBVW3yEyJBLlc/19DJKSn60/GzKc2fKi4Ucnbw675I2sp wZyPgQeKDYMaFTA+IDlUBc4PVdrZ7epOHHK3s5nk4NBpOs51xkofnqwcNrssgRKEghaj UuDeInnLds9VI2QXkHOGETtLjUVRGUTa1tCCi6NvoR8m1I6FsyHNgBXbVG8eZeQ/+Nqb FEtOqK+8iMw3h0NqxAzYQrbkpdcE965pYW94T1zq7uKEQbMUYBEUOQiPaIV8g/pTPn4n wrwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=8Oa5fyrgRSaS4/rlLU6Dg1I751HN6ZGGg3XuTeNEsyA=; b=ity6i5Z6CDMbtlHkZL2aA8wan6x++xXujy89WDr0Li8GoYTxPp+ZxKY0SqHikXxgjq nA4AsMULCz8ZRJ/dl8WjLPCr09y7Ryetj9uelljGd0pSwWyfzuVKB5eETnHRGfU/vC/H F2p3PWhTVYqCxmd33D8RRVd7eFFeE81KX7urBel98BAsQdXXoE2Fvw7xkkBskhHjLnKn o7WPP9nRr2FUihw4qWiG1jVsO2rgRD+TigWkC5m9qLLT9yY02cUVOxXkmy+xE3c+6R73 vHc8oQyDxTWWG2QZD/cyn6EOMaC3NPFdcNovQXrEleCQ67EGxNAzNlNRacUIQiJ9kYdh 1ptg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=EVDV2i92; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 8si3199065pgk.539.2019.07.03.14.30.05; Wed, 03 Jul 2019 14:30:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=EVDV2i92; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727374AbfGCV2y (ORCPT + 99 others); Wed, 3 Jul 2019 17:28:54 -0400 Received: from mail-ot1-f67.google.com ([209.85.210.67]:42702 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726902AbfGCV2x (ORCPT ); Wed, 3 Jul 2019 17:28:53 -0400 Received: by mail-ot1-f67.google.com with SMTP id l15so3893761otn.9 for ; Wed, 03 Jul 2019 14:28:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8Oa5fyrgRSaS4/rlLU6Dg1I751HN6ZGGg3XuTeNEsyA=; b=EVDV2i92j6Oc8X9cDYFxw3faBgrvRgokearOh6hTInPcfDEF/9Dw3KaF4Zq6F/M1Po 7PhktLMJwFP6nTQp9RrenWueYOGk2EVUa/0Qx0+hUltXSvdhvIuEHI+qKgYeF3B1eFlS 5Y3swajWUyLDnwgyIA3axCIg+XMQUJMGQtIBwCbx1V1d//Itvxh9xcx753a7zTQU7G7q 9Zf9F4ew3jdtrywY3ZSbzOZTGZe1X8L9mb5LB87Kgaj4vPdRVLpEenwZhPzX+x7S/pLI SKnQAFXjTLXqteyTAJ1slhEtrupPHQDbOkd7sNBcRXPLRfHg2/4qMUGQWkb8jOyatu6m VoNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8Oa5fyrgRSaS4/rlLU6Dg1I751HN6ZGGg3XuTeNEsyA=; b=pScT8ecWU6krS4ui6HHBG+jQg18ymGven4ciPUOs2OcjkclA+s+TZ6Mo/VkeprncEm MTBELp53xSCGQwhEXuAUxHgfdVEELhxZjaqhxWvWxxok6YKne9yCILCpVPyJ/MNBc0NB a4LpVxW8DoFBYyw6eQfnAzrW3M4chWETAemIgm0ZcYT5FSLoi77LKSQjapb+fF+SKSVj /nrf9XdmWsSj8kxwQDw7ZerIel6o0rkdAemrcm4Sgfod22WFyzzYWQBMoS5dcPUyCKHI NmRwo6B0k5ejRM0QDRyS1k4U/xPPvLdI2qQgbuxtYoOTW2UpVjqnHCrqtLlcZ3jkrysu wweA== X-Gm-Message-State: APjAAAUNPn/q1kpMZZeGaC/hBcAa9E2+H5JyfAA1yX3q3KcoKZYnWEUo ux6Q8SDaEF7GO04fR/DmkE9/Qfm5UXSFosl7yNUyiw== X-Received: by 2002:a9d:7248:: with SMTP id a8mr32385007otk.363.1562189332829; Wed, 03 Jul 2019 14:28:52 -0700 (PDT) MIME-Version: 1.0 References: <156213869409.3910140.7715747316991468148.stgit@dwillia2-desk3.amr.corp.intel.com> <20190703121743.GH1729@bombadil.infradead.org> <20190703195302.GJ1729@bombadil.infradead.org> In-Reply-To: <20190703195302.GJ1729@bombadil.infradead.org> From: Dan Williams Date: Wed, 3 Jul 2019 14:28:41 -0700 Message-ID: Subject: Re: [PATCH] dax: Fix missed PMD wakeups To: Matthew Wilcox Cc: linux-fsdevel , Jan Kara , Boaz Harrosh , stable , Robert Barror , Seema Pandit , linux-nvdimm , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 3, 2019 at 12:53 PM Matthew Wilcox wrote: > > On Wed, Jul 03, 2019 at 10:01:37AM -0700, Dan Williams wrote: > > On Wed, Jul 3, 2019 at 5:17 AM Matthew Wilcox wrote: > > > > > > On Wed, Jul 03, 2019 at 12:24:54AM -0700, Dan Williams wrote: > > > > This fix may increase waitqueue contention, but a fix for that is saved > > > > for a larger rework. In the meantime this fix is suitable for -stable > > > > backports. > > > > > > I think this is too big for what it is; just the two-line patch to stop > > > incorporating the low bits of the PTE would be more appropriate. > > > > Sufficient, yes, "appropriate", not so sure. All those comments about > > pmd entry size are stale after this change. > > But then they'll have to be put back in again. This seems to be working > for me, although I doubt I'm actually hitting the edge case that rocksdb > hits: Seems to be holding up under testing here, a couple comments... > > diff --git a/fs/dax.c b/fs/dax.c > index 2e48c7ebb973..e77bd6aef10c 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -198,6 +198,10 @@ static void dax_wake_entry(struct xa_state *xas, void *entry, bool wake_all) > * if it did. > * > * Must be called with the i_pages lock held. > + * > + * If the xa_state refers to a larger entry, then it may return a locked > + * smaller entry (eg a PTE entry) without waiting for the smaller entry > + * to be unlocked. > */ > static void *get_unlocked_entry(struct xa_state *xas) > { > @@ -211,7 +215,8 @@ static void *get_unlocked_entry(struct xa_state *xas) > for (;;) { > entry = xas_find_conflict(xas); > if (!entry || WARN_ON_ONCE(!xa_is_value(entry)) || > - !dax_is_locked(entry)) > + !dax_is_locked(entry) || > + dax_entry_order(entry) < xas_get_order(xas)) Doesn't this potentially allow a locked entry to be returned for a caller that expects all value entries are unlocked? > return entry; > > wq = dax_entry_waitqueue(xas, entry, &ewait.key); > @@ -253,8 +258,12 @@ static void wait_entry_unlocked(struct xa_state *xas, void *entry) > > static void put_unlocked_entry(struct xa_state *xas, void *entry) > { > - /* If we were the only waiter woken, wake the next one */ > - if (entry) > + /* > + * If we were the only waiter woken, wake the next one. > + * Do not wake anybody if the entry is locked; that indicates > + * we weren't woken. > + */ > + if (entry && !dax_is_locked(entry)) > dax_wake_entry(xas, entry, false); > } > > diff --git a/include/linux/xarray.h b/include/linux/xarray.h > index 052e06ff4c36..b17289d92af4 100644 > --- a/include/linux/xarray.h > +++ b/include/linux/xarray.h > @@ -1529,6 +1529,27 @@ static inline void xas_set_order(struct xa_state *xas, unsigned long index, > #endif > } > > +/** > + * xas_get_order() - Get the order of the entry being operated on. > + * @xas: XArray operation state. > + * > + * Return: The order of the entry. > + */ > +static inline unsigned int xas_get_order(const struct xa_state *xas) > +{ > + unsigned int order = xas->xa_shift; > + > +#ifdef CONFIG_XARRAY_MULTI > + unsigned int sibs = xas->xa_sibs; > + > + while (sibs) { > + order++; > + sibs /= 2; > + } Use ilog2() here?