Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp3500901ybd; Fri, 28 Jun 2019 09:39:40 -0700 (PDT) X-Google-Smtp-Source: APXvYqzU00de1O3kp+1eP9OvSCmAmIe2fXV90/s4Od3bAxaN82XcxtGmhjjwklwmmFh8YXICbmlH X-Received: by 2002:a17:902:549:: with SMTP id 67mr12655622plf.86.1561739980418; Fri, 28 Jun 2019 09:39:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561739980; cv=none; d=google.com; s=arc-20160816; b=GUORLthxxpk+NDYzqjd6VReuAzxHGb9Np6QZXxHRLu/tz62gdoKuZbG3N2MPx4DnVd HNzPjmBzbHDdEL+EzTuMalboSd7wjN1PzpcvzbQqwmCShiAYfWQbx2z9/iIjoDNAzpfL +j29qGYtjrSvWFjFJUoX9vhGd9bzGlwwQaoPniz6appph/sUz5w2/oh1WXG1SMxl2yhp t7MHyPcVHMznMUIObfFsSLGfYtGODVmGFsZxEywTTxzTLI8GhJr4DS+3Gud4YYMP/tYR dOy2WhJ5iQg8q67vs+yJyXv98OeGYkvK/FyG+YI4qCDXW0/0fQmzOMSBJPNFK11HxRro houQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=5XLI89+7Lb6wfVp1bMUmxbBSBywMikjF0UqTHc4ZmMY=; b=tqzHBePm5qr/HxyIENHvVGAUFIvyqHyRqnLG8zcB6K8k7OilEj4RmpZudz9bQvo184 iA2ybS4iwplsGolEIrOMfKr5Vv8w1RF5q4Z72RS+IQX9t2LLXcPL1TFws1CJYES7ciDu pFEEiQ4LXE3Uzz31vqZj0qOKALq45AXVJG9RbSgm4w+ASk5vTERWuMqbggSAOmMnYHLt mutxndbB+RBqc2BBA0zmxs98nJ9OOOM+k/kpiimnvxD+4gfV50N3gV8WHimGWS9k0zhK KYHnwemiBLIo/2gTDRrGImUoWfxJTXwhFIhnrSRG+7D7OhzeTcb94g/NILTsd3WI7pSE 1k7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b="uhN/KAyq"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p4si722960pgm.91.2019.06.28.09.39.23; Fri, 28 Jun 2019 09:39:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b="uhN/KAyq"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726818AbfF1QjN (ORCPT + 99 others); Fri, 28 Jun 2019 12:39:13 -0400 Received: from mail-ot1-f66.google.com ([209.85.210.66]:36410 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726605AbfF1QjN (ORCPT ); Fri, 28 Jun 2019 12:39:13 -0400 Received: by mail-ot1-f66.google.com with SMTP id r6so6604817oti.3 for ; Fri, 28 Jun 2019 09:39:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5XLI89+7Lb6wfVp1bMUmxbBSBywMikjF0UqTHc4ZmMY=; b=uhN/KAyqmQJIfXv831E6f5gCDD4M7DPJutn9seJLci/a+sjZWkNPOQkf4lWIlPxwGD wMJOdt7G3K51qZNGI32EaXEHFWXMsveXMVkqEPxlkaTlEw1EBzhFfuapGlvfNcBcrdL8 f/oqvyCxWz01YIuY/ynYX7NCy59SZ0R5gEnlrM241o+azp5zCI0tGPyPwJCui9wr6IgY jtsQL9yrZizohktvGl0vlFQn2LyUDsaLvo9BzEfVLM292cR04kl53Lpy8I5hyHVV6Ru8 hH0ENXRysaq4iaAGqfrO2x1mMegw9G1ixN1TNPBbhj80FdYkvhaMt1tohfYCgxXPvFfq l7WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5XLI89+7Lb6wfVp1bMUmxbBSBywMikjF0UqTHc4ZmMY=; b=CBaGq7ao0iZntnbfE4ZDGUz6f5/3SpEL0uMsVz0CHuO8PYZdV3B6YOP8sycsPne4I0 ibmHPwsUnYPprQx8D6zLqAcopYxeXBNp2Grcy1HlpgCELaxR+3WH8BGYz1i4NCp96XSi fRfZGLJ06zCoOrT66yO7s6XANb4xPsQpEnozhD8y0NJGK8RyUcMv/fmN+JxiHKZrRf2s P7sUGPTTgNBi/rSIguK50j95MceW0/sjZ5WDwghZFRg6jGAkaQ4m1wKoWGrlr3D40n6b e3Gylk3m24Gm57Hs4bUt7+flMEoMQ2TCmtkcTJc5+B0dJJ22SeGx8GbSvj4NPZvBbSv9 s0xA== X-Gm-Message-State: APjAAAXE016XOlYAvEh0yJlPnJNrhLJooV6J+9+FdDXNdA+OC8wHay3Y ULbYhZMtR8/aiPdJqLpukf+dWBIpZakHeSrqwQQPQw== X-Received: by 2002:a9d:7b48:: with SMTP id f8mr8613500oto.207.1561739952508; Fri, 28 Jun 2019 09:39:12 -0700 (PDT) MIME-Version: 1.0 References: <156159454541.2964018.7466991316059381921.stgit@dwillia2-desk3.amr.corp.intel.com> <20190627123415.GA4286@bombadil.infradead.org> <20190627195948.GB4286@bombadil.infradead.org> <20190628163721.GC4286@bombadil.infradead.org> In-Reply-To: <20190628163721.GC4286@bombadil.infradead.org> From: Dan Williams Date: Fri, 28 Jun 2019 09:39:01 -0700 Message-ID: Subject: Re: [PATCH] filesystem-dax: Disable PMD support To: Matthew Wilcox Cc: linux-nvdimm , Jan Kara , stable , Robert Barror , Seema Pandit , linux-fsdevel , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 28, 2019 at 9:37 AM Matthew Wilcox wrote: > > On Thu, Jun 27, 2019 at 07:39:37PM -0700, Dan Williams wrote: > > On Thu, Jun 27, 2019 at 12:59 PM Matthew Wilcox wrote: > > > On Thu, Jun 27, 2019 at 12:09:29PM -0700, Dan Williams wrote: > > > > > This bug feels like we failed to unlock, or unlocked the wrong entry > > > > > and this hunk in the bisected commit looks suspect to me. Why do we > > > > > still need to drop the lock now that the radix_tree_preload() calls > > > > > are gone? > > > > > > > > Nevermind, unmapp_mapping_pages() takes a sleeping lock, but then I > > > > wonder why we don't restart the lookup like the old implementation. > > > > > > If something can remove a locked entry, then that would seem like the > > > real bug. Might be worth inserting a lookup there to make sure that it > > > hasn't happened, I suppose? > > > > Nope, added a check, we do in fact get the same locked entry back > > after dropping the lock. > > Okay, good, glad to have ruled that out. > > > The deadlock revolves around the mmap_sem. One thread holds it for > > read and then gets stuck indefinitely in get_unlocked_entry(). Once > > that happens another rocksdb thread tries to mmap and gets stuck > > trying to take the mmap_sem for write. Then all new readers, including > > ps and top that try to access a remote vma, then get queued behind > > that write. > > > > It could also be the case that we're missing a wake up. > > That was the conclusion I came to; that one thread holding the mmap sem > for read isn't being woken up when it should be. Just need to find it ... > obviously it's something to do with the PMD entries. Can you explain to me one more time, yes I'm slow on the uptake on this, the difference between xas_load() and xas_find_conflict() and why it's ok for dax_lock_page() to use xas_load() while grab_mapping_entry() uses xas_find_conflict()?