Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp669669ybi; Wed, 3 Jul 2019 02:49:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqxKfX0JPrRhjX0rzcnD64Ksl1WUs27dT0TPQx64HrRlB5yxrydh2kCjUQTk0TFPlfJarFc9 X-Received: by 2002:a17:902:9b94:: with SMTP id y20mr41240773plp.260.1562147352246; Wed, 03 Jul 2019 02:49:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562147352; cv=none; d=google.com; s=arc-20160816; b=opY3di+2Nlp421t9On2J+ictj7kLG9i1+OWrpUA4ELPOVRWCEFv720rmc1FPFd3Tfw Apz3ysAh8eAJdDP+7MAv2FS9bvlua/L0vZdNLVhE6y7XOXUBHedCJuGkDnLN2XLHgNje hVkOXD02gwgDdGJsCK3nRFJOs/TdQWbu9hOCvhFhLof+Aqev+Ci15Bo2gPaziF1g+3zl 0huqvwO61Xsd+8vW0nYuJj4saaCeoIHGxTvgm9V36hfLL1bibSV45FpdY17kQvNNjc6M ICOgruvd+aSqcx4aNkQ8kMYSmgzLTunbOJzNj5EGcgc7DIzUM7nDrFU0ayfiHbBW/LrN d2lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Rdn4uPWS3ZShLQ5CaHdyqI5Byh3UZqmARe/7VHccLHo=; b=MlJWkHIv/VagHVtg27brllYda+/LKQ+joK9d0NG1fjxKZuz1RI7j+QmqkE5MIGJ7Q1 nb5Uv46KrnDUuwqJQW1W2EitEVf29mnK9qT6WTS+t7YU7rif/Ktb+3BuUoKIOElZ0RfC ll2Na8aYbidMYzWGdwsSC+2nr2NGhzLQa1CaEEYLGK982NFACqkYkB5WHgO1G5JC//Cd elLe+JW4cofsJAzhDtz3EkOKAlDESO48ui4aJ0FcX3iP0kax0NaIoPpcNANUUgBWn9X0 mGPSp5Hw/ezrR6VAxFR+k6O85Dfz3gZI6i/sW4leAKP244OYn5GDivfTV2O5fL4n4V6F WEaw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f38si1828283plb.99.2019.07.03.02.48.57; Wed, 03 Jul 2019 02:49:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727101AbfGCJrz (ORCPT + 99 others); Wed, 3 Jul 2019 05:47:55 -0400 Received: from mx2.suse.de ([195.135.220.15]:47630 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727056AbfGCJrx (ORCPT ); Wed, 3 Jul 2019 05:47:53 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 7ADFDB15C; Wed, 3 Jul 2019 09:47:51 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 285CE1E433D; Mon, 1 Jul 2019 14:11:19 +0200 (CEST) Date: Mon, 1 Jul 2019 14:11:19 +0200 From: Jan Kara To: Matthew Wilcox Cc: Dan Williams , Seema Pandit , linux-nvdimm , Linux Kernel Mailing List , stable , Robert Barror , linux-fsdevel , Jan Kara Subject: Re: [PATCH] filesystem-dax: Disable PMD support Message-ID: <20190701121119.GE31621@quack2.suse.cz> References: <20190627195948.GB4286@bombadil.infradead.org> <20190629160336.GB1180@bombadil.infradead.org> <20190630152324.GA15900@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190630152324.GA15900@bombadil.infradead.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun 30-06-19 08:23:24, Matthew Wilcox wrote: > On Sun, Jun 30, 2019 at 01:01:04AM -0700, Dan Williams wrote: > > @@ -215,7 +216,7 @@ static wait_queue_head_t > > *dax_entry_waitqueue(struct xa_state *xas, > > * queue to the start of that PMD. This ensures that all offsets in > > * the range covered by the PMD map to the same bit lock. > > */ > > - if (dax_is_pmd_entry(entry)) > > + //if (dax_is_pmd_entry(entry)) > > index &= ~PG_PMD_COLOUR; > > key->xa = xas->xa; > > key->entry_start = index; > > Hah, that's a great naive fix! Thanks for trying that out. > > I think my theory was slightly mistaken, but your fix has the effect of > fixing the actual problem too. > > The xas->xa_index for a PMD is going to be PMD-aligned (ie a multiple of > 512), but xas_find_conflict() does _not_ adjust xa_index (... which I > really should have mentioned in the documentation). So we go to sleep > on the PMD-aligned index instead of the index of the PTE. Your patch > fixes this by using the PMD-aligned index for PTEs too. > > I'm trying to come up with a clean fix for this. Clearly we > shouldn't wait for a PTE entry if we're looking for a PMD entry. > But what should get_unlocked_entry() return if it detects that case? > We could have it return an error code encoded as an internal entry, > like grab_mapping_entry() does. Or we could have it return the _locked_ > PTE entry, and have callers interpret that. > > At least get_unlocked_entry() is static, but it's got quite a few callers. > Trying to discern which ones might ask for a PMD entry is a bit tricky. > So this seems like a large patch which might have bugs. Yeah. So get_unlocked_entry() is used in several cases: 1) Case where we already have entry at given index but it is locked and we need it unlocked so that we can do our thing `(dax_writeback_one(), dax_layout_busy_page()). 2) Case where we want any entry covering given index (in __dax_invalidate_entry()). This is essentially the same as case 1) since we have already looked up the entry (just didn't propagate that information from mm/truncate.c) - we want any unlocked entry covering given index. 3) Cases where we really want entry at given index and we have some entry order constraints (dax_insert_pfn_mkwrite(), grab_mapping_entry()). Honestly I'd make the rule that get_unlocked_entry() returns entry of any order that is covering given index. I agree it may be unnecessarily waiting for PTE entry lock for the case where in case 3) we are really looking only for PMD entry but that seems like a relatively small cost for the simplicity of the interface. BTW, looking into the xarray code, I think I found another difference between the old radix tree code and the new xarray code that could cause issues. In the old radix tree code if we tried to insert PMD entry but there was some PTE entry in the covered range, we'd get EEXIST error back and the DAX fault code relies on this. I don't see how similar behavior is achieved by xas_store()... Honza -- Jan Kara SUSE Labs, CR