Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp1776669ybe; Wed, 11 Sep 2019 22:27:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqyUqmgVFJsU0lxlzkvxPUzutD+A29HsA82+xHgNgCemuoWgFHl4c5GvngRZPfna7VnKPi/C X-Received: by 2002:a17:907:111b:: with SMTP id qu27mr33224303ejb.10.1568266025595; Wed, 11 Sep 2019 22:27:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568266025; cv=none; d=google.com; s=arc-20160816; b=qP1wIP0jfCnc/MVrM3MwR5dHT+zuU2kLRGX+3RnlqJsJWpLY46Q2aShD7N5XLh/Vd0 srPPpg3kURW40cjFkriquSsTa1bcefB6HYI9jEotprJPGqMnuQ1Yu4JwwPXByl/Ygp0O dWYEIg5RBpI1ysx+DC54IbqIfbOx+nhRWBmo9fEE0ZC62CK6u1KJ7uGv0lYLZzulawyZ 6MuXzfK4ah9kWR1zf5miquxNZ5KKu+9H5I4G3HMr/Dz1HmS9jDXNDT62ml/ytyLQX/Np l8KS1w/KSTX2Um+ZaZjBOc/8RLMaj4b/qOHnnNvBX7KKMgW+oygyJLLUVIBXxvUyPQh1 WUDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:mail-followup-to :message-id:subject:cc:to:from:date; bh=QYmsX6zQj/lGyIflVteFX+4AHQLEgT7NLME3jqYeN40=; b=naFT8HywyhJYgZh8erf/jE2/USOzQQcZtuB/iA/8gksOyOz/EXohYiueuK5hYLTP3L 9hUi0DnVoUBIlaJh32FZhFnGZWybv0fwF3+hi+wWqpzFCw/MM6EWLZBllG0Ne6owqY98 /gxucYyWIPq/ehVltQxoH/HTEly6j2NGVZ6fiAViHwW7aRbuL9qFsGDzb3Swov+LkKDw 2TU0QWWr97oQF6gud1F/Fq4V/zwxBa+ppX1gJicdD9x+yQz46sWesvkP8eKQ+oBOnNX6 olWS/iyVPdfQyJ2h2hnKxRAKs/LjhWmcY6vflV3HjWGt718hVbd1E8GdUMThQkwjl7VT tNRw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ay26si12687301ejb.244.2019.09.11.22.26.41; Wed, 11 Sep 2019 22:27:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727186AbfILEkN (ORCPT + 99 others); Thu, 12 Sep 2019 00:40:13 -0400 Received: from mx2.suse.de ([195.135.220.15]:56478 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725792AbfILEkN (ORCPT ); Thu, 12 Sep 2019 00:40:13 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 84889AB6D; Thu, 12 Sep 2019 04:40:11 +0000 (UTC) Date: Wed, 11 Sep 2019 21:40:02 -0700 From: Davidlohr Bueso To: Matthew Wilcox Cc: Mike Kravetz , Waiman Long , Peter Zijlstra , Ingo Molnar , Will Deacon , Alexander Viro , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 5/5] hugetlbfs: Limit wait time when trying to share huge PMD Message-ID: <20190912044002.xp3c7jbpbmq4dbz6@linux-p48b> Mail-Followup-To: Matthew Wilcox , Mike Kravetz , Waiman Long , Peter Zijlstra , Ingo Molnar , Will Deacon , Alexander Viro , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org References: <20190911150537.19527-1-longman@redhat.com> <20190911150537.19527-6-longman@redhat.com> <20190912034143.GJ29434@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20190912034143.GJ29434@bombadil.infradead.org> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 11 Sep 2019, Matthew Wilcox wrote: >On Wed, Sep 11, 2019 at 08:26:52PM -0700, Mike Kravetz wrote: >> All this got me wondering if we really need to take i_mmap_rwsem in write >> mode here. We are not changing the tree, only traversing it looking for >> a suitable vma. >> >> Unless I am missing something, the hugetlb code only ever takes the semaphore >> in write mode; never read. Could this have been the result of changing the >> tree semaphore to read/write? Instead of analyzing all the code, the easiest >> and safest thing would have been to take all accesses in write mode. > >I was wondering the same thing. It was changed here: > >commit 83cde9e8ba95d180eaefefe834958fbf7008cf39 >Author: Davidlohr Bueso >Date: Fri Dec 12 16:54:21 2014 -0800 > > mm: use new helper functions around the i_mmap_mutex > > Convert all open coded mutex_lock/unlock calls to the > i_mmap_[lock/unlock]_write() helpers. > >and a subsequent patch said: > > This conversion is straightforward. For now, all users take the write > lock. > >There were subsequent patches which changed a few places >c8475d144abb1e62958cc5ec281d2a9e161c1946 >1acf2e040721564d579297646862b8ea3dd4511b >d28eb9c861f41aa2af4cfcc5eeeddff42b13d31e >874bfcaf79e39135cd31e1cfc9265cf5222d1ec3 >3dec0ba0be6a532cac949e02b853021bf6d57dad > >but I don't know why this one wasn't changed. I cannot recall why huge_pmd_share() was not changed along with the other callers that don't modify the interval tree. By looking at the function, I agree that this could be shared, in fact this lock is much less involved than it's anon_vma counterpart, last I checked (perhaps with the exception of take_rmap_locks(). > >(I was also wondering about caching a potentially sharable page table >in the address_space to avoid having to walk the VMA tree at all if that >one happened to be sharable). I also think that the right solution is within the mm instead of adding a new api to rwsem and the extra complexity/overhead to osq _just_ for this case. We've managed to not need timeout extensions in our locking primitives thus far, which is a good thing imo. Thanks, Davidlohr