Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp1798628ybn; Thu, 26 Sep 2019 02:21:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqyBqdRJQ1ewALOBWusfHSxYPWO9KrRvmgzDBvVmYj9tyUkaKeX+jeAsEMycIrKzvs4hbktC X-Received: by 2002:a17:906:8c8:: with SMTP id o8mr2237754eje.56.1569489676372; Thu, 26 Sep 2019 02:21:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569489676; cv=none; d=google.com; s=arc-20160816; b=GC6x8NQaL6PlMIjcAxgajZVp5TQATiefOdjdPMvwKMPYmRUIPZDDa/QShaJsp1122z mxJ3qx+d8FSmGiCi+XL7I2nX3qLAqhs/iV5LDBk5EKkP1c+wqsP5C4uT/wpVN1FuXxxq QkjsE6ytqhb6+4U6XpieP2I4QA2WxotaHnuC2kzYNNLQsOExmLF+FfIs4pXJD5Ruex2p cOJzc7Gjur9zMbLXMvE81iDkpAspUQQF58ZN8PCIm5RsjcV8SW+9mzNWYCWUBOE6SXcV 6o20Esu87//56YWzil4xwqqp4AcCugtTM5Zt1Xon0ZpRw/da1WcPBibNUCRkHZp8WG8Y I5Yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=qY25nh7ft7xVLg9CJpkhKTMkEGk7+Lk26TG1nt2SkhY=; b=QP/Qj8jH/1F5/telUMYJiNTiSttXuVF3PRAnGOw33aYxKYmZCY3jzblTiwAw5eRO0q 0Rq8+AoLYGMIpWNR9cKV9p3uTWSXELVsIHt6pDQ8HKiTL1ClG9hK+ZZS/uzD4BBpOjiH F4QqCHHZvDrK7KxazSC+pXIlAVPFdxkxglzc/Dzh/qCOQBhFLtumKPczBux+CqxrhxNd QGa8NvDFMZXCLmdGpPNMdaqMQchD63hynp9junE09j2tQSe9l7b2EfJdvWfDt4fYzRLI eurjoyWxI8mLbeFCs5IZYBxmjkfFzCT46MELuwfHsP2sijhNI+4/BBuOULOTu78VIUHO sEaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=VoBkfX9U; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y2si674191ejw.308.2019.09.26.02.20.52; Thu, 26 Sep 2019 02:21:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=VoBkfX9U; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2442799AbfIYIgI (ORCPT + 99 others); Wed, 25 Sep 2019 04:36:08 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:47878 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437903AbfIYIgH (ORCPT ); Wed, 25 Sep 2019 04:36:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=qY25nh7ft7xVLg9CJpkhKTMkEGk7+Lk26TG1nt2SkhY=; b=VoBkfX9U18C/zs886+AEon/jz P8rwaJyJUSaktSTe86iBPd0eBgthTknvrTbwTpUBOuWHhmEZqmAITl1A0E7pa7U5ttEI5mrWcs9o4 p4+fDC/V3M2a7wy9dKkPgP+TnuzpeTYzt/qzNSUXjyw6yW/81muGQ5GjgXgcZhJl5NxibaKkmzPui Gzbv1UPwm2+G48V6a/5upaH/pv1sR3cER+i16a7znZ9Xqj2BQPIf7FboFrBLfcaS1vBBQvX0pQf7E jXk2p5sUusUWVcfBJ91JEh92ht0Wz8O9LH/eQ6ciz2xzpX9IXOUs7BnDAB5+pXhClmFrsi0CMhyhv QqT4AlKXw==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD2mF-0006SO-Rx; Wed, 25 Sep 2019 08:36:00 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 18B5D305E1F; Wed, 25 Sep 2019 10:35:11 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id D7818203E4FB5; Wed, 25 Sep 2019 10:35:57 +0200 (CEST) Date: Wed, 25 Sep 2019 10:35:57 +0200 From: Peter Zijlstra To: Dave Chinner Cc: Waiman Long , Ingo Molnar , Will Deacon , Alexander Viro , Mike Kravetz , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Davidlohr Bueso Subject: Re: [PATCH 0/5] hugetlbfs: Disable PMD sharing for large systems Message-ID: <20190925083557.GA4553@hirez.programming.kicks-ass.net> References: <20190911150537.19527-1-longman@redhat.com> <20190913015043.GF27547@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190913015043.GF27547@dread.disaster.area> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 13, 2019 at 11:50:43AM +1000, Dave Chinner wrote: > On Wed, Sep 11, 2019 at 04:05:32PM +0100, Waiman Long wrote: > > A customer with large SMP systems (up to 16 sockets) with application > > that uses large amount of static hugepages (~500-1500GB) are experiencing > > random multisecond delays. These delays was caused by the long time it > > took to scan the VMA interval tree with mmap_sem held. > > > > To fix this problem while perserving existing behavior as much as > > possible, we need to allow timeout in down_write() and disabling PMD > > sharing when it is taking too long to do so. Since a transaction can > > involving touching multiple huge pages, timing out for each of the huge > > page interactions does not completely solve the problem. So a threshold > > is set to completely disable PMD sharing if too many timeouts happen. > > > > The first 4 patches of this 5-patch series adds a new > > down_write_timedlock() API which accepts a timeout argument and return > > true is locking is successful or false otherwise. It works more or less > > than a down_write_trylock() but the calling thread may sleep. > > Just on general principle, this is a non-starter. If a lock is being > held too long, then whatever the lock is protecting needs fixing. > Adding timeouts to locks and sysctls to tune them is not a viable > solution to address latencies caused by algorithm scalability > issues. I'm very much agreeing here. Lock functions with timeouts are a sign of horrific design.