Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp1015458ybe; Wed, 11 Sep 2019 08:08:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqyHX8tnv6GYFH61AiSf1ynb9Mtgv8lLILiDq+Bx9pCnv1gWKbMZeY7IKc+t7xFakcuSPhhB X-Received: by 2002:a50:9f42:: with SMTP id b60mr37894071edf.192.1568214500947; Wed, 11 Sep 2019 08:08:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568214500; cv=none; d=google.com; s=arc-20160816; b=TuzeZy78wsz/Vt8uF30iffSXmBahBMkImoUWT+/s7rz076VjHrqcjOTElLlahBXx2O ScMwg/paFr/iWXGw2Zib2DYIN5xm1nYuVm9Zd+BdzERwjBwHJINpQAHRal7CblcX4UvO dcOQW6hqMRxA83agKlD53EpRM/B+JhM1F72TQMt5U/hNA+Pk8Yen5/2wWH3u9R4G006Q H3xOs5yF6EdInNQxLMb/zHpmF28mjQ1Mw4CJLtl3ECcF/RD1dnyyP4d5bj5K09y1h6QZ di4hTOyOwwOb0ACJ3uL8+kDpjBc+nHMcqyVQ3JnXofGryIWesrZZGXIoO4lwecWTSI8H tZzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=EVHlbLY65v6c4JVDmnHKNy21jG3UE6tfh2Z5jbDAns8=; b=DNxeGTitnG9jKwBKfg5EFdSlZIaAzd+xYMGIWSepdJ3P+04N2DeoPKeYQOsWhyZMe3 ejHW7vPli9qwaf5aqW42rh8CTLv4Ml7PSGWzS+UC/InHboGB8tZn/QBa0Al7zFUlkdL9 UOkkYVI5gwmwmLmIzOevApfMTQEM6qaJoHG22fi7UPdF72vWe9y1jVH0pzb8J14yLhrG OtCDvyNUsCfEyBXnLnHfMEOI6gBz6SDZcOvp/SdaTtM5PUbqJepk1RsrqFbMR8byDxuv pRMIGJ8G61FqXEHUUYCZwberwHtOMjqBDGVJ2k2kJV6/wgb2TSMSqKHNjwYh0OQnp7ue adOw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t6si11325661ejr.238.2019.09.11.08.07.56; Wed, 11 Sep 2019 08:08:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728363AbfIKPGH (ORCPT + 99 others); Wed, 11 Sep 2019 11:06:07 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59658 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727581AbfIKPGH (ORCPT ); Wed, 11 Sep 2019 11:06:07 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E2CB510576C6; Wed, 11 Sep 2019 15:06:06 +0000 (UTC) Received: from llong.com (ovpn-125-196.rdu2.redhat.com [10.10.125.196]) by smtp.corp.redhat.com (Postfix) with ESMTP id 848AD1FB; Wed, 11 Sep 2019 15:06:01 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Alexander Viro , Mike Kravetz Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Davidlohr Bueso , Waiman Long Subject: [PATCH 0/5] hugetlbfs: Disable PMD sharing for large systems Date: Wed, 11 Sep 2019 16:05:32 +0100 Message-Id: <20190911150537.19527-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx1.redhat.com [10.5.110.64]); Wed, 11 Sep 2019 15:06:07 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A customer with large SMP systems (up to 16 sockets) with application that uses large amount of static hugepages (~500-1500GB) are experiencing random multisecond delays. These delays was caused by the long time it took to scan the VMA interval tree with mmap_sem held. To fix this problem while perserving existing behavior as much as possible, we need to allow timeout in down_write() and disabling PMD sharing when it is taking too long to do so. Since a transaction can involving touching multiple huge pages, timing out for each of the huge page interactions does not completely solve the problem. So a threshold is set to completely disable PMD sharing if too many timeouts happen. The first 4 patches of this 5-patch series adds a new down_write_timedlock() API which accepts a timeout argument and return true is locking is successful or false otherwise. It works more or less than a down_write_trylock() but the calling thread may sleep. The last patch implements the timeout mechanism as described above. With the patched kernel installed, the customer confirmed that the problem was gone. Waiman Long (5): locking/rwsem: Add down_write_timedlock() locking/rwsem: Enable timeout check when spinning on owner locking/osq: Allow early break from OSQ locking/rwsem: Enable timeout check when staying in the OSQ hugetlbfs: Limit wait time when trying to share huge PMD include/linux/fs.h | 7 ++ include/linux/osq_lock.h | 13 +-- include/linux/rwsem.h | 4 +- kernel/locking/lock_events_list.h | 1 + kernel/locking/mutex.c | 2 +- kernel/locking/osq_lock.c | 12 +- kernel/locking/rwsem.c | 183 +++++++++++++++++++++++++----- mm/hugetlb.c | 24 +++- 8 files changed, 201 insertions(+), 45 deletions(-) -- 2.18.1