Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp2171795pxb; Fri, 25 Mar 2022 12:22:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxNxUsGSTVMS+mYr8jg+uz27LmCKjGwYQF6/RV9myXyNnDT3gm0tkr3OCtv+CFFKBgn/d3Y X-Received: by 2002:a05:6a00:1a88:b0:4fa:9a8c:c05f with SMTP id e8-20020a056a001a8800b004fa9a8cc05fmr11671749pfv.46.1648236147894; Fri, 25 Mar 2022 12:22:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648236147; cv=none; d=google.com; s=arc-20160816; b=UIFwXztnd3MZ2aFm+VtQoN89NLu0siYuK4H4x3ocXAQeYNJgLjz8zkguTEUASfW9zj rhCnW4QeA5af8Omo6ZihTqHwTnrIZvsVCC4qfvGPEtcLDfXjt+/me4S7n+p6riRnRiSj M7uMogcVQ1sYxqA8RyZcKtMXcJoIGLytgPYiiH9gT/v5l7d+lEM90NY4AQ376uXFCWVB mjnOtNKHQ5LXZtWKoJzFYt8ZicdnQHeCFgC4PGrrTskQn+6ntZULBNORQKzIyVkhM4M3 7i9NVB/b1eKyHe08f0P+9WP739A5CLGEx1m+15l+T535BWaR39fPw7KcmjthKt+euUH5 0+tw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :content-id:content-language:accept-language:message-id:date :thread-index:thread-topic:subject:to:from; bh=zhGxh0Ptvwlfy2/ca8TpdWhJFU+W5pBnJUf2srQwM3c=; b=mTH7SMA4Gs5BMd7wBHzKBis3cNPYeoyJt3J9c8EDDI1njQCN/hpjSbC2EBM0DHK8cC uauMqX4uP6nOzsW+Y6kF+wZBcojKq2KQ+A4iO1SmmLqRsoPNZZq7n/2ZBr8S/qGfasB3 CYGJDtFfaNUNyxdXur3a3d5scLKdJXl2PL6jVVBY8z5Tk6DTiWSvZo0Y9tvJbt9b+Nqs mKBMXk8nMq+ZD61vpA8i3yfivTm6CRJ+hkNLe8hZWX+q+xQom/FII4McYDUeQSAQA4Tk HeOsqt2YC7xgpvfuIinnam1Jq4t8beyJ/pk0b63BOJwDbgndrxomQocHpFFExvcUvEAR p8wA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intersystems.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id 66-20020a630345000000b003816043f16bsi3108124pgd.864.2022.03.25.12.22.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Mar 2022 12:22:27 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intersystems.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1197426B59E; Fri, 25 Mar 2022 11:28:22 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349120AbiCXUer convert rfc822-to-8bit (ORCPT + 99 others); Thu, 24 Mar 2022 16:34:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345637AbiCXUeq (ORCPT ); Thu, 24 Mar 2022 16:34:46 -0400 X-Greylist: delayed 1232 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Thu, 24 Mar 2022 13:33:13 PDT Received: from mail2.intersystems.com (mail2.intersystems.com [38.105.105.84]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8270FDF7F for ; Thu, 24 Mar 2022 13:33:11 -0700 (PDT) X-InterSystems: Sent from InterSystems X-InterSystems: Sent from InterSystems X-InterSystems: Sent from InterSystems X-InterSystems: Sent from InterSystems From: Ray Fucillo To: "linux-kernel@vger.kernel.org" Subject: scalability regressions related to hugetlb_fault() changes Thread-Topic: scalability regressions related to hugetlb_fault() changes Thread-Index: AQHYP7uAL1vA/rQijE+hDIc6LhXGuA== Date: Thu, 24 Mar 2022 20:12:35 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.17.254.204] x-c2processedorg: 5d7e5ca7-6395-445f-80da-8568a4fc58e5 Content-Type: text/plain; charset="us-ascii" Content-ID: <48F2A26C80B440459846FEED0F086F05@exchangemail.iscinternal.com> Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In moving to newer versions of the kernel, our customers have experienced dramatic new scalability problems in our database application, InterSystems IRIS. Our research has narrowed this down to new processes that attach to the database's shared memory segment taking very long delays (in some cases ~100ms!) acquiring the i_mmap_lock_read() in hugetlb_fault() as they fault in the huge page for the first time. The addition of this lock in hugetlb_fault() matches the versions where we see this problem. It's not just slowing the new process that incurs the delay, but backing up other processes if the page fault occurs inside a critical section within the database application. Is there something that can be improved here? The read locks in hugetlb_fault() contend with write locks that seem to be taken in very common application code paths: shmat(), process exit, fork() (not vfork()), shmdt(), presumably others. So hugetlb_fault() contending to read turns out to be common. When the system is loaded, there will be many new processes faulting in pages that may blocks the write lock, which in turn blocks more readers in fault behind it, and so on... I don't think there's any support for shared page tables in hugetlb to avoid the faults altogether. Switching to 1GB huge pages instead of 2MB is a good mitigation in reducing the frequency of fault, but not a complete solution. Thanks for considering. Ray