Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp2795651ybn; Thu, 26 Sep 2019 18:05:58 -0700 (PDT) X-Google-Smtp-Source: APXvYqyy5bmcovesb6Eao4fkarXr6POmKnHCUYZ46NxzJvH29SXLO4aPsETpy/ZqRU8GIyp/8Xoy X-Received: by 2002:a50:9438:: with SMTP id p53mr1838789eda.291.1569546358193; Thu, 26 Sep 2019 18:05:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569546358; cv=none; d=google.com; s=arc-20160816; b=pDXRcbfZU+Rl37GobAsSck7HxecsMQdu1xhJyPJm+BlLB2/otvocNS0BCwb1WGfvfj MUc958aSXpkPvs8qJTY5BhwtnUK1TzjwFhu717onacECMkkl8M3vK/DXxWfpzkSBvX/M MNx6xu59E9CmzmlsUPB3k28Hxkm/yps/Dqdpe7VnUa/3moiyD2P2aO/gyDuIJg0V2uQ3 UEPGgewJF67pshAAy/O5+CCWFyr6uJDCF+dkFsP5vrOoB/ex5Q/8kiM2jfaqs9+7/+uk aYTKG4E9sttgPAkqh3836y+UzOrJT1WV6BkgXjEQu/ZlJOxMNXLlN6Gh+N3v3+TLRKh5 0T3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=w6GtUWD65Sb2e3DNABoXgvvmVNvez3XhNPll/iTmngk=; b=tXDdpooLrVwSJMmVhrisGkl6QoMcqlJiX17xKNub9PEQFX6m15XFkTNmPVJ3uYqBZX kKZDGBw4I+JqlydvSp5HovblrNcsUqD6sBZYrvGDV+VO8Vq78TmGexTwq5qDuP0VHtxc +lJJu7ipK2I8DR6qL3vyN/c2b0VEaYEfJAl12t0mvi9UwgOmkijjqNhe4hLHcIyaH6iS Mjg+f5dhz3T9O8HDjLgT+2pYNel+v8f4zCRk0jlsoKmwNF8VRa6qEAkhyMM+2D/cjyE3 qMgfak7hdbKNMnGKjsw8hW4nIb9od2rpI29IZFVgDKmNeXxdMs3b/Jjrr4MTDLCFJP4v F8YA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=mWJcqqxG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n8si1877355eju.311.2019.09.26.18.05.02; Thu, 26 Sep 2019 18:05:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=mWJcqqxG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727549AbfI0Azm (ORCPT + 99 others); Thu, 26 Sep 2019 20:55:42 -0400 Received: from mail-oi1-f193.google.com ([209.85.167.193]:40535 "EHLO mail-oi1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727450AbfI0Azl (ORCPT ); Thu, 26 Sep 2019 20:55:41 -0400 Received: by mail-oi1-f193.google.com with SMTP id k9so3772694oib.7 for ; Thu, 26 Sep 2019 17:55:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=w6GtUWD65Sb2e3DNABoXgvvmVNvez3XhNPll/iTmngk=; b=mWJcqqxG0yvWUhw/T7tKy9RfvCH7jcGWz8miRGuWvLmTwqKJ3ZfLN2t5FDY/5MGWhf ksuRq1AECJHnDrA0xQt+a4UYEbD8UkzPJY+fqn/8NAyINpG17nBfdXvau2HnxCbKLB1Q vf3nugrT7CDycCO9Lf4mLAu/WO/MIupOReMgVMjTF7NEWU6zQNqyxdetxF3YH28a1ZE7 /oE/UleTqIWRhPtcbsdoFfbHxLRkhc4HjmCAKBGLYDpgy3dSsWZK6MjWZWl9GC9MMWmC drBPJKlLe2xE4nQfIjgf1D707tvSN6vOQoL8hN1DcuGOl6hhyjnrpkGzuysSkBASGQAo R6gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=w6GtUWD65Sb2e3DNABoXgvvmVNvez3XhNPll/iTmngk=; b=nsq6MD4BZ77cQdk2K4INlXi5x4w6PTEt2JOSjA77ZDphypr8fZr6E3EZ70G4RE7wKd STaPu9F9CGXsQs/TRwyqvPmHLbILHe167OxylHmoNqpPNSa6ODFU6MA7rv08zDJI3hME BIOLgFMB0fhaNkQovzkvLihvZIvL8+5SRo61jwIY9G7RWKRZx0tRvPuTjAu2JWpHjuGA Zmn5D868kI/c+tQA9fsKGjhrILc8epilgBXUT1P5g86AiUEWYUN89KaNNenx/kdY1+1E LUWXZSrQdX73anivs/Ov+GfnAp/yrdtIfpyEFnbM4gic5xiRVsRLMCxQEU537bewDy7M vacg== X-Gm-Message-State: APjAAAWAQ3ECJCLi7Vc9eBOolBKhc/7g3qqZCuocBNBq3214T5+I8xWH sCD/q8ynKHZCdO8jztsZB/2pBdbgmiIZp0IXHnp4wQ== X-Received: by 2002:aca:cf51:: with SMTP id f78mr5004751oig.8.1569545740466; Thu, 26 Sep 2019 17:55:40 -0700 (PDT) MIME-Version: 1.0 References: <20190919222421.27408-1-almasrymina@google.com> <3c73d2b7-f8d0-16bf-b0f0-86673c3e9ce3@oracle.com> <8f7db4f1-9c16-def5-79dc-d38d6b9d150e@oracle.com> In-Reply-To: <8f7db4f1-9c16-def5-79dc-d38d6b9d150e@oracle.com> From: Mina Almasry Date: Thu, 26 Sep 2019 17:55:29 -0700 Message-ID: Subject: Re: [PATCH v5 0/7] hugetlb_cgroup: Add hugetlb_cgroup reservation limits To: Mike Kravetz , Tejun Heo Cc: David Rientjes , Aneesh Kumar , shuah , Shakeel Butt , Greg Thelen , Andrew Morton , khalid.aziz@oracle.com, open list , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, cgroups@vger.kernel.org, =?UTF-8?Q?Michal_Koutn=C3=BD?= Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 26, 2019 at 2:23 PM Mike Kravetz wrote: > > On 9/26/19 12:28 PM, David Rientjes wrote: > > On Tue, 24 Sep 2019, Mina Almasry wrote: > > > >>> I personally prefer the one counter approach only for the reason that it > >>> exposes less information about hugetlb reservations. I was not around > >>> for the introduction of hugetlb reservations, but I have fixed several > >>> issues having to do with reservations. IMO, reservations should be hidden > >>> from users as much as possible. Others may disagree. > >>> > >>> I really hope that Aneesh will comment. He added the existing hugetlb > >>> cgroup code. I was not involved in that effort, but it looks like there > >>> might have been some thought given to reservations in early versions of > >>> that code. It would be interesting to get his perspective. > >>> > >>> Changes included in patch 4 (disable region_add file_region coalescing) > >>> would be needed in a one counter approach as well, so I do plan to > >>> review those changes. > >> > >> OK, FWIW, the 1 counter approach should be sufficient for us, so I'm > >> not really opposed. David, maybe chime in if you see a problem here? > >> From the perspective of hiding reservations from the user as much as > >> possible, it is an improvement. > >> > >> I'm only wary about changing the behavior of the current and having > >> that regress applications. I'm hoping you and Aneesh can shed light on > >> this. > >> > > > > I think neither Aneesh nor myself are going to be able to provide a > > complete answer on the use of hugetlb cgroup today, anybody could be using > > it without our knowledge and that opens up the possibility that combining > > the limits would adversely affect a real system configuration. > > I agree that nobody can provide complete information on hugetlb cgroup usage > today. My interest was in anything Aneesh could remember about development > of the current cgroup code. It 'appears' that the idea of including > reservations or mmap ranges was considered or at least discussed. But, those > discussions happened more than 7 years old and my searches are not providing > a complete picture. My hope was that Aneesh may remember those discussions. > > > If that is a possibility, I think we need to do some due diligence and try > > to deprecate allocation limits if possible. One of the benefits to > > separate limits is that we can make reasonable steps to deprecating the > > actual allocation limits, if possible: we could add warnings about the > > deprecation of allocation limits and see if anybody complains. > > > > That could take the form of two separate limits or a tunable in the root > > hugetlb cgroup that defines whether the limits are for allocation or > > reservation. > > > > Combining them in the first pass seems to be very risky and could cause > > pain for users that will not detect this during an rc cycle and will > > report the issue only when their distro gets it. Then we are left with no > > alternative other than stable backports and the separation of the limits > > anyway. > > I agree that changing behavior of the existing controller is too risky. > Such a change is likely to break someone. I'm glad we're converging on keeping the existing behavior unchanged. > The more I think about it, the > best way forward will be to retain the existing controller and create a > new controller that satisfies the new use cases. My guess is that a new controller needs to support cgroups-v2, which is fine. But can a new controller also support v1? Or is there a requirement that new controllers support *only* v2? I need whatever solution here to work on v1. Added Tejun to hopefully comment on this. >The question remains as > to what that new controller will be. Does it control reservations only? > Is it a combination of reservations and allocations? If a combined > controller will work for new use cases, that would be my preference. Of > course, I have not prototyped such a controller so there may be issues when > we get into the details. For a reservation only or combined controller, > the region_* changes proposed by Mina would be used. Provided we keep the existing controller untouched, should the new controller track: 1. only reservations, or 2. both reservations and allocations for which no reservations exist (such as the MAP_NORESERVE case)? I like the 'both' approach. Seems to me a counter like that would work automatically regardless of whether the application is allocating hugetlb memory with NORESERVE or not. NORESERVE allocations cannot cut into reserved hugetlb pages, correct? If so, then applications that allocate with NORESERVE will get sigbused when they hit their limit, and applications that allocate without NORESERVE may get an error at mmap time but will always be within their limits while they access the mmap'd memory, correct? So the 'both' counter seems like a one size fits all. I think the only sticking point left is whether an added controller can support both cgroup-v2 and cgroup-v1. If I could get confirmation on that I'll provide a patchset. > -- > Mike Kravetz