Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp629334ybl; Fri, 9 Aug 2019 11:08:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqzJaDzZhov8Ho9wQpDcLX138XOg7inOMg0L1CaTCen3Vb6Fu0BeJUn4qTvorwHSvAvJwnD/ X-Received: by 2002:a63:2264:: with SMTP id t36mr17791519pgm.87.1565374092554; Fri, 09 Aug 2019 11:08:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565374092; cv=none; d=google.com; s=arc-20160816; b=YzNPKrqEHbBNfde9y/fwJ9CDd1upH27zLLVVAR/7wIOgKzmaoCO0W11IIcANDCFDjW Y1ooiPhx7rgW0dVj1/nOKd5kbN6iHsO7RjsyroREoBn2isjphbwjIAn3AEz3xlkbGRkj /NuXcmHX1dd67UGY+e7RNB2RQqn46qvouyAIKaSS+VOZ8+MvedzdFxvrV2qK3O8gA48T Mn2weJ/Oexn1hRkQkTMH9InaPAYuU2v1lHytwbwMP62KfMG31KAUxeVCZHvO+K+JHuvK jrJ67mvjsopMPWJctVuM2eS1HEP+Klfvsh6bZ4j7gaourgT1nI4Cm7s7sWJySLEe9Ss0 NEQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=aXqwFbZXLc0jc6LmbghFnYdfUqr8zle8MxM4PnoEM74=; b=no6oLZyoSCvKpVWHYVYgdS0vKB6sELRcdOfqigyQ8IZwnnil+A6pRmLWi8folMqMMV Ptfui6Sv5s8ZKAjry9me0Q7FOP5zQqUIhi1/iL4y25IsW2srDavApb50tTwUbBVTlh2n +rB6Aa+oYS2ai4ciVwENpZENLruW+iL4xaxGlnhPN5SZnd+/cPzBjDgV1ooHjiaXq2z5 HhbLeTftZ6u2JKSRouE2DLlSSvD/Jzd4DwIVzuUliEYCapP2rrGigaiuYbS9ZL5O9e8v pj8P5W6cQLKqw7JHqXeVup061Ep+nAYkdhULQxcEpgy6QqWMINtzZvtPMKqJ2aZhURIv YBzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Ntope72j; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w18si4808001pjn.74.2019.08.09.11.07.56; Fri, 09 Aug 2019 11:08:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Ntope72j; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437346AbfHISF4 (ORCPT + 99 others); Fri, 9 Aug 2019 14:05:56 -0400 Received: from mail-ot1-f65.google.com ([209.85.210.65]:46157 "EHLO mail-ot1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436767AbfHISF4 (ORCPT ); Fri, 9 Aug 2019 14:05:56 -0400 Received: by mail-ot1-f65.google.com with SMTP id z17so17471917otk.13 for ; Fri, 09 Aug 2019 11:05:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=aXqwFbZXLc0jc6LmbghFnYdfUqr8zle8MxM4PnoEM74=; b=Ntope72jwTxtu3Yysu6SNp8tcmL3H+NFI7czF31eOSqr4bal2SL/RrlLvEAayx2Ppj Es40PbT9e5+Zgr+Ba1AEytbNAAULO/iK0f1/uJN4haBod2IZjQguq4yHjlQTCQIKReRt /lN9A/E3wWfGY6skJCuhbuMvEaHxMHxL9OARpTq4DI8pSoAqPrAAhABRLIBmAX/sUgxv 0nh3pdV5vI1ptGadMwRdWf27a8ufQdcJ2pMvk7HDwgizlapWvxb8XnU8zYSZQj8ePcuw nX0f6it3SQzuzRohHkPFgUjaNJ9ZNvw8PnZj8l9JeRYW+eqQItb5r/MzXgt4Dn7z9X3C fZPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=aXqwFbZXLc0jc6LmbghFnYdfUqr8zle8MxM4PnoEM74=; b=Qxk8mo1IV1smLMhvoHul6Na6dATL7t5QHGsw7nur9/JWkwoRG5n7KQrxfznrtJXmIO KBaPc6whGrOzamqKHSY6cOfqY2TYg5P5P1ZwafOd+pl+L4m5DHdM7Gev56q54WGLWmfB KdOvErSs7j6TMhSroEQ49eM7tjibo1azMHdSnd638VAC1NvGTpTo70qTnf3r/2fC3lKp SAhLC2MesX1rUvxlgTHNeeAeumOJQFVsgJks/W+573tJ/OU5Ap7TI1pjUNZnoZ41kiET qii5xMebx3n34+tXSJTQYxUcbdpI9iW52mAlYWI4lyK1laSvbPZ4cvuhgZ8xepuCbXeE 9ZSA== X-Gm-Message-State: APjAAAVNIVTLEdl/CwvEfMewsRWxEx9JWtFVVz5nWszkz2gySHPQiwYq 5rvALZSKxUSDAxUKGv5WJ1X3DqdF35dH8mGfpewkaQ== X-Received: by 2002:a05:6830:1249:: with SMTP id s9mr19888526otp.33.1565373954812; Fri, 09 Aug 2019 11:05:54 -0700 (PDT) MIME-Version: 1.0 References: <20190808194002.226688-1-almasrymina@google.com> <20190809112738.GB13061@blackbody.suse.cz> In-Reply-To: <20190809112738.GB13061@blackbody.suse.cz> From: Mina Almasry Date: Fri, 9 Aug 2019 11:05:43 -0700 Message-ID: Subject: Re: [RFC PATCH] hugetlbfs: Add hugetlb_cgroup reservation limits To: =?UTF-8?Q?Michal_Koutn=C3=BD?= Cc: mike.kravetz@oracle.com, shuah , David Rientjes , Shakeel Butt , Greg Thelen , akpm@linux-foundation.org, khalid.aziz@oracle.com, open list , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 9, 2019 at 4:27 AM Michal Koutn=C3=BD wrote: > > (+CC cgroups@vger.kernel.org) > > On Thu, Aug 08, 2019 at 12:40:02PM -0700, Mina Almasry wrote: > > We have developers interested in using hugetlb_cgroups, and they have e= xpressed > > dissatisfaction regarding this behavior. > I assume you still want to enforce a limit on a particular group and the > application must be able to handle resource scarcity (but better > notified than SIGBUS). > > > Alternatives considered: > > [...] > (I did not try that but) have you considered: > 3) MAP_POPULATE while you're making the reservation, I have tried this, and the behaviour is not great. Basically if userspace mmaps more memory than its cgroup limit allows with MAP_POPULATE, the kernel will reserve the total amount requested by the userspace, it will fault in up to the cgroup limit, and then it will SIGBUS the task when it tries to access the rest of its 'reserved' memory. So for example: - if /proc/sys/vm/nr_hugepages =3D=3D 10, and - your cgroup limit is 5 pages, and - you mmap(MAP_POPULATE) 7 pages. Then the kernel will reserve 7 pages, and will fault in 5 of those 7 pages, and will SIGBUS you when you try to access the remaining 2 pages. So the problem persists. Folks would still like to know they are crossing the limits on mmap time. > 4) Using multple hugetlbfs mounts with respective limits. > I assume you mean the size=3D option on the hugetlbfs mount. This would only limit hugetlb memory usage via the hugetlbfs mount. Tasks can still allocate hugetlb memory without any mount via mmap(MAP_HUGETLB) and shmget/shmat APIs, and all these calls will deplete the global, shared hugetlb memory pool. > > Caveats: > > 1. This support is implemented for cgroups-v1. I have not tried > > hugetlb_cgroups with cgroups v2, and AFAICT it's not supported yet. > > This is largely because we use cgroups-v1 for now. > Adding something new into v1 without v2 counterpart, is making migration > harder, that's one of the reasons why v1 API is rather frozen now. (I'm > not sure whether current hugetlb controller fits into v2 at all though.) > In my estimation it's maybe fine to make this change in v1 because, as far as I understand, hugetlb_cgroups are a little used feature of the kernel (although we see it getting requested) and hugetlb_cgroups aren't supported in v2 yet, and I don't *think* this change makes it any harder to port hugetlb_cgroups to v2. But, like I said if there is consensus this must not be checked in without hugetlb_cgroups v2 supported is added alongside, I can take a look at that. > Michal