Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1054457imm; Wed, 15 Aug 2018 10:34:50 -0700 (PDT) X-Google-Smtp-Source: AA+uWPzicVaJvImyeEWKFJzB2QDHMmgW1AUyAKQ9xWZtcvPAz9V8QzCeCkBFxtas0RRrMgK6+Lbp X-Received: by 2002:a17:902:530a:: with SMTP id b10-v6mr25734341pli.101.1534354490495; Wed, 15 Aug 2018 10:34:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534354490; cv=none; d=google.com; s=arc-20160816; b=NsdQfXIYzWNrs5yV0ZvmM2ZVh8VzMnYpVfVrVr6L3TqjKmG8IO0Q1WywD7HmMbGBbs OOiRR3GYZ8jhCN5zqWYgsE14pjHsI4Jp1rUU8oo/aeRhAIG1EmFX5AjN8fdMqZSiZhcl IRkjDnAzSojw5nbAMdRVpWBrJsjFDIOjGIDRA2eYuJcVGgMHumEhWqzi4fnrtbQ2tQoL uzolQ8NEkqGpmg2XoilPz3sOEBke7y2Z6cWJB2/0ExzPHvrywFlG8ZLek6Q6AIC7DrQ0 gyHERzeMrzseuFlgH34Dfcgdx4yvf0P7PjuKSjYblGaH78qGUttTi89CCa1I0OqPy/8E ADCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature:arc-authentication-results; bh=DAx22+T6JFU1Rm2t1COBFfWz2OdoveH63Y6Fo26Wjzw=; b=akzw16nKyOQ5KRxmf/VDlCWhzdF7D3HdBzZR9qAEhg694I+sPMg+fyWfFXCIKy1aBC JmjLpUSzSYUESD45Tda2kukPo1Xg80MW1yJq1HL+UW0H7zGqr5DwmZjus9lrP9+tk931 4q5PhamhTaMMQx2YYGedaLCeFfTPM1CgoUvmAgAzLIS4N7lsZgpsxsrXk8wQYqYTt0aT f1bmL+aCNfnsKTDm9qZLFphQ3/cvMrNobMJB5EEbQJV8qYHcuvEiTVzfow7TJ88LPoYD gYQID0m0670SlMtuiiakJbhKpgmMrVv+dmmqfVHJPnmhhBZss496uq6okzDCtvxbu/aG 4vKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=nDR+HpQf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 13-v6si23662023pgt.638.2018.08.15.10.34.34; Wed, 15 Aug 2018 10:34:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=nDR+HpQf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730218AbeHOU0V (ORCPT + 99 others); Wed, 15 Aug 2018 16:26:21 -0400 Received: from mail-ua1-f67.google.com ([209.85.222.67]:43420 "EHLO mail-ua1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729376AbeHOU0V (ORCPT ); Wed, 15 Aug 2018 16:26:21 -0400 Received: by mail-ua1-f67.google.com with SMTP id f4-v6so2037562uao.10 for ; Wed, 15 Aug 2018 10:33:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=DAx22+T6JFU1Rm2t1COBFfWz2OdoveH63Y6Fo26Wjzw=; b=nDR+HpQf39Tij/CvFe8nnCof/hg6C1FdBhm1qqHUv982UXyT21j+TXwEa5F5Yyb8Ng 97/EczV5yqmKp+srotyLpyM4vZm325KzsFn78jgOg8HWAlnyVPVv/CW/ojoko45Dradu y7cBRhLw4byyZA2gmDRCiJj9GVB87mVBjgezF+y8IXVj1eqIOyDPz8Ml70DvLv54x//x jW/Rzy+3fpECGVm6qhssTManhxa+4dBwnVLU+X3qY6BKDFLJK941L7nwZNhNDyN7U0td 06QXNgJA5PEeB30Dxmqb1RPWuuedktk4SRWfbLMJifZTPP1FZlNQVf84wnBASNrOYsbI yBFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=DAx22+T6JFU1Rm2t1COBFfWz2OdoveH63Y6Fo26Wjzw=; b=j4m3M75S83j7HbEkLEUAILisspkwT8ZUG9/tHTD0ndFPGp0LJS8DutdIGblL2ejj5i x26J+9Uor3vyjWKKe42HlAqAYabLb0WULjzvu4mjsq5/nJsuSly4XMEYiKENFoRA4YDf tqUMpEZ6pwXIyNWPs9hbYOQODWnYYmUSwXHbifDirefW/91rJJO651sS+KLMF3PO0rzZ e9q4hOXmQBr3i18XoPHyDyKDoib1t4LehdAv5tUe+QNRreP1TBEDfzlWUcBnSkUsaKqm APzSw7IZnrm09drSeR+dvh9nw/aw/KIRr4wrVbIpSizxPAThu0emLQxhtmMsG5vZI63z j/1A== X-Gm-Message-State: AOUpUlEZ1W7AfNnpYQBUIHwnuIpty6i9rikxFTfLu5cbox9eD0jef9Ar 3144TLDltYHaePx1sE/HcKCKq0Lxkjolg4qI5rn+MAxM X-Received: by 2002:a9f:3ed9:: with SMTP id n25-v6mr17371519uaj.25.1534354395966; Wed, 15 Aug 2018 10:33:15 -0700 (PDT) MIME-Version: 1.0 References: <20180815003620.15678-1-guro@fb.com> <20180815163923.GA28953@cmpxchg.org> <20180815165513.GA26330@castle.DHCP.thefacebook.com> <2393E780-2B97-4BEE-8374-8E9E5249E5AD@amacapital.net> <20180815172557.GC26330@castle.DHCP.thefacebook.com> In-Reply-To: <20180815172557.GC26330@castle.DHCP.thefacebook.com> From: Shakeel Butt Date: Wed, 15 Aug 2018 10:32:41 -0700 Message-ID: Subject: Re: [RFC PATCH 1/2] mm: rework memcg kernel stack accounting To: Roman Gushchin Cc: luto@amacapital.net, Johannes Weiner , Linux MM , LKML , kernel-team@fb.com, Michal Hocko , luto@kernel.org, Konstantin Khlebnikov , Tejun Heo Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 15, 2018 at 10:26 AM Roman Gushchin wrote: > > On Wed, Aug 15, 2018 at 10:12:42AM -0700, Andy Lutomirski wrote: > > > > > > > On Aug 15, 2018, at 9:55 AM, Roman Gushchin wrote: > > > > > >> On Wed, Aug 15, 2018 at 12:39:23PM -0400, Johannes Weiner wrote: > > >>> On Tue, Aug 14, 2018 at 05:36:19PM -0700, Roman Gushchin wrote: > > >>> @@ -224,9 +224,14 @@ static unsigned long *alloc_thread_stack_node(= struct task_struct *tsk, int node) > > >>> return s->addr; > > >>> } > > >>> > > >>> + /* > > >>> + * Allocated stacks are cached and later reused by new threads= , > > >>> + * so memcg accounting is performed manually on assigning/rele= asing > > >>> + * stacks to tasks. Drop __GFP_ACCOUNT. > > >>> + */ > > >>> stack =3D __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN, > > >>> VMALLOC_START, VMALLOC_END, > > >>> - THREADINFO_GFP, > > >>> + THREADINFO_GFP & ~__GFP_ACCOUNT, > > >>> PAGE_KERNEL, > > >>> 0, node, __builtin_return_address(0)); > > >>> > > >>> @@ -246,12 +251,41 @@ static unsigned long *alloc_thread_stack_node= (struct task_struct *tsk, int node) > > >>> #endif > > >>> } > > >>> > > >>> +static void memcg_charge_kernel_stack(struct task_struct *tsk) > > >>> +{ > > >>> +#ifdef CONFIG_VMAP_STACK > > >>> + struct vm_struct *vm =3D task_stack_vm_area(tsk); > > >>> + > > >>> + if (vm) { > > >>> + int i; > > >>> + > > >>> + for (i =3D 0; i < THREAD_SIZE / PAGE_SIZE; i++) > > >>> + memcg_kmem_charge(vm->pages[i], __GFP_NOFAIL, > > >>> + compound_order(vm->pages[i])); > > >>> + > > >>> + /* All stack pages belong to the same memcg. */ > > >>> + mod_memcg_page_state(vm->pages[0], MEMCG_KERNEL_STACK_KB, > > >>> + THREAD_SIZE / 1024); > > >>> + } > > >>> +#endif > > >>> +} > > >> > > >> Before this change, the memory limit can fail the fork, but afterwar= ds > > >> fork() can grow memory consumption unimpeded by the cgroup settings. > > >> > > >> Can we continue to use try_charge() here and fail the fork? > > > > > > We can, but I'm not convinced we should. > > > > > > Kernel stack is relatively small, and it's already allocated at this = point. > > > So IMO exceeding the memcg limit for 1-2 pages isn't worse than > > > adding complexity and handle this case (e.g. uncharge partially > > > charged stack). Do you have an example, when it does matter? > > > > What bounds it to just a few pages? Couldn=E2=80=99t there be lots of = forks in flight that all hit this path? It=E2=80=99s unlikely, and there a= re surely easier DoS vectors, but still. > > Because any following memcg-aware allocation will fail. > There is also the pid cgroup controlled which can be used to limit the nu= mber > of forks. > > Anyway, I'm ok to handle the this case and fail fork, > if you think it does matter. Roman, before adding more changes do benchmark this. Maybe disabling the stack caching for CONFIG_MEMCG is much cleaner. Shakeel