Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp4358175pxv; Tue, 20 Jul 2021 01:40:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwkFkwgVlHtYnpcdQJjQHqPQGOIvmdynpQKyVbTwJllQtl2ctjTMSWq8lNi58yCvcvBFcVN X-Received: by 2002:a92:cf05:: with SMTP id c5mr19110192ilo.196.1626770405885; Tue, 20 Jul 2021 01:40:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626770405; cv=none; d=google.com; s=arc-20160816; b=QmF1lgafD+cGNXTG41fqHAvEY1SlZmifYn/dgcqWe+p3wkmwSCDYP94ReLHW1ENt0P Fy2bxR+94Aozs+m1vOiOev9ACc+Aounnvko7GdTdKpPmohPROxypfb/cDuZ/wf+LMhoo Ktkp9VKPsBecebzkLStbwL1vSN3pHfXkGTiA6kYEoJo2IBRLZjUzbjFjBYvrWQgrH74q 8Ri5PeccjdbDl4lu8l3JJ855Fr8hJJPEmMSU/7pR7xFjWvg/07/m0L15POuQC+w0lC16 eNxFwV8HyEnN/rRU1wAshGEED5qg2oQv6PtA+hZ7KFwIQN2oSZG6JZ1/CRZsR99+Gxbk GBOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=GCfwuXuzQlyWhoFqI2bAb5C2O3u0V1H4t9sAk8gyaiU=; b=OFXfNuOc64t7AeXHmlblUPrv2OI3W6lePCP5gTz+Z0cDmwVy+cBgJKyl6pUSjUV26U RBbpviknP+dB1tCIXbINIodtUWjORjP6mOS/hRNkP1yNbkO4hyPQM1eE9PlxZ86lLlKi Fy7C6XWVrCUhHl9yO3YbmPE5PrgYHThi+WEvG17KdNT/eBO/cSgHG8w4a5wSYY5wAKRN Tg5AXrh9KBu+7HBh0Pe3V13M2zw8VGIFLFlj0Uk3oQUIp0huCltNRnGp9a/YXpqpOdOr wZvqKBR+5dhYIMJexYzpmkLiey/o2Oa/amAhOQGuX250MjzTPlI/BGevefXkdenrvucb YdvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=relay header.b=o2x2uM9k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h21si10730252ila.73.2021.07.20.01.39.54; Tue, 20 Jul 2021 01:40:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=relay header.b=o2x2uM9k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234902AbhGTH4v (ORCPT + 99 others); Tue, 20 Jul 2021 03:56:51 -0400 Received: from relay.sw.ru ([185.231.240.75]:47922 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234449AbhGTHzA (ORCPT ); Tue, 20 Jul 2021 03:55:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=Content-Type:MIME-Version:Date:Message-ID:From: Subject; bh=GCfwuXuzQlyWhoFqI2bAb5C2O3u0V1H4t9sAk8gyaiU=; b=o2x2uM9kxU9Ii8mcM ngv33zeb7qweb0P3mK2R8eAhFqaxQt6ZcEaksmqdwoZYzXdctpdlMovEX+oZSjSFpM8uxJ6gLHMkO Mu0Ncl43GqGbeSSpHpKFig1cIERZbIr86yVht+RnKUlNXocFFbYkh9yY1cEeu5YE7PC2U+Mx7Foi8 =; Received: from [10.93.0.56] by relay.sw.ru with esmtp (Exim 4.94.2) (envelope-from ) id 1m5lDk-004YDr-BU; Tue, 20 Jul 2021 11:35:20 +0300 Subject: Re: [PATCH v5 13/16] memcg: enable accounting for signals To: "Eric W. Biederman" Cc: Andrew Morton , cgroups@vger.kernel.org, Michal Hocko , Shakeel Butt , Johannes Weiner , Vladimir Davydov , Roman Gushchin , Jens Axboe , Oleg Nesterov , linux-kernel@vger.kernel.org References: <87k0lmryyp.fsf@disp2133> From: Vasily Averin Message-ID: <56816a9d-c2e5-127d-4d90-5d7d17782c8a@virtuozzo.com> Date: Tue, 20 Jul 2021 11:35:19 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <87k0lmryyp.fsf@disp2133> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/19/21 8:32 PM, Eric W. Biederman wrote: > Vasily Averin writes: > >> When a user send a signal to any another processes it forces the kernel >> to allocate memory for 'struct sigqueue' objects. The number of signals >> is limited by RLIMIT_SIGPENDING resource limit, but even the default >> settings allow each user to consume up to several megabytes of memory. >> Moreover, an untrusted admin inside container can increase the limit or >> create new fake users and force them to sent signals. > > Not any more. Currently the number of sigqueue objects is limited > by the rlimit of the creator of the user namespace of the container. > >> It makes sense to account for these allocations to restrict the host's >> memory consumption from inside the memcg-limited container. > > Does it? Why? The given justification appears to have bit-rotted > since -rc1. Could you please explain what was changed in rc1? From my POV accounting is required to help OOM-killer to select proper target. > I know a lot of these things only really need a limit just to catch a > program that starts malfunctioning. If that is indeed the case > reasonable per-resource limits are probably better than some great big > group limit that can be exhausted with any single resource in the group. > > Is there a reason I am not aware of that where it makes sense to group > all of the resources together and only count the number of bytes > consumed? Any new limits: a) should be set properly depending on huge number of incoming parameters. b) should properly notify about hits c) should be updated properly after b) d) do a)-c) automatically if possible In past OpenVz had own accounting subsystem, user beancounters (UBC). It accounted and limited 20+ resources per-container: numfiles, file locks, signals, netfilter rules, socket buffers and so on. I assume you want to do something similar, so let me share our experience. We had a lot of problems with UBC: - it's quite hard to set up the limit. Why it's good to consume N entities of some resource but it's bad to consume N+1 ones? per-process? per-user? per-thread? per-task? per-namespace? if nested? per-container? per-host? To answer the questions host admin should have additional knowledge and skills. - Ok, we have set all limits. Some application hits it and fails. It's quite hard to understand that application hits the limit, and failed due to this reason. From users point of view, if some application does not work (stable enough) inside container => containers are guilty. - It's quite hard to understand that failed application just want to increase limit X up to N entities. As result both host admins and container users was unhappy. So after years of such fights we decided just to limit accounted memory instead. Anyway, OOM-killer must know who consumed memory to select proper target. Thank you, vasily Averin