Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp5347648rwb; Tue, 1 Aug 2023 00:35:01 -0700 (PDT) X-Google-Smtp-Source: APBJJlEyEwNHfOcnsjMmslcKLiLfo5BsCv5QHxabplOczhGy3wa4lBheC+d6UAv6c5lnd395XMXT X-Received: by 2002:a05:6a00:2449:b0:681:415d:ba2c with SMTP id d9-20020a056a00244900b00681415dba2cmr14328961pfj.31.1690875301400; Tue, 01 Aug 2023 00:35:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690875301; cv=none; d=google.com; s=arc-20160816; b=rMeoYXdvcryWXbXUl+2Kbvo1njizDASPpit1EgBL5HwmUeRi1XRx3hVr85LaywFUaT Z4+Wucy8HaxwEmwD530v+4e3IKL+0c/nHdlBcug1Mo3e70bRnqnNPk/fNSKt1CQ0k/or l/WvWgHbQyh7E6xbvIijmfjM2fkyJWX+Hek7kFiDq+Cm5BU5wyRQ8jGUk5FdonMajD/J 6VdJ66v8YFQh36+p3Q1MZ5pM27aOGHFTu6Yeszxfc5xn0p4Xffg2G2q7cOnpNMqAfTS/ HzgrmSFggt+XF4yczFC9Qc0V2t3+mqneWLhKvqJNLX8keKatAScrgkGCdjw4trDDC1Oa 7Cpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=MhlIfhe6t4kTcIvTYJKrWsmkQqO8HfOd2ecUpgycz/E=; fh=SXegpvOTURat+bJpXPy1ZaS18RAMC+BwEitW1nR0pIY=; b=HWK0sF7zCmaUF8U24iftZs7U7ZhqKw64Q8LO55XZLBDhIuz2qFyrZf7Q/BOWLljRtL UUoa87lnks0hJ0+XsN4GWUQUpxIPFf5dZiOs27XqRXprg3mvp+udyM+Qc8kI5w3qKdYq HibnW96qzNbf0u69p52M16QpNpoJ3Ba+zf+zglfjs0sTtIYO2mry4joafHqP0DoKpiE2 xeFdMSh+9LUJ9mVNBiaqBpmT3jzGlu2X4kYYynjrRXHGoTrBmxXSGrQGHDqZHZi4GuwL V3nih5Y7ERNyb/aLPNkh/d+RMJVvSq6/lbwlf94LGCkjrapVC+gxX0YZUrjbNtGuvIYk dglA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=LVUBwf8R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z3-20020a633303000000b00563ea86ad90si8557800pgz.464.2023.08.01.00.34.49; Tue, 01 Aug 2023 00:35:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=LVUBwf8R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231303AbjHAGxg (ORCPT + 99 others); Tue, 1 Aug 2023 02:53:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231481AbjHAGxf (ORCPT ); Tue, 1 Aug 2023 02:53:35 -0400 Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E0BABF for ; Mon, 31 Jul 2023 23:53:10 -0700 (PDT) Received: by mail-pg1-x52b.google.com with SMTP id 41be03b00d2f7-55b1238a013so4033120a12.3 for ; Mon, 31 Jul 2023 23:53:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690872789; x=1691477589; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=MhlIfhe6t4kTcIvTYJKrWsmkQqO8HfOd2ecUpgycz/E=; b=LVUBwf8RVARX2umwNwUWVEf7YhOcuiEYLwXY42jE8cRMn9jVtUBxM9CsA51m5svKv0 aaeUhCMrTtdbEG580lnIn6sz6Pjzl4ivlFdXQ9UcM0cNWIZYnCBaGCMIMTT+nCD/yDcM 8WqZZAoBovneVvUyfbdLc2lSR/swmmx0Mt5cA/9HIbWlxcI2YijOrd9Xi3LUCp3pCpFK Gnz1cDFTLnjoNyMq1tE9VUKCWiw90HsfL7jGSkBx0O5lyJapczvJmJQ3xBRYd2r7BEsj odIIdrl/CWQ2zK6UwBflLE8VtNITPGm0Ahs7eMOtcSIT7UWwRVjAsTvL911aHMFigi2X SKrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690872789; x=1691477589; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MhlIfhe6t4kTcIvTYJKrWsmkQqO8HfOd2ecUpgycz/E=; b=lpjJNKIN3guaXOThngSVt/YYPqjqdL1VGaynu29Y6x7K0XqeGw8Gq4xDDWLiuQUOnw 2mKHtBWrhIzN/ii/HiY4CVdiA9NSE06M9Gpg+R6ueBXMfT1XRJr6slvJSaeBk4zXPlgN keh9RYsA7QtEN6rKALLb0Gayy4uJbibE/0xsyYSeBmUyjjC586V+Y8cvW+n8Ne7YP89u CGy0zu3Rh/E3GraQgUtM5idJv13olQYwZUWQvdiV774bXYalA05h47ysLO5t9kRmaHZx OSEfEkhGaT5LQ4sgrTitd8UUmekSiPMdyzf3KP+/fE6TEfjvIJr60ZMR+usKXIlUHtfr Ojcg== X-Gm-Message-State: ABy/qLYnoG76kAnUrEo3Ag8hEsVBwzgtLo1tLyQUqtOqTeCsQAIzqf8W rjE/qaGB3RYWHtrYh/UXuTUUNw== X-Received: by 2002:a17:902:a510:b0:1bc:2188:ef88 with SMTP id s16-20020a170902a51000b001bc2188ef88mr1723154plq.3.1690872789658; Mon, 31 Jul 2023 23:53:09 -0700 (PDT) Received: from ?IPV6:fdbd:ff1:ce00:11bb:1457:9302:1528:c8f4? ([240e:694:e21:b::2]) by smtp.gmail.com with ESMTPSA id g6-20020a170902740600b001b9ff5aa2e7sm9647662pll.239.2023.07.31.23.53.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 31 Jul 2023 23:53:09 -0700 (PDT) Message-ID: Date: Tue, 1 Aug 2023 14:53:03 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.1 Subject: Re: Re: [RFC PATCH 0/5] mm: Select victim memcg using BPF_OOM_POLICY Content-Language: en-US To: Roman Gushchin , Michal Hocko Cc: Chuyi Zhou , hannes@cmpxchg.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, robin.lu@bytedance.com References: <20230727073632.44983-1-zhouchuyi@bytedance.com> From: Abel Wu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/28/23 12:30 PM, Roman Gushchin wrote: > On Thu, Jul 27, 2023 at 10:15:16AM +0200, Michal Hocko wrote: >> On Thu 27-07-23 15:36:27, Chuyi Zhou wrote: >>> This patchset tries to add a new bpf prog type and use it to select >>> a victim memcg when global OOM is invoked. The mainly motivation is >>> the need to customizable OOM victim selection functionality so that >>> we can protect more important app from OOM killer. >> >> This is rather modest to give an idea how the whole thing is supposed to >> work. I have looked through patches very quickly but there is no overall >> design described anywhere either. >> >> Please could you give us a high level design description and reasoning >> why certain decisions have been made? e.g. why is this limited to the >> global oom sitation, why is the BPF program forced to operate on memcgs >> as entities etc... >> Also it would be very helpful to call out limitations of the BPF >> program, if there are any. > > One thing I realized recently: we don't have to make a victim selection > during the OOM, we [almost always] can do it in advance. I agree. We take precautions against memory shortage on over-committed machines through oomd-like userspace tools, to mitigate possible SLO violations on important services. The kernel OOM-killer in our scenario works as a last resort, since userspace tools are not that reliable. IMHO it would be useful for kernel to provide such flexibility. > > Kernel OOM's must guarantee the forward progress under heavy memory pressure > and it creates a lot of limitations on what can and what can't be done in > these circumstances. > > But in practice most policies except maybe those which aim to catch very fast > memory spikes rely on things which are fairly static: a logical importance of > several workloads in comparison to some other workloads, "age", memory footprint > etc. > > So I wonder if the right path is to create a kernel interface which allows > to define a OOM victim (maybe several victims, also depending on if it's > a global or a memcg oom) and update it periodically from an userspace. Something like [1] proposed by Chuyi? IIUC there is still lack of some triggers to invoke the procedure so we can actually do this in advance. [1] https://lore.kernel.org/lkml/f8f44103-afba-10ee-b14b-a8e60a7f33d8@bytedance.com/ Thanks & Best, Abel