Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp4474926rwb; Mon, 31 Jul 2023 07:27:04 -0700 (PDT) X-Google-Smtp-Source: APBJJlHRX7dD0Zs+Qy+XUtdPjDbvrfOxtfXQDYKzSGb/YLgF79iN0dcB4SkHFM3X+zOuTfT0mcVR X-Received: by 2002:a05:6a20:151:b0:138:60e:9ba with SMTP id 17-20020a056a20015100b00138060e09bamr10152103pzs.29.1690813624136; Mon, 31 Jul 2023 07:27:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690813624; cv=none; d=google.com; s=arc-20160816; b=RGzVW0FMTkbvCPjtk4QsmiNSeN3LSQYQZPXgaWsaDp59DJ/GKVQtyBFfZc48FBj1Rj Xkaasp2+BwjV0OTjjY975tzkYwki0qE962vNZGdGLQ0iRCANXS5E8saI3sJw/h6u6XEH wYSDbkRk0azcz/aZ/vVn+qkP3L8AbyxPqoiH76JKBntD1f2GWq0962f8YvhR1BqR4zJ/ +bm3M+hPo1D6oM7PBNIVf9R5TKNlkCmdNdctTn9sJCCVTgDIXAU23Ju6p9GjUP+k43k4 In1odfJQ0On9WRXvCJvV6IMBZBmlmJtHQlF+p6C2JYc0hoNMX7PhIpREH5X5sIceBMUg nQrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=c7nCp6bUMUXzsX8AVFR76D+rWXz0yIMDTk4KEgqwhiY=; fh=ToSUgSissh5lipA68Ts4J2tW0ZsZhZ5fuAh8FdLbacQ=; b=AvdAqq3oKZ1ZhwrLF2p5Ikka2DGdyTlYp54uWvYx66qnUriOAQfYgystotdORp1LrH TrsJgmdiMWy4iqfXiSfgo8vEK785ZTABvdRH4RLUpyvLb6JsaJEXuTc8e11MJDhnl+C+ W9XLmqhiOEyuLsnMBugkBRCWw3r0+Scb3VKS61HTRhHI6g7hM0lPvxPxUcy8ZFK2go1+ Gq+C9JHOwxAbp4UwnqWgDEZWIStYaZC76zcUz8yiBbaZwjTLs8/xFawVXH3aCyKNjnG4 mNVYOQ8LEMctP0VaJqQFVclTT2zZIJSSJocGcgngvb3VRvz8mIQi6+WA3tEFbzzLaYye KQDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=dnbGBmjd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j71-20020a63804a000000b0051b423d966csi7290811pgd.280.2023.07.31.07.26.51; Mon, 31 Jul 2023 07:27:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=dnbGBmjd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232948AbjGaNYF (ORCPT + 99 others); Mon, 31 Jul 2023 09:24:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229820AbjGaNYD (ORCPT ); Mon, 31 Jul 2023 09:24:03 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 354C21708; Mon, 31 Jul 2023 06:24:00 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9B7002228E; Mon, 31 Jul 2023 13:23:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1690809839; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c7nCp6bUMUXzsX8AVFR76D+rWXz0yIMDTk4KEgqwhiY=; b=dnbGBmjd55Fr7v6XcCJyJfWWM8SfkaBrmopZf+bHJxc5Q+5pqMHMV5kR+2cjYoanViwARE /2Pi6VMrxNjoEiXQCV22+wcjwioVo4JEhLAPb8NsxFbGNUODKsBO1myhC7Bha34tySKXxy XNnWZsu1dNZ6WA3UHy7h+jAVkQz9c/s= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 7C06F133F7; Mon, 31 Jul 2023 13:23:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id mv5KG++1x2TsNgAAMHmgww (envelope-from ); Mon, 31 Jul 2023 13:23:59 +0000 Date: Mon, 31 Jul 2023 15:23:58 +0200 From: Michal Hocko To: Chuyi Zhou Cc: hannes@cmpxchg.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, muchun.song@linux.dev, zhengqi.arch@bytedance.com Subject: Re: [RFC PATCH 0/5] mm: Select victim memcg using BPF_OOM_POLICY Message-ID: References: <20230727073632.44983-1-zhouchuyi@bytedance.com> <7347aad5-f25c-6b76-9db5-9f1be3a9f303@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 31-07-23 14:00:22, Chuyi Zhou wrote: > Hello, Michal > > 在 2023/7/28 01:23, Michal Hocko 写道: [...] > > This sounds like a very specific oom policy and that is fine. But the > > interface shouldn't be bound to any concepts like priorities let alone > > be bound to memcg based selection. Ideally the BPF program should get > > the oom_control as an input and either get a hook to kill process or if > > that is not possible then return an entity to kill (either process or > > set of processes). > > Here are two interfaces I can think of. I was wondering if you could give me > some feedback. > > 1. Add a new hook in select_bad_process(), we can attach it and return a set > of pids or cgroup_ids which are pre-selected by user-defined policy, > suggested by Roman. Then we could use oom_evaluate_task to find a final > victim among them. It's user-friendly and we can offload the OOM policy to > userspace. > > 2. Add a new hook in oom_evaluate_task() and return a point to override the > default oom_badness return-value. The simplest way to use this is to protect > certain processes by setting the minimum score. > > Of course if you have a better idea, please let me know. Hooking into oom_evaluate_task seems the least disruptive to the existing oom killer implementation. I would start by planing with that and see whether useful oom policies could be defined this way. I am not sure what is the best way to communicate user input so that a BPF prgram can consume it though. The interface should be generic enough that it doesn't really pre-define any specific class of policies. Maybe we can add something completely opaque to each memcg/task? Does BPF infrastructure allow anything like that already? -- Michal Hocko SUSE Labs