Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp2707432ybl; Thu, 29 Aug 2019 11:45:37 -0700 (PDT) X-Google-Smtp-Source: APXvYqyU1vJnfw8q1xCSGWzyrgmJ/K7GzzeSYp2QipNmq/LGKdjGmn7XtRUCfKOezSr8KNO9ku+p X-Received: by 2002:a17:902:4581:: with SMTP id n1mr11748578pld.310.1567104337407; Thu, 29 Aug 2019 11:45:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567104337; cv=none; d=google.com; s=arc-20160816; b=i+MoZ0bSNM5gaHYEheuCV7fFvKDU7itZwYb1a3IaEuOuUJjLWGmPtNPS64jsEvPDzx YU9+cMpHezwB1DaRXgHfpKcKLEzhFGu43Z/CWZtRu7re1MzEWH/nPdEbOXuZCXx8diF6 ZAPXcNPowHPtxax6LzymKHTBcNi4iHKxYiP3pgTzh2RWJ4aT9jcEz0tei2cmTJs7LZoC O2H8RvJBeqRipOLQhRP2+BX3X9fjfgNFT+V+R092KNqo8ewQezhs6eflpj5jiwSdwFRd +Po1E302lPpF1Um01oe8+XmsvGH+Vh+mXjCvlyEXH2BB8+kxfKHrfld17rL5YSULmIp+ nwmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :dkim-signature; bh=QDRi/Xd9XfnkQ5sOxj7EVRPPb6YjOhNvBvKil4zJw/0=; b=EBfyUumi+Jb1WYcFzuiSm0FZh7T4LH/55FiwId96OO9KX54z72E/pgK2XX7MoErVNs d25pR+HpyFhkDiGbfTxzuCci/wBhYQeJUc48dL0apxo/2R8Xmjwozhf1Fdwb4Od3RyWz jDPxylBZ2kjU6elWHGhjIqSNxnvTKvAUozqhM88ayHkwa01tLwiAfNnPhUZhYfYomamY guTHATLgMyrKQW0T05YEErblXfTSNR4+tLiD8N9e9hTRQtjs/AQz5xmWZCA6MHHWH/AJ HixkLi7QvmnwmpruFTIiP1gggNPteoKiJ7nYBuJVgGupPag5ZMQEAqp8BQoC75eZdmq6 LsaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=dl9qohKF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s20si2634439pgi.499.2019.08.29.11.45.22; Thu, 29 Aug 2019 11:45:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=dl9qohKF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728644AbfH2SoH (ORCPT + 99 others); Thu, 29 Aug 2019 14:44:07 -0400 Received: from mail-qk1-f195.google.com ([209.85.222.195]:45607 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728595AbfH2SoF (ORCPT ); Thu, 29 Aug 2019 14:44:05 -0400 Received: by mail-qk1-f195.google.com with SMTP id m2so3852381qki.12 for ; Thu, 29 Aug 2019 11:44:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=message-id:subject:from:to:cc:date:in-reply-to:references :mime-version:content-transfer-encoding; bh=QDRi/Xd9XfnkQ5sOxj7EVRPPb6YjOhNvBvKil4zJw/0=; b=dl9qohKF185MJTvsc/xIx5y2bsPkyUgeQyPr5ffE0hZoRxnas8wTSWNEjE47IMjSQt ClxH/8EVTw9Ka+9M0wwX+9B/tIxd5c9tbC4bJVP6ZwyImFCUqm28A7X+yPLCqQJFBBAV 3DR877YtrSKP77RLHhkN9aaVXjjdSnAuMQixQQV6IvlbzoGtwvQjTLJTjwZMLpdTQDgx ewvCMg70knxrpix/UuQ1dzA8dmHi5pooHb91SW/yA0O3o0XTnVkP1+7DUQVBU/n2yWRZ YvrEv8swmwRvi3EEp2XB0vErOGppjRla558g1r6lB4UaSNmoXh8pnI+WpxL6iTq7jsfA L4Hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=QDRi/Xd9XfnkQ5sOxj7EVRPPb6YjOhNvBvKil4zJw/0=; b=CJlcpzquIv0katJMVe48iu3EiJFELHGJKDJKjXySaFxgjDbdPVFVHwThp5L5QaioB9 ZYLyhv0AYL5MA58EPlAldVMN8T0hwX5t6B7NEjV3vBmx+VEj2BebvwFnrRIBozW7lGqQ vR+zpw4BYJs1xx1i0LePafKadkw5SaFhsT39RjAy41aje+OnYZrpCLAOZj6nhNvA+J9M LQmlre4Fciz0+mk77H9B+UE5WfrabJvJG8q0jqSTDQ/939pmIOgyuGFbf3tKALr7uuW2 8y48XohkNkFkfJD7Lw4xsOcUKeujW01IzVaco8VV8iV4701O6vqNsHGQQ9TwoxiCFA4j 8ang== X-Gm-Message-State: APjAAAXwf3O3wztAO1WRD1fc1fhn/pE3OFpKV7bUVIyidymxSjh0ZXNC zOZeHqBpWmHOHr7lF2hpgp7w/w== X-Received: by 2002:a37:98f:: with SMTP id 137mr11278917qkj.188.1567104244278; Thu, 29 Aug 2019 11:44:04 -0700 (PDT) Received: from dhcp-41-57.bos.redhat.com (nat-pool-bos-t.redhat.com. [66.187.233.206]) by smtp.gmail.com with ESMTPSA id u7sm1494346qkj.113.2019.08.29.11.44.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 29 Aug 2019 11:44:03 -0700 (PDT) Message-ID: <1567104241.5576.30.camel@lca.pw> Subject: Re: [PATCH 00/10] OOM Debug print selection and additional information From: Qian Cai To: Edward Chron Cc: Michal Hocko , Tetsuo Handa , Andrew Morton , Roman Gushchin , Johannes Weiner , David Rientjes , Shakeel Butt , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ivan Delalande Date: Thu, 29 Aug 2019 14:44:01 -0400 In-Reply-To: References: <20190826193638.6638-1-echron@arista.com> <20190827071523.GR7538@dhcp22.suse.cz> <20190828065955.GB7386@dhcp22.suse.cz> <20190829071105.GQ28313@dhcp22.suse.cz> <297cf049-d92e-f13a-1386-403553d86401@i-love.sakura.ne.jp> <20190829115608.GD28313@dhcp22.suse.cz> <1567093344.5576.23.camel@lca.pw> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.6 (3.22.6-10.el7) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2019-08-29 at 09:09 -0700, Edward Chron wrote: > > Feel like you are going in circles to "sell" without any new information. If > > you > > need to deal with OOM that often, it might also worth working with FB on > > oomd. > > > > https://github.com/facebookincubator/oomd > > > > It is well-known that kernel OOM could be slow and painful to deal with, so > > I > > don't buy-in the argument that kernel OOM recover is better/faster than a > > kdump > > reboot. > > > > It is not unusual that when the system is triggering a kernel OOM, it is > > almost > > trashed/dead. Although developers are working hard to improve the recovery > > after > > OOM, there are still many error-paths that are not going to survive which > > would > > leak memories, introduce undefined behaviors, corrupt memory etc. > > But as you have pointed out many people are happy with current OOM processing > which is the report and recovery so for those people a kdump reboot is > overkill. > Making the OOM report at least optionally a bit more informative has value. > Also > making sure it doesn't produce excessive output is desirable. > > I do agree for developers having to have all the system state a kdump > provides that > and as long as you can reproduce the OOM event that works well. But > that is not the > common case as has already been discussed. > > Also, OOM events that are due to kernel bugs could leak memory and over time > and cause a crash, true. But that is not what we typically see. In > fact we've had > customers come back and report issues on systems that have been in continuous > operation for years. No point in crashing their system. Linux if > properly maintained > is thankfully quite stable. But OOMs do happen and root causing them to > prevent > future occurrences is desired. This is not what I meant. After an OOM event happens, many kernel memory allocations could fail. Since very few people are testing those error-paths due to allocation failures, it is considered one of those most buggy areas in the kernel. Developers have mostly been focus on making sure the kernel OOM should not happen in the first place. I still think the time is better spending on improving things like eBPF, oomd and kdump etc to solve your problem, but leave the kernel OOM report code alone.