Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4938312yba; Wed, 10 Apr 2019 08:02:39 -0700 (PDT) X-Google-Smtp-Source: APXvYqxj8jWXQHiXb/N3LzhHaztia9uv5N5bT2/g8K2KOuJ9ktbFpd/31Zeg/avL1BpdGg51/xk9 X-Received: by 2002:a17:902:7589:: with SMTP id j9mr18673551pll.287.1554908559299; Wed, 10 Apr 2019 08:02:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554908559; cv=none; d=google.com; s=arc-20160816; b=MiJlLSWLtr8NUZyNIowWTzlkwj9YRpM/8bT3XlNokx26wPfs+Q3eQwJHa7Hde6is+y VyXGzQHrc8sn0mZvzku8IZywulXu4qPV5hzte4lhXtrGDhsUMdtPSWbWmMyzPSvQcvD7 v8uqfNlPx8oHF5zVYtoUugXu3IQ+SOVd9Vb/GsbVMCJLAgE+3XdqfLTm7+FaC3wV6Mcg 4H/09NruEvBTnNTUjLYFKYm7Brc9H4S6lTR4+zIvLht9DB5j5Aah8uBx/8l8tO1UJgAJ onVAirfL9n3nOx88EjpRrbiv2X/1wPtzmAtLlaakapE6H4YR1Ccc2OwMMC9dCm/xTf5Y VYRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=sfcXxyY2pf1Us+EzMt10j1R0H3C85nVMzYP20CgJNes=; b=p95Jve100ZXcKBk5oaSBruwoG1Dl0Zy8N0KOMEBkHfSVJjdKenGzwXND4cLPqcnPGX IH7FpFOBPzqagC1EFo0Ujhxsx4Wx9xM37iaXZIG8JnWAzI8iYi4MoJDlhV24QSHsoH6G 3XB/6anqf/fgYizG4Zwu9mtrLL9MkMU1xf0WlpzUWHw2t3qn1tLRe/g2dmxaMaFx1dex RZ5GWREet4OWV1Zig4mcNFxKhaBjuHL4NtcvI79UN/TnT0FbbNl3tUocB97hYwOhQxpm Jcp2aXK4+UeSoau4uVXAOWdX0c96acztMFDcZdhBpHHXw9Na2IzyHAiMegfEAaAXtYcY 7+ug== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@foreca.com header.s=google header.b=TysosJxl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b9si6241328pla.275.2019.04.10.08.02.21; Wed, 10 Apr 2019 08:02:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@foreca.com header.s=google header.b=TysosJxl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732933AbfDJPBa (ORCPT + 99 others); Wed, 10 Apr 2019 11:01:30 -0400 Received: from mail-yb1-f193.google.com ([209.85.219.193]:36008 "EHLO mail-yb1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732990AbfDJPBa (ORCPT ); Wed, 10 Apr 2019 11:01:30 -0400 Received: by mail-yb1-f193.google.com with SMTP id e76so938062ybc.3 for ; Wed, 10 Apr 2019 08:01:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foreca.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=sfcXxyY2pf1Us+EzMt10j1R0H3C85nVMzYP20CgJNes=; b=TysosJxl3EO86pltnIejs2Dor1YwYMZqoOUkJFP1OYHtQT02PGZZPCiOTzQ7XOA1vv qawyHTH5ZXw68ccdxpwQRI5ILg6Pt/kqUnV2DybfDZDT/oyhap9QiyytjX7k4cQAOC3h /6oCknc67YufXvWPSyPKtWXw0YyOFxc36z9rg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=sfcXxyY2pf1Us+EzMt10j1R0H3C85nVMzYP20CgJNes=; b=dxWwIegKLgO4vUNkQc4/6+gE8KqtY2k5tXjsOwJ6RQQT8RtRaIynTvw9C7VXN8T9Rt CPJtf1Xtcah4AXuT6fOAOd2qzg6O+HFcqnTsaqnGgb099VnhRlRnqz+2dDyIgEpiJjeu KyIOUWRaIculFaNTkiwZi8ADj5WNSjOgHia1Jma+AbiwsKsa+hOJBRW7XAeBF8SMzD6U M8EhyU/oUonZHYfZ4ELfOCFkA5QJyK+jJm9fu5uyRnCE4XQQWSuQ8bJPNQXOwUXCwYaR 1k72zvCDxWAkkqoSMXefqQ8J0dMGoYs8lDb9vVknu5tHBuvd8Dg/kblMuLdWUbj5OuTq me6Q== X-Gm-Message-State: APjAAAVz0KgLONk6jH0r7PZEGsRuEgrTY9IHG/PxdKND/2+NzSny45kO UDubX/nBKds1M9jbu8CTabjjnbndQe/c4BwE7JeEsw== X-Received: by 2002:a25:aaaf:: with SMTP id t44mr1766665ybi.299.1554908488903; Wed, 10 Apr 2019 08:01:28 -0700 (PDT) MIME-Version: 1.0 References: <20190410101947.8603-1-juha-matti.tilli@foreca.com> In-Reply-To: From: Juha-Matti Tilli Date: Wed, 10 Apr 2019 18:01:17 +0300 Message-ID: Subject: Re: [PATCH] net: add big honking pfmemalloc OOM warning To: Eric Dumazet Cc: LKML , netdev , Rafael Aquini , Murphy Zhou , Yongcheng Yang , Jianhong Yin Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 10, 2019 at 5:16 PM Eric Dumazet wrote: > If NFS sessions hang, then there is a bug to eventually root cause and fix. > > Just telling the user : Increase the limit is the same thing than admitting : > > Our limit system or TCP or NFS stacks are broken and unable to > recover, so lets disable the limit system and work around a more > serious bug. > > Maybe the bug is in a NIC driver, please share more details before > adding yet another noisy signal in syslog > > SNMP counters are per netns, and more useful in the modern computing > era, where a host is shared by many different containers. Any idea where the bug might be? It can't be in NFS, because I have observed the issue to be a TCP level issue. NFS would be working just fine if TCP worked, but the underlying TCP connection is not working fine, unless we bump up vm.min_free_kbytes. It could be in ixgbe, because the incoming SKB gets pfmemalloc pages for some reason, and that happens repeatedly for a duration of 5-10 minutes for every single retransmit, until the condition clears. Ping is working just fine at the time the NFS connection is stuck. I think these 63-queue NICs use different queue for ping than they use for the TCP NFS connection. I think there is some code in ixgbe for not reusing pfmemalloc pages, but it seems every packet nevertheless gets a pfmemalloc page in the queue that is used for TCP NFS. Might the cause be that if ixgbe gets the pages in large bunches, it gets multiple pfmemalloc pages at a time and then every packet is dropped until all the pfmemalloc pages run out (not being reused)? It could also be in the default value of vm.min_free_kbytes, but I'm not experienced enough in Linux kernel internals to adjust the complex calculations. Just saying that 90 MB sounds ridiculously low on a 256 GB NUMA machine. Are you of the opinion that Intel as the developer of ixgbe should be informed? Anyway, I posted more details to the mailing lists about a week ago, search for "NFS hang, sk_drops increasing, segs_in not, pfmemalloc suspected" in the mailing lists, or click this direct link: https://lkml.org/lkml/2019/4/3/682 The current situation is that we've been running the production system for 2 weeks with a bumped-up vm.min_free_kbytes, no NFS hangs, whereas before the bump, we had approximately one hang per day, so without the bump, the period of 2 weeks would have approximately 14 NFS hangs. To me, this OOM condition seems to be global, so having it per-netns offers no clear benefit in my opinion. Or is vm.min_free_kbytes per container tunable? BR, Juha-Matti