Received: by 2002:a4a:311b:0:0:0:0:0 with SMTP id k27-v6csp3638033ooa; Mon, 13 Aug 2018 15:28:40 -0700 (PDT) X-Google-Smtp-Source: AA+uWPymatMrc23roJR1st0uNWVygGgaUU3MRzfw7bsecf06Jf8oXmk/iYD9CqmvhCh1OXAX6cmy X-Received: by 2002:a62:3703:: with SMTP id e3-v6mr20794764pfa.117.1534199320135; Mon, 13 Aug 2018 15:28:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534199320; cv=none; d=google.com; s=arc-20160816; b=nSzaTh/dtMF+YcJLaHm/MPz1CsHjHGUF0biZ1FKfuHgBnj8I5qW7KyB/JLnxaRTEtN F5LwGXQlej+gTAP9NvCTHEbYej9/4hlk5n1JsehRssqWTie//LZvtFANlg/YPo0CvIHP ds8Va3f26VC9s6TwCrxBIGlfH27nQjjodoFpIv1N1CJiY7TSMMtAaVkl+OKAvRbdWcZI yx6VitiRbrTio6LeB+tWmTSP2SVnNXtm4mCAuugK56jAVae1DtS2x7wDwhzcIpkXAOKr Y7u7Tgon2iIqG4AFFR7G4LeIozTdf5Rk1nmCJc59TCqC+DiedG0QUp9USPcBzT8v2OM9 qRHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=eRe8tHpKMfD0sR7WLtln0nbLoir1wj6/XRoyYOrGg0w=; b=Kj8hAiVDbpRue/Id3AvfxfoTFiSeDjJahTU9KsfSpm/kJ+HuARAIBXtpWINejROVhY Q0yIcKe4csKFXYQUtV/IfNjFhnzp/yBtthvc/kI9dNooVDjPLXkY3BitJzFz7NqDGO6K hS1xt+DgDnDSHxssl52KIBrGKY6KwDMV6fLBUCGSnWYwNUkgoZLW1v2F3YSBvmbXOxHJ 28aZdYrGBdQTSBu8NcAqfec8Wsx2eJXn3wCVlUVXI9Pr3Tk1X1PdwYwdurJlq/a2KKDt YgV9mFfqUtl3INAgpEOVw8cZMimgYHNKoJUTRtRQlRc/hoKK7CL+VPZsgUUt/o5oBfTw EOcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=IxtefMSD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k14-v6si21420528pfd.23.2018.08.13.15.28.25; Mon, 13 Aug 2018 15:28:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=IxtefMSD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730943AbeHNAcZ (ORCPT + 99 others); Mon, 13 Aug 2018 20:32:25 -0400 Received: from mail-pl0-f68.google.com ([209.85.160.68]:44368 "EHLO mail-pl0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730027AbeHNAcY (ORCPT ); Mon, 13 Aug 2018 20:32:24 -0400 Received: by mail-pl0-f68.google.com with SMTP id ba4-v6so7434302plb.11; Mon, 13 Aug 2018 14:48:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=eRe8tHpKMfD0sR7WLtln0nbLoir1wj6/XRoyYOrGg0w=; b=IxtefMSDRRABKB/+BHoQ5XJS/mZN9w9nyK6MYrYl3S6JJhkwbXM4hkNTS3w40o+rUb QAogHgDJDiihfIZ19J97UWCnzmWj2WldpHsV+gaE2zppbJ3fg37llELGFaq8jSFMW3LO jRuYFZ2VHea9uPrD0bKXaFIgLP2YAR65Pw+jJDPovBMSybS9GSs4yuIND6/3k+UeZAGQ SRwxYodQ4FA2w0NmnOGAwp1rrmU6a5OlfkLyHwdSIJ3iXuf8N6sIQj/GqCqzw1/4wApJ 2qYwCGlQxyU8swcdIs6DDIU9Ju7lVDpAbM/RjzK8uHRaeJ24Kxr/JwLuw9d+w1SpjRWS LbNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=eRe8tHpKMfD0sR7WLtln0nbLoir1wj6/XRoyYOrGg0w=; b=VVuB/0GPzmawXgOADlI+2avkvfcPyDdDpoirHJQxrBhsO5bpLLaPWPbtXPGmwxWXXP hoEvUYHaaaS6jne1fHQ9MbSAMf6F10ssAnQhPFa7iVkkUPnjqOcow7uumvXjvMqTkhwC 2y9wyyuMiCowydfY7qBuoT6Ayo7R9Ob7p48dsc2B7lwTORbUzAdxto0VcAOz+FfbN3Hv wsmx2FOa+y3d89D8/33i7I+TsAG9lEMrqvt/KeGVJAoDKMRiDz9he5PAOzaNsHeFmpQd jfM1DzyRbov+ct8/vvWoWdaZO/+KPqYN7Ndxqloc7J7UJmfiPT7SesCAHRcs8wozm+4/ nMBA== X-Gm-Message-State: AOUpUlHdWuDxeSxyNyM8NAQWBtswqcHv34JkB6pYeM7KA5Oe0zObFSkE xWxugnEaRLg3ug28hrJ9H1j3U792 X-Received: by 2002:a17:902:b7c6:: with SMTP id v6-v6mr17614238plz.49.1534196900390; Mon, 13 Aug 2018 14:48:20 -0700 (PDT) Received: from dsa-mb.local ([2601:282:800:fd80:fc20:f6ce:b02d:5bac]) by smtp.googlemail.com with ESMTPSA id y63-v6sm26141867pgd.94.2018.08.13.14.48.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 Aug 2018 14:48:19 -0700 (PDT) Subject: Re: [PATCH RFC/RFT net-next 00/17] net: Convert neighbor tables to per-namespace To: "Eric W. Biederman" Cc: Cong Wang , David Miller , Linux Kernel Network Developers , nikita.leshchenko@oracle.com, Roopa Prabhu , Stephen Hemminger , Ido Schimmel , Jiri Pirko , Saeed Mahameed , Alexander Aring , linux-wpan@vger.kernel.org, NetFilter , LKML References: <1a3f59a9-0ba5-c83f-16a6-f9550a84f693@gmail.com> <1a27e301-3275-b349-a2f8-afdfdc02f04f@gmail.com> <20180718.125938.2271502580775162784.davem@davemloft.net> <28c30574-391c-b4bd-c337-51d3040d901a@gmail.com> <5021d874-8e99-6eba-f24b-4257c62d4457@gmail.com> <87muufze8w.fsf@xmission.com> <4b03b5f6-87ce-9ff2-7c14-598beebd8fb8@gmail.com> <87zhyfw70m.fsf@xmission.com> <87o9evt9a6.fsf@xmission.com> From: David Ahern Message-ID: Date: Mon, 13 Aug 2018 15:48:17 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <87o9evt9a6.fsf@xmission.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/25/18 1:17 PM, Eric W. Biederman wrote: > David Ahern writes: > >> On 7/25/18 11:38 AM, Eric W. Biederman wrote: >>> >>> Absolutely NOT. Global thresholds are exactly correct given the fact >>> you are running on a single kernel. >>> >>> Memory is not free (Even though we are swimming in enough of it memory >>> rarely matters). One of the few remaining challenges is for containers >>> is finding was to limit resources in such a way that one application >>> does not mess things up for another container during ordinary usage. >>> >>> It looks like the neighbour tables absolutely are that kind of problem, >>> because the artificial limits are too strict. Completely giving up on >>> limits does not seem right approach either. We need to fix the limits >>> we have (perhaps making them go away entirely), not just apply a >>> band-aid. Let's get to the bottom of this and make the system better. >> >> Eric: yes, they all share the global resource of memory and there should >> be limits on how many entries a remote entity can create. >> >> Network namespaces can provide a separation such that one namespace does >> not disrupt networking in another. It is absolutely appropriate to do >> so. Your rigid stance is inconsistent given the basic meaning of a >> network namespace and the parallels to this same problem -- bridges, >> vxlans, and ip fragments. Only neighbor tables are not per-device or per >> namespace; your insistence on global limits is missing the mark and wrong. > > That is not what I said. Let me rephrase and see if you understand. > > The problem appears to be of lots of devices. Fundamentally if you use > lots of network devices today unless you adjust gc_thresh3 you will run > out of neighbour table entries. > > The problem has a bigger scope than what you are looking at. > > If you fix the core problem you won't see the problem in the context > of network namespaces either. > > Default limits should be something that will never be hit unless > something goes crazy. We are hitting them. Therefore by definition > there is a bug in these limits. I disagree that the problem is a global limit. It is trivial for users to increase gc_thresh3. That does not solve the fundamental problem. > > > And yes there is absolutely a place for global limits on things like > inodes, file descriptors etc, that does not care about which part of the > kernel you are in. However hitting those limits in normal operation is > a bug. > > We have ourselves a bug. I agree we have a bug; we disagree on what that bug is. I am just back from vacation and re-read your responses. No where do you acknowledge the fundamental point of this patch set - that adding a new neighbor entry in one namespace can evict an entry in another namespace or worse networking in one namespace can fail due to table overflow because of entries from another. That is a real problem. It is not a matter of increasing the default gc_thresh3 to some number N; it is ensuring that regardless of the value of gc_thresh3 one namespace is not affected by another. You created network namespaces and it provides isolation -- separate tables essentially -- for devices, FIB entries, sockets, etc, but you argue against completing the task with separate neighbor tables which is very strange given the impact (completely broken networking). > > Eric > > p.s. I wrote the definition of network namespaces and it absolutely does > have room for global limits. One of the things Linus has periodically > yelled at me about is that there are not enough of them. >