Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2015743imu; Wed, 21 Nov 2018 05:35:10 -0800 (PST) X-Google-Smtp-Source: AFSGD/WkQpiAyNSQfP6Sskd4p9LwMaUDDmbQhKBuipf88BMGVemdMzQHhJt85cwj1GZquqw2/mze X-Received: by 2002:a63:e915:: with SMTP id i21mr5873925pgh.409.1542807310352; Wed, 21 Nov 2018 05:35:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542807310; cv=none; d=google.com; s=arc-20160816; b=AxgSJ1EICWV19hdO/BnOpyJE+E00kkgqVrHVEaUR3CssSPYL/B5+OQwlFNKWvnmIq9 EQWgM9P/70reey5x4BK0mHLbw8ipcNN4AJ0sB1VePsKM4ZVcqoHj8D1bzR5nawRAt62z GTQaMrVdB2X6WqtU7VTnJpbbMcJm1TMgd20LhvwpGWNun08Onet4/kbje6aH10p12kp9 mjTu5uroq4+lkbZiLl50zJbW0nXEzFpKg6RRqK6RDi7uicKBncOoXtei2HYuKzdhnIfw dd3TdZkw10dYwkRTlO/ZLXXhED5NUUZJJNBD59HK7VgsQUHlT7QS1kw7x5D3fT/IFvSy gkHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=Np+UjCx6w59aLDvTcFrTLtT9FS191B+yQn3jTM82hdI=; b=fh8x5/2FBZNoJ/C4U5UbQYx/xKbRSvc5HWxbxw3rx36syuztNY/id7OqoMvw2kWZ9g qbYc4Mox9iGIThHXJZQlm8I8SK00/hOO9M1wQi2QSRbZD4vDqvDBUxoIFlQ3he7peIJb 3zgUhFiXQLvIyIPm2xTi46UA3cZiO1Fax1bF+rB2wZdx/pjzFSiTW8Y2mKvNZnzT7rhw kgfhoy9gkrrxq8C/HOo4NLX1h2XASpKR5Ot8DbSHflMQ8jPkFS6WrC2jJ4puHps0I871 xQ64augPiwNk968FA+eVC3oDHSTsj8IKP9luWev7HJs7Xu19YnIFaAiltXfQDFVK4IQ5 D7uQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x141si254480pgx.266.2018.11.21.05.34.44; Wed, 21 Nov 2018 05:35:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729176AbeKVAAs (ORCPT + 99 others); Wed, 21 Nov 2018 19:00:48 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:44098 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726729AbeKVAAr (ORCPT ); Wed, 21 Nov 2018 19:00:47 -0500 Received: from p4fea46ac.dip0.t-ipconnect.de ([79.234.70.172] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1gPSWL-0001kA-L4; Wed, 21 Nov 2018 14:26:21 +0100 Date: Wed, 21 Nov 2018 14:26:20 +0100 (CET) From: Thomas Gleixner To: Josh Hunt cc: saeedm@mellanox.com, linux-kernel@vger.kernel.org, "Ozen, Gurhan" Subject: Re: vector space exhaustion on 4.14 LTS kernels In-Reply-To: <598457c6-4bea-50f5-efe9-6a2af3405ff5@akamai.com> Message-ID: References: <598457c6-4bea-50f5-efe9-6a2af3405ff5@akamai.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Josh, On Mon, 19 Nov 2018, Josh Hunt wrote: > We have a class of machines that appear to be exhausting the vector space on > cpus 0 and 1 which causes some breakage later on when trying to set the > affinity. The boxes are running the 4.14 LTS kernel. > > [ 39.531385] __assign_irq_vector: irq:512 cpu:128 searched:00,00000001 > vector:00,00000000 continue > [ 39.531386] apic_set_affinity: irq:512 mask:00,00000001 err:-28 > > The affinity values: > > root@172.25.48.208:/proc/irq/512# grep . * > affinity_hint:00,00000001 > effective_affinity:00,00000004 > effective_affinity_list:2 > grep: mlx5_comp0@pci:0000:65:00.1: Is a directory > node:0 > smp_affinity:ff,ffffffff > smp_affinity_list:0-39 > spurious:count 3 > spurious:unhandled 0 > spurious:last_unhandled 0 ms > > I noticed your change, a0c9259dc4e1 "irq/matrix: Spread interrupts on > allocation", and this sounds like what we're hitting. Booting 4.19 does not > have this problem. I haven't booted 4.15 yet, but can do it to confirm the > above commit is what resolves this. Might be, but in 4.15 the while vector allocation got rewritten. One of the reasons was the exhaustion issue. Some of that is caused by massive over allocation by certain device drivers. The new allocator mechanism handles that way better. > Since 4.14 doesn't have the matrix allocator it's not a trivial backport. I > was wondering a) if you agree with my assessment and b) if there's any plans > on resolving this on the 4.14 allocator? If not I can attempt to backport the > idea to 4.14 to spread the interrupts around on allocation. No plans. Good luck with trying to fix that on the 4.14 code. I'd recommend to switch to 4.19 LTS :) Thanks, tglx