Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp5921965ybl; Tue, 10 Dec 2019 13:48:41 -0800 (PST) X-Google-Smtp-Source: APXvYqy8x5BLF/Dh4hbqCJ68XWrkUJWjX0MQ7cfhrUhhiSaUMqBWdDogGO0ZeiADu+quFppY8/7l X-Received: by 2002:a05:6808:218:: with SMTP id l24mr132887oie.75.1576014521417; Tue, 10 Dec 2019 13:48:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576014521; cv=none; d=google.com; s=arc-20160816; b=felIPZqU6gAmqzYPngBR9Ym60OAJSqFtQNV8eAN9cIf86hyWSDjKvxyNjVVQ0MVwlW JW4jnuDRB4gX+LIr74C6FSd153yhoMm+qVjJlU0nM6tEEHReyy6/lyD0mtjpSJsFTQrr Q9DU5i1SdgpaqVAC0BgmJGCtNkhKkic4Fv3VTTya+sb/S26AiO0J+C1LebHYolF08VpW 3u6FWpDFcWO5UbWnZEvgJqebAGqQf8YE0KN39bei+B90AoFJ9R6TI3iDZn430EFhtBgs pn5KbTLuAZhjdzCzzhdtK0CkkKntLTocu+Z4eLJSejTdIh2I/ktpxDcB1AjqBRBE3RdE uU3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :mime-version:dkim-signature; bh=4Ja+Cy4lz3d7/9D3E5SJdSwFwUmUfYYIZtw7SRnmxMQ=; b=qex67pFZ1Uh4ltWuLHUWFsnVvQ0uuM3DIA8d41O3i093k9loIGuLBWWiUDytVKD3t9 +QpETDzglu6kCa1naVtCFReCDuVm6zKSt8PcZkwLnZUWohY12WVkwObxQTRoIlB9jLWW LvnOz7NlfnT1QKL/lQGitM877lNoYB4S2rlXXR+xbkpdGAMOKaZcKEl+CGu4SiBsVI9s LA7leKZTZgVqVudTfWoYQRhRpkyhbRnj9/NaduhQogsGLcfZXhg2+C1OEDSHcQFSchux BBBH+1oIeGE9TqNMvM9tZGcq2N3yYJyyE5bSrbdDUJcQC3BZjTe0HYN5kjVN8L6rNy8r QDEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=xSLCrrxP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cloudflare.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u16si2805065otq.92.2019.12.10.13.48.28; Tue, 10 Dec 2019 13:48:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=xSLCrrxP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cloudflare.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727697AbfLJVci (ORCPT + 99 others); Tue, 10 Dec 2019 16:32:38 -0500 Received: from mail-ed1-f53.google.com ([209.85.208.53]:41953 "EHLO mail-ed1-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728836AbfLJVce (ORCPT ); Tue, 10 Dec 2019 16:32:34 -0500 Received: by mail-ed1-f53.google.com with SMTP id c26so17331045eds.8 for ; Tue, 10 Dec 2019 13:32:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=mime-version:from:date:message-id:subject:to:cc; bh=4Ja+Cy4lz3d7/9D3E5SJdSwFwUmUfYYIZtw7SRnmxMQ=; b=xSLCrrxPejnE0kdt17030quLJkQSeBjnB1vQJo4e7EaofNY9vrZMLRjLWjTUAcCSCT yuxr7DHf+2HvLYOySwaOiOuJZYvB0oesOytWvM4DmGl/IhueidOi+nYm0/KXxlf2GwEa eXFXrese8dp5z5H0DwYhE+Ts7EObyeM8Pthks= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=4Ja+Cy4lz3d7/9D3E5SJdSwFwUmUfYYIZtw7SRnmxMQ=; b=DRW3KqHccQe0VCCoxNaE2HprD5CP75KI9hWRczg4RVj9EduH2F34+Euk2aUsb0gvc2 gHo9Z81gVL7gSP3DHHDiyC7OED+L1V7Rgh1rvpA53HBhPtVzBU/dxH6OtWVhliuTyi+T 5OXeJC7vPsf76ZYEhHUUU4CvAZzTwtwRpWD9jUeRs6LaG9vwaW9MsJb4zBaeZp4MXY+L vGuwwDgXYVL7j0ItHaj5Dm6itPeoobAbyEFlvYdIJ5Y0dBbEFWojwH+wk1AuudQ8h+w4 e2TDy2IZmswLANkdzdNP3+z05WU6HRwMWIfNTlWB3e0pdz74G8R5Vy2ql3rJdMqZ4asG S4tQ== X-Gm-Message-State: APjAAAUQb1N/7lCBBnosf6IhzPKiK1a7lsWSESkmIYYfFt39LyQXC1mZ f6PQG/kqVHYfWDHiJ7NUZlYTYA34cWxTouhJ8mRHT/778vGxMw== X-Received: by 2002:a17:906:2e47:: with SMTP id r7mr6151354eji.215.1576013552064; Tue, 10 Dec 2019 13:32:32 -0800 (PST) MIME-Version: 1.0 From: Ivan Babrou Date: Tue, 10 Dec 2019 13:32:21 -0800 Message-ID: Subject: Lock contention around unix_gc_lock To: linux-kernel Cc: "David S. Miller" , hare@suse.com, axboe@kernel.dk, allison@lohutok.net, tglx@linutronix.de, Linux Kernel Network Developers Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, We're seeing very high contention on unix_gc_lock when a bug in an application makes it stop reading incoming messages with inflight unix sockets. In our system we churn through a lot of unix sockets and we have 96 logical CPUs in the system, so spinlock gets very hot. I was able to halve overall system throughput with 1024 inflight unix sockets, which is the default RLIMIT_NOFILE. This doesn't sound too good for isolation, one user should not be able to affect the system as much. One might even consider this as DoS vector. There's a lot of time is spent in _raw_spin_unlock_irqrestore, which is triggered by wait_for_unix_gc, which in turn is unconditionally called from unix_stream_sendmsg: ffffffff9f64f3ea _raw_spin_unlock_irqrestore+0xa ffffffff9eea6ab0 prepare_to_wait_event+0x70 ffffffff9f5a4ac6 wait_for_unix_gc+0x76 ffffffff9f5a182c unix_stream_sendmsg+0x3c ffffffff9f4bb7f9 sock_sendmsg+0x39 * https://elixir.bootlin.com/linux/v4.19.80/source/net/unix/af_unix.c#L1849 Even more time is spent in waiting on spinlock because of call to unix_gc from unix_release_sock, where condition is having any inflight sockets whatsoever: ffffffff9eeb1758 queued_spin_lock_slowpath+0x158 ffffffff9f5a4718 unix_gc+0x38 ffffffff9f5a28f3 unix_release_sock+0x2b3 ffffffff9f5a2929 unix_release+0x19 ffffffff9f4b902d __sock_release+0x3d ffffffff9f4b90a1 sock_close+0x11 * https://elixir.bootlin.com/linux/v4.19.80/source/net/unix/af_unix.c#L586 Should this condition take the number of inflight sockets into account, just like unix_stream_sendmsg does via wait_for_unix_gc? Static number of inflight sockets that trigger a GC from wait_for_unix_gc may also be something that is scaled with system size, rather than be a hardcoded value. I know that our case is a pathological one, but it sounds like scalability of garbage collection can be better, especially on systems with large number of CPUs.