Received: by 10.213.65.68 with SMTP id h4csp637131imn; Fri, 30 Mar 2018 12:23:40 -0700 (PDT) X-Google-Smtp-Source: AIpwx49HZUo0iGgKOMqzAo4XvZWuxPPfT4y1cVveFx3D76SMIGxidkw+o4VZ7/cLD2y81i8JEqEg X-Received: by 2002:a17:902:ba8a:: with SMTP id k10-v6mr246254pls.337.1522437820960; Fri, 30 Mar 2018 12:23:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522437820; cv=none; d=google.com; s=arc-20160816; b=cJjCiy8VPSoc03gPU7E7Cshre2uS5SliI5hFDQ4xkPOSZqfFQGcmqLbb5rwPZRJInK 5lv4AClJyDkgObo486tmg/0rfd5rojGeFWfUJtetIGbknYElvjHuTuw8w3ShvuSfeLzn fGHgsXGlULsX639rppzWv9tSwcBpcY4L9Z/Hg7je79Lc752W6QnJ59WeZR5RL9qajBcH nIL++D0/aLBuZE+4fDep37rYoMEzORWiwnuemNXcinLPpYN6ET7Odnl/6bP7RDpatoR/ zQDqWJaoCI70fPQcjYLMa/LOFeuj253o41FoRC8amgc/nP9QkcDToxc1qTL4J3Rf2hvT Okcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=NaThemzyOgE26kztoXvlGieSj1R0IplmYQUlpC0HaSM=; b=AEC7zmA7AAmk/gResOPvujTeBx3ASFnAAc4aY38IAh0KTQ2CX2xfLmhDmZCzEBjGik RhMADmds702V1oekIdOoE8kzkUdj858r5PglnuZuBx1dioScnkLb4F4iM0cRTFJOlana A6QJa6KkinLebWOsWLhhCo30dXVYhmf//AZd+G+yG6jWFFqtxJXgBYE1BjmQQYy0emcr UhmKhQWRpwhUjmgQZw/Rx4umFyqlOBAJHH7Yed1aP7HOFMmx9ntLQrQwkQoD0XA2RoJw lWaNPSk4SqIZnVQiXUduZ4jEyRda99ugGbv9vxjcICJ6+LgrbsitZpOyr8ScZNZhAelF 61YA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k63si4027087pgc.577.2018.03.30.12.23.27; Fri, 30 Mar 2018 12:23:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752570AbeC3TWW (ORCPT + 99 others); Fri, 30 Mar 2018 15:22:22 -0400 Received: from mx2.suse.de ([195.135.220.15]:42988 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752365AbeC3TWU (ORCPT ); Fri, 30 Mar 2018 15:22:20 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id A7A34AD4C; Fri, 30 Mar 2018 19:22:18 +0000 (UTC) Date: Fri, 30 Mar 2018 12:09:51 -0700 From: Davidlohr Bueso To: "Eric W. Biederman" , manfred@colorfullife.com Cc: Linux Containers , linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, khlebnikov@yandex-team.ru, prakash.sangappa@oracle.com, luto@kernel.org, akpm@linux-foundation.org, oleg@redhat.com, serge.hallyn@ubuntu.com, esyr@redhat.com, jannh@google.com, linux-security-module@vger.kernel.org, Pavel Emelyanov , Nagarathnam Muthusamy Subject: Re: [REVIEW][PATCH 11/11] ipc/sem: Fix semctl(..., GETPID, ...) between pid namespaces Message-ID: <20180330190951.nfcdwuzp42bl2lfy@linux-n805> References: <87vadmobdw.fsf_-_@xmission.com> <20180323191614.32489-11-ebiederm@xmission.com> <20180329005209.fnzr3hzvyr4oy3wi@linux-n805> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20180329005209.fnzr3hzvyr4oy3wi@linux-n805> User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 28 Mar 2018, Davidlohr Bueso wrote: >On Fri, 23 Mar 2018, Eric W. Biederman wrote: > >>Today the last process to update a semaphore is remembered and >>reported in the pid namespace of that process. If there are processes >>in any other pid namespace querying that process id with GETPID the >>result will be unusable nonsense as it does not make any >>sense in your own pid namespace. > >Yeah that sounds pretty wrong. > >> >>Due to ipc_update_pid I don't think you will be able to get System V >>ipc semaphores into a troublesome cache line ping-pong. Using struct >>pids from separate process are not a problem because they do not share >>a cache line. Using struct pid from different threads of the same >>process are unlikely to be a problem as the reference count update >>can be avoided. >> >>Further linux futexes are a much better tool for the job of mutual >>exclusion between processes than System V semaphores. So I expect >>programs that are performance limited by their interprocess mutual >>exclusion primitive will be using futexes. > >You would be wrong. There are plenty of real workloads out there >that do not use futexes and are care about performance; in the end >futexes are only good for the uncontended cases, it can also >destroy numa boxes if you consider the global hash table. Experience >as shown me that sysvipc sems are quite still used. > >> >>So while it is possible that enhancing the storage of the last >>rocess of a System V semaphore from an integer to a struct pid >>will cause a performance regression because of the effect >>of frequently updating the pid reference count. I don't expect >>that to happen in practice. > >How's that? Now thanks to ipc_update_pid() for each semop the user >passes, perform_atomic_semop() will do two atomic updates for the >cases where there are multiple processes updating the sem. This is >not uncommon. > >Could you please provide some numbers. I ran this on a 40-core (no ht) Westmere with two benchmarks. The first is Manfred's sysvsem lockunlock[1] program which uses _processes_ to, well, lock and unlock the semaphore. The options are a little unconventional, to keep the "critical region small" and the lock+unlock frequency high I added busy_in=busy_out=10. Similarly, to get the worst case scenario and have everyone update the same semaphore, a single one is used. Here are the results (pretty low stddev from run to run) for doing 100,000 lock+unlock. - 1 proc: * vanilla total execution time: 0.110638 seconds for 100000 loops * dirty total execution time: 0.120144 seconds for 100000 loops - 2 proc: * vanilla total execution time: 0.379756 seconds for 100000 loops * dirty total execution time: 0.477778 seconds for 100000 loops - 4 proc: * vanilla total execution time: 6.749710 seconds for 100000 loops * dirty total execution time: 4.651872 seconds for 100000 loops - 8 proc: * vanilla total execution time: 5.558404 seconds for 100000 loops * dirty total execution time: 7.143329 seconds for 100000 loops - 16 proc: * vanilla total execution time: 9.016398 seconds for 100000 loops * dirty total execution time: 9.412055 seconds for 100000 loops - 32 proc: * vanilla total execution time: 9.694451 seconds for 100000 loops * dirty total execution time: 9.990451 seconds for 100000 loops - 64 proc: * vanilla total execution time: 9.844984 seconds for 100032 loops * dirty total execution time: 10.016464 seconds for 100032 loops Lower task counts show pretty massive performance hits of ~9%, ~25% and ~30% for single, two and four/eight processes. As more are added I guess the overhead tends to disappear as for one you have a lot more locking contention going on. The second workload I ran this patch on was Chris Mason's sem-scalebench[2] program which uses _threads_ for the sysvsem option (this benchmark is more about semaphores as a concept rather than sysvsem specific). Dealing with a single semaphore and increasing thread counts we get: sembench-sem vanill dirt vanilla dirty Hmean sembench-sem-2 286272.00 ( 0.00%) 288232.00 ( 0.68%) Hmean sembench-sem-8 510966.00 ( 0.00%) 494375.00 ( -3.25%) Hmean sembench-sem-12 435753.00 ( 0.00%) 465328.00 ( 6.79%) Hmean sembench-sem-21 448144.00 ( 0.00%) 462091.00 ( 3.11%) Hmean sembench-sem-30 479519.00 ( 0.00%) 471295.00 ( -1.72%) Hmean sembench-sem-48 533270.00 ( 0.00%) 542525.00 ( 1.74%) Hmean sembench-sem-79 510218.00 ( 0.00%) 528392.00 ( 3.56%) Unsurprisingly, the thread case shows no overhead -- and yes, even better at times but still noise). Similarly, when completely abusing the systems and doing 64*NCPUS there is pretty much no difference: vanill dirt vanilla dirty User 1865.99 1819.75 System 35080.97 35396.34 Elapsed 3602.03 3560.50 So at least for a large box this patch hurts the cases where there is low to medium cpu usage (no more than ~8 processes on a 40 core box) in a non trivial way. For more processes it doesn't matter. We can confirm that the case for threads is irrelevant. While I'm not happy about the 30% regression I guess we can live with this. Manfred, any thoughts? Thanks Davidlohr [1] https://github.com/manfred-colorfu/ipcscale/blob/master/sem-lockunlock.c [2] https://github.com/davidlohr/sembench-ng/blob/master/sembench.c