Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752807AbbHZXKa (ORCPT ); Wed, 26 Aug 2015 19:10:30 -0400 Received: from gundega.hpl.hp.com ([192.6.19.190]:36835 "EHLO gundega.hpl.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751769AbbHZXKX (ORCPT ); Wed, 26 Aug 2015 19:10:23 -0400 X-Greylist: delayed 325 seconds by postgrey-1.27 at vger.kernel.org; Wed, 26 Aug 2015 19:10:23 EDT Message-ID: <55DE4366.9080104@hpe.com> Date: Wed, 26 Aug 2015 15:53:26 -0700 From: Hideaki Kimura User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Jason Low , Oleg Nesterov , Andrew Morton CC: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , "Paul E. McKenney" , linux-kernel@vger.kernel.org, Frederic Weisbecker , Linus Torvalds , Steven Rostedt , Rik van Riel , Scott J Norton Subject: Re: [PATCH 0/3] timer: Improve itimers scalability References: <1440559068-29680-1-git-send-email-jason.low2@hp.com> <20150825202710.d960a928.akpm@linux-foundation.org> <1440606804.23728.85.camel@j-VirtualBox> <20150826170851.GA5264@redhat.com> <1440626847.23728.122.camel@j-VirtualBox> In-Reply-To: <1440626847.23728.122.camel@j-VirtualBox> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2235 Lines: 57 Sure, let me elaborate. Executive summary: Yes, enabling a process-wide timer in such a large machine is not wise, but sometimes users/applications cannot avoid it. The issue was observed actually not in a database itself but in a common library it links to; gperftools. The database itself is optimized for many-cores/sockets, so surely it avoids putting a process-wide timer or other unscalable things. It just links to libprofiler for an optional feature to profile performance bottleneck only when the user turns it on. We of course avoid turning the feature on unless while we debug/tune the database. However, libprofiler sets the timer even when the client program doesn't invoke any of its functions: libprofiler does it when the shared library is loaded. We requested the developer of libprofiler to change the behavior, but seems like there is a reason to keep that behavior: https://code.google.com/p/gperftools/issues/detail?id=133 Based on this, I think there are two reasons why we should ameliorate this issue in kernel layer. 1. In the particular case, it's hard to prevent or even detect the issue in user space. We (a team of low-level database and kernel experts) in fact spent huge amount of time to just figure out what's the bottleneck there because nothing measurable happens in user space. I pulled out countless hairs. Also, the user has to de-link the library from the application to prevent the itimer installation. Imagine a case where the software is proprietary. It won't fly. 2. This is just one example. There could be many other such binaries/libraries that do similar things somewhere in a complex software stack. Today we haven't heard of many such cases, but people will start hitting it once 100s~1,000s of cores become common. After applying this patchset, we have observed that the performance hit almost completely went away at least for 240 cores. So, it's quite beneficial in real world. -- Hideaki Kimura -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/