Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753899AbcDALLY (ORCPT ); Fri, 1 Apr 2016 07:11:24 -0400 Received: from mail-am1on0111.outbound.protection.outlook.com ([157.56.112.111]:14752 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751513AbcDALLW (ORCPT ); Fri, 1 Apr 2016 07:11:22 -0400 Authentication-Results: infradead.org; dkim=none (message not signed) header.d=none;infradead.org; dmarc=none action=none header.from=virtuozzo.com; Date: Fri, 1 Apr 2016 13:55:40 +0300 From: Vladimir Davydov To: Peter Zijlstra CC: Andrew Morton , Christoph Lameter , Joonsoo Kim , Pekka Enberg , David Rientjes , Johannes Weiner , Michal Hocko , , Subject: Re: [PATCH -mm v2 3/3] slub: make dead caches discard free slabs immediately Message-ID: <20160401105539.GA6610@esperanza> References: <6eecfafdc6c3dcbb98d2176cdebcb65abbc180b4.1422461573.git.vdavydov@parallels.com> <20160401090441.GD12845@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20160401090441.GD12845@twins.programming.kicks-ass.net> X-Originating-IP: [195.214.232.10] X-ClientProxiedBy: DB5PR02CA0008.eurprd02.prod.outlook.com (10.161.237.18) To HE1PR08MB0585.eurprd08.prod.outlook.com (10.163.178.139) X-MS-Office365-Filtering-Correlation-Id: 5feeadba-5a16-4fc9-1d1b-08d35a1c2eb2 X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB0585;2:EHuQVT+Veu5HkUJiNdYf13Rl9OGYgeb504YVDB0QVSAWEmB+1scfqp+hRCHIE9gFFfV26vOv3gScfM2Fdonlh8v+KklWH+v7cOxI0llxktov0D3a9FiLOF0XsU6rZdMjWqdOzrewx+3BdUeDRZGI4zc2fpAzq+KyXjjJzDRGEdup7J5YWfpPAEMJXnVxmpAw;3:T8ZTd7NGJkUO2gJQEwSRjCHtQ/49AIXpWOr2PoIZqs/aL0fTK2pwlwpTjkanHXWgd7Oc2eCtczfaZfHqef9KYDOGq7hvuLVJcBt6dWXjqOqOfl18i84IF7f80QcVGpbP;25:vQjpmY6tQdhD75ZHu12/yeTZDlksTn6P6jClvTUMId3AaELCKQ6qGXybHryWJDxr+bIfqzRq34aGSkfi68HoS/lnnhxR7VU35wS1AVvk4hVio25fTKObDJmKCUvTE/RETEK8K/hBLPSo8Ze0UFkG3q11POpdslqMV7Xd4AvoHxvJtMUVqk+qeKAnZMleHVxQXGsLx+/oIkZ0/Vw/zXim7UDnhMskR3lxuRVjZhXTGvbvO2KRWvoG0ephI0FJFhqKSDKYnUCVXg1NTc0Hxa8YkDeZ9bLKn42nAAhh/dbqLO/MEHgMKnkHdS6q/hiUgJjsKjAndpTgS4/nSreUn1qDuEjl0XnzNHfeF9AsxgHuhLA= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:HE1PR08MB0585; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040046)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6041046)(6042046)(6043046);SRVR:HE1PR08MB0585;BCL:0;PCL:0;RULEID:;SRVR:HE1PR08MB0585; X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB0585;4:FOEZIVcLxeMOXTfXC7bHPAs9w91wXpgHE3zSWFSY0/q7rvVJHJTA/LNgVbhLCoVwao/oKyo1wkl8U3VdATJ0uKmDFfcUzIninw40YtDykxDCG4zWz3EiN3HZVtYLeVMDMzWwfHqyk4UhH3z4K+aAPW7wtcZvPu1rC1xjSHq+p0vcw6eABQ56TsLWJVglc7fxh6WWq7PNVzpyAICvmrOdELXjiTMUG999wUKMZgjM8ZHwLDhbYd2P7EGPX6EI5YOvMFKYKB6sbDQs2N2pxgN5I3J1UUchP8K3+IZGZMuePq9U1XGADFWyC4Z1O0OysSLj67aEZqWsfZv1pOsmnI/MWzpazd75Hk8L4OvUZ9TGDMtBe4pxWRE1e5+SCD5LB0cafSc4MSqAB3ZdSIxO4NvmqOBNVoAmTRH5dbEzTby/i4AHMMqWIMoocAjh0LrGAVlx X-Forefront-PRVS: 0899B47777 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(164054003)(24454002)(2950100001)(33716001)(189998001)(50466002)(86362001)(5004730100002)(47776003)(66066001)(5008740100001)(77096005)(76176999)(54356999)(50986999)(33656002)(3846002)(586003)(4326007)(81166005)(1096002)(80792005)(97756001)(110136002)(23726003)(6116002)(46406003)(92566002)(42186005)(1076002)(2906002)(14773001);DIR:OUT;SFP:1102;SCL:1;SRVR:HE1PR08MB0585;H:esperanza;FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB0585;23:DChnYy7cunHmuSRBPi+VxJyZi04vd138UejVAibm4HMWKxg+m/zYvU9lI5VRufFfRoQUYM2DGhEN85aSPkLUVILz0jY/fEjJYxcJ9h6w9rKp9lH97tmpBnywA3V8HhOre72PnauRFFzVBwX3U5SUsJDm8XJBVyr+v0Qx+E6sAk4Gk0a/sepEEfdMX0J2TcsngbECqV4GJxfVVvWA5b8+YZoYSjfIB4PdPwefQbTWwn9JYJZJJi81isr4YWTfJOwjTsbLQ6jiDDbE/ei5lDdf5X9m5A2hYqOxK+ujPpDb2vm0B7sn/thHZ3kc60s5Cm3lm4Q9843qTOrt7FE8c5c/A4RZvTDJMX3s4vAetiyayMeAVDvalra4NNbwr9YzbfbYfKDhA53P/o9jPEGTfY/1UHlObtU+qfiIEdB8dGHQ52VRu7X1OqMkBC1UwWIoglphV7Pk6ylBkVxHEGmvaO1+ZKh4NlkkpPYgt8UoVNi15fpFJ5XKi0GGYp793mmz0p5ZRrHEt4LM1SuT9kyeDWXbfxi2HEZEVnVYqVTxKh70h0Aa7BhEWie/smVc87g4bJrlKBWatuJ9p27ncEjJKNXIbiL0jb/Ora29dGDXAuafWychQRS03pM0rHJ5Qc77K8QG4G8yKyX0Uj3BJzcsVHvy1ljWhmJvywgwz+xp9U7EMVhdrB869409qfkAgkC2O5dsd1aHFkl7zgA5l1byBzN6C0+5wyG8ud6ihGO9lfZ+7YO0wpDukqp0/WU+KuvL13qjZq7/KAeSjWlh36CyELDDLXws5w5E1r8rvhAjXa9RfYN7CipZ/bKZZDIdtXWLXcqKB4tm96WVbpDoLOIa4YRPe8wGIpBtOXvsusCViltHVrKAUgp5fF4ODjJl60y7y1vgDoIHPYqYIF/+/sN54y734Q== X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB0585;5:PCKaraujlFGljtvlj8osfY8L3456ki7/QDRHBbryKRQ0eg3HpZwBKzU7JzBOuM6PTJMK6pWdemm4kGzp6iJRxofmMIvW6KG2UB29bYDDfkC5a/c2vmMn4K5LAVTBESrYCFZrYTLJvD6wqekE9HNq1g==;24:HeJMC2w5LAJUEbLgaP5PQVSZHhi8yqXDVv/7IbUfelRegkSCwwP6/ulYZkEGDU5G9M5tHNU4SNabfquAf7OJYJB+rdbCKhCjvC4T+QzjxfI=;20:hOzUxYcMOcL4H0FOAfOudMCeoL2f5RMPbl4C806/aFO8dbSPGi6TEwIObz1eOSkQd4w3UtDfViucmsjeWzP8Y6TgjpY7CoOjl+mYqIjR+sEPIMtJiUp/SgqYEwb45XL2gZvh+CwxBQHSpm8JpISICJLirm3Tr5DPQ4X9l1R6FYA= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2016 10:55:46.1974 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR08MB0585 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3577 Lines: 97 On Fri, Apr 01, 2016 at 11:04:41AM +0200, Peter Zijlstra wrote: > On Wed, Jan 28, 2015 at 07:22:51PM +0300, Vladimir Davydov wrote: > > +++ b/mm/slub.c > > @@ -2007,6 +2007,7 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain) > > int pages; > > int pobjects; > > > > + preempt_disable(); > > do { > > pages = 0; > > pobjects = 0; > > @@ -2040,6 +2041,14 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain) > > > > } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) > > != oldpage); > > + if (unlikely(!s->cpu_partial)) { > > + unsigned long flags; > > + > > + local_irq_save(flags); > > + unfreeze_partials(s, this_cpu_ptr(s->cpu_slab)); > > + local_irq_restore(flags); > > + } > > + preempt_enable(); > > #endif > > } > > > > @@ -3369,7 +3378,7 @@ EXPORT_SYMBOL(kfree); > > * being allocated from last increasing the chance that the last objects > > * are freed in them. > > */ > > -int __kmem_cache_shrink(struct kmem_cache *s) > > +int __kmem_cache_shrink(struct kmem_cache *s, bool deactivate) > > { > > int node; > > int i; > > @@ -3381,14 +3390,26 @@ int __kmem_cache_shrink(struct kmem_cache *s) > > unsigned long flags; > > int ret = 0; > > > > + if (deactivate) { > > + /* > > + * Disable empty slabs caching. Used to avoid pinning offline > > + * memory cgroups by kmem pages that can be freed. > > + */ > > + s->cpu_partial = 0; > > + s->min_partial = 0; > > + > > + /* > > + * s->cpu_partial is checked locklessly (see put_cpu_partial), > > + * so we have to make sure the change is visible. > > + */ > > + kick_all_cpus_sync(); > > + } > > Argh! what the heck! and without a single mention in the changelog. This function is only called when a memory cgroup is removed, which is rather a rare event. I didn't think it would cause any pain. Sorry. > > Why are you spraying IPIs across the entire machine? Why isn't > synchronize_sched() good enough, that would allow you to get rid of the > local_irq_save/restore as well. synchronize_sched() is slower. Calling it for every per memcg kmem cache would slow down cleanup on cgroup removal. The latter is async, so I'm not sure if it would be a problem though. I think we can try to replace kick_all_cpus_sync() with synchronize_sched() here. Regarding local_irq_save/restore - synchronize_sched() wouldn't allow us to get rid of them, because unfreeze_partials() must be called with irqs disabled. Come to think of it, kick_all_cpus_sync() is used as a memory barrier here, so as to make sure that after it's finished all cpus will use the new ->cpu_partial value, which makes me wonder if we could replace it with a simple smp_mb. I mean, this_cpu_cmpxchg(), which is used by put_cpu_partial to add a page to per-cpu partial list, must issue a full memory barrier (am I correct?), so we have two possibilities here: Case 1: smp_mb is called before this_cpu_cmpxchg is called on another cpu executing put_cpu_partial. In this case, put_cpu_partial will see cpu_partial == 0 and hence call the second unfreeze_partials, flushing per cpu partial list. Case 2: smp_mb is called after this_cpu_cmpxchg. Then __kmem_cache_shrink ->flush_all -> has_cpu_slab should see (thanks to the barriers) that there's a slab on a per-cpu list and so flush it (provided it hasn't already been flushed by put_cpu_partial). In any case, after __kmem_cache_shrink has finished, we are guaranteed to not have any slabs on per cpu partial lists. Does it make sense? Thanks, Vladimir