Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp5760865imm; Mon, 23 Jul 2018 05:44:29 -0700 (PDT) X-Google-Smtp-Source: AAOMgpc+5dwv2Jib52pq8gpilPewo4weXiX+kcrm9Wh/RSd6nJIA7R208vSpk4xC4CfvyXkZdXkh X-Received: by 2002:a62:3d41:: with SMTP id k62-v6mr13102382pfa.35.1532349869260; Mon, 23 Jul 2018 05:44:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532349869; cv=none; d=google.com; s=arc-20160816; b=sONgCWos2yQRMqw3qLIaDeyyJnnvwV29v6c3onpHIR9X2GfRjm7iYVnx5SkOsLiL1w /fIa5MsqNDgxabcPrzY91oA4dwUxsWXKKtQeyO+7KOwJDG4nVKF1RPnIWdSM6JJYSmQH msjH8xGyb8EUKHSub3LxSWdPjFwTYvuvCAgHJD4MVAWpCbFOCcTVAC68CnEGl0C5ptkw 7hxqzRNoc6SOtUXECwah3EO1MbyQOLVH8CBnD+UdZ8XxB6Re4XDQ4UEyc+S0O40F1y8f HYwVGjSmM6A9ZXp1c7BGYS2taIq0PLMXf6gshaTEb1HccSf3a008eAUHIrPLc2b88EOW bLHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=BHOqhJ7Sm/0KONWBm8RuVBBGMD+dedZXkNJQH7lGNN0=; b=Irolp6kb4OQS8vw1zKMoCtqeXPs/JN9beOn24/jXNPNNXbicOnZb6pksxBXeCJysjH mDpnFkVbAgGvqaXFrdt5WmWYAviTLZu/I4XWWrIKr2S0+WbUpIjHaRV3ivBM8tsVEaEi TAsHkboW+WhQXE2nyhZMskieQujDMkUCzMkKSzarEm1DmvcdrlXhz+OIsKkYxuHqU1mA a8rTJUDMtyJg6lMUttN6RihDBYl8J4BbezV2ae/FU5t1GOyTisrpyPSwMmf4qfD2cvAg Xqw3EnY9sPLXU+MVtzLeVL/LJ4N18G4TFYkcT8ZpZtSb9PRtYbG2fxmJ5mb2vbMZHbau zv1A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c3-v6si8149091pld.457.2018.07.23.05.44.14; Mon, 23 Jul 2018 05:44:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388186AbeGWN0r (ORCPT + 99 others); Mon, 23 Jul 2018 09:26:47 -0400 Received: from nautica.notk.org ([91.121.71.147]:52931 "EHLO nautica.notk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387801AbeGWN0r (ORCPT ); Mon, 23 Jul 2018 09:26:47 -0400 Received: by nautica.notk.org (Postfix, from userid 1001) id 04E57C009; Mon, 23 Jul 2018 14:25:46 +0200 (CEST) Date: Mon, 23 Jul 2018 14:25:31 +0200 From: Dominique Martinet To: Greg Kurz Cc: Matthew Wilcox , v9fs-developer@lists.sourceforge.net, Latchesar Ionkov , Eric Van Hensbergen , Ron Minnich , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH v2 5/6] 9p: Use a slab for allocating requests Message-ID: <20180723122531.GA9773@nautica> References: <20180711210225.19730-1-willy@infradead.org> <20180711210225.19730-6-willy@infradead.org> <20180718100554.GA21781@nautica> <20180723135220.08ec45bf@bahia> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20180723135220.08ec45bf@bahia> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Greg Kurz wrote on Mon, Jul 23, 2018: > The patch is quite big and I'm not sure I can find time to review it > carefully, but I'll try to help anyway. No worry, thanks for this already. > > Sorry for coming back to this patch now, I just noticed something that's > > actually probably a fairly big hit on performance... > > > > While the slab is just as good as the array for the request itself, this > > makes every single request allocate "fcalls" everytime instead of > > reusing a cached allocation. > > The default msize is 8k and these allocs probably are fairly efficient, > > but some transports like RDMA allow to increase this to up to 1MB... And > > It can be even bigger with virtio: > > #define VIRTQUEUE_NUM 128 > > .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3), > > On a typical ppc64 server class setup with 64KB pages, this is nearly 8MB. I don't think I'll be able to test 64KB pages, and it's "just" 500k with 4K pages so I'll go with IB. I just finished reinstalling my IB-enabled VMs, now to get some iops test running (dbench maybe) and I'll get some figures to be able to play with different models and evaluate the impact of these. > > One thing is that the buffers are all going to be the same size for a > > given client (.... except virtio zc buffers, I wonder what I'm missing > > or why that didn't blow up before?) > > ZC allocates a 4KB buffer, which is more than enough to hold the 7-byte 9P > header and the "dqd" part of all messages that may use ZC, ie, 16 bytes. > So I'm not sure to catch what could blow up. ZC requests won't blow up, but from what I can see with the current (old) request cache array, if a ZC request has a not-yet used tag it'll allocate a new 4k buffer, then if a normal request uses that tag it'll get the 4k buffer instead of an msize sized one. On the client size the request would be posted with req->rc->capacity which would correctly be 4k, but I'm not sure what would happen if qemu tries to write more than the given size to that request? > > It's a shame because I really like that patch, I'll try to find time to > > run some light benchmark with varying msizes eventually but I'm not sure > > when I'll find time for that... Hopefully before the 4.19 merge window! > > > > Yeah, the open-coded cache we have now really obfuscates things. > > Maybe have a per-client kmem_cache object for non-ZC requests with > size msize [*], and a global kmem_cache object for ZC requests with > fixed size P9_ZC_HDR_SZ. > > [*] the server can require a smaller msize during version negotiation, > so maybe we should change the kmem_cache object in this case. Yeah, if we're going to want to accomodate non-power of two buffers, I think we'll need a separate kmem_cache for them. The ZC requests could be made into exactly 4k and these could come with regular kmalloc just fine, it looks like trying to create a cache of that size would just return the same cache used by kmalloc anyway so it's probably easier to fall back to kmalloc if requested alloc size doesn't match what we were hoping for. I'll try to get figures for various approaches before the merge window for 4.19 starts, it's getting closer though... -- Dominique