Return-Path: Received: from mail-ig0-f169.google.com ([209.85.213.169]:37347 "EHLO mail-ig0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752244AbbFHNel (ORCPT ); Mon, 8 Jun 2015 09:34:41 -0400 Received: by igbsb11 with SMTP id sb11so57059579igb.0 for ; Mon, 08 Jun 2015 06:34:41 -0700 (PDT) Date: Mon, 8 Jun 2015 09:34:37 -0400 From: Jeff Layton To: Trond Myklebust Cc: Shirley Ma , "J. Bruce Fields" , Linux NFS Mailing List Subject: Re: [RFC PATCH V3 0/7] nfsd/sunrpc: prepare nfsd to add workqueue support Message-ID: <20150608093437.3bb9b028@synchrony.poochiereds.net> In-Reply-To: References: <557525DD.6070300@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-nfs-owner@vger.kernel.org List-ID: On Mon, 8 Jun 2015 09:15:28 -0400 Trond Myklebust wrote: > On Mon, Jun 8, 2015 at 1:19 AM, Shirley Ma wrote: > > This patchset was originally written by Jeff Layton from adding support for a workqueue-based nfsd. I am helping on stability test and performance analysis. There are some workloads benefit from global threading mode, some workloads benefit from workqueue mode. I am still investigating on how to make workqueue mode better to bid global theading mode. I am splitting the patchset into two parts: one is preparing nfsd to add workqueue mode, one is adding workqueue mode. The test results show that the first part doesn't cause much performance change, the results are within the variation from each run. > > As stated in the original emails, Primary Data's internal testing of > these patches showed that there is a significant difference. We had 48 > virtual clients running on 7 ESX hypervisors with 10GigE NICs against > a hardware NFSv3 server with a 40GigE NIC. The clients were doing 4k > aio/dio reads+writes in a 70/30 mix. > At the time, we saw a roughly 50% decrease with measured standard > deviations being of the order a few % when comparing the performance > as measured in IOPs between the existing code and the workqueue code. > > Testing showed the workqueue performance was relatively improved when > we upped the block size to 256k (with lower IOPs counts). That would > indicate that the workqueues are failing to scale correctly for the > high IOPs (i.e. high thread count) case. > > Trond Yes. Also, I did some work with tracepoints to see if I could figure out where the extra latency was coming from. At the time, the data I collected showed that the workqueue server was processing incoming frames _faster_ than the thread based one. So, my theory (untested) is that somehow the workqueue server is causing extra latency in processing interrupts for the network card or disk. The test I was using at the time was to just dd a bunch of sequential reads and writes to a file. I figured that was as simple a testcase as you could get... In any case though, the patches that Shirley is proposing here don't actually add any of the workqueue based server code. It's all just preparatory patches to add an ops structure to the svc_serv and move a bunch of disparate function pointers into it. I think that's a reasonable set to take, even if we don't take the workqueue code itself as it cleans up the rpc server API considerably (IMNSHO). -- Jeff Layton