Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp7961199rwi; Tue, 25 Oct 2022 00:06:02 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4uiE7lId3V31vf+n5mrspYt5eWvNjHMsYEg2I/vsFSV8us7npyZdG++LYLQFUSglvcf8c1 X-Received: by 2002:a17:906:99c5:b0:73d:70c5:1a4f with SMTP id s5-20020a17090699c500b0073d70c51a4fmr30033779ejn.302.1666681562240; Tue, 25 Oct 2022 00:06:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666681562; cv=none; d=google.com; s=arc-20160816; b=EWIc9hpv4B8XB9j5l0Ru0/8XnSDqR9yhMgKZZAWzuMZMVDZ9BwXH3tuOsg6PbNF5Eu FzeU2dnrSCfBPE7D9jIaqg7lEb313lTa86ydl/6TpK05Mzl1pGMFagAmSutfQiTHizpG g+o0Heyg5fLWSrQ0px7gVuoKpw1ot6B5wFn4viYyxjwBQZwRVOf00NlskDdQgUvcL+DX niL8OOEVnfAoXmnO4v8Y0ypEA4yUxCv0prFux+gXVFfSV5QNUZmEH27WE8Q5+hqM0IrR W7H9ShFn7tgkklaB3MxzZOlULbQ1LfPLA8njP9/DSHUWDHo8dWgjiH+b26Jxq3Ja1X66 ksJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=52Ackc+HIaGbbEoFOOYpWlvR8CA50d8AxFJw/tKl/6o=; b=em3VNk6/4QNn7kHxGstuH1uTH9jLvtasq2/9Hl9chl2K969pzoGl/13pGr2v/e2eru eavuq18AY/bUAfhNmenblPYxyyadCDtvkcmwMCZq6pN6DnGSUbU4Ge6jgPnxO4mZUjOF XfduuGaO9Y2keFKVDbhaOK9wPpfzYSpsH1iqlrzhHNiIjp3b86H0ZujzPXTh0ebaGMdH +WCWw2YgX2Ys5/jcspwWuJ9r7VipMTZ9S57DdDzByfpZBDPlOzwSXXKQk4fuoz32N8i4 yfcifbsO89eAbtAM9eQehcshYaL2cHaj32shiqzBKiMVIfgnODTVs9PRI9BmMnSvWMuN QdEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=NaYQSyho; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ds9-20020a170907724900b0077547abf08fsi2321673ejc.169.2022.10.25.00.05.37; Tue, 25 Oct 2022 00:06:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=NaYQSyho; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231378AbiJYGsw (ORCPT + 99 others); Tue, 25 Oct 2022 02:48:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230327AbiJYGsq (ORCPT ); Tue, 25 Oct 2022 02:48:46 -0400 Received: from mail-yb1-xb32.google.com (mail-yb1-xb32.google.com [IPv6:2607:f8b0:4864:20::b32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA4EC148FE1 for ; Mon, 24 Oct 2022 23:48:44 -0700 (PDT) Received: by mail-yb1-xb32.google.com with SMTP id m125so304633ybb.6 for ; Mon, 24 Oct 2022 23:48:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=52Ackc+HIaGbbEoFOOYpWlvR8CA50d8AxFJw/tKl/6o=; b=NaYQSyho5xlX02NRoH9tbD+t+kKtnlqtqMXq5Ed43u5oNQu94haBkDyUViIk6HGKYh XMV2Q36m3/TVoduSMKWQ3xdPW/hGah60R1EdncGCUnUeuTn4WNyl3H/cM/8INSifyHRs CXEVqiaozScxXpXRyq/HNKNYVX0PCLSEP9W62KfSqYfuJHuwzrYjziaCk0Wz5b+y+dAv W7nQQ23hG2BVLPIYV8szZ8kah2yr+d24dF7ZLwxrOpo6F+Q/n+QXkVhWVH/xc2kumZsd UO3IRB3HRtmddGJgFzy5cQ4cCwIZv+yloKpSGIrGozuGth3FwuI++loODE/glk9X0mny ayyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=52Ackc+HIaGbbEoFOOYpWlvR8CA50d8AxFJw/tKl/6o=; b=N23KiHm+P1xwtK2Z2t3BDL1BNHphuOZyBUygSnZIUcbjRJ+NGRU6wZp8bLPafsY0Ww rVskHw7JpwepePqgbbL59Iu+pFPZAObKwxwb8mWiRWAPEFFwZvjwlWSOazbQweASeE6l D5OfSnk1O3sq5B7/bMlxZRcfWq02GrhO5NB8FYvyU6DBtGYOqvAQO6S8VI7RGTTGOSTF 1EYkf49sCvIOot65ZocAO/QhYVbca4SMnwa7J5y9RoOJnK3Aj3tiGDoBu9hVj7nxO8Fm WZFqg4NxtFW8ZoIaMx67fRMucHPAaVS+6q0diFbpeLr8XHzgdoaSKzAjP/ZvkSA55sde 4kdg== X-Gm-Message-State: ACrzQf2T2rAxjEqJfchZn9F9nB043wvF7Z+RrKDOGUuIi1ffcaRsAi0k m7tpshpV42ZerjTY+KeKPd7ZkQjdiGw/SCYM7jY= X-Received: by 2002:a25:81d2:0:b0:6ca:1fc9:de7c with SMTP id n18-20020a2581d2000000b006ca1fc9de7cmr23655368ybm.13.1666680524020; Mon, 24 Oct 2022 23:48:44 -0700 (PDT) MIME-Version: 1.0 References: <20221020153540.8226-1-ubizjak@gmail.com> <220dabe6-dac1-8ca7-0134-e6e89c923d81@nvidia.com> In-Reply-To: <220dabe6-dac1-8ca7-0134-e6e89c923d81@nvidia.com> From: Uros Bizjak Date: Tue, 25 Oct 2022 08:48:32 +0200 Message-ID: Subject: Re: [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head To: Chaitanya Kulkarni Cc: "linux-nvme@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Christoph Hellwig , Sagi Grimberg Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 25, 2022 at 2:47 AM Chaitanya Kulkarni wrote: > > On 10/20/22 08:35, Uros Bizjak wrote: > > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in > > nvmet_update_sq_head. x86 CMPXCHG instruction returns success in ZF flag, so > > this change saves a compare after cmpxchg (and related move instruction in > > front of cmpxchg). > > > > Is it worth a share delts of assembly instructions of the changes above? > as developers on block mailing list are sharing the delta between before > and after patch including the assembly. The difference in the assembly of nvmet_update_sq_head function is: before: 0000000000001d30 : 1d30: 48 8b 4f 10 mov 0x10(%rdi),%rcx 1d34: 49 89 f8 mov %rdi,%r8 1d37: 0f b7 71 1a movzwl 0x1a(%rcx),%esi 1d3b: 66 85 f6 test %si,%si 1d3e: 75 14 jne 1d54 1d40: 49 8b 40 08 mov 0x8(%r8),%rax 1d44: 8b 51 1c mov 0x1c(%rcx),%edx 1d47: 66 89 50 08 mov %dx,0x8(%rax) 1d4b: e9 00 00 00 00 jmpq 1d50 1d4c: R_X86_64_PLT32 __x86_return_thunk-0x4 1d50: 0f b7 71 1a movzwl 0x1a(%rcx),%esi 1d54: 8b 79 1c mov 0x1c(%rcx),%edi 1d57: 31 d2 xor %edx,%edx 1d59: 8d 47 01 lea 0x1(%rdi),%eax 1d5c: f7 f6 div %esi 1d5e: 89 f8 mov %edi,%eax 1d60: f0 0f b1 51 1c lock cmpxchg %edx,0x1c(%rcx) 1d65: 49 8b 48 10 mov 0x10(%r8),%rcx 1d69: 39 c7 cmp %eax,%edi 1d6b: 75 e3 jne 1d50 1d6d: 49 8b 40 08 mov 0x8(%r8),%rax 1d71: 8b 51 1c mov 0x1c(%rcx),%edx 1d74: 66 89 50 08 mov %dx,0x8(%rax) 1d78: e9 00 00 00 00 jmpq 1d7d 1d79: R_X86_64_PLT32 __x86_return_thunk-0x4 1d7d: after: 0000000000001d30 : 1d30: 48 8b 4f 10 mov 0x10(%rdi),%rcx 1d34: 0f b7 51 1a movzwl 0x1a(%rcx),%edx 1d38: 66 85 d2 test %dx,%dx 1d3b: 74 1e je 1d5b 1d3d: 8b 71 1c mov 0x1c(%rcx),%esi 1d40: 44 0f b7 c2 movzwl %dx,%r8d 1d44: 8d 46 01 lea 0x1(%rsi),%eax 1d47: 31 d2 xor %edx,%edx 1d49: 41 f7 f0 div %r8d 1d4c: 89 f0 mov %esi,%eax 1d4e: f0 0f b1 51 1c lock cmpxchg %edx,0x1c(%rcx) 1d53: 48 8b 4f 10 mov 0x10(%rdi),%rcx 1d57: 89 c6 mov %eax,%esi 1d59: 75 10 jne 1d6b 1d5b: 48 8b 47 08 mov 0x8(%rdi),%rax 1d5f: 8b 51 1c mov 0x1c(%rcx),%edx 1d62: 66 89 50 08 mov %dx,0x8(%rax) 1d66: e9 00 00 00 00 jmpq 1d6b 1d67: R_X86_64_PLT32 __x86_return_thunk-0x4 1d6b: 0f b7 51 1a movzwl 0x1a(%rcx),%edx 1d6f: eb cf jmp 1d40 1d71: You can see that in addition to the smaller size of the function, the load of req->sq->size at 1d6b got moved to a cold path. As the main benefit, the load at 1d3d is now out of the loop, and the value in %esi is now provided by cmpxchg insn itself at 1d4e (plus move at 1d57). Unfortunately, the division clobbers %eax, so some reg-reg moves are necessary. Note also that the compare at 1d69 is now gone. > I also hope that you have tested this change with blktests nvme. No, I didn't test the patch that thoroughly, but the change is the same as some similar recent changes in the generic code, so I confirmed the patch by inspecting asm code. OTOH, the kernel booted from a NVME SSD. > Either way:- > > Reviewed-by: ChaItanya Kulkarni Thanks! Uros.