Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp708597rdh; Sun, 24 Sep 2023 08:17:29 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFEfIqrrR2TXIeZ215rl0m3Fscs250lA9LnA/d2OoDD70NP8kMWjkXIcUI65zD1IM9PMdpx X-Received: by 2002:a05:6a20:3caa:b0:154:6480:83a4 with SMTP id b42-20020a056a203caa00b00154648083a4mr3539933pzj.14.1695568649271; Sun, 24 Sep 2023 08:17:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695568649; cv=none; d=google.com; s=arc-20160816; b=P+kxAo8rTIkdCAfsvVP1qRZf1no8zEXCZSU2qfcjeGNraQx09YW4QR3zT3HwUK7kxY mBUa+4g01fDC3Elalc2VEbfGgH31HO6/etqgJW8r56mltxyKuMNcuh/+roEdoIjjnm68 JXOifb3jTR6WMUnYhKTfnI258PG6mX6cPTYGiYOOEjGwKpjP8haRy+4VhFmFxbAXs9jN FI73ccLs6gZOerSgNCzC4AGVBHtWh/cQyrM0a+Ncs7i/uBXkavyswYU78yp60Gc0qr9V Fjh55XGCMsFkVqvORzzVGaUlXIN/3WzR7pPqIBuQjdB5h9GtVTWN9Z1qaq1qfoARSayD hJ/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from; bh=bAZuGeLosryni1cx0Vzh0/ajLtjaFzUHk3RTy4P4pzc=; fh=kTBstCixiri9Ahdk8UNuw7wmcx8Ry+mqafUdDsc8Vd0=; b=no133kULoipEG2scfoiGXhFH9BeVOP3lbHDaJlBZiLYLtggCTuE3qSMkey4B81J/Gh dLYJW2S4vS/BBds3QCigCnGE0AjzCe0AvuKToZ7q9eb/OBYfHB78wPyagxv92Q4d/sbA oyK9cplikShWUn6+97ffaMHTGlZ9/0iyBgbFZHhwPWFTQKwzK6ZBYfZgKX5v0ULjZjYo EhhzYHl8p78NzQunNZhdfxriTtvvHEt35ITD5QrPFuzOYPiB2DEDi+pdiIF3nhArfW8h u518Q2YMj5F5ctaa7og6zVIHMwyv+KrYz8P1pbTXz/RgjenGn3Ba2EgB2tsCFqJG+03x 0bTg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id ip11-20020a17090b314b00b00262ca5c4c12si8754266pjb.178.2023.09.24.08.17.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 24 Sep 2023 08:17:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 5A31C819D1DD; Sun, 24 Sep 2023 08:17:20 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229583AbjIXPRN (ORCPT + 99 others); Sun, 24 Sep 2023 11:17:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229437AbjIXPRM (ORCPT ); Sun, 24 Sep 2023 11:17:12 -0400 Received: from out30-118.freemail.mail.aliyun.com (out30-118.freemail.mail.aliyun.com [115.124.30.118]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA9E3B8; Sun, 24 Sep 2023 08:17:03 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=guwen@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0Vsir3vX_1695568613; Received: from localhost(mailfrom:guwen@linux.alibaba.com fp:SMTPD_---0Vsir3vX_1695568613) by smtp.aliyun-inc.com; Sun, 24 Sep 2023 23:16:59 +0800 From: Wen Gu To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: wintera@linux.ibm.com, schnelle@linux.ibm.com, gbayer@linux.ibm.com, pasic@linux.ibm.com, alibuda@linux.alibaba.com, tonylu@linux.alibaba.com, dust.li@linux.alibaba.com, guwen@linux.alibaba.com, linux-s390@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v4 00/18] net/smc: implement virtual ISM extension and loopback-ism Date: Sun, 24 Sep 2023 23:16:35 +0800 Message-Id: <1695568613-125057-1-git-send-email-guwen@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 X-Spam-Status: No, score=-0.7 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NUMERIC_HTTP_ADDR,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Sun, 24 Sep 2023 08:17:20 -0700 (PDT) Hi, all # Background SMC-D is now used in IBM z with ISM function to optimize network interconnect for intra-CPC communications. Inspired by this, we try to make SMC-D available on the non-s390 architecture through a software-simulated virtual ISM device, such as loopback-ism device here, to accelerate inter-process or inter-containers communication within the same OS. # Design This patch set includes 4 parts: - Patch #1-#3: decouple ISM device hard code from SMC-D stack. - Patch #4-#8: implement virtual ISM extension defined in SMCv2.1. - Patch #9-#13: implement loopback-ism device. - Patch #14-#18: memory copy optimization for the case using loopback. The loopback-ism device is designed as a kernel device and not be limited to a specific net namespace, ends of both inter-process connection (1/1' in diagram below) or inter-container connection (2/2' in diagram below) will find that peer shares the same loopback-ism device during the CLC handshake. Then loopback-ism device will be chosen. Container 1 (ns1) Container 2 (ns2) +-----------------------------------------+ +-------------------------+ | +-------+ +-------+ +-------+ | | +-------+ | | | App A | | App B | | App C | | | | App D |<-+ | | +-------+ +---^---+ +-------+ | | +-------+ |(2') | | |127.0.0.1 (1')| |192.168.0.11 192.168.0.12| | | (1)| +--------+ | +--------+ |(2) | | +--------+ +--------+ | | `-->| lo |-` | eth0 |<-` | | | lo | | eth0 | | +---------+--|---^-+---+-----|--+---------+ +-+--------+---+-^------+-+ | | | | Kernel | | | | +----+-------v---+-----------v----------------------------------+---+----+ | | TCP | | | | | | | +--------------------------------------------------------------+ | | | | +--------------+ | | | smc loopback | | +---------------------------+--------------+-----------------------------+ loopback-ism device allocs RMBs and sndbufs for each connection peer and 'moves' data from sndbuf at one end to RMB at the other end. Since communication occurs within the same kernel, the sndbuf can be mapped to peer RMB so that the data copy in loopback-ism case can be avoided. Container 1 (ns1) Container 2 (ns2) +-----------------------------------------+ +-------------------------+ | +-------+ +-------+ +-------+ | | +-------+ | | | App A | | App B | | App C | | | | App D | | | +-------+ +--^----+ +-------+ | | +---^---+ | | | | | | | | | | (1) | (1') | (2) | | | (2') | | | | | | | | | | +-------|-----------|---------------|-----+ +------------|------------+ | | | | Kernel | | | | +-------|-----------|---------------|-----------------------|------------+ | +-----v-+ +-------+ +---v---+ +-------+ | | | snd A |-+ | RMB B |<--+ | snd C |-+ +->| RMB D | | | +-------+ | +-------+ | +-------+ | | +-------+ | | +-------+ | +-------+ | +-------+ | | +-------+ | | | RMB A | | | snd B | | | RMB C | | | | snd D | | | +-------+ | +-------+ | +-------+ | | +-------+ | | | +-------------v+ | | | +-------------->| smc loopback |---------+ | +---------------------------+--------------+-----------------------------+ # Benchmark Test * Test environments: - VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem. - SMC sndbuf/RMB size 1MB. * Test object: - TCP: run on TCP loopback. - domain: run on UNIX domain. - SMC lo: run on SMC loopback device. 1. ipc-benchmark (see [1]) - ./ -c 1000000 -s 100 TCP SMC-lo Message rate (msg/s) 81539 151251(+85.50%) 2. sockperf - serv: taskset -c sockperf sr --tcp - clnt: taskset -c sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30 TCP SMC-lo Bandwidth(MBps) 5313.66 8270.51(+55.65%) Latency(us) 5.806 3.207(-44.76%) 3. nginx/wrk - serv: nginx - clnt: wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80 TCP SMC-lo Requests/s 194641.79 258656.13(+32.89%) 4. redis-benchmark - serv: redis-server - clnt: redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024 TCP SMC-lo GET(Requests/s) 85855.34 115640.35(+34.69%) SET(Requests/s) 86337.15 118203.30(+36.90%) [1] https://github.com/goldsborough/ipc-bench v3->v4: - Fix build warning of patch#12 about: dmb_node->dma_addr = (dma_addr_t)dmb_node->cpu_addr; - Replace kzalloc with vzalloc to alloc DMB in loopback-ism, which may cause throughput or QPS to drop by 2% to 8%. see: https://lore.kernel.org/netdev/d6facfd5-e083-ffc7-05e5-2e8f3ef17735@linux.alibaba.com/ - Add SMC_LO in Kconfig to turn on/off loopback-ism. - Make extension GID cleaner. v2->v3: - Fix build warning of patch#1 and patch#10. v1->v2: - Fix build error on s390 arch. Wen Gu (18): net/smc: decouple ism_dev from SMC-D device dump net/smc: decouple ism_dev from SMC-D DMB registration net/smc: extract v2 check helper from SMC-D device registration net/smc: support SMCv2.x supplemental features negotiation net/smc: reserve CHID range for SMC-D virtual device net/smc: extend GID to 128bits only for virtual ISM device net/smc: disable SEID on non-s390 architecture net/smc: enable virtual ISM device feature bit net/smc: introduce SMC-D loopback device net/smc: implement ID-related operations of loopback net/smc: implement some unsupported operations of loopback net/smc: implement DMB-related operations of loopback net/smc: register loopback device as SMC-Dv2 device net/smc: add operation for getting DMB attribute net/smc: add operations for DMB attach and detach net/smc: avoid data copy from sndbuf to peer RMB in SMC-D net/smc: modify cursor update logic when sndbuf mapped to RMB net/smc: add interface implementation of loopback device drivers/s390/net/ism_drv.c | 20 +- include/net/smc.h | 32 ++- include/uapi/linux/smc.h | 3 + include/uapi/linux/smc_diag.h | 2 + net/smc/Kconfig | 13 ++ net/smc/Makefile | 2 +- net/smc/af_smc.c | 88 ++++++-- net/smc/smc.h | 7 + net/smc/smc_cdc.c | 56 ++++- net/smc/smc_cdc.h | 1 + net/smc/smc_clc.c | 64 ++++-- net/smc/smc_clc.h | 10 +- net/smc/smc_core.c | 111 +++++++++- net/smc/smc_core.h | 9 +- net/smc/smc_diag.c | 11 +- net/smc/smc_ism.c | 100 ++++++--- net/smc/smc_ism.h | 24 ++- net/smc/smc_loopback.c | 489 ++++++++++++++++++++++++++++++++++++++++++ net/smc/smc_loopback.h | 54 +++++ net/smc/smc_pnet.c | 4 +- 20 files changed, 996 insertions(+), 104 deletions(-) create mode 100644 net/smc/smc_loopback.c create mode 100644 net/smc/smc_loopback.h -- 1.8.3.1