Received: by 2002:a05:7412:1e0b:b0:fc:a2b0:25d7 with SMTP id kr11csp1123274rdb; Fri, 16 Feb 2024 06:10:41 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCU9KWa5GwaF4Ugw3SgaWblxMD1ZBdMcFThzP3QZrrI7drB7y5iw3YTPxnUd8LTePjrnYY/UlzYjFeACp5TX7Pb1p87P8clWbWTC9WEIGw== X-Google-Smtp-Source: AGHT+IGOfOh4PsOBnP3Txt7/RTnYrZfCESCE/BjYysvp541kZa3IxwtwJbuMjcbaNChUTbKMKwpG X-Received: by 2002:a0c:ab56:0:b0:68f:1315:7ef4 with SMTP id i22-20020a0cab56000000b0068f13157ef4mr4355755qvb.61.1708092641412; Fri, 16 Feb 2024 06:10:41 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708092641; cv=pass; d=google.com; s=arc-20160816; b=RBwH8VOw5yfb4ERE67GATbHu2RtbfA+5PpTElqBsA/xx+HpqfEnWoujDFNvC1KpEfy x5xn/i7xq5lRWId0tLg3EhXqM8dSe+p2IfHxSOgo7VkPVBp90KmPSeHJZVj/iuFFGGgp /XJZowczG7Zr+M09pJHDI08R8udw4QCGzQrX+0jafn5HnR8uA37imKCvyu8KKkG/D5Yn /Vs6vTdV8vLfEdpn6lZ4AlG8JFHDrvYsaklkQK4IG8SftD1MZrV1CRK3veBaflCpvp/P EJlxYyqCs+0cQzTrjKgW6IwwaFUWLSUc/9i1JLOUaiBLVx+P1lLMg+omdtVrFkkXGD86 aIkQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:date:message-id:dkim-signature; bh=9MCKfA0LPdub8ksZ7K0EQAhgx1AHwzqEwSq9YR49V0Y=; fh=SofdEQA4NZ74UKKWzXQUF8mgDElfzDqMXurOcg2Nmes=; b=yqcWggFiGnjTYepk8nct/2kkgvYGQ1shNtXE3MrWv33pyqckw2w2Nvo+J3xB9/2olX 85ytXGgmWD2cQV5QLKk3I9gEOhwXKST/YOlGe/jAnJHUsfrZmncRYBtpzUxWupADsGNp OY8G/G+XDbhXqoKh2YdN1QJHKqpenJEbFCINqYuWV0QK43SHgAeams0dudjpt94MJxbg 9jDox92a+f+DmUCa2uvPaROIOU9zOAbNywmMHxw8/yiRq4wc3x7Xhs70fAmCnNZRRomG uC4EFmbxRqKeG6Q2awTjojFhpLJplw1+j6SQVhONoR9YJklhYnCAgB8duCW/ogXc/LSI TpUg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=gwxJbP5M; arc=pass (i=1 spf=pass spfdomain=linux.ibm.com dkim=pass dkdomain=ibm.com dmarc=pass fromdomain=linux.ibm.com); spf=pass (google.com: domain of linux-kernel+bounces-68711-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-68711-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id p16-20020a05621415d000b0068f1be0ac7asi3101885qvz.543.2024.02.16.06.10.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 06:10:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-68711-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=gwxJbP5M; arc=pass (i=1 spf=pass spfdomain=linux.ibm.com dkim=pass dkdomain=ibm.com dmarc=pass fromdomain=linux.ibm.com); spf=pass (google.com: domain of linux-kernel+bounces-68711-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-68711-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id E1E1F1C20FC7 for ; Fri, 16 Feb 2024 14:10:24 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1867712D744; Fri, 16 Feb 2024 14:10:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="gwxJbP5M" Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82B6412C558; Fri, 16 Feb 2024 14:09:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708092598; cv=none; b=avqcu79gXLJuYaJVmh+S9FsY5KpSaRLq869pUMRkwtwL3xA67Tqja2Q/WKl6QmB6Og8/xZ+vhEubNBZBMdUbN1FPgVO5EuXVSAQrH+Age2gPfXKml7oTZf4vC6EAUwiNF7hDygZK0VdlC+1oUNqcTpocIvgqt7l+wlUanxjuMDw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708092598; c=relaxed/simple; bh=A32UXNtiXmxgXEgjT0Kfl4Zq9TOWQF5FJNHhlUrUTZI=; h=Message-ID:Date:Subject:To:Cc:References:From:In-Reply-To: Content-Type:MIME-Version; b=rPGcH/2h0VP7YU04zylFL2jVN5CWEXnC3SJGMsxmmyXdLoDpvvFrrL66ls4lout06++mYdXRVKvyljqFE/2TJdX5lz2nmgV9/WtU3WwLKXQkwrDiJgYAaCHAXXsutN9Z603gU2YdpF0Sy3Zg2qC5TsBdv7OoUUD9GiAMnEHlfQ8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=gwxJbP5M; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 41GDPmO1004600; Fri, 16 Feb 2024 14:09:52 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=9MCKfA0LPdub8ksZ7K0EQAhgx1AHwzqEwSq9YR49V0Y=; b=gwxJbP5MWqbdejw1wpt7MH42RImjuHVERXYZSlpNMraIZiF5cw8JrvABMbH9jkyQI73d O6SAia7+VADLenb0oF1v5svGm+Nu0Dp5rXK6AtUMyneL3q7dIjoNw77biRdYGQl+CjZ6 UuJIg+MzNOmqS9gWjwdDCeNgGdmUDlemer/XikhsvtxqVZ73SRgf2ILrGVY2XoyXW9Bu WEbWkdnRJUcgORlLr+y3oEjv5hoxEaCV+PxDtbKiBxaGmiLN2S412kQD8aGHmTCg6SWP Qkzz9BDSjL5Jxtf4Sy3TUPQiVTCgdefHkAqWONt2+1ccsoeZqp0/Jms2OoSAXc6/+MFC 7g== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wa7e82pkr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 16 Feb 2024 14:09:52 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 41GDPh6q004427; Fri, 16 Feb 2024 14:09:51 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wa7e82pk8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 16 Feb 2024 14:09:51 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 41GB23aq010083; Fri, 16 Feb 2024 14:09:50 GMT Received: from smtprelay01.wdc07v.mail.ibm.com ([172.16.1.68]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3w6npmba8e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 16 Feb 2024 14:09:50 +0000 Received: from smtpav04.wdc07v.mail.ibm.com (smtpav04.wdc07v.mail.ibm.com [10.39.53.231]) by smtprelay01.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 41GE9lbg56426768 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 16 Feb 2024 14:09:49 GMT Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E9AA958050; Fri, 16 Feb 2024 14:09:46 +0000 (GMT) Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7F83558052; Fri, 16 Feb 2024 14:09:43 +0000 (GMT) Received: from [9.171.40.55] (unknown [9.171.40.55]) by smtpav04.wdc07v.mail.ibm.com (Postfix) with ESMTP; Fri, 16 Feb 2024 14:09:43 +0000 (GMT) Message-ID: <76b53c2d-5596-44da-b759-e5e94571d401@linux.ibm.com> Date: Fri, 16 Feb 2024 15:09:42 +0100 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next 00/15] net/smc: implement loopback-ism used by SMC-D Content-Language: en-GB To: Wen Gu , wintera@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, jaka@linux.ibm.com, Gerd Bayer Cc: borntraeger@linux.ibm.com, svens@linux.ibm.com, alibuda@linux.alibaba.com, tonylu@linux.alibaba.com, linux-s390@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org References: <20240111120036.109903-1-guwen@linux.alibaba.com> From: Wenjia Zhang In-Reply-To: <20240111120036.109903-1-guwen@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8; format=flowed X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 7cMb1OSZ9cixhz_GvgBdOlTsqk-A16I- X-Proofpoint-ORIG-GUID: VCOwlQMw9pqK_eVtl4kEu4YkWxC5ADUw Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-16_13,2024-02-16_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 malwarescore=0 adultscore=0 spamscore=0 clxscore=1015 impostorscore=0 bulkscore=0 priorityscore=1501 mlxscore=0 mlxlogscore=999 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402160113 On 11.01.24 13:00, Wen Gu wrote: > This patch set acts as the second part of the new version of [1] (The first > part can be referred from [2]), the updated things of this version are listed > at the end. > > # Background > > SMC-D is now used in IBM z with ISM function to optimize network interconnect > for intra-CPC communications. Inspired by this, we try to make SMC-D available > on the non-s390 architecture through a software-implemented virtual ISM device, > that is the loopback-ism device here, to accelerate inter-process or > inter-containers communication within the same OS instance. > > # Design > > This patch set includes 3 parts: > > - Patch #1-#2: some prepare work for loopback-ism. > - Patch #3-#9: implement loopback-ism device. > - Patch #10-#15: memory copy optimization for loopback scenario. > > The loopback-ism device is designed as a ISMv2 device and not be limited to > a specific net namespace, ends of both inter-process connection (1/1' in diagram > below) or inter-container connection (2/2' in diagram below) can find the same > available loopback-ism and choose it during the CLC handshake. > > Container 1 (ns1) Container 2 (ns2) > +-----------------------------------------+ +-------------------------+ > | +-------+ +-------+ +-------+ | | +-------+ | > | | App A | | App B | | App C | | | | App D |<-+ | > | +-------+ +---^---+ +-------+ | | +-------+ |(2') | > | |127.0.0.1 (1')| |192.168.0.11 192.168.0.12| | > | (1)| +--------+ | +--------+ |(2) | | +--------+ +--------+ | > | `-->| lo |-` | eth0 |<-` | | | lo | | eth0 | | > +---------+--|---^-+---+-----|--+---------+ +-+--------+---+-^------+-+ > | | | | > Kernel | | | | > +----+-------v---+-----------v----------------------------------+---+----+ > | | TCP | | > | | | | > | +--------------------------------------------------------------+ | > | | > | +--------------+ | > | | smc loopback | | > +---------------------------+--------------+-----------------------------+ > > loopback-ism device creates DMBs (shared memory) for each connection peer. > Since data transfer occurs within the same kernel, the sndbuf of each peer > is only a descriptor and point to the same memory region as peer DMB, so that > the data copy from sndbuf to peer DMB can be avoided in loopback-ism case. > > Container 1 (ns1) Container 2 (ns2) > +-----------------------------------------+ +-------------------------+ > | +-------+ | | +-------+ | > | | App C |-----+ | | | App D | | > | +-------+ | | | +-^-----+ | > | | | | | | > | (2) | | | (2') | | > | | | | | | > +---------------|-------------------------+ +----------|--------------+ > | | > Kernel | | > +---------------|-----------------------------------------|--------------+ > | +--------+ +--v-----+ +--------+ +--------+ | > | |dmb_desc| |snd_desc| |dmb_desc| |snd_desc| | > | +-----|--+ +--|-----+ +-----|--+ +--------+ | > | +-----|--+ | +-----|--+ | > | | DMB C | +---------------------------------| DMB D | | > | +--------+ +--------+ | > | | > | +--------------+ | > | | smc loopback | | > +---------------------------+--------------+-----------------------------+ > > # Benchmark Test > > * Test environments: > - VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem. > - SMC sndbuf/DMB size 1MB. > - /sys/devices/virtual/smc/loopback-ism/dmb_copy is set to default 0, > which means sndbuf and DMB are merged and no data copied between them. > - /sys/devices/virtual/smc/loopback-ism/dmb_type is set to default 0, > which means DMB is physically contiguous buffer. > > * Test object: > - TCP: run on TCP loopback. > - SMC lo: run on SMC loopback device. > > 1. ipc-benchmark (see [3]) > > - ./ -c 1000000 -s 100 > > TCP SMC-lo > Message > rate (msg/s) 80636 149515(+85.42%) > > 2. sockperf > > - serv: taskset -c sockperf sr --tcp > - clnt: taskset -c sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30 > > TCP SMC-lo > Bandwidth(MBps) 4909.36 8197.57(+66.98%) > Latency(us) 6.098 3.383(-44.52%) > > 3. nginx/wrk > > - serv: nginx > - clnt: wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80 > > TCP SMC-lo > Requests/s 181685.74 246447.77(+35.65%) > > 4. redis-benchmark > > - serv: redis-server > - clnt: redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024 > > TCP SMC-lo > GET(Requests/s) 85855.34 118553.64(+38.09%) > SET(Requests/s) 86824.40 125944.58(+45.06%) > > > Change log: > > v1->RFC: > - Patch #9: merge rx_bytes and tx_bytes as xfer_bytes statistics: > /sys/devices/virtual/smc/loopback-ism/xfer_bytes > - Patch #10: add support_dmb_nocopy operation to check if SMC-D device supports > merging sndbuf with peer DMB. > - Patch #13 & #14: introduce loopback-ism device control of DMB memory type and > control of whether to merge sndbuf and DMB. They can be respectively set by: > /sys/devices/virtual/smc/loopback-ism/dmb_type > /sys/devices/virtual/smc/loopback-ism/dmb_copy > The motivation for these two control is that a performance bottleneck was > found when using vzalloced DMB and sndbuf is merged with DMB, and there are > many CPUs and CONFIG_HARDENED_USERCOPY is set [4]. The bottleneck is caused > by the lock contention in vmap_area_lock [5] which is involved in memcpy_from_msg() > or memcpy_to_msg(). Currently, Uladzislau Rezki is working on mitigating the > vmap lock contention [6]. It has significant effects, but using virtual memory > still has additional overhead compared to using physical memory. > So this new version provides controls of dmb_type and dmb_copy to suit > different scenarios. > - Some minor changes and comments improvements. > > RFC->old version([1]): > Link: https://lore.kernel.org/netdev/1702214654-32069-1-git-send-email-guwen@linux.alibaba.com/ > - Patch #1: improve the loopback-ism dump, it shows as follows now: > # smcd d > FID Type PCI-ID PCHID InUse #LGs PNET-ID > 0000 0 loopback-ism ffff No 0 > - Patch #3: introduce the smc_ism_set_v2_capable() helper and set > smc_ism_v2_capable when ISMv2 or virtual ISM is registered, > regardless of whether there is already a device in smcd device list. > - Patch #3: loopback-ism will be added into /sys/devices/virtual/smc/loopback-ism/. > - Patch #8: introduce the runtime switch /sys/devices/virtual/smc/loopback-ism/active > to activate or deactivate the loopback-ism. > - Patch #9: introduce the statistics of loopback-ism by > /sys/devices/virtual/smc/loopback-ism/{{tx|rx}_tytes|dmbs_cnt}. > - Some minor changes and comments improvements. > > [1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/ > [2] https://lore.kernel.org/netdev/20231219142616.80697-1-guwen@linux.alibaba.com/ > [3] https://github.com/goldsborough/ipc-bench > [4] https://lore.kernel.org/all/3189e342-c38f-6076-b730-19a6efd732a5@linux.alibaba.com/ > [5] https://lore.kernel.org/all/238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@linux.alibaba.com/ > [6] https://lore.kernel.org/all/20240102184633.748113-1-urezki@gmail.com/ > > Wen Gu (15): > net/smc: improve SMC-D device dump for virtual ISM > net/smc: decouple specialized struct from SMC-D DMB registration > net/smc: introduce virtual ISM device loopback-ism > net/smc: implement ID-related operations of loopback-ism > net/smc: implement some unsupported operations of loopback-ism > net/smc: implement DMB-related operations of loopback-ism > net/smc: register loopback-ism into SMC-D device list > net/smc: introduce loopback-ism runtime switch > net/smc: introduce loopback-ism statistics attributes > net/smc: add operations to merge sndbuf with peer DMB > net/smc: attach or detach ghost sndbuf to peer DMB > net/smc: adapt cursor update when sndbuf and peer DMB are merged > net/smc: introduce loopback-ism DMB type control > net/smc: introduce loopback-ism DMB data copy control > net/smc: implement DMB-merged operations of loopback-ism > > drivers/s390/net/ism_drv.c | 2 +- > include/net/smc.h | 7 +- > net/smc/Kconfig | 13 + > net/smc/Makefile | 2 +- > net/smc/af_smc.c | 28 +- > net/smc/smc_cdc.c | 58 ++- > net/smc/smc_cdc.h | 1 + > net/smc/smc_core.c | 61 +++- > net/smc/smc_core.h | 1 + > net/smc/smc_ism.c | 71 +++- > net/smc/smc_ism.h | 5 + > net/smc/smc_loopback.c | 718 +++++++++++++++++++++++++++++++++++++ > net/smc/smc_loopback.h | 88 +++++ > 13 files changed, 1026 insertions(+), 29 deletions(-) > create mode 100644 net/smc/smc_loopback.c > create mode 100644 net/smc/smc_loopback.h > Hi Wen, Thank you for the patience again! You can find the comments under the corresponding patches respectively. About the file hierarchy in sysfs and the names, we still have some thoughts. We need to investigate a bit more time on it. Thanks, Gerd & Wenjia