Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp1077821rbb; Sun, 25 Feb 2024 19:05:31 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCX+77hugxF6rnrV4Hxh0uKi/dyVl1mW8Z20bJVb+gpAAfxXhbbkFcU1mY3KW5/5HGgkj6a3cnWm4qfgMMfvf5ukxUAJ2KqRUpHeiHrrlw== X-Google-Smtp-Source: AGHT+IHoJqpBV+BkT+gfZkL9JLm00CPBSm5daQvRSq4/MLQrSR440wJvfKZvJHpI5CaNp1VkBr7g X-Received: by 2002:a62:cd4e:0:b0:6e3:79ba:6eed with SMTP id o75-20020a62cd4e000000b006e379ba6eedmr7021314pfg.13.1708916731619; Sun, 25 Feb 2024 19:05:31 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708916731; cv=pass; d=google.com; s=arc-20160816; b=Bjo216EXAxbMlDkrIzUo1GPj0Hq4ntuWUbgNPWW/I80fUkcS/UBdmZ0c09etZgPiq2 AlC17RitDCoMZommGH/+8gAlKzjgysbYn0VRihhM7sPeyvhBP9TkHUPJZ50bN1I5J30x C/vk6vg1Qc9j20ryZVB+t2Y96EFFpCBg10gcq1rcol/hIVd7drNo0tQMs+2FRD8a0WnH 8USgkgPcLJz0Q9ByYhI7bloz9fNtFWCZ7nI31Ua9Via6j/8cCCibFVv36jZfgvNd8kPp 5r2hdnzIVOAOTqO3jk9rQ7Vpo4L+7vWu9v4gN3auOPzrWNkEp5gnwPFGDcexcITRsogA 1hNQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:list-unsubscribe:list-subscribe:list-id :precedence:date:message-id:dkim-signature; bh=8fHEj3Es1WjucFLsLgdW49Mkhy0Q0HKu/7afnZ+AaHQ=; fh=R2LJav7NUBRX2CoWPDckhn3zRz8dtV6E4Qed6PygHq8=; b=MhobsXxE9y2ANAFV98/UlhcS4sO175mQu44KE0VYa5vSzRKG3SHoWtsGtZLkbvDFby SK//VS6FKITXFmeMtiulYy1uXLZQg3OL8N1j40sBGFiuasvwnKU4JWdgsbOGkd1hyZuF K/LPiavBNV5PtMc4UODGHhq3w2i2MOQP4DMaJlyyVmpTx5G2AD6fP8v4h/sw/4YmfOwJ +BddW/bgowkFTAmTTeMsnpmUNTBbQ4U6gkN67tD/h3bys6jOAdoTQKBraJQLeaTT+X9k Bw4pUtsRFD/hD2kRajVa09LF2tWPFbSON48e+xtYkd8fNK/XfIaUMLYHBAuLo2esKGNZ zVCQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.alibaba.com header.s=default header.b=kLcVw91j; arc=pass (i=1 spf=pass spfdomain=linux.alibaba.com dkim=pass dkdomain=linux.alibaba.com dmarc=pass fromdomain=linux.alibaba.com); spf=pass (google.com: domain of linux-kernel+bounces-80442-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-80442-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.alibaba.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id n31-20020a63591f000000b005dc49c4dfc7si2922604pgb.471.2024.02.25.19.05.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 25 Feb 2024 19:05:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-80442-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.alibaba.com header.s=default header.b=kLcVw91j; arc=pass (i=1 spf=pass spfdomain=linux.alibaba.com dkim=pass dkdomain=linux.alibaba.com dmarc=pass fromdomain=linux.alibaba.com); spf=pass (google.com: domain of linux-kernel+bounces-80442-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-80442-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.alibaba.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 4FC6D280F4C for ; Mon, 26 Feb 2024 03:05:11 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 601311401B; Mon, 26 Feb 2024 03:05:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="kLcVw91j" Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9AA3C8C7; Mon, 26 Feb 2024 03:04:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.132 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708916699; cv=none; b=BSEt57m0dxQekucnIG7birg9L0AajabOHSQaNAozjXnTafBfbb1k4V7wNB6gyCbI7BNbLmxMyjJiesjpJb2I1lR2Vo5qGXCyPIz7WPaqJXYWwS3JHFmOKXx3CnspLN5Uk4YHpo2mvd8eqhu2zTIH3MkcgtD+klI3cuGjRBtbcOY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708916699; c=relaxed/simple; bh=7DPdQp+crI03h1yJx2LmjB97PDiC+mM7c78GGmSvO3s=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=kC2MY79QMYRgkRM4UTSmcP1mGrqFiWF07AswbUZbYkj/mGKa0Z27ATI0XFGIa36gCgU+DbhhOw8HycxYFxkGhL+zUto5YJb9h2eHpuufAdknYO3t2Mj8Qu9LHShR7+PRXLv6m26gRNGGVU2pqCVa203rQv2iDlc2AmZJvXUMsxI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=kLcVw91j; arc=none smtp.client-ip=115.124.30.132 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708916693; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=8fHEj3Es1WjucFLsLgdW49Mkhy0Q0HKu/7afnZ+AaHQ=; b=kLcVw91j+1601OCE5O3fVMYSJUhyPt/uhiZGr6V73/plm4M5fXUqx3uRs7J/5hTIReOLsMljNdZtGlqZ5hubLv2iStOpso07wBaVrymT4OBZxtuY+vJhweXxvb3TovigJw0ULHvfffgt7rEzLiAqpH08qsLOPlaAY9EOJ9OssBc= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=guwen@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0W1AK4gS_1708916691; Received: from 30.221.129.59(mailfrom:guwen@linux.alibaba.com fp:SMTPD_---0W1AK4gS_1708916691) by smtp.aliyun-inc.com; Mon, 26 Feb 2024 11:04:52 +0800 Message-ID: Date: Mon, 26 Feb 2024 11:04:51 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next 06/15] net/smc: implement DMB-related operations of loopback-ism To: Wenjia Zhang , wintera@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, jaka@linux.ibm.com, Gerd Bayer Cc: borntraeger@linux.ibm.com, svens@linux.ibm.com, alibuda@linux.alibaba.com, tonylu@linux.alibaba.com, linux-s390@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org References: <20240111120036.109903-1-guwen@linux.alibaba.com> <20240111120036.109903-7-guwen@linux.alibaba.com> <2fe9e5e0-aa5a-41e8-a2b3-80db0208cfa9@linux.ibm.com> From: Wen Gu In-Reply-To: <2fe9e5e0-aa5a-41e8-a2b3-80db0208cfa9@linux.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 2024/2/23 22:12, Wenjia Zhang wrote: > > > On 20.02.24 02:55, Wen Gu wrote: >> >> >> On 2024/2/16 22:13, Wenjia Zhang wrote: >>> >>> >>> On 11.01.24 13:00, Wen Gu wrote: >>>> This implements DMB (un)registration and data move operations of >>>> loopback-ism device. >>>> >>>> Signed-off-by: Wen Gu >>>> --- >>>>   net/smc/smc_cdc.c      |   6 ++ >>>>   net/smc/smc_cdc.h      |   1 + >>>>   net/smc/smc_loopback.c | 133 ++++++++++++++++++++++++++++++++++++++++- >>>>   net/smc/smc_loopback.h |  13 ++++ >>>>   4 files changed, 150 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c >>>> index 3c06625ceb20..c820ef197610 100644 >>>> --- a/net/smc/smc_cdc.c >>>> +++ b/net/smc/smc_cdc.c >>>> @@ -410,6 +410,12 @@ static void smc_cdc_msg_recv(struct smc_sock *smc, struct smc_cdc_msg *cdc) >>>>   static void smcd_cdc_rx_tsklet(struct tasklet_struct *t) >>>>   { >>>>       struct smc_connection *conn = from_tasklet(conn, t, rx_tsklet); >>>> + >>>> +    smcd_cdc_rx_handler(conn); >>>> +} >>>> + >>>> +void smcd_cdc_rx_handler(struct smc_connection *conn) >>>> +{ >>>>       struct smcd_cdc_msg *data_cdc; >>>>       struct smcd_cdc_msg cdc; >>>>       struct smc_sock *smc; >>>> diff --git a/net/smc/smc_cdc.h b/net/smc/smc_cdc.h >>>> index 696cc11f2303..11559d4ebf2b 100644 >>>> --- a/net/smc/smc_cdc.h >>>> +++ b/net/smc/smc_cdc.h >>>> @@ -301,5 +301,6 @@ int smcr_cdc_msg_send_validation(struct smc_connection *conn, >>>>                    struct smc_wr_buf *wr_buf); >>>>   int smc_cdc_init(void) __init; >>>>   void smcd_cdc_rx_init(struct smc_connection *conn); >>>> +void smcd_cdc_rx_handler(struct smc_connection *conn); >>>>   #endif /* SMC_CDC_H */ >>>> diff --git a/net/smc/smc_loopback.c b/net/smc/smc_loopback.c >>>> index 353d4a2d69a1..f72e7b24fc1a 100644 >>>> --- a/net/smc/smc_loopback.c >>>> +++ b/net/smc/smc_loopback.c >>>> @@ -15,11 +15,13 @@ >>>>   #include >>>>   #include >>>> +#include "smc_cdc.h" >>>>   #include "smc_ism.h" >>>>   #include "smc_loopback.h" >>>>   #if IS_ENABLED(CONFIG_SMC_LO) >>>>   #define SMC_LO_V2_CAPABLE    0x1 /* loopback-ism acts as ISMv2 */ >>>> +#define SMC_DMA_ADDR_INVALID    (~(dma_addr_t)0) >>>>   static const char smc_lo_dev_name[] = "loopback-ism"; >>>>   static struct smc_lo_dev *lo_dev; >>>> @@ -50,6 +52,97 @@ static int smc_lo_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid, >>>>       return 0; >>>>   } >>>> +static int smc_lo_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb, >>>> +                   void *client_priv) >>>> +{ >>>> +    struct smc_lo_dmb_node *dmb_node, *tmp_node; >>>> +    struct smc_lo_dev *ldev = smcd->priv; >>>> +    int sba_idx, order, rc; >>>> +    struct page *pages; >>>> + >>>> +    /* check space for new dmb */ >>>> +    for_each_clear_bit(sba_idx, ldev->sba_idx_mask, SMC_LO_MAX_DMBS) { >>>> +        if (!test_and_set_bit(sba_idx, ldev->sba_idx_mask)) >>>> +            break; >>>> +    } >>>> +    if (sba_idx == SMC_LO_MAX_DMBS) >>>> +        return -ENOSPC; >>>> + >>>> +    dmb_node = kzalloc(sizeof(*dmb_node), GFP_KERNEL); >>>> +    if (!dmb_node) { >>>> +        rc = -ENOMEM; >>>> +        goto err_bit; >>>> +    } >>>> + >>>> +    dmb_node->sba_idx = sba_idx; >>>> +    order = get_order(dmb->dmb_len); >>>> +    pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN | >>>> +                __GFP_NOMEMALLOC | __GFP_COMP | >>>> +                __GFP_NORETRY | __GFP_ZERO, >>>> +                order); >>>> +    if (!pages) { >>>> +        rc = -ENOMEM; >>>> +        goto err_node; >>>> +    } >>>> +    dmb_node->cpu_addr = (void *)page_address(pages); >>>> +    dmb_node->len = dmb->dmb_len; >>>> +    dmb_node->dma_addr = SMC_DMA_ADDR_INVALID; >>>> + >>>> +again: >>>> +    /* add new dmb into hash table */ >>>> +    get_random_bytes(&dmb_node->token, sizeof(dmb_node->token)); >>>> +    write_lock(&ldev->dmb_ht_lock); >>>> +    hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_node->token) { >>>> +        if (tmp_node->token == dmb_node->token) { >>>> +            write_unlock(&ldev->dmb_ht_lock); >>>> +            goto again; >>>> +        } >>>> +    } >>>> +    hash_add(ldev->dmb_ht, &dmb_node->list, dmb_node->token); >>>> +    write_unlock(&ldev->dmb_ht_lock); >>>> + >>> The write_lock_irqsave()/write_unlock_irqrestore() and read_lock_irqsave()/read_unlock_irqrestore()should be used >>> instead of write_lock()/write_unlock() and read_lock()/read_unlock() in order to keep the lock irq-safe. >>> >> >> dmb_ht_lock won't be hold in an interrupt or sockirq context. The dmb_{register|unregister}, >> dmb_{attach|detach} and data_move are all on the process context. So I think write_(un)lock >> and read_(un)lock is safe here. > > right, it is not directly hold in a interrupt context, but it has a dependency on conn->send_lock as you wrote below, > which requires irq-safe lock. And this matches our finding from a test: > > ===================================================== > WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected > 6.8.0-rc4-00787-g8eb4d2392609 #2 Not tainted > ----------------------------------------------------- > smcapp/33802 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire: > 00000000a2fc0330 (&ldev->dmb_ht_lock){++++}-{2:2}, at: smc_lo_move_data+0x84/0x1d0 [> > and this task is already holding: > 00000000e4df6f28 (&smc->conn.send_lock){+.-.}-{2:2}, at: smc_tx_sndbuf_nonempty+0xaa> > which would create a new lock dependency: > (&smc->conn.send_lock){+.-.}-{2:2} -> (&ldev->dmb_ht_lock){++++}-{2:2} > but this new dependency connects a SOFTIRQ-irq-safe lock: > (&smc->conn.send_lock){+.-.}-{2:2} > I understand, thank you Wenjia. I will fix it in the next version. >> >>>> +    dmb->sba_idx = dmb_node->sba_idx; >>>> +    dmb->dmb_tok = dmb_node->token; >>>> +    dmb->cpu_addr = dmb_node->cpu_addr; >>>> +    dmb->dma_addr = dmb_node->dma_addr; >>>> +    dmb->dmb_len = dmb_node->len; >>>> + >>>> +    return 0; >>>> + >>>> +err_node: >>>> +    kfree(dmb_node); >>>> +err_bit: >>>> +    clear_bit(sba_idx, ldev->sba_idx_mask); >>>> +    return rc; >>>> +} >>>> + >>>> +static int smc_lo_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb) >>>> +{ >>>> +    struct smc_lo_dmb_node *dmb_node = NULL, *tmp_node; >>>> +    struct smc_lo_dev *ldev = smcd->priv; >>>> + >>>> +    /* remove dmb from hash table */ >>>> +    write_lock(&ldev->dmb_ht_lock); >>>> +    hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb->dmb_tok) { >>>> +        if (tmp_node->token == dmb->dmb_tok) { >>>> +            dmb_node = tmp_node; >>>> +            break; >>>> +        } >>>> +    } >>>> +    if (!dmb_node) { >>>> +        write_unlock(&ldev->dmb_ht_lock); >>>> +        return -EINVAL; >>>> +    } >>>> +    hash_del(&dmb_node->list); >>>> +    write_unlock(&ldev->dmb_ht_lock); >>>> + >>>> +    clear_bit(dmb_node->sba_idx, ldev->sba_idx_mask); >>>> +    kfree(dmb_node->cpu_addr); >>>> +    kfree(dmb_node); >>>> + >>>> +    return 0; >>>> +} >>>> + >>>>   static int smc_lo_add_vlan_id(struct smcd_dev *smcd, u64 vlan_id) >>>>   { >>>>       return -EOPNOTSUPP; >>>> @@ -76,6 +169,38 @@ static int smc_lo_signal_event(struct smcd_dev *dev, struct smcd_gid *rgid, >>>>       return 0; >>>>   } >>>> +static int smc_lo_move_data(struct smcd_dev *smcd, u64 dmb_tok, >>>> +                unsigned int idx, bool sf, unsigned int offset, >>>> +                void *data, unsigned int size) >>>> +{ >>>> +    struct smc_lo_dmb_node *rmb_node = NULL, *tmp_node; >>>> +    struct smc_lo_dev *ldev = smcd->priv; >>>> + >>>> +    read_lock(&ldev->dmb_ht_lock); >>>> +    hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_tok) { >>>> +        if (tmp_node->token == dmb_tok) { >>>> +            rmb_node = tmp_node; >>>> +            break; >>>> +        } >>>> +    } >>>> +    if (!rmb_node) { >>>> +        read_unlock(&ldev->dmb_ht_lock); >>>> +        return -EINVAL; >>>> +    } >>>> +    read_unlock(&ldev->dmb_ht_lock); >>>> + >>>> +    memcpy((char *)rmb_node->cpu_addr + offset, data, size); >>>> + >>> >>> Should this read_unlock be placed behind memcpy()? >>> >> >> dmb_ht_lock is used to ensure safe access to the DMB hash table of loopback-ism. >> The DMB hash table could be accessed by all the connections on loopback-ism, so >> it should be protected. >> >> But a certain DMB is only used by one connection, and the move_data process is >> protected by conn->send_lock (see smcd_tx_sndbuf_nonempty()), so the memcpy(rmb_node) >> here is safe and no race with other. >> >> Thanks! >> > sounds reasonable. >>> <...>