Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1493751ybl; Wed, 28 Aug 2019 15:54:15 -0700 (PDT) X-Google-Smtp-Source: APXvYqy7K5IPuhoe+XFlxY7J3+u2Kz6+fngObtK+aKL4Et1LbRSKlDvpXEPW1L05R9xam0a3XnG5 X-Received: by 2002:a17:90a:1aab:: with SMTP id p40mr6620093pjp.7.1567032855253; Wed, 28 Aug 2019 15:54:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567032855; cv=none; d=google.com; s=arc-20160816; b=lgbbd7fMt1MtgrtpSQZQP36vWx5I2mRg+eNMKJUD1YwA2B0uNIOOzsZ5hEMAuBGFCi tICg/MyGkeMXOINaHEUaGfRjO2vRtzW355fUx1ABBLZFfD5rZIrxDTyV+sqXmP8SKCON 0oX/Z+1X0HR5PDuyx6amV3vP8IiLMmgwuEizwyi3+bp8HGL8eaEcjyDNjj+SP/Y4pkWn rtwK9pqw+XF9bLDvKe06/7uzQp2NIV40m66tJ99JurmiysqVBhOxtZbBWsJTBdtYiVyK RCiNvktKjccTsByke8r26MZ3Qg5E/WPCkUtOUphfLzq8YNRS8UmPUH1SGSBCILy5eZmV UBBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=wSOZ9FKJn4NCkS590Q2nHsWpQB5dAD8O0fHvAh7HgCE=; b=Rw181yeXK0Dq5FsX0R/GbDekWeYXDnTFfMPK6CJghZkU4NgYUhrn6PYn492bIEJM6G Z/rVMFWLg1cD6ASfvZJMtB+bpS/fB832Mj2BbLVyO2pdHoH5nkcRorRCP7GHWtPXW01e CbD/X3dkSStZ70wKBlJP1oJfrCyhpZx/ae2Dr5Bk/T7kHNGpabQpKwjPfDEkKLpKLRUh /p0iA0UYiSEFhJ8yqyKxomtRecHDUEmiOXWtZlu3vJAY8FBTVo1ob7/2yqqGV0Ir4UYS CnEjh2TJjadTC/odW5m5IGBkoUGdSGQDeLuFrMkmMSupVznzDhbkIMWXDUJAagmmGdcL +XYw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id az8si285150plb.34.2019.08.28.15.53.59; Wed, 28 Aug 2019 15:54:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727060AbfH1WxE (ORCPT + 99 others); Wed, 28 Aug 2019 18:53:04 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:53790 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726828AbfH1WxD (ORCPT ); Wed, 28 Aug 2019 18:53:03 -0400 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x7SMnaZm138652; Wed, 28 Aug 2019 18:53:01 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 2up1dht6eh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Aug 2019 18:53:01 -0400 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.27/8.16.0.27) with SMTP id x7SMp21k142114; Wed, 28 Aug 2019 18:53:00 -0400 Received: from ppma05wdc.us.ibm.com (1b.90.2fa9.ip4.static.sl-reverse.com [169.47.144.27]) by mx0a-001b2d01.pphosted.com with ESMTP id 2up1dht6e1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Aug 2019 18:53:00 -0400 Received: from pps.filterd (ppma05wdc.us.ibm.com [127.0.0.1]) by ppma05wdc.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id x7SMj4kD018795; Wed, 28 Aug 2019 22:52:59 GMT Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by ppma05wdc.us.ibm.com with ESMTP id 2ujvv78pk8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Aug 2019 22:52:59 +0000 Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x7SMqwcX34603306 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 28 Aug 2019 22:52:58 GMT Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C74E413604F; Wed, 28 Aug 2019 22:52:58 +0000 (GMT) Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 30DAB136051; Wed, 28 Aug 2019 22:52:58 +0000 (GMT) Received: from oc5348122405.ibm.com.austin.ibm.com (unknown [9.53.179.215]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTP; Wed, 28 Aug 2019 22:52:58 +0000 (GMT) From: David Dai To: jhs@mojatatu.com, xiyou.wangcong@gmail.com, jiri@resnulli.us, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: zdai@us.ibm.com, zdai@linux.vnet.ibm.com Subject: [v1] iproute2: police: support 64bit rate and peakrate in tc utility Date: Wed, 28 Aug 2019 17:52:56 -0500 Message-Id: <1567032776-1118-1-git-send-email-zdai@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.1 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-08-28_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1031 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1908280218 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org For high speed adapter like Mellanox CX-5 card, it can reach upto 100 Gbits per second bandwidth. Currently htb already supports 64bit rate in tc utility. However police action rate and peakrate are still limited to 32bit value (upto 32 Gbits per second). Taking advantage of the 2 new attributes TCA_POLICE_RATE64 and TCA_POLICE_PEAKRATE64 from kernel, tc can use them to break the 32bit limit, and still keep the backward binary compatibility. Tested-by: David Dai Signed-off-by: David Dai --- include/uapi/linux/pkt_cls.h | 2 + tc/m_police.c | 64 +++++++++++++++++++++++++++-------------- tc/tc_core.c | 29 +++++++++++++++++++ tc/tc_core.h | 3 ++ 4 files changed, 76 insertions(+), 22 deletions(-) diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index b057aee..eb4ea4d 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -159,6 +159,8 @@ enum { TCA_POLICE_AVRATE, TCA_POLICE_RESULT, TCA_POLICE_TM, + TCA_POLICE_RATE64, + TCA_POLICE_PEAKRATE64, TCA_POLICE_PAD, __TCA_POLICE_MAX #define TCA_POLICE_RESULT TCA_POLICE_RESULT diff --git a/tc/m_police.c b/tc/m_police.c index 862a39f..abdbcce 100644 --- a/tc/m_police.c +++ b/tc/m_police.c @@ -71,6 +71,7 @@ static int act_parse_police(struct action_util *a, int *argc_p, char ***argv_p, unsigned int linklayer = LINKLAYER_ETHERNET; /* Assume ethernet */ int Rcell_log = -1, Pcell_log = -1; struct rtattr *tail; + __u64 rate64 = 0, prate64 = 0; if (a) /* new way of doing things */ NEXT_ARG(); @@ -121,11 +122,11 @@ static int act_parse_police(struct action_util *a, int *argc_p, char ***argv_p, } } else if (strcmp(*argv, "rate") == 0) { NEXT_ARG(); - if (p.rate.rate) { + if (rate64) { fprintf(stderr, "Double \"rate\" spec\n"); return -1; } - if (get_rate(&p.rate.rate, *argv)) { + if (get_rate64(&rate64, *argv)) { explain1("rate"); return -1; } @@ -141,11 +142,11 @@ static int act_parse_police(struct action_util *a, int *argc_p, char ***argv_p, } } else if (matches(*argv, "peakrate") == 0) { NEXT_ARG(); - if (p.peakrate.rate) { + if (prate64) { fprintf(stderr, "Double \"peakrate\" spec\n"); return -1; } - if (get_rate(&p.peakrate.rate, *argv)) { + if (get_rate64(&prate64, *argv)) { explain1("peakrate"); return -1; } @@ -189,23 +190,23 @@ action_ctrl_ok: if (!ok) return -1; - if (p.rate.rate && avrate) + if (rate64 && avrate) return -1; /* Must at least do late binding, use TB or ewma policing */ - if (!p.rate.rate && !avrate && !p.index) { + if (!rate64 && !avrate && !p.index) { fprintf(stderr, "\"rate\" or \"avrate\" MUST be specified.\n"); return -1; } /* When the TB policer is used, burst is required */ - if (p.rate.rate && !buffer && !avrate) { + if (rate64 && !buffer && !avrate) { fprintf(stderr, "\"burst\" requires \"rate\".\n"); return -1; } - if (p.peakrate.rate) { - if (!p.rate.rate) { + if (prate64) { + if (!rate64) { fprintf(stderr, "\"peakrate\" requires \"rate\".\n"); return -1; } @@ -215,22 +216,24 @@ action_ctrl_ok: } } - if (p.rate.rate) { + if (rate64) { + p.rate.rate = (rate64 >= (1ULL << 32)) ? ~0U : rate64; p.rate.mpu = mpu; p.rate.overhead = overhead; - if (tc_calc_rtable(&p.rate, rtab, Rcell_log, mtu, - linklayer) < 0) { + if (tc_calc_rtable_64(&p.rate, rtab, Rcell_log, mtu, + linklayer, rate64) < 0) { fprintf(stderr, "POLICE: failed to calculate rate table.\n"); return -1; } - p.burst = tc_calc_xmittime(p.rate.rate, buffer); + p.burst = tc_calc_xmittime(rate64, buffer); } p.mtu = mtu; - if (p.peakrate.rate) { + if (prate64) { + p.peakrate.rate = (prate64 >= (1ULL << 32)) ? ~0U : prate64; p.peakrate.mpu = mpu; p.peakrate.overhead = overhead; - if (tc_calc_rtable(&p.peakrate, ptab, Pcell_log, mtu, - linklayer) < 0) { + if (tc_calc_rtable_64(&p.peakrate, ptab, Pcell_log, mtu, + linklayer, prate64) < 0) { fprintf(stderr, "POLICE: failed to calculate peak rate table.\n"); return -1; } @@ -238,10 +241,16 @@ action_ctrl_ok: tail = addattr_nest(n, MAX_MSG, tca_id); addattr_l(n, MAX_MSG, TCA_POLICE_TBF, &p, sizeof(p)); - if (p.rate.rate) + if (rate64) { addattr_l(n, MAX_MSG, TCA_POLICE_RATE, rtab, 1024); - if (p.peakrate.rate) + if (rate64 >= (1ULL << 32)) + addattr64(n, MAX_MSG, TCA_POLICE_RATE64, rate64); + } + if (prate64) { addattr_l(n, MAX_MSG, TCA_POLICE_PEAKRATE, ptab, 1024); + if (prate64 >= (1ULL << 32)) + addattr64(n, MAX_MSG, TCA_POLICE_PEAKRATE64, prate64); + } if (avrate) addattr32(n, MAX_MSG, TCA_POLICE_AVRATE, avrate); if (presult) @@ -268,6 +277,7 @@ static int print_police(struct action_util *a, FILE *f, struct rtattr *arg) struct rtattr *tb[TCA_POLICE_MAX+1]; unsigned int buffer; unsigned int linklayer; + __u64 rate64, prate64; if (arg == NULL) return 0; @@ -286,16 +296,26 @@ static int print_police(struct action_util *a, FILE *f, struct rtattr *arg) #endif p = RTA_DATA(tb[TCA_POLICE_TBF]); + rate64 = p->rate.rate; + if (tb[TCA_POLICE_RATE64] && + RTA_PAYLOAD(tb[TCA_POLICE_RATE64]) >= sizeof(rate64)) + rate64 = rta_getattr_u64(tb[TCA_POLICE_RATE64]); + fprintf(f, " police 0x%x ", p->index); - fprintf(f, "rate %s ", sprint_rate(p->rate.rate, b1)); - buffer = tc_calc_xmitsize(p->rate.rate, p->burst); + fprintf(f, "rate %s ", sprint_rate(rate64, b1)); + buffer = tc_calc_xmitsize(rate64, p->burst); fprintf(f, "burst %s ", sprint_size(buffer, b1)); fprintf(f, "mtu %s ", sprint_size(p->mtu, b1)); if (show_raw) fprintf(f, "[%08x] ", p->burst); - if (p->peakrate.rate) - fprintf(f, "peakrate %s ", sprint_rate(p->peakrate.rate, b1)); + prate64 = p->peakrate.rate; + if (tb[TCA_POLICE_PEAKRATE64] && + RTA_PAYLOAD(tb[TCA_POLICE_PEAKRATE64]) >= sizeof(prate64)) + prate64 = rta_getattr_u64(tb[TCA_POLICE_PEAKRATE64]); + + if (prate64) + fprintf(f, "peakrate %s ", sprint_rate(prate64, b1)); if (tb[TCA_POLICE_AVRATE]) fprintf(f, "avrate %s ", diff --git a/tc/tc_core.c b/tc/tc_core.c index 8eb1122..498d35d 100644 --- a/tc/tc_core.c +++ b/tc/tc_core.c @@ -152,6 +152,35 @@ int tc_calc_rtable(struct tc_ratespec *r, __u32 *rtab, return cell_log; } +int tc_calc_rtable_64(struct tc_ratespec *r, __u32 *rtab, + int cell_log, unsigned int mtu, + enum link_layer linklayer, __u64 rate) +{ + int i; + unsigned int sz; + __u64 bps = rate; + unsigned int mpu = r->mpu; + + if (mtu == 0) + mtu = 2047; + + if (cell_log < 0) { + cell_log = 0; + while ((mtu >> cell_log) > 255) + cell_log++; + } + + for (i = 0; i < 256; i++) { + sz = tc_adjust_size((i + 1) << cell_log, mpu, linklayer); + rtab[i] = tc_calc_xmittime(bps, sz); + } + + r->cell_align = -1; + r->cell_log = cell_log; + r->linklayer = (linklayer & TC_LINKLAYER_MASK); + return cell_log; +} + /* stab[pkt_len>>cell_log] = pkt_xmit_size>>size_log */ diff --git a/tc/tc_core.h b/tc/tc_core.h index bd4a99f..40520e7 100644 --- a/tc/tc_core.h +++ b/tc/tc_core.h @@ -21,6 +21,9 @@ unsigned tc_calc_xmittime(__u64 rate, unsigned size); unsigned tc_calc_xmitsize(__u64 rate, unsigned ticks); int tc_calc_rtable(struct tc_ratespec *r, __u32 *rtab, int cell_log, unsigned mtu, enum link_layer link_layer); +int tc_calc_rtable_64(struct tc_ratespec *r, __u32 *rtab, + int cell_log, unsigned mtu, enum link_layer link_layer, + __u64 rate); int tc_calc_size_table(struct tc_sizespec *s, __u16 **stab); int tc_setup_estimator(unsigned A, unsigned time_const, struct tc_estimator *est); -- 1.7.1