Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp425232pxv; Fri, 9 Jul 2021 00:57:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwcCzFVpy8ElFvdnHdNp3efUaR37stmyDr+bXOFFR7b7aVLMq4/0an/6KUuMOhMm2XKZxeA X-Received: by 2002:a05:6e02:1d0e:: with SMTP id i14mr19330283ila.150.1625817444318; Fri, 09 Jul 2021 00:57:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625817444; cv=none; d=google.com; s=arc-20160816; b=ieZAzOG+pB4diJbAFcGJ+KmOYhFph/sq/TkHadaqxN/+mg0VblclXNCMjCqZg59WAm tnx4dan1cyCZ8u9EvKEAmoG4wDfElMn+6BWU3GvY3EXxNiqLZZ2i7VYx+RbAiD6E0Ofz /Znw7q1Cs4hpv6K9o/xWUqPlYtOMgQqDW64lOvvLo8Z8DqwKlQP4/eWvtT+5sBS+Xk6i YiPLu1q2HeT5j6keBCEeiFrq0a3dmrs7q1/PT6EAhiJgH+4vV2/fDnZ6E3y4anGnI32S wOh99yUNQh1d7kTfsV4ROVVQaiEuh6sC7JxfcYNPvlAw0RvOQ3gTFFq4eGzKYZMbxog7 Ubdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=hQBA6yAzfiCf4M7B7gHYC/MRhuMbH/d9Iy8yog71GUE=; b=bK5+SsTRe8XA81IKkCK8ehcUHVfPNW2rrh5JcSiJhRO4gMtkwRU56IXlHjHCKF3OAS d/EEo2i/LnBuTxrZMqEK1qNhdPl8meFd6WxM+RY5uqrIMXeMs4tyfWMQpTocXvVlLrES KudCaLNaY9NXpqLG1OF9BOwN1qD1PTwuXA3GcN/yZr4Of2mPxhRe2zFYx3diof1SS5Jz qcK57O0yWxmrf6nUFMsv0ZJ+0Io9GdHrHXEALd21uAwo5fuxR6zf9ZTgrribg6SDANlf ZKm7XtqhhgqNEi9NRgUpiOdxnUTEli1Z7AEhSN2UIihNeXb3XECN1aDxSQOIEZbfymHi vNqA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hisilicon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y1si5636680jab.59.2021.07.09.00.57.12; Fri, 09 Jul 2021 00:57:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hisilicon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231402AbhGIH7I (ORCPT + 99 others); Fri, 9 Jul 2021 03:59:08 -0400 Received: from szxga02-in.huawei.com ([45.249.212.188]:10451 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231347AbhGIH7H (ORCPT ); Fri, 9 Jul 2021 03:59:07 -0400 Received: from dggemv704-chm.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4GLll85GmSzcbPD; Fri, 9 Jul 2021 15:53:08 +0800 (CST) Received: from dggemi761-chm.china.huawei.com (10.1.198.147) by dggemv704-chm.china.huawei.com (10.3.19.47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Fri, 9 Jul 2021 15:56:22 +0800 Received: from SWX921481.china.huawei.com (10.126.202.219) by dggemi761-chm.china.huawei.com (10.1.198.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Fri, 9 Jul 2021 15:56:16 +0800 From: Barry Song To: , , , , CC: , , , , , , , , , , , , , , , , Tian Tao , Jonathan Cameron , Barry Song Subject: [PATCH v6 1/4] cpumask: introduce cpumap_print_to_buf to support large bitmask and list Date: Fri, 9 Jul 2021 19:55:41 +1200 Message-ID: <20210709075544.11412-2-song.bao.hua@hisilicon.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20210709075544.11412-1-song.bao.hua@hisilicon.com> References: <20210709075544.11412-1-song.bao.hua@hisilicon.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.126.202.219] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggemi761-chm.china.huawei.com (10.1.198.147) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Tian Tao The existing cpumap_print_to_pagebuf() is used by cpu topology and other drivers to export hexadecimal bitmask and decimal list to userspace by sysfs ABI. Right now, those drivers are using a normal attribute for this kind of ABIs. A normal attribute typically has show entry as below: static ssize_t example_dev_show(struct device *dev, struct device_attribute *attr, char *buf) { ... return cpumap_print_to_pagebuf(true, buf, &pmu_mmdc->cpu); } show entry of attribute has no offset and count parameters and this means the file is limited to one page only. cpumap_print_to_pagebuf() API works terribly well for this kind of normal attribute with buf parameter and without offset, count: static inline ssize_t cpumap_print_to_pagebuf(bool list, char *buf, const struct cpumask *mask) { return bitmap_print_to_pagebuf(list, buf, cpumask_bits(mask), nr_cpu_ids); } The problem is once we have many cpus, we have a chance to make bitmask or list more than one page. Especially for list, it could be as complex as 0,3,5,7,9,...... We have no simple way to know it exact size. It turns out bin_attribute is a way to break this limit. bin_attribute has show entry as below: static ssize_t example_bin_attribute_show(struct file *filp, struct kobject *kobj, struct bin_attribute *attr, char *buf, loff_t offset, size_t count) { ... } With the new offset and count parameters, this makes sysfs ABI be able to support file size more than one page. For example, offset could be >= 4096. This patch introduces cpumap_print_to_buf() so that those drivers can move to bin_attribute to support large bitmask and list. In result, we have to pass the corresponding parameters from bin_attribute to this new API. Signed-off-by: Tian Tao Reviewed-by: Jonathan Cameron Cc: Andrew Morton Cc: Andy Shevchenko Cc: Randy Dunlap Cc: Stefano Brivio Cc: Alexander Gordeev Cc: "Ma, Jianpeng" Cc: Yury Norov Cc: Valentin Schneider Cc: Peter Zijlstra Cc: Daniel Bristot de Oliveira Signed-off-by: Barry Song --- -v6: -minor cleanup doc according to Andy Shevchenko's comment; -take bitmap_print_to_buf back according to Yury Norov's comment and fix the documents; -Sorry, Yury, I don't think it is doable to move memory allocation to drivers. Considering a driver like topology.c, we have M CPUs and each CPU has N various nodes like core_siblings, package_cpus, die_cpus etc, we can't know the size of each node of each CPU in advance. The best time to get the size of each node is really when users read the sysfs node. otherwise, we have to scan M*N nodes in drivers in advance to figure out the exact size of buffers we need. On the other hand, it is crazily tricky to ask a bundle of drivers to find a proper place to save the pointer of allocated buffers so that they can be re-used in second read of the same bin_attribute node. And I doubt it is really useful to save the address of buffers somewhere. Immediately freeing it seems to be a correct option to avoid runtime waste of memory. We can't predict when users will read topology ABI and which node users will read. Finally, reading topology things wouldn't be the really cpu-bound things in user applications, hardly this kind of ABI things can be a performance bottleneck. Users use numactl and lstopo commands to read ABIs but nobody will do it again and again. And a normal application won't read topology repeatly. So the overhead caused by malloc/free in the new bitmap API doesn't really matter. if we really want a place to re-used the buffer and avoid malloc/free, it seems this should be done in some common place rather than each driver. still it is hard to find the best place. Thanks for the comments of Yury and Andy in v5. include/linux/bitmap.h | 2 ++ include/linux/cpumask.h | 24 +++++++++++++++++++++++ lib/bitmap.c | 43 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 69 insertions(+) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index a36cfcec4e77..0de6effa2797 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -226,6 +226,8 @@ void bitmap_copy_le(unsigned long *dst, const unsigned long *src, unsigned int n unsigned int bitmap_ord_to_pos(const unsigned long *bitmap, unsigned int ord, unsigned int nbits); int bitmap_print_to_pagebuf(bool list, char *buf, const unsigned long *maskp, int nmaskbits); +int bitmap_print_to_buf(bool list, char *buf, const unsigned long *maskp, + int nmaskbits, loff_t off, size_t count); #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1))) #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1))) diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index bfc4690de4f4..8a89d133fa2d 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -983,6 +983,30 @@ cpumap_print_to_pagebuf(bool list, char *buf, const struct cpumask *mask) nr_cpu_ids); } +/** + * cpumap_print_to_buf - copies the cpumask into the buffer + * @list: indicates whether the cpumap must be list + * true: print in decimal list format + * false: print in hexadecimal bitmask format + * @mask: the cpumask to copy + * @buf: the buffer to copy into + * @off: in the string from which we are copying, We copy to @buf + * @count: the maximum number of bytes to print + * + * The function copies the cpumask into the buffer either as comma-separated + * list of cpus or hex values of cpumask; Typically used by bin_attribute to + * export cpumask bitmask and list ABI. + * + * Returns the length of how many bytes have been copied. + */ +static inline ssize_t +cpumap_print_to_buf(bool list, char *buf, const struct cpumask *mask, + loff_t off, size_t count) +{ + return bitmap_print_to_buf(list, buf, cpumask_bits(mask), + nr_cpu_ids, off, count); +} + #if NR_CPUS <= BITS_PER_LONG #define CPU_MASK_ALL \ (cpumask_t) { { \ diff --git a/lib/bitmap.c b/lib/bitmap.c index 9401d39e4722..c64baa3a8606 100644 --- a/lib/bitmap.c +++ b/lib/bitmap.c @@ -487,6 +487,49 @@ int bitmap_print_to_pagebuf(bool list, char *buf, const unsigned long *maskp, } EXPORT_SYMBOL(bitmap_print_to_pagebuf); +/** + * bitmap_print_to_buf - convert bitmap to list or hex format ASCII string + * @list: indicates whether the bitmap must be list + * true: print in decimal list format + * false: print in hexadecimal bitmask format + * @buf: buffer into which string is placed + * @maskp: pointer to bitmap to convert + * @nmaskbits: size of bitmap, in bits + * @off: in the string from which we are copying, We copy to @buf + * @count: the maximum number of bytes to print + * + * The role of cpumap_print_to_buf() and cpumap_print_to_pagebuf() is similar, + * the difference is that bitmap_print_to_pagebuf() mainly serves sysfs + * attribute with the assumption the destination buffer is exactly one page + * aligned with PAGE_SIZE and it won't be more than one page, thus, + * bitmap_print_to_pagebuf() needs neither offset to copy from nor count + * which is the length we are going to copy. cpumap_print_to_buf(), on the + * other hand, mainly serves bin_attribute which doesn't work with exact + * one page, and it has explicit parameters like "offset" to copy from and + * "count" bytes to copy. So cpumap_print_to_buf() can break the size limit + * of converted decimal list and hexadecimal bitmask. And buf doesn't have + * to be exactly one page. + * + * Returns the number of characters actually printed to @buf + */ +int bitmap_print_to_buf(bool list, char *buf, const unsigned long *maskp, + int nmaskbits, loff_t off, size_t count) +{ + const char *fmt = list ? "%*pbl\n" : "%*pb\n"; + ssize_t size; + void *data; + + data = kasprintf(GFP_KERNEL, fmt, nmaskbits, maskp); + if (!data) + return -ENOMEM; + + size = memory_read_from_buffer(buf, count, &off, data, strlen(data) + 1); + kfree(data); + + return size; +} +EXPORT_SYMBOL(bitmap_print_to_buf); + /* * Region 9-38:4/10 describes the following bitmap structure: * 0 9 12 18 38 N -- 2.25.1