Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp4975rdb; Wed, 17 Jan 2024 15:04:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IGfVTBsD/RwNuoXsOn76WxppLwi7aUSMt0iX3M4oZlQMFsuqMTvzkm4dQIE/DwSuO6BhscE X-Received: by 2002:a05:6214:20ee:b0:681:797d:992b with SMTP id 14-20020a05621420ee00b00681797d992bmr2116823qvk.101.1705532687519; Wed, 17 Jan 2024 15:04:47 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705532687; cv=pass; d=google.com; s=arc-20160816; b=PT5InqA5wZIy+CIsarUSuUbPlITfGHiNur0G9O6PCKOnVbnJw5XmgVpoOAojIEJ+8/ 0QE9HN/uuNzYxss3lXZ37jDWAGtQOPBkChltq6GNWURI6eBuFbvQkEkiMbd9W+q6+HS8 PTBRs8p/b0KQjxQBCdgpN226Nhr2oxWzHirJVwg2TP81CTAYwGBx+wyrbaLSJIIr/kKU N7g3jrEfdNedLGdO/zzEgjpihF3RMi9IBhUSdL8SZO3T/evm84C9LxSC62/0mfexTv/d pYeaGbkVVUIH22jtCpM74ArmgUJ3YA71DEqQ0bRsQoMoSpxFLcKzNp7enmnqZ0cZw8dM Dftg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=sKiOkgpupfu/xZ9c0nKrOyJ57HTRtTh2I6NcqVJ0RnM=; fh=1aofuOYPgaN/GgpBzUhBk7UYuvo2RiYHO/HF9UXg4F4=; b=a8FZxGujWzvpDl1nKdtW9zT3bIleNJ90TUbChwX/2sryfGZywcc80e3UqWYaCl5lF0 CAzQymyC+/ft0eVNgUcGe9QmsDQ3b2FaSzAfG4Y9FfMcbuQ5hRqdobDCKzmfVoc3SQWq ggxRs2CpMg+fAiUJXPZ07aAUJcGUGk9/XCrTQ97IZmoqW7bEfUETYX6ImFVZWGfCsSRn igy8ymIY1zUsWFQzXQb261bZtoimbeclAZ68FeT2omRLwlPlOGaT4skjNmEI8TutFUeT eGwzoZVn/PWGY2WhSWVlgLcWwxH3Tp8r8IQFddr4ZItWe+IDUmv5lxDbfv9djuKwDOp9 Nx5Q== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=k+GkIeKE; arc=pass (i=1 spf=pass spfdomain=kernel.dk dkim=pass dkdomain=kernel-dk.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-29532-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29532-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id t4-20020a0c8d84000000b0067f7155921csi12706654qvb.285.2024.01.17.15.04.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jan 2024 15:04:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-29532-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=k+GkIeKE; arc=pass (i=1 spf=pass spfdomain=kernel.dk dkim=pass dkdomain=kernel-dk.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-29532-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29532-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 24DC31C20A1F for ; Wed, 17 Jan 2024 23:04:47 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5265B28DAD; Wed, 17 Jan 2024 23:04:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="k+GkIeKE" Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EB6924B36 for ; Wed, 17 Jan 2024 23:04:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705532668; cv=none; b=VzBnKqh8GhJvldkQlYm0SdcJ1mDb02nmhm6Ttv+dmyIzL1d6RyUWz+IH9Dgr8wsL5ctxGExNp1wtzw19JtVikiI64VyBY/YVPzPCK6LUoQpM1aasYc3zscuMfWqRnCqhnu6xckVYFFjByeBKCimrkZcnJTWZHnHFtFzQS9VrEFk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705532668; c=relaxed/simple; bh=nIy8rn2sGR7SXFwc4gKM4r8s3RaW0NqpXIxGO0LCCnY=; h=Received:DKIM-Signature:X-Google-DKIM-Signature: X-Gm-Message-State:X-Google-Smtp-Source:X-Received:Received: Message-ID:Date:MIME-Version:User-Agent:Subject:Content-Language: To:Cc:References:From:In-Reply-To:Content-Type: Content-Transfer-Encoding; b=IY9X2ByhBI8rXqGItF4S/zYaAQ4pYpU9kcIUscMTlmY1z9MU9yfXFXy4KO2a1w7UzLqWLvoJ6lncfzjEKQ3aI7+o1v+rKdll0ZiL2adNDzUsm2QP+hsoHSoSVziiQDI939HILwEMHVVDcTwfkBpp8OaDAvhwgY7XzIZGb8QBHS4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=k+GkIeKE; arc=none smtp.client-ip=209.85.166.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Received: by mail-io1-f51.google.com with SMTP id ca18e2360f4ac-7bb06f56fe9so111254039f.0 for ; Wed, 17 Jan 2024 15:04:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1705532663; x=1706137463; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=sKiOkgpupfu/xZ9c0nKrOyJ57HTRtTh2I6NcqVJ0RnM=; b=k+GkIeKEvVE2429hv2MldemFuNZhPw825xqk8LcstJ0oCQZwHAOdw95MKPrsg2nZrv 8vWp453T9KwTbTR+ao3syDa1iE0tqAJ3hYh4a3GnLfGQl5byNgcAP9K1XIseihni1rbB vWplUTd+uBwI8BFo0TADaKeS0Nwc5WmqIsmN0RuUujg8+E5SSqOdPfUUvVB1JKDcocjZ Is25ESyzwWqncUgPWS4cIMTMl+VWOmEfs6Kc6cQpGcH45KfqkPY+2M636xD+B68hiuin j3Es1ef5Ui3xkBXLEhLaUltX+nVWuTyxze3bZOQ8GBHI1umGfoVVqdrufMA6qAerficW te/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705532663; x=1706137463; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sKiOkgpupfu/xZ9c0nKrOyJ57HTRtTh2I6NcqVJ0RnM=; b=nQzzLv9ioggPfVlngVUF5Nplidklyn0xwGYQhFnl39Ix+brM7Jz8i7b2Tu42LK11wA XHGOo0TSJGeI/DC/5iBS5CdoC+qpXZk9YsYDYPKKs7jG1IrvGt8plNc3Y7s6LOGHQB0U l1vo7nLG79tNaUT2k9UVK3P66EPBQyI+IqdLdG6z7urZ9tYoalgr1M8QVodeiGGLOzey nhmsQShP2mZRS81JsNzDT33xGCAfGId/m810WVlRctAszoS+o/FZAMRPZ19aI8alCkRq flHFmJBGQsn2n5SLo6e3C7/g+f57mmoweofkF4C7hL1uRzCRPQv12W9mXRmBAAg63zAV 5OnA== X-Gm-Message-State: AOJu0Yz1ERFNAsYql+Pu+sjHoWKPUXu7QXP8PkV4DGL+z7fTWa9wg8Ne WRYgwkOHjDdmzYeZ+47pZQWNB26ozS4SIsSNCrJu+faMTyIBe8d1YI7B11lnUNhDJ2jtIigi1i+ PUms= X-Received: by 2002:a5e:d70c:0:b0:7be:edbc:629f with SMTP id v12-20020a5ed70c000000b007beedbc629fmr94956iom.0.1705532663485; Wed, 17 Jan 2024 15:04:23 -0800 (PST) Received: from [192.168.1.116] ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id s11-20020a056638258b00b0046dda6b83c1sm670895jat.25.2024.01.17.15.04.22 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 17 Jan 2024 15:04:22 -0800 (PST) Message-ID: <0a626e1a-939a-44e5-bb82-0275c19f7143@kernel.dk> Date: Wed, 17 Jan 2024 16:04:22 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6] io_uring: Statistics of the true utilization of sq threads. Content-Language: en-US To: Xiaobing Li Cc: asml.silence@gmail.com, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, kun.dou@samsung.com, peiwei.li@samsung.com, joshi.k@samsung.com, kundan.kumar@samsung.com, wenwen.chen@samsung.com, ruyi.zhang@samsung.com References: <20240117083706.11766-1-xiaobing.li@samsung.com> From: Jens Axboe In-Reply-To: <20240117083706.11766-1-xiaobing.li@samsung.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 1/17/24 1:37 AM, Xiaobing Li wrote: > On 1/12/24 2:58 AM, Jens Axboe wrote: >> On 1/11/24 6:12 PM, Xiaobing Li wrote: >>> On 1/10/24 16:15 AM, Jens Axboe wrote: >>>> On 1/10/24 2:05 AM, Xiaobing Li wrote: >>>>> On 1/5/24 04:02 AM, Pavel Begunkov wrote: >>>>>> On 1/3/24 05:49, Xiaobing Li wrote: >>>>>>> On 12/30/23 9:27 AM, Pavel Begunkov wrote: >>>>>>>> Why it uses jiffies instead of some task run time? >>>>>>>> Consequently, why it's fine to account irq time and other >>>>>>>> preemption? (hint, it's not) >>>>>>>> >>>>>>>> Why it can't be done with userspace and/or bpf? Why >>>>>>>> can't it be estimated by checking and tracking >>>>>>>> IORING_SQ_NEED_WAKEUP in userspace? >>>>>>>> >>>>>>>> What's the use case in particular? Considering that >>>>>>>> one of the previous revisions was uapi-less, something >>>>>>>> is really fishy here. Again, it's a procfs file nobody >>>>>>>> but a few would want to parse to use the feature. >>>>>>>> >>>>>>>> Why it just keeps aggregating stats for the whole >>>>>>>> life time of the ring? If the workload changes, >>>>>>>> that would either totally screw the stats or would make >>>>>>>> it too inert to be useful. That's especially relevant >>>>>>>> for long running (days) processes. There should be a >>>>>>>> way to reset it so it starts counting anew. >>>>>>> >>>>>>> Hi, Jens and Pavel, >>>>>>> I carefully read the questions you raised. >>>>>>> First of all, as to why I use jiffies to statistics time, it >>>>>>> is because I have done some performance tests and found that >>>>>>> using jiffies has a relatively smaller loss of performance >>>>>>> than using task run time. Of course, using task run time is >>>>>> >>>>>> How does taking a measure for task runtime looks like? I expect it to >>>>>> be a simple read of a variable inside task_struct, maybe with READ_ONCE, >>>>>> in which case the overhead shouldn't be realistically measurable. Does >>>>>> it need locking? >>>>> >>>>> The task runtime I am talking about is similar to this: >>>>> start = get_system_time(current); >>>>> do_io_part(); >>>>> sq->total_time += get_system_time(current) - start; >>>> >>>> Not sure what get_system_time() is, don't see that anywhere. >>>> >>>>> Currently, it is not possible to obtain the execution time of a piece of >>>>> code by a simple read of a variable inside task_struct. >>>>> Or do you have any good ideas? >>>> >>>> I must be missing something, because it seems like all you need is to >>>> read task->stime? You could possible even make do with just logging busy >>>> loop time, as getrusage(RUSAGE_THREAD, &stat) from userspace would then >>>> give you the total time. >>>> >>>> stat.ru_stime would then be the total time, the thread ran, and >>>> 1 - (above_busy_stime / stat.ru_stime) would give you the time the >>>> percentage of time the thread ran and did useful work (eg not busy >>>> looping. >>> >>> getrusage can indeed get the total time of the thread, but this >>> introduces an extra function call, which is relatively more >>> complicated than defining a variable. In fact, recording the total >>> time of the loop and the time of processing the IO part can achieve >>> our observation purpose. Recording only two variables will have less >>> impact on the existing performance, so why not choose a simpler and >>> effective method. >> >> I'm not opposed to exposing both of them, it does make the API simpler. >> If we can call it an API... I think the main point was using task->stime >> for it rather than jiffies etc. > > Hi, Jens and Pavel > I modified the code according to your opinions. > > I got the total time of the sqpoll thread through getrusage. > eg? > > fdinfo.c: > +long sq_total_time = 0; > +long sq_work_time = 0; > if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { > struct io_sq_data *sq = ctx->sq_data; > > sq_pid = sq->task_pid; > sq_cpu = sq->sq_cpu; > + struct rusage r; > + getrusage(sq->thread, RUSAGE_SELF, &r); > + sq_total_time = r.ru_stime.tv_sec * 1000000 + r.ru_stime.tv_usec; > + sq_work_time = sq->work_time; > } > > seq_printf(m, "SqThread:\t%d\n", sq_pid); > seq_printf(m, "SqThreadCpu:\t%d\n", sq_cpu); > +seq_printf(m, "SqTotalTime:\t%ldus\n", sq_total_time); > +seq_printf(m, "SqWorkTime:\t%ldus\n", sq_work_time); > seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files); > > The working time of the sqpoll thread is obtained through ktime_get(). > eg? > > sqpoll.c: > +ktime_t start, diff; > +start = ktime_get(); > list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { > int ret = __io_sq_thread(ctx, cap_entries); > > if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list))) > sqt_spin = true; > } > if (io_run_task_work()) > sqt_spin = true; > > +diff = ktime_sub(ktime_get(), start); > +if (sqt_spin == true) > + sqd->work_time += ktime_to_us(diff); > > The test results are as follows: > Every 2.0s: cat /proc/9230/fdinfo/6 | grep -E Sq > SqMask: 0x3 > SqHead: 3197153 > SqTail: 3197153 > CachedSqHead: 3197153 > SqThread: 9231 > SqThreadCpu: 11 > SqTotalTime: 92215321us > SqWorkTime: 15106578us > > Do you think this solution work? Possibly, can you send an actual patch? Would be easier to review that way. Bonus points for crafting test cases that can help vet that it calculates the right thing (eg test case that does 50% idle, 25% idle, 75% idle, that kind of thing). -- Jens Axboe