Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp369559pxv; Thu, 22 Jul 2021 02:06:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy3VIN5CZyrrFuMdX319GPiVAjZZ0u47pDaJ6+XfLEVZVCtnDmG7S+fExuWx0+gzFl99qby X-Received: by 2002:a05:6e02:1905:: with SMTP id w5mr13675419ilu.270.1626944781871; Thu, 22 Jul 2021 02:06:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626944781; cv=none; d=google.com; s=arc-20160816; b=fh8Wk/LO+i6x6a/hrVF4aXXtSj0s7qa4b3s8stluT9jUDFCxCqS/RD9+1iV1/qmW1K PwWsUMju1AjSvx0k0Ej6BMGxvnsZlYT2iocq2xWkOVV7KAGrU0wtL3q/ZIOgNrORsc7d fSuDpS6EPXKTA/HWzPLUfo0EMc9RmzNCuVPL44xqEgXh4UVfVS0UGknx0lQxNioH+jVr MDs7fXnFGhvry/EtU/B1GRUwktPkFQx4EMoe0wWKh6PbixR4g1GXNWSwpF8c9wEUu0/R Zly8B+oyHevDdKxd1Af5MqybAWgdRKaCixUnxSiVpjfOwnCsvuU6qs1oRqiEavNNZv8J KCLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=613H4wAGuEnZg548mMLIz3DNJ0VmDXrroK5bj+IIg9Q=; b=Wqwd1zHznhUdtWPBnYk3buPXslJPTknQIJGBfxa37V8TR77voyqPR5zKx1MGMqMNmc 8SCM23KnKdZz47+MlzViQnyvW7nS4zC8NzmB8PEhYFXbaWqK+l95KsPpmYzJIxF+RpD4 svgJWB3E53d9zVhJo35nnwVKOHw06s3AAN8k7/qSmeIk8IouXb8hGKsto5OfFpN7BcA8 resSpLbmZgV78CdH8j/MmgmwzPkt5gz54ZpbWiwKif3qvlnVvY4JVnV3PwlAGHA80niw cqtthSFI2i8iSGC1hoNCFv9ReS1jn057H8Hj8rAh7YvdhQ+QWJ0DEPX6w0rIVEN6SvOo k/bw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Z6amGgMp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j5si26886510iom.105.2021.07.22.02.06.10; Thu, 22 Jul 2021 02:06:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Z6amGgMp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230330AbhGVIYk (ORCPT + 99 others); Thu, 22 Jul 2021 04:24:40 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:55917 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230407AbhGVIYg (ORCPT ); Thu, 22 Jul 2021 04:24:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626944710; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=613H4wAGuEnZg548mMLIz3DNJ0VmDXrroK5bj+IIg9Q=; b=Z6amGgMpJWyGu9uXdYbWbvnlys+Fd9XZTxqSadFj9GGEfWOGvzOW8FE/EWmOMHsiWDFCZb KPQGYNvvz5GLLPoH+lvPKEfNtOa5scWyWfuIYnt7bjxmvRgi5iHveAB/K5ZxvMr8rNP1tF FbrOt/R7r4nOtXWPXWhsfThZJz+QJw0= Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-109-dbRqtg-QM1mGxTs4EWv0mQ-1; Thu, 22 Jul 2021 05:05:08 -0400 X-MC-Unique: dbRqtg-QM1mGxTs4EWv0mQ-1 Received: by mail-pl1-f197.google.com with SMTP id e12-20020a170902784cb02901298fdd4067so2406677pln.0 for ; Thu, 22 Jul 2021 02:05:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=613H4wAGuEnZg548mMLIz3DNJ0VmDXrroK5bj+IIg9Q=; b=XclxF6qFh8HakG1CpAkOi71u0ib5KhNcHuTJ6PumyLPvmRDaPWvoxyjchOYLpRbrP+ DbIoFu6bFqzIjivEa7IyCZfjSwh3fpmpXwnQ+5LgidnZHH86Zocu7GuHgBxFd8GiltlZ SxW/8OGZmMw/S554UMMWX61wmVeOfudj18HBh7ReFl62xYW0HgA+57D1HRBSWvXtuzuh phss6ePgc43/kTZBkM3dgH8GhxYz8wslNRbA2s2OyJVVdBRsw8p4e+QADWE2ZWNSt1Rs lW2DSVDso7llAiXnUVJ5bAHU+RWqJP8rKhBG0qCtvTdT6ZWdEoSo93zjG8HhPIf6OBQU 4EAA== X-Gm-Message-State: AOAM5301wZwNO9SRJYqpSkgmzs2pVptE3W9PgEd+C06Q4YfVMPcs+PvF o/iLNJj4uHoO/2LfLiUpCbi7ZBy9mJnYMaO7WskaNZKdMAOI+E3pOSDyZ7R01ktD8yY0Wf0HkV3 x9LTIVIlByXitTaZ5NXoptkI6 X-Received: by 2002:a63:1658:: with SMTP id 24mr12874156pgw.307.1626944707783; Thu, 22 Jul 2021 02:05:07 -0700 (PDT) X-Received: by 2002:a63:1658:: with SMTP id 24mr12874128pgw.307.1626944707458; Thu, 22 Jul 2021 02:05:07 -0700 (PDT) Received: from wangxiaodeMacBook-Air.local ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id t3sm4781930pfd.153.2021.07.22.02.05.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 22 Jul 2021 02:05:07 -0700 (PDT) Subject: Re: [RFC 0/3] cpuidle: add poll_source API and virtio vq polling To: Stefan Hajnoczi Cc: linux-kernel@vger.kernel.org, Daniel Lezcano , Stefano Garzarella , Ming Lei , "Michael S . Tsirkin" , Marcelo Tosatti , Jens Axboe , linux-block@vger.kernel.org, "Rafael J. Wysocki" , virtualization@lists.linux-foundation.org, linux-pm@vger.kernel.org, Christoph Hellwig References: <20210713161906.457857-1-stefanha@redhat.com> <1008dee4-fce1-2462-1520-f5432bc89a07@redhat.com> From: Jason Wang Message-ID: <869a993d-a1b0-1c39-d081-4cdd2b71041f@redhat.com> Date: Thu, 22 Jul 2021 17:04:57 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2021/7/21 下午5:41, Stefan Hajnoczi 写道: > On Wed, Jul 21, 2021 at 11:29:55AM +0800, Jason Wang wrote: >> 在 2021/7/14 上午12:19, Stefan Hajnoczi 写道: >>> These patches are not polished yet but I would like request feedback on this >>> approach and share performance results with you. >>> >>> Idle CPUs tentatively enter a busy wait loop before halting when the cpuidle >>> haltpoll driver is enabled inside a virtual machine. This reduces wakeup >>> latency for events that occur soon after the vCPU becomes idle. >>> >>> This patch series extends the cpuidle busy wait loop with the new poll_source >>> API so drivers can participate in polling. Such polling-aware drivers disable >>> their device's irq during the busy wait loop to avoid the cost of interrupts. >>> This reduces latency further than regular cpuidle haltpoll, which still relies >>> on irqs. >>> >>> Virtio drivers are modified to use the poll_source API so all virtio device >>> types get this feature. The following virtio-blk fio benchmark results show the >>> improvement: >>> >>> IOPS (numjobs=4, iodepth=1, 4 virtqueues) >>> before poll_source io_poll >>> 4k randread 167102 186049 (+11%) 186654 (+11%) >>> 4k randwrite 162204 181214 (+11%) 181850 (+12%) >>> 4k randrw 159520 177071 (+11%) 177928 (+11%) >>> >>> The comparison against io_poll shows that cpuidle poll_source achieves >>> equivalent performance to the block layer's io_poll feature (which I >>> implemented in a separate patch series [1]). >>> >>> The advantage of poll_source is that applications do not need to explicitly set >>> the RWF_HIPRI I/O request flag. The poll_source approach is attractive because >>> few applications actually use RWF_HIPRI and it takes advantage of CPU cycles we >>> would have spent in cpuidle haltpoll anyway. >>> >>> The current series does not improve virtio-net. I haven't investigated deeply, >>> but it is possible that NAPI and poll_source do not combine. See the final >>> patch for a starting point on making the two work together. >>> >>> I have not tried this on bare metal but it might help there too. The cost of >>> disabling a device's irq must be less than the savings from avoiding irq >>> handling for this optimization to make sense. >>> >>> [1] https://lore.kernel.org/linux-block/20210520141305.355961-1-stefanha@redhat.com/ >> >> Hi Stefan: >> >> Some questions: >> >> 1) What's the advantages of introducing polling at virtio level instead of >> doing it at each subsystems? Polling in virtio level may only work well if >> all (or most) of the devices are virtio > I'm not sure I understand the question. cpuidle haltpoll benefits all > devices today, except it incurs interrupt latency. The poll_source API > eliminates the interrupt latency for drivers that can disable device > interrupts cheaply. > > This patch adds poll_source to core virtio code so that all virtio > drivers get this feature for free. No driver-specific changes are > needed. > > If you mean networking, block layer, etc by "subsystems" then there's > nothing those subsystems can do to help. Whether poll_source can be used > depends on the specific driver, not the subsystem. If you consider > drivers/virtio/ a subsystem, then that's exactly what the patch series > is doing. I meant, if we choose to use idle poll, we have some several choices: 1) bus level (e.g the virtio) 2) subsystem level (e.g the networking and block) I'm not sure which one is better. > >> 2) What's the advantages of using cpuidle instead of using a thread (and >> leverage the scheduler)? > In order to combine with the existing cpuidle infrastructure. No new > polling loop is introduced and no additional CPU cycles are spent on > polling. > > If cpuidle itself is converted to threads then poll_source would > automatically operate in a thread too, but this patch series doesn't > change how the core cpuidle code works. So networking subsystem can use NAPI busy polling in the process context which means it can be leveraged by the scheduler. I'm not sure it's a good idea to poll drivers for a specific bus in the general cpu idle layer. Another questions, are those numbers measured by APICV capable machine? Virtio-net turns on the tx interrupts since 2 years ago. And we don't see too much difference when measured with a APICV host. > >> 3) Any reason it's virtio_pci specific not a general virtio one? > Good idea, it is possible to move the virtio_pci changes into virtio.c. > > Other transports can't use this feature yet though. Only virtio_pci > supports vq irq affinity. But the code can be generic and if other > transports ever support vq irq affinity they'll get it for free. Yes. Thanks > >> (Btw, do we need to cc scheduler guys?) > I'm not sure. This patch series doesn't change how cpuidle interacts > with the scheduler. The cpuidle maintainers can pull in more people, if > necessary. > > Stefan