Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp865366imm; Wed, 8 Aug 2018 07:04:11 -0700 (PDT) X-Google-Smtp-Source: AA+uWPxChk4uW5mXw3xuKpyG78w/qPoXd4u/M7Tn3PZ59QqofNN51iM4NV4T3kdNO++BUewNB31n X-Received: by 2002:a62:6eca:: with SMTP id j193-v6mr3148279pfc.256.1533737051569; Wed, 08 Aug 2018 07:04:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533737051; cv=none; d=google.com; s=arc-20160816; b=jo7s3h+FWU6fBQPNmMwPZQ+jn2UQotnn5OLRWV1luFVeAPQDLnXDaQhGFH0iRmxHZx bcEZfMT4ipcy1Kr+26zX/+rT4RJK9AKbMBz2eFoDN2/3wXiqgStQNVPur1Y1aEbq3fEi wMSe1wYSXfHrLcmjnIqKMU4mrH4V05FelI3hdTp0KVEyt8UOK1qtSbypeM6HOkw0O3OM jFl8CSvaEEbDP7tJy3aBgdjqkLdQo+AAlUOIi24wxdR8oUaZygb6zCzoCNQpf2HztwLT 3jRJn0B50uNO/u1jnq0ilICejPRR9Qi3lQciorEroV8aJGg9/fj9KXCLf/2GpdXsLRYW MjIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=xf1st28FLE/3twXChIO8tRq/2R47rUMO++fJtTaKIL0=; b=KKqzW6huVuaZCe6+jWWohXsHVJV3eWIISEYBoUEtuUnXtaQNO8hR8sBCaOihpz2oI9 xLGYStkLQfLWnsR+0g11DZR1K0DFUSTkfgdRXpnDWrzwzTkhgRSHerJytSmti/ApdrHS YyOjmGjWpJB0zg96u1hTxLaS2cR5zOfHWGl/vw+zWo65p6X0/KVBjnpE34vh359h0hVq +vmKapYcpBSJVgoJVMv38NgjtEjZ22eHB+57MJoLCgw0ovg0is7TnjBDuQoHnTAGEUbW WCYUffyMsBO78UJfh8FyNOjvF0DBj6P0Fh/OeyEK+bpa482tXrcdezwAlXZzL54e/esx FTsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=awq9QxIt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s2-v6si3453775pgc.447.2018.08.08.07.03.56; Wed, 08 Aug 2018 07:04:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=awq9QxIt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727303AbeHHQWZ (ORCPT + 99 others); Wed, 8 Aug 2018 12:22:25 -0400 Received: from mail-lj1-f196.google.com ([209.85.208.196]:37116 "EHLO mail-lj1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727078AbeHHQWY (ORCPT ); Wed, 8 Aug 2018 12:22:24 -0400 Received: by mail-lj1-f196.google.com with SMTP id v9-v6so1816863ljk.4 for ; Wed, 08 Aug 2018 07:02:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=xf1st28FLE/3twXChIO8tRq/2R47rUMO++fJtTaKIL0=; b=awq9QxItSoy7NoEYobIvJyMtMbmrzxyOtu9wFAsZfYNU2C8wx/DkY3B5ZZsJuSB60L PGRT5G8N2jAaZyQeMcDf6JA4F3Cjw6cVawirDxldKD2c6YI9jVuyPb4FPZsFD73F3WCE 0Njh6/3IpcQALyDcKPFi5WGI74y4JaVyWiFLA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=xf1st28FLE/3twXChIO8tRq/2R47rUMO++fJtTaKIL0=; b=pDBexkh5r0yph5dFnPkGGIa5XwVu8t0+T6P3M2KMgC4pv6bLsdplHx0wVyYdBz3cTE NKp87WyGE3ChR0kGBfROK4c0vy5Sl2fIFJh+bB1C876YAAfikv3J6xeFJfHK3eQp1g/k I+2HhNzNwBYZHzJ8Ga6AkRlKV6t0irnN6JFEmtwa86OqYJiizAF4Il0keLH72dirS4Og HBFPM/TGlLgbEnlPYujcoxOzPb+C058uD/irEYhpVzhTX86MLFHmEcYlf6v63hoEnnBp Iea5q4W08lEvjtuLxfeQ+mdglMLEEmTtoHBZBUFimsI9DvKSlHBNBrnR/6hN7AgiRGYU JSfA== X-Gm-Message-State: AOUpUlGQ8E3AacJiVNS4c1/Rg7Sa7ufy7mJx/9Sv5iy+0vAySInqXO41 d6yP7qsiXCrc6StsddT11+f/cIPFfZbBub18hTARF868 X-Received: by 2002:a2e:5243:: with SMTP id g64-v6mr2140456ljb.144.1533736953111; Wed, 08 Aug 2018 07:02:33 -0700 (PDT) MIME-Version: 1.0 References: <20180627103408.33003-1-keiichiw@chromium.org> <11886963.8nkeRH3xvi@avalon> <3411643.50e8mdYzJX@avalon> In-Reply-To: From: Keiichi Watanabe Date: Wed, 8 Aug 2018 23:02:21 +0900 Message-ID: Subject: Re: [RFC PATCH v1] media: uvcvideo: Cache URB header data before processing To: kieran.bingham@ideasonboard.com Cc: Laurent Pinchart , Tomasz Figa , Linux Kernel Mailing List , Mauro Carvalho Chehab , Linux Media Mailing List , Douglas Anderson , Alan Stern , ezequiel@collabora.com, matwey@sai.msu.ru Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Kieran, On Wed, Aug 8, 2018 at 10:07 PM Kieran Bingham wrote: > > Hi All, > > On 08/08/18 13:45, Keiichi Watanabe wrote: > > Hi Laurent, Kieran, Tomasz, > > > > Thank you for reviews and suggestions. > > I want to do additional measurements for improving the performance. > > > > Let me clarify my understanding: > > Currently, if the platform doesn't support coherent-DMA (e.g. ARM), > > urb_buffer is allocated by usb_alloc_coherent with > > URB_NO_TRANSFER_DMA_MAP flag instead of using kmalloc. > > This is because we want to avoid frequent DMA mappings, which are > > generally expensive. > > However, memories allocated in this way are not cached. > > > > So, we wonder if using usb_alloc_coherent is really fast. > > In other words, we want to know which is better: > > "No DMA mapping/Uncached memory" v.s. "Frequent DMA mapping/Cached memory". > > > > For this purpose, I'm planning to measure performance on ARM > > Chromebooks in the following conditions: > > 1. Current implementation with Kieran's patches > > 2. 1. + my patch > > 3. Use kmalloc instead > > > > 1 and 2 are the same conditions I reported in the first mail on this thread. > > For condition 3, I only have to add "#define CONFIG_DMA_NONCOHERENT" > > at the beginning of uvc_video.c. > > I'd be interested in numbers/performances both with and without my async > if possible too. > > The async path can be easily disabled temporarily with the following > change: (perhaps this should be a module option?) > > diff --git a/drivers/media/usb/uvc/uvc_video.c > b/drivers/media/usb/uvc/uvc_video.c > index 8bb6e90f3483..f9fbdc9bfa4b 100644 > --- a/drivers/media/usb/uvc/uvc_video.c > +++ b/drivers/media/usb/uvc/uvc_video.c > @@ -1505,7 +1505,8 @@ static void uvc_video_complete(struct urb *urb) > } > > INIT_WORK(&uvc_urb->work, uvc_video_copy_data_work); > - queue_work(stream->async_wq, &uvc_urb->work); > +// queue_work(stream->async_wq, &uvc_urb->work); > + uvc_video_copy_data_work(&uvc_urb->work); > } > > /* > > > I do suspect that even with cached buffers, it's probably likely we > should still consider the async patches to move the memcopy out of > interrupt context. > It sounds better to check. I'll do tests with/without your async patches in both cached/uncached memory allocation conditions. Best regards, Keiichi > -- > Regards > > Kieran > > > > > > > > Does this plan sound reasonable? > > > > Best regards, > > Keiichi > > On Wed, Aug 8, 2018 at 5:42 PM Laurent Pinchart > > wrote: > >> > >> Hi Tomasz, > >> > >> On Wednesday, 8 August 2018 07:08:59 EEST Tomasz Figa wrote: > >>> On Tue, Jul 31, 2018 at 1:00 AM Laurent Pinchart wrote: > >>>> On Wednesday, 27 June 2018 13:34:08 EEST Keiichi Watanabe wrote: > >>>>> On some platforms with non-coherent DMA (e.g. ARM), USB drivers use > >>>>> uncached memory allocation methods. In such situations, it sometimes > >>>>> takes a long time to access URB buffers. This can be a cause of video > >>>>> flickering problems if a resolution is high and a USB controller has > >>>>> a very tight time limit. (e.g. dwc2) To avoid this problem, we copy > >>>>> header data from (uncached) URB buffer into (cached) local buffer. > >>>>> > >>>>> This change should make the elapsed time of the interrupt handler > >>>>> shorter on platforms with non-coherent DMA. We measured the elapsed > >>>>> time of each callback of uvc_video_complete without/with this patch > >>>>> while capturing Full HD video in > >>>>> https://webrtc.github.io/samples/src/content/getusermedia/resolution/. > >>>>> I tested it on the top of Kieran Bingham's Asynchronous UVC series > >>>>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg128359.html. > >>>>> The test device was Jerry Chromebook (RK3288) with Logitech Brio 4K. > >>>>> I collected data for 5 seconds. (There were around 480 callbacks in > >>>>> this case.) The following result shows that this patch makes > >>>>> uvc_video_complete about 2x faster. > >>>>> > >>>>> | average | median | min | max | standard deviation > >>>>> w/o caching| 45319ns | 40250ns | 33834ns | 142625ns| 16611ns > >>>>> w/ caching| 20620ns | 19250ns | 12250ns | 56583ns | 6285ns > >>>>> > >>>>> In addition, we confirmed that this patch doesn't make it worse on > >>>>> coherent DMA architecture by performing the same measurements on a > >>>>> Broadwell Chromebox with the same camera. > >>>>> > >>>>> | average | median | min | max | standard deviation > >>>>> w/o caching| 21026ns | 21424ns | 12263ns | 23956ns | 1932ns > >>>>> w/ caching| 20728ns | 20398ns | 8922ns | 45120ns | 3368ns > >>>> > >>>> This is very interesting, and it seems related to https:// > >>>> patchwork.kernel.org/patch/10468937/. You might have seen that discussion > >>>> as you got CC'ed at some point. > >>>> > >>>> I wonder whether performances couldn't be further improved by allocating > >>>> the URB buffers cached, as that would speed up the memcpy() as well. Have > >>>> you tested that by any chance ? > >>> > >>> We haven't measure it, but the issue being solved here was indeed > >>> significantly reduced by using cached URB buffers, even without > >>> Kieran's async series. After we discovered the latter, we just > >>> backported it and decided to further tweak the last remaining bit, to > >>> avoid playing too much with the DMA API in code used in production on > >>> several different platforms (including both ARM and x86). > >>> > >>> If you think we could change the driver to use cached buffers instead > >>> (as the pwc driver mentioned in another thread), I wouldn't have > >>> anything against it obviously. > >> > >> I think there's a chance that performances could be further improved. > >> Furthermore, it would lean to simpler code as we wouldn't need to deal with > >> caching headers manually. I would however like to see numbers before making a > >> decision. > >> > >> -- > >> Regards, > >> > >> Laurent Pinchart > >> > >> > >> > > -- > Regards > -- > Kieran