Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp4121879imm; Mon, 30 Jul 2018 09:02:06 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdm8xK92ryQbyV2Nw1UEVU8BLUflgCFewa4VLrBGmFACfytFL5r6stOAJV7WmvB2Hu7xVRY X-Received: by 2002:a63:4283:: with SMTP id p125-v6mr17204894pga.142.1532966526060; Mon, 30 Jul 2018 09:02:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532966526; cv=none; d=google.com; s=arc-20160816; b=GiyLnHGCSwQBNoNREbF8Sg7H8pz8zrXr3U4g6xVYXk2sI3Yz4SLkpuOhlrpA6P3KHB HRf+Ob8xBBURkSW2qQHgIiCayww/EeBkdAGQ9tAgCKgJNmOFTU79DlgD2j0AGKg/Ya/c xFelC76EtI7gkqT84LPVhSHYl053UNYYFDXZiD72buvVsTxOF3/c8mylTnHiw1RSwlhf iD4LtUiasTQISTi+namNeuEoSLOgGQUoX5Z/Xo0RsgREDpapZcvucMELCA7yXsKhQHrV l4ZYylwn1AQ1MCOEH4y1lTMOPVzZmu9hVG8IPbqZZNDQI59O0jfGM2MKf2uWJqPeGI+y nDZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:organization:message-id:date:subject:cc:to :from:dkim-signature:arc-authentication-results; bh=rn9AdybIjKjFAou6iybpaoBswsq4giWWCHrDVyFFQZc=; b=R2QTWSMZjsEgxUj710qOROCTZiLlgtMkuqZuCQgp4jCkKdgypZa6WvjGlVAu+EItV4 lZ6Qgx4pL8+kX3jsAywCwXc25C4uclLa25t3QjZ6TKt1tmmhtw5msZ3Yf8Tb9NZSMsUw euftZAvnPs/AEdKZj4eWINS6ouHRlx4xCBSmq1F51tc0+dQz0VJubuECVgSy5akPMTZs /b0osN+hpCBvwkloAsxmr/m083z5ZOzJJphtu6kZQo6WchOOj47rzpmHYD7+A4mCRody ykAQBTe8qQLTAYkr+0fGiZ6A9syS/vHE7NiR3d+3GDwc0yUzvrwdEO2yXF/UD9xh3CwY PaOQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass (test mode) header.i=@ideasonboard.com header.s=mail header.b=Y0HQTMb5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x11-v6si9730815pln.130.2018.07.30.09.01.51; Mon, 30 Jul 2018 09:02:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass (test mode) header.i=@ideasonboard.com header.s=mail header.b=Y0HQTMb5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727474AbeG3Rfr (ORCPT + 99 others); Mon, 30 Jul 2018 13:35:47 -0400 Received: from perceval.ideasonboard.com ([213.167.242.64]:41564 "EHLO perceval.ideasonboard.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726661AbeG3Rfr (ORCPT ); Mon, 30 Jul 2018 13:35:47 -0400 Received: from avalon.localnet (dfj612ybrt5fhg77mgycy-3.rev.dnainternet.fi [IPv6:2001:14ba:21f5:5b00:2e86:4862:ef6a:2804]) by perceval.ideasonboard.com (Postfix) with ESMTPSA id 459F31AC2; Mon, 30 Jul 2018 18:00:07 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com; s=mail; t=1532966408; bh=X0zIpPEKJU7zIjw8TQJ9/3iFjD+xa6McIS8b8ybZUzc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Y0HQTMb5XhV+RZ4TBNgc3Qbtpp2pxJYym1Q3xdycKFniyuCg8LShmUzHzDkSAvw0D F9VUMoQwPnLzMsRfxwvW4jIJ649cRTXTfoq3v0AtDvmYi3RbdDpRn6JZtJK3cnXxBs L+tvi6ILk8e56pzXireQD6ScozRcuZGRE0AGDxcE= From: Laurent Pinchart To: Keiichi Watanabe Cc: linux-kernel@vger.kernel.org, Mauro Carvalho Chehab , linux-media@vger.kernel.org, kieran.bingham@ideasonboard.com, tfiga@chromium.org, dianders@chromium.org, Alan Stern , Ezequiel Garcia , "Matwey V. Kornilov" Subject: Re: [RFC PATCH v1] media: uvcvideo: Cache URB header data before processing Date: Mon, 30 Jul 2018 19:00:46 +0300 Message-ID: <11886963.8nkeRH3xvi@avalon> Organization: Ideas on Board Oy In-Reply-To: <20180627103408.33003-1-keiichiw@chromium.org> References: <20180627103408.33003-1-keiichiw@chromium.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Keiichi, (CC'ing Alan, Ezequiel and Matwey) Thank you for the patch. On Wednesday, 27 June 2018 13:34:08 EEST Keiichi Watanabe wrote: > On some platforms with non-coherent DMA (e.g. ARM), USB drivers use > uncached memory allocation methods. In such situations, it sometimes > takes a long time to access URB buffers. This can be a cause of video > flickering problems if a resolution is high and a USB controller has > a very tight time limit. (e.g. dwc2) To avoid this problem, we copy > header data from (uncached) URB buffer into (cached) local buffer. > > This change should make the elapsed time of the interrupt handler > shorter on platforms with non-coherent DMA. We measured the elapsed > time of each callback of uvc_video_complete without/with this patch > while capturing Full HD video in > https://webrtc.github.io/samples/src/content/getusermedia/resolution/. > I tested it on the top of Kieran Bingham's Asynchronous UVC series > https://www.mail-archive.com/linux-media@vger.kernel.org/msg128359.html. > The test device was Jerry Chromebook (RK3288) with Logitech Brio 4K. > I collected data for 5 seconds. (There were around 480 callbacks in > this case.) The following result shows that this patch makes > uvc_video_complete about 2x faster. > > | average | median | min | max | standard deviation > > w/o caching| 45319ns | 40250ns | 33834ns | 142625ns| 16611ns > w/ caching| 20620ns | 19250ns | 12250ns | 56583ns | 6285ns > > In addition, we confirmed that this patch doesn't make it worse on > coherent DMA architecture by performing the same measurements on a > Broadwell Chromebox with the same camera. > > | average | median | min | max | standard deviation > > w/o caching| 21026ns | 21424ns | 12263ns | 23956ns | 1932ns > w/ caching| 20728ns | 20398ns | 8922ns | 45120ns | 3368ns This is very interesting, and it seems related to https:// patchwork.kernel.org/patch/10468937/. You might have seen that discussion as you got CC'ed at some point. I wonder whether performances couldn't be further improved by allocating the URB buffers cached, as that would speed up the memcpy() as well. Have you tested that by any chance ? > Signed-off-by: Keiichi Watanabe > --- > > After applying 6 patches in > https://www.mail-archive.com/linux-media@vger.kernel.org/msg128359.html, > I measured elapsed time by adding the following code to > /drivers/media/usb/uvc/uvc_video.c > > @@ -XXXX,6 +XXXX,9 @@ static void uvc_video_complete(struct urb *urb) > struct uvc_video_queue *queue = &stream->queue; > struct uvc_buffer *buf = NULL; > int ret; > + ktime_t start, end; > + int elapsed_time; > + start = ktime_get(); > switch (urb->status) { > case 0: > > @@ -XXXX,6 +XXXX,10 @@ static void uvc_video_complete(struct urb *urb) > > INIT_WORK(&uvc_urb->work, uvc_video_copy_data_work); > queue_work(stream->async_wq, &uvc_urb->work); > + > + end = ktime_get(); > + elapsed_time = ktime_to_ns(ktime_sub(end, start)); > + pr_err("elapsed time: %d ns", elapsed_time); > } > > /* > > > drivers/media/usb/uvc/uvc_video.c | 92 +++++++++++++++---------------- > 1 file changed, 43 insertions(+), 49 deletions(-) > diff --git a/drivers/media/usb/uvc/uvc_video.c > b/drivers/media/usb/uvc/uvc_video.c index a88b2e51a666..ff2eddc55530 100644 > --- a/drivers/media/usb/uvc/uvc_video.c > +++ b/drivers/media/usb/uvc/uvc_video.c > @@ -391,36 +391,15 @@ static inline ktime_t uvc_video_get_time(void) > > static void > uvc_video_clock_decode(struct uvc_streaming *stream, struct uvc_buffer > *buf, - const u8 *data, int len) > + const u8 *data, int len, unsigned int header_size, > + bool has_pts, bool has_scr) > { > struct uvc_clock_sample *sample; > - unsigned int header_size; > - bool has_pts = false; > - bool has_scr = false; > unsigned long flags; > ktime_t time; > u16 host_sof; > u16 dev_sof; > > - switch (data[1] & (UVC_STREAM_PTS | UVC_STREAM_SCR)) { > - case UVC_STREAM_PTS | UVC_STREAM_SCR: > - header_size = 12; > - has_pts = true; > - has_scr = true; > - break; > - case UVC_STREAM_PTS: > - header_size = 6; > - has_pts = true; > - break; > - case UVC_STREAM_SCR: > - header_size = 8; > - has_scr = true; > - break; > - default: > - header_size = 2; > - break; > - } > - > /* Check for invalid headers. */ > if (len < header_size) > return; > @@ -717,11 +696,10 @@ void uvc_video_clock_update(struct uvc_streaming > *stream, */ > > static void uvc_video_stats_decode(struct uvc_streaming *stream, > - const u8 *data, int len) > + const u8 *data, int len, > + unsigned int header_size, bool has_pts, > + bool has_scr) > { > - unsigned int header_size; > - bool has_pts = false; > - bool has_scr = false; > u16 uninitialized_var(scr_sof); > u32 uninitialized_var(scr_stc); > u32 uninitialized_var(pts); > @@ -730,25 +708,6 @@ static void uvc_video_stats_decode(struct uvc_streaming > *stream, stream->stats.frame.nb_packets == 0) > stream->stats.stream.start_ts = ktime_get(); > > - switch (data[1] & (UVC_STREAM_PTS | UVC_STREAM_SCR)) { > - case UVC_STREAM_PTS | UVC_STREAM_SCR: > - header_size = 12; > - has_pts = true; > - has_scr = true; > - break; > - case UVC_STREAM_PTS: > - header_size = 6; > - has_pts = true; > - break; > - case UVC_STREAM_SCR: > - header_size = 8; > - has_scr = true; > - break; > - default: > - header_size = 2; > - break; > - } > - > /* Check for invalid headers. */ > if (len < header_size || data[0] < header_size) { > stream->stats.frame.nb_invalid++; > @@ -957,10 +916,41 @@ static void uvc_video_stats_stop(struct uvc_streaming > *stream) * to be called with a NULL buf parameter. uvc_video_decode_data > and * uvc_video_decode_end will never be called with a NULL buffer. > */ > +static void uvc_video_decode_header_size(const u8 *data, int *header_size, > + bool *has_pts, bool *has_scr) > +{ > + switch (data[1] & (UVC_STREAM_PTS | UVC_STREAM_SCR)) { > + case UVC_STREAM_PTS | UVC_STREAM_SCR: > + *header_size = 12; > + *has_pts = true; > + *has_scr = true; > + break; > + case UVC_STREAM_PTS: > + *header_size = 6; > + *has_pts = true; > + break; > + case UVC_STREAM_SCR: > + *header_size = 8; > + *has_scr = true; > + break; > + default: > + *header_size = 2; > + } > +} > + > static int uvc_video_decode_start(struct uvc_streaming *stream, > - struct uvc_buffer *buf, const u8 *data, int len) > + struct uvc_buffer *buf, const u8 *urb_data, > + int len) > { > u8 fid; > + u8 data[12]; > + unsigned int header_size; > + bool has_pts = false, has_scr = false; > + > + /* Cache the header since urb_data is uncached memory. The size of > + * header is at most 12 bytes. > + */ > + memcpy(data, urb_data, min(len, 12)); > > /* Sanity checks: > * - packet must be at least 2 bytes long > @@ -983,8 +973,12 @@ static int uvc_video_decode_start(struct uvc_streaming > *stream, uvc_video_stats_update(stream); > } > > - uvc_video_clock_decode(stream, buf, data, len); > - uvc_video_stats_decode(stream, data, len); > + uvc_video_decode_header_size(data, &header_size, &has_pts, &has_scr); > + > + uvc_video_clock_decode(stream, buf, data, len, header_size, has_pts, > + has_scr); > + uvc_video_stats_decode(stream, data, len, header_size, has_pts, > + has_scr); > > /* Store the payload FID bit and return immediately when the buffer is > * NULL. > -- > 2.18.0.rc2.346.g013aa6912e-goog -- Regards, Laurent Pinchart