Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp169560ybl; Mon, 2 Dec 2019 09:05:19 -0800 (PST) X-Google-Smtp-Source: APXvYqwwDVDgaGROC52W2m2Woz5W7FCKOGUIU+5bnNnZxnh8wlBIm7bSsNYMhFCJ9f98ECPsukJ1 X-Received: by 2002:a7b:cf0f:: with SMTP id l15mr31092658wmg.42.1575306319389; Mon, 02 Dec 2019 09:05:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575306319; cv=none; d=google.com; s=arc-20160816; b=zLw/6Axn67NKD5m++xE31upG8+7Ao6Ksb+U9IPF2ozDZ0D/bE2u4n18s/RETC2O/0z 84kuxgmw+n/Ps0DLYGFAeU5IWJ/LIWMQ2wAvw3pICn2im6WYINVTAkxa2lLE2TAwnv0y bJRhCzsiQTuNkXgZK40z0gcwACcalnEcRNPr2JpM/uJ3mebEf3ch824FmZsqXc9tcgXE Kr5r0smfrC/Qx7dAzwtZyTb3bTGpjgsqgOmyADb/G3n0W5O8M3jYNL+/ADfu2QftKmSZ sGCHB5Oi++/CCOdesEcyG+sUxNUTRF/Mcd/h8M1UPAyzOPZjvkvh27XEGzj4Ihz9fk5A xCzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:in-reply-to :subject:cc:to:from:date; bh=n619xAgAYvZcPAkLe4ug3ehe0KxXebWa0dcrRiLNsKQ=; b=C7Lh0ne/MlhdIJ0R+AwyMQ6M/rwXDQMirrS/CKUSEGcZbbwNUiHGe7RwXcxiYGe9e3 BRFm5omPviRDq0ICR7m+X9BaRLl7smhMWXPAzpd1YoJr2O7wd4H0ybu+qhtNooj+8kV/ h8SOA4xdmeZwXV6JA3C5HnXbk0FipnLBF62cBctOtakFZVQ/fe/wlJ3qprKhXweRPWqY 8BcL0NMDjzgYUCRgy8P5SGYEltwTLq2DLNWFoZ/WS28A9kQF6kaJ9qhLQX+kOVb4uS+I lH7SSAcZIBTQwo16NizMmDQQYrzr+TPu48rU2uW+TdXoTxmDxTxQJJk1vgRHwRL6QVce hXFg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o19si31922ejj.307.2019.12.02.09.04.46; Mon, 02 Dec 2019 09:05:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727727AbfLBRDA (ORCPT + 99 others); Mon, 2 Dec 2019 12:03:00 -0500 Received: from iolanthe.rowland.org ([192.131.102.54]:34082 "HELO iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1727493AbfLBRDA (ORCPT ); Mon, 2 Dec 2019 12:03:00 -0500 Received: (qmail 4189 invoked by uid 2102); 2 Dec 2019 12:02:59 -0500 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 2 Dec 2019 12:02:59 -0500 Date: Mon, 2 Dec 2019 12:02:59 -0500 (EST) From: Alan Stern X-X-Sender: stern@iolanthe.rowland.org To: Michael Olbrich cc: Thinh Nguyen , Felipe Balbi , "linux-usb@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "kernel@pengutronix.de" , Greg Kroah-Hartman , Laurent Pinchart , Subject: Re: [PATCH 2/2] usb: dwc3: gadget: restart the transfer if a isoc request is queued too late In-Reply-To: <20191202154120.o4yt266cat6uzxd7@pengutronix.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2 Dec 2019, Michael Olbrich wrote: > > > My current test-case is video frames with 450kB on average at 30fps. This > > > currently results in ~10 CPU load for the threaded interrupt handler. > > > At least in my test, filling the actual video data into the frame has very > > > little impact. So if I reserve 900kB to support occasionally larger video > > > frames, then I expect that this CPU load will almost double in all cases, > > > not just when the video frames are larger. > > > > This is the sort of thing you need to confirm by experimenting. It is > > not at all clear that doubling the interrupt rate will also double the > > CPU load, especially if half of the interrupts don't require the CPU to > > do much work. > > As I noted before, actually filling in the video data is only a small part > of the measured CPU load. To put in in more precise numbers: > > From my limited understanding, there are 8000 interrupts per second > regardless of the bandwidth (or maybe less for very low bandwidth > configurations?). > Just queuing requests without any content (so skipping the buffer handling > and 'encode()' in uvc_video_complete()) results in ~17% CPU load. Can you tell dwc3 to request only 4000 interrupts per second? Or 2000? Or even 60? > If I fill in the data for a video stream with ~1.8MB per frame and 30 fps > (and empty requests for the rest) then the CPU load goes up to ~19.5%. I assume you're talking about a SuperSpeed USB connection here, since high speed can handle no more than 0.8 MB per frame at 30 fps. > This number remains the same for different bandwidths (and therefore > different request sizes and a different zero-length request percentage). > > With my patches the CPU load changes as expected. The 2.5% to fill the data > remains and the rest goes down with less interrupts. > > I am hoping that more batching will help here a bit. But either way the > overhead of queuing zero-length request is significant. Hmmm. It may be that my thinking has been prejudiced by a host-centered attitude, and things may work out different on the gadget side. There's one aspect I'm not clear on. Suppose you decide to skip submitting requests for USB microframes 5 - 7 (at 8000 microframes per second, as you mentioned) and then submit a request to be queued for microframe 8. How does the UDC driver know which microframe to use for that request? Does it always use the first available microframe? > My problem is this: > Let's assume a (for simplicity) that I have a video stream that fills > almost the full available bandwidth. And two reduce the bandwidth, two > interrupts will be requested for each video frame. Let's assume the video frame rate is 30 fps. Since you say almost the full available bandwidth is in use, the number of USB microframes per video frame must be just under 8000 / 30, or around 266. > - When the first frame arrives, the whole frame is queued. Requiring 266 microframes, let's say starting at microframe 0. > - When the first half of the frame is transmitted the first interrupt > arrives. That would be at microframe 133. > - The second frame has not yet arrived so half a frame worth of 0-length > requests are queued. 133 microframes, extending out to microframe 399. > - When the second half of the frame is transmitted the second interrupt > arrives. At the end of microframe 266. > - The second frame has still not yet arrived so another half a frame worth > of 0-length requests are queued. At this point, it has been about 266/8000 s = 33.25 ms since the first video frame arrived. But at a rate of 30 fps, the interval between frames should be 33.33 ms. So either something is wrong or else the second frame is just about to arrive. (Since the frame rate is slightly lower than the rate of transmission of the USB data, it's inevitable that every so often you'll get an interrupt just before a frame arrives.) > - Immediately afterwards the second frame arrives from userspace. At this > point, almost one frame worth of 0-length requests are queued so the > second frame will have an extra latency of almost one frame. I see. There is another way to approach this, however. When you get a USB interrupt and no video data is available, nothing says you have to queue an entire frame (or half-frame) worth of 0-length packets. Instead, queue a request containing 2 ms (for example) of 0-length packets. Presumably the next video frame will arrive during those 2 ms (since it's _expected_ to arrive in no more than 0.08 ms), and from then on you'll be okay until once again the USB data has caught up to the video data. On the other hand, I don't know how the UVC driver manages the alignment between USB packets and video frames on the host end. Perhaps it would prefer something slightly different -- that would be another good question to ask the UVC maintainer (CC'ed). > With my patch this does not happen and the transmission of the second > starts immediately. > The question is, can we do better than that? What could be done in the > driver if it knows that 0-length requests must be transmitted because no > new data is available? It does seems reasonable to require that when a UDC driver encounters a gap in an isochronous stream, it should associate the next request with the first available microframe. If dcw3 did that then you wouldn't have to worry about filling extra space with 0-length packets. But if we make this change, it should be documented in an appropriate place and it should apply to all UDC drivers -- not just dwc3. Alan Stern