Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1945948AbbEVN62 (ORCPT ); Fri, 22 May 2015 09:58:28 -0400 Received: from mx3-phx2.redhat.com ([209.132.183.24]:55325 "EHLO mx3-phx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422716AbbEVN6M (ORCPT ); Fri, 22 May 2015 09:58:12 -0400 Date: Fri, 22 May 2015 09:58:10 -0400 (EDT) From: Frediano Ziglio To: Christophe Fergeau Cc: spice-devel@lists.freedesktop.org, David Airlie , dri-devel@lists.freedesktop.org, Dave Airlie , linux-kernel@vger.kernel.org Message-ID: <1702166382.3247503.1432303090162.JavaMail.zimbra@redhat.com> In-Reply-To: <20150522115805.GR20750@edamame.cdg.redhat.com> References: <1591625424.1112688.1432028990916.JavaMail.zimbra@redhat.com> <163000764.1114319.1432029294390.JavaMail.zimbra@redhat.com> <20150522115805.GR20750@edamame.cdg.redhat.com> Subject: Re: [Spice-devel] [PATCH] Do not loop on ERESTARTSYS using interruptible waits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.36.5.75] X-Mailer: Zimbra 8.0.6_GA_5922 (ZimbraWebClient - FF38 (Linux)/8.0.6_GA_5922) Thread-Topic: Do not loop on ERESTARTSYS using interruptible waits Thread-Index: gvT6H1BTa4iSAvJ0xmwzRrAwDuQDuw== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4147 Lines: 100 > > Hey, > > On Tue, May 19, 2015 at 05:54:54AM -0400, Frediano Ziglio wrote: > > This problem happens using KMS surfaces and QXL driver. > > To easy reproduce use KDE Plasma (which use surfaces a lot) and assure > > you are using KMS surfaces (QXL driver on Fedora/RedHat has a patch to > > stop using them). Open some complex application like LibreOffice and > > after a while your machine get stuck using 100% CPU on Xorg. > > The problem occurs as creating new surfaces not interruptible wait > > are used however instead of returning ERESTARTSYS back to userspace > > you try to loop but wait routines always keep returning ERESTARTSYS > > once the signal is marked. > > On out of memory conditions TTM module try to move objects to system > > memory and QXL assure surface is updated before the move. > > The fix handle differently this case using no interruptible wait so > > wait functions will wait instead of returning ERESTARTSYS. > > Note the when the loop occurs driver will send a lot of update requests > > causing more CPU usage on Qemu side too. > > > > Signed-off-by: Frediano Ziglio > > --- > > qxl/qxl_cmd.c | 12 +++--------- > > qxl/qxl_drv.h | 2 +- > > qxl/qxl_ioctl.c | 2 +- > > 3 files changed, 5 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/gpu/drm/drivers/gpu/drm/qxl/qxl_cmd.c b/qxl/qxl_cmd.c > > index 9782364..bd5404e 100644 > > --- a/drivers/gpu/drm/qxl/qxl_cmd.c > > +++ b/drivers/gpu/drm/qxl/qxl_cmd.c > > @@ -317,14 +317,11 @@ static void wait_for_io_cmd(struct qxl_device *qdev, > > uint8_t val, long port) > > { > > int ret; > > > > -restart: > > ret = wait_for_io_cmd_user(qdev, val, port, false); > > - if (ret == -ERESTARTSYS) > > - goto restart; > > I think this one is not directly related to the fix, but can be removed > because wait_for_io_cmd_user(qdev, val, port, false); will call > wait_event_timeout() which cannot return ERESTARTSYS? Or was this loop > causing issues too? > Yes, but it has the same issue. Try till ERESTARTSYS are gone. Currently perhaps not broken but prone to have same issue. > > } > > > > int qxl_io_update_area(struct qxl_device *qdev, struct qxl_bo *surf, > > - const struct qxl_rect *area) > > + const struct qxl_rect *area, bool intr) > > { > > int surface_id; > > uint32_t surface_width, surface_height; > > @@ -350,7 +347,7 @@ int qxl_io_update_area(struct qxl_device *qdev, struct > > qxl_bo *surf, > > mutex_lock(&qdev->update_area_mutex); > > qdev->ram_header->update_area = *area; > > qdev->ram_header->update_surface = surface_id; > > - ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, true); > > + ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, intr); > > mutex_unlock(&qdev->update_area_mutex); > > return ret; > > } > > @@ -588,10 +585,7 @@ int qxl_update_surface(struct qxl_device *qdev, struct > > qxl_bo *surf) > > rect.right = surf->surf.width; > > rect.top = 0; > > rect.bottom = surf->surf.height; > > -retry: > > - ret = qxl_io_update_area(qdev, surf, &rect); > > - if (ret == -ERESTARTSYS) > > - goto retry; > > + ret = qxl_io_update_area(qdev, surf, &rect, false); > > My understanding is that the fix is this hunk? If so, this could be made > more obvious with an intermediate commit adding the 'bool intr' arg to > qxl_io_update_area and only calling it with 'true' in the appropriate > places. > This code path is only triggered from qxl_surface_evict() which I assume > is not necessarily easily interruptible, so this change makes sense to > me. However it would be much better to get a review from Dave Airlie ;) > > Christophe > Are you asking if just removing the loop would fix the issue? So you are proposing a first patch that add the argument always passing true and another that change some calls to false? It make sense but still the loop should be removed. Frediano -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/