Date: Fri, 22 May 2015 09:58:10 -0400 (EDT)
From: Frediano Ziglio <fziglio@redhat.com>
To: Christophe Fergeau <cfergeau@redhat.com>
Cc: spice-devel@lists.freedesktop.org, David Airlie <airlied@linux.ie>,
        dri-devel@lists.freedesktop.org, Dave Airlie <airlied@redhat.com>,
        linux-kernel@vger.kernel.org
Message-ID: <1702166382.3247503.1432303090162.JavaMail.zimbra@redhat.com>
In-Reply-To: <20150522115805.GR20750@edamame.cdg.redhat.com>
References: <1591625424.1112688.1432028990916.JavaMail.zimbra@redhat.com> <163000764.1114319.1432029294390.JavaMail.zimbra@redhat.com> <20150522115805.GR20750@edamame.cdg.redhat.com>
Subject: Re: [Spice-devel] [PATCH] Do not loop on ERESTARTSYS using
 interruptible waits
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Thread-Topic: Do not loop on ERESTARTSYS using interruptible waits
Thread-Index: gvT6H1BTa4iSAvJ0xmwzRrAwDuQDuw==
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4147
Lines: 100


> 
> Hey,
> 
> On Tue, May 19, 2015 at 05:54:54AM -0400, Frediano Ziglio wrote:
> > This problem happens using KMS surfaces and QXL driver.
> > To easy reproduce use KDE Plasma (which use surfaces a lot) and assure
> > you are using KMS surfaces (QXL driver on Fedora/RedHat has a patch to
> > stop using them). Open some complex application like LibreOffice and
> > after a while your machine get stuck using 100% CPU on Xorg.
> > The problem occurs as creating new surfaces not interruptible wait
> > are used however instead of returning ERESTARTSYS back to userspace
> > you try to loop but wait routines always keep returning ERESTARTSYS
> > once the signal is marked.
> > On out of memory conditions TTM module try to move objects to system
> > memory and QXL assure surface is updated before the move.
> > The fix handle differently this case using no interruptible wait so
> > wait functions will wait instead of returning ERESTARTSYS.
> > Note the when the loop occurs driver will send a lot of update requests
> > causing more CPU usage on Qemu side too.
> > 
> > Signed-off-by: Frediano Ziglio <fziglio@redhat.com>
> > ---
> >  qxl/qxl_cmd.c   | 12 +++---------
> >  qxl/qxl_drv.h   |  2 +-
> >  qxl/qxl_ioctl.c |  2 +-
> >  3 files changed, 5 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drivers/gpu/drm/qxl/qxl_cmd.c b/qxl/qxl_cmd.c
> > index 9782364..bd5404e 100644
> > --- a/drivers/gpu/drm/qxl/qxl_cmd.c
> > +++ b/drivers/gpu/drm/qxl/qxl_cmd.c
> > @@ -317,14 +317,11 @@ static void wait_for_io_cmd(struct qxl_device *qdev,
> > uint8_t val, long port)
> >  {
> >  	int ret;
> >  
> > -restart:
> >  	ret = wait_for_io_cmd_user(qdev, val, port, false);
> > -	if (ret == -ERESTARTSYS)
> > -		goto restart;
> 
> I think this one is not directly related to the fix, but can be removed
> because wait_for_io_cmd_user(qdev, val, port, false); will call
> wait_event_timeout() which cannot return ERESTARTSYS? Or was this loop
> causing issues too?
> 

Yes, but it has the same issue. Try till ERESTARTSYS are gone.
Currently perhaps not broken but prone to have same issue.

> >  }
> >  
> >  int qxl_io_update_area(struct qxl_device *qdev, struct qxl_bo *surf,
> > -			const struct qxl_rect *area)
> > +			const struct qxl_rect *area, bool intr)
> >  {
> >  	int surface_id;
> >  	uint32_t surface_width, surface_height;
> > @@ -350,7 +347,7 @@ int qxl_io_update_area(struct qxl_device *qdev, struct
> > qxl_bo *surf,
> >  	mutex_lock(&qdev->update_area_mutex);
> >  	qdev->ram_header->update_area = *area;
> >  	qdev->ram_header->update_surface = surface_id;
> > -	ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, true);
> > +	ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, intr);
> >  	mutex_unlock(&qdev->update_area_mutex);
> >  	return ret;
> >  }
> > @@ -588,10 +585,7 @@ int qxl_update_surface(struct qxl_device *qdev, struct
> > qxl_bo *surf)
> >  	rect.right = surf->surf.width;
> >  	rect.top = 0;
> >  	rect.bottom = surf->surf.height;
> > -retry:
> > -	ret = qxl_io_update_area(qdev, surf, &rect);
> > -	if (ret == -ERESTARTSYS)
> > -		goto retry;
> > +	ret = qxl_io_update_area(qdev, surf, &rect, false);
> 
> My understanding is that the fix is this hunk? If so, this could be made
> more obvious with an intermediate commit adding the 'bool intr' arg to
> qxl_io_update_area and only calling it with 'true' in the appropriate
> places.
> This code path is only triggered from qxl_surface_evict() which I assume
> is not necessarily easily interruptible, so this change makes sense to
> me. However it would be much better to get a review from Dave Airlie ;)
> 
> Christophe
> 

Are you asking if just removing the loop would fix the issue?
So you are proposing a first patch that add the argument always passing true and another that change some calls to false? It make sense but still the loop should be removed.

Frediano
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/