Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp267168ybz; Tue, 28 Apr 2020 23:17:12 -0700 (PDT) X-Google-Smtp-Source: APiQypLkrsulhy130/wDCUzy2CJiRGFSoo/yl2sX4Qz14rkHUgcGRIaNeQQKkuB9xzJEaySu8eGD X-Received: by 2002:a17:906:35cd:: with SMTP id p13mr1230986ejb.206.1588141032033; Tue, 28 Apr 2020 23:17:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588141032; cv=none; d=google.com; s=arc-20160816; b=jwHh5C4N/F6CCC3S2eDhVL9xYP51i4Gjb+j49Wk7uu2EuTanrqmNNBp2VXQd5EH5WT UyhlbhkmCE5JdBb6T1yqxlrrjM+PjiKee2usAU0c+s51Jpt5vZ/SmTZ3lXzX8RAwTKLB +pL4D3e0olgqdqwdKOjCM5wftJo81lHE4ZSGdeejakGlmBTL/J7XCa0nRju3Kf/PnMgD welDC4WSTAb031hyWMEl1R+gVjjFbD+yWs9/33hoEM6+gBzYerVAvJTldtWjyItbpOfz nZDy8hxxo6oqbe6HTwVweDQD+5ef6ovd5hHoavF+YD+5xTKGsEJEx5hl+ehehqKGHrlz faZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:cc:from:references:to:subject; bh=gKNl0KCS3OsKDDBDxai7Ybrr5cIoY9fp4T25lFeobgk=; b=uzoepajUa5vzSzbM6llAFPTPC+EbP6cBNmF23RF6Xmyg7mKWpXL1LA1gPggiZyMNFt MolTTB0RZ7mxabRz6TOvF/AE61XU/xREurvkMhjcM49yBDYq9fn3zTOkrHoDb3wGi9rv rhbB/aJMMsGHYaqBJQrDB7wUzxaDKpnLPI+qo3hvNGZ+v5DcZWS0k5FsNYbjyXgYMS/D ZmQe+5/lXC1aryQlXowb+1axxc2q92mgZEpZU5XfekEkeM2+wUoWYf//+tTMzinsz/Qv qbcQ60dNZfO3v8m7CLz+kOqjnxGfGGbatsFFyTSr904wj3LWTIyLCZIVSeViftHkCcHq aTog== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u18si3297944ejx.502.2020.04.28.23.16.48; Tue, 28 Apr 2020 23:17:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726436AbgD2GNI (ORCPT + 99 others); Wed, 29 Apr 2020 02:13:08 -0400 Received: from relay.sw.ru ([185.231.240.75]:56440 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726181AbgD2GNH (ORCPT ); Wed, 29 Apr 2020 02:13:07 -0400 Received: from vvs-ws.sw.ru ([172.16.24.21]) by relay.sw.ru with esmtp (Exim 4.92.3) (envelope-from ) id 1jTfxf-0005DQ-K4; Wed, 29 Apr 2020 09:12:47 +0300 Subject: Re: [PATCH 1/1] drm/qxl: add mutex_lock/mutex_unlock to ensure the order in which resources are rele To: Gerd Hoffmann References: <20200421084300.zggroiptwbrblzqy () sirius ! home ! kraxel ! org> From: Vasily Averin Cc: Caicai , Dave Airlie , David Airlie , virtualization@lists.linux-foundation.org, dri-devel@lists.freedesktop.org, "linux-kernel@vger.kernel.org\"" , Zhangyueqian , "Denis V. Lunev" Message-ID: Date: Wed, 29 Apr 2020 09:12:46 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200421084300.zggroiptwbrblzqy () sirius ! home ! kraxel ! org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/21/20 11:43 AM, Gerd Hoffmann wrote: > On Sat, Apr 18, 2020 at 02:39:17PM +0800, Caicai wrote: >> When a qxl resource is released, the list that needs to be released is >> fetched from the linked list ring and cleared. When you empty the list, >> instead of trying to determine whether the ttm buffer object for each >> qxl in the list is locked, you release the qxl object and remove the >> element from the list until the list is empty. It was found that the >> linked list was cleared first, and that the lock on the corresponding >> ttm Bo for the QXL had not been released, so that the new qxl could not >> be locked when it used the TTM. > > So the dma_resv_reserve_shared() call in qxl_release_validate_bo() is > unbalanced? Because the dma_resv_unlock() call in > qxl_release_fence_buffer_objects() never happens due to > qxl_release_free_list() clearing the list beforehand? Is that correct? we observe similar issue: RHEL7 guests crashes in qxl_draw_opaque_fb() qxl_release_fence_buffer_objects() crashdump investigation shows that qlx_object was freed and reused, so its original content was re-written. At the same time qxl_device have empty release_idr, ant there are no allocated qxl_bo_list entries. i.e. qxl_release_free was really called. > The only way I see for this to happen is that the guest is preempted > between qxl_push_{cursor,command}_ring_release() and > qxl_release_fence_buffer_objects() calls. The host can complete the qxl > command then, signal the guest, and the IRQ handler calls > qxl_release_free_list() before qxl_release_fence_buffer_objects() runs. We think the same: qxl_release was freed by garbage collector before original thread had called qxl_release_fence_buffer_objects(). > Looking through the code I think it should be safe to simply swap the > qxl_release_fence_buffer_objects() + > qxl_push_{cursor,command}_ring_release() calls to close that race > window. Can you try that and see if it fixes the bug for you? I'm going to prepare and test such patch but I have one question here: qxl_push_*_ring_release can be called with interruptible=true and fail How to correctly handle this case? Is the hunk below correct from your POV? --- a/drivers/gpu/drm/qxl/qxl_ioctl.c +++ b/drivers/gpu/drm/qxl/qxl_ioctl.c @@ -261,12 +261,8 @@ static int qxl_process_single_command(struct qxl_device *qdev, apply_surf_reloc(qdev, &reloc_info[i]); } + qxl_release_fence_buffer_objects(release); ret = qxl_push_command_ring_release(qdev, release, cmd->type, true); - if (ret) - qxl_release_backoff_reserve_list(release); <<<< ???? - else - qxl_release_fence_buffer_objects(release); - out_free_bos: out_free_release: Thank you, Vasily Averin