Received: by 10.213.65.68 with SMTP id h4csp493065imn; Fri, 23 Mar 2018 08:58:08 -0700 (PDT) X-Google-Smtp-Source: AG47ELv9YdWe5mvo0uVmX1RG3DxWQ4hHEUj5xKet3BF+iJxFfy0joPROJ9kZu5tS0zhBgmzYjpxz X-Received: by 10.99.140.87 with SMTP id q23mr21404536pgn.258.1521820688108; Fri, 23 Mar 2018 08:58:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521820688; cv=none; d=google.com; s=arc-20160816; b=UetDo1rG1Cc0gWr+LWSOzsF0D2RikPlV9wegPCSYL/VO6f+50u3kFDL6auL7MhyWTb c9fx12zWG5yzyRZSPhG8c523hb+TF6055Y+3C5697GxIweBEYGkZTCGs0hSY2FxCOkg3 DMUZU9rKoPzrTHvqy6CN8d4lmReXevYJgwJi/CcvyuBYRJKnRuJUSHuz/ZKlc+/Lr2HR ncsKPjwypqFlZ9b98HfFQfG7iNi0LUZNhIg9CavlSHUpD5Ghx3CbPD3BDmSFE8f95AX/ 7rghCPZPvF8G2zhrfU+SSkLP2wGJWrGtoO9dryrUi+NXDzF0TFuqGeeqswHUQ/22X4/T BbAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:dkim-signature :arc-authentication-results; bh=LEMmWUSzVazMOiweJodGeivQR0w/sSJWPv36AinS2Yo=; b=bOZkM6gF2c/SRc6HOuUIa9e2lkCQLr70i8rndihvjoV9zxSi5iXRst66UqSk/7nMpv hKm9Id5IBQIAz5iAYihGN+4fVjPtEmI9Q46CyDf3MNwXFXredccblHFKkl0j3Od6FxUu mXKwbxGLCN+AHIpK16CbJ+UjWfsmDMd0YgUKzyeqUcoBVr0XJhPx9/an+u9H6T1Cv6Fb khJuCvefaumiAVeo0n4Iun3x6L7+jeISL6r5yvRzDZycPdWi1Bxd6ycoGREach9KLgIi ZyWXiZYmUkJj6UcCpindGIXiUIc/0rwftZEmdJVl7aa+2gs3Q9SycvI9/yYB/27IDq6U /bNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=jgcEiBeu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j11-v6si8417733plk.378.2018.03.23.08.57.22; Fri, 23 Mar 2018 08:58:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=jgcEiBeu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751841AbeCWPyz (ORCPT + 99 others); Fri, 23 Mar 2018 11:54:55 -0400 Received: from mail-lf0-f51.google.com ([209.85.215.51]:41818 "EHLO mail-lf0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750789AbeCWPyx (ORCPT ); Fri, 23 Mar 2018 11:54:53 -0400 Received: by mail-lf0-f51.google.com with SMTP id o102-v6so18983034lfg.8 for ; Fri, 23 Mar 2018 08:54:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language; bh=LEMmWUSzVazMOiweJodGeivQR0w/sSJWPv36AinS2Yo=; b=jgcEiBeu+Dl5mZATsHSya814tlA9Tm9QpsbbZsc+8LHm5TQIGBIvpAIwb+eKtW7JIC iz9S5CSr5zgwoiWyAdwG2y0qxtVvtZ9vW1iylyF6CpVBXYnApdAZhAycsbwXhppUYyWY baB5QaNAObszWe1G8vL8l5cKm53ahZhwjjQS/uoW431g+92srU6R2vf4YW2vBumlWnpT zVubElyBZjpid9wyAKt1BuAC00G3YQ3hWVVhfXm0oLMwGQu2rJ5jvPHrBVqOMZILZLLZ LPWArkni43wUV2XrpKRcde4K6tjKnfZw0hUwrbiNEl8Y1tZkRV6NVbeKYZvH+U0dkYer C7nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=LEMmWUSzVazMOiweJodGeivQR0w/sSJWPv36AinS2Yo=; b=XNjSG307Z0ZNcBpuPJHb6IUWIMUleYPbOkjW8LCIUj0ZVJ50D7kUB6Nh8UZGGXwf3W 2x5eAqqkMxupLW4z2gJ6vOmP9MgMYqZMKF1w5PopYxrtH5iPvHaZvBtTAYpdzR8FuhxE l4GhnV8zjybD2WhDen026HUUan6rv5+ifT3DtDFQgY7PZ7Lkg/eIbP0L6dv453d1JeBG vbdVWAdSmdRlWida8tCqCyecA+39BG2nDTWH8z3p3J3nq+AgSNbQE+UlZSWHlpXAH8LS i5l+yiUX7s4WI8e/Umw0c/UsrJWF1cZvtUUZW4eWn5ovr79+MuO0rSOuIx7baW4Z0Zaa bwRg== X-Gm-Message-State: AElRT7FRojHDwTUO5mZ1OyCdvbj0uznxGIGL8+Cvv2WkOMczjsRDYtNi iWGli5MT3WcS65S0Kzh1ee79WA== X-Received: by 10.46.18.70 with SMTP id t67mr19098433lje.137.1521820492155; Fri, 23 Mar 2018 08:54:52 -0700 (PDT) Received: from [10.17.6.156] (ll-54.209.223.85.sovam.net.ua. [85.223.209.54]) by smtp.googlemail.com with ESMTPSA id c7sm1934972ljk.51.2018.03.23.08.54.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 23 Mar 2018 08:54:51 -0700 (PDT) Subject: Re: [PATCH v3] drm/xen-front: Add support for Xen PV display frontend To: xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, airlied@linux.ie, daniel.vetter@intel.com, seanpaul@chromium.org, gustavo@padovan.org, jgross@suse.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, Oleksandr Andrushchenko References: <1521644293-14612-1-git-send-email-andr2000@gmail.com> <1521644293-14612-2-git-send-email-andr2000@gmail.com> <20180322075648.GI14155@phenom.ffwll.local> From: Oleksandr Andrushchenko Message-ID: <888a2381-3c83-cb23-2854-6add90f2f493@gmail.com> Date: Fri, 23 Mar 2018 17:54:49 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180322075648.GI14155@phenom.ffwll.local> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > My apologies, but I found a few more things that look strange and should > be cleaned up. Sorry for this iterative review approach, but I think we're > slowly getting there. Thank you for reviewing! > Cheers, Daniel > >> --- >> >> +static int xen_drm_drv_dumb_create(struct drm_file *filp, >> + struct drm_device *dev, struct drm_mode_create_dumb *args) >> +{ >> + struct xen_drm_front_drm_info *drm_info = dev->dev_private; >> + struct drm_gem_object *obj; >> + int ret; >> + >> + ret = xen_drm_front_gem_dumb_create(filp, dev, args); >> + if (ret) >> + goto fail; >> + >> + obj = drm_gem_object_lookup(filp, args->handle); >> + if (!obj) { >> + ret = -ENOENT; >> + goto fail_destroy; >> + } >> + >> + drm_gem_object_unreference_unlocked(obj); > You can't drop the reference while you keep using the object, someone else > might sneak in and destroy your object. The unreference always must be > last. Will fix, thank you >> + >> + /* >> + * In case of CONFIG_DRM_XEN_FRONTEND_CMA gem_obj is constructed >> + * via DRM CMA helpers and doesn't have ->pages allocated >> + * (xendrm_gem_get_pages will return NULL), but instead can provide >> + * sg table >> + */ >> + if (xen_drm_front_gem_get_pages(obj)) >> + ret = xen_drm_front_dbuf_create_from_pages( >> + drm_info->front_info, >> + xen_drm_front_dbuf_to_cookie(obj), >> + args->width, args->height, args->bpp, >> + args->size, >> + xen_drm_front_gem_get_pages(obj)); >> + else >> + ret = xen_drm_front_dbuf_create_from_sgt( >> + drm_info->front_info, >> + xen_drm_front_dbuf_to_cookie(obj), >> + args->width, args->height, args->bpp, >> + args->size, >> + xen_drm_front_gem_get_sg_table(obj)); >> + if (ret) >> + goto fail_destroy; >> + > The above also has another race: If you construct an object, then it must > be fully constructed by the time you publish it to the wider world. In gem > this is done by calling drm_gem_handle_create() - after that userspace can > get at your object and do nasty things with it in a separate thread, > forcing your driver to Oops if the object isn't fully constructed yet. > > That means you need to redo this code here to make sure that the gem > object is fully set up (including pages and sg tables) _before_ anything > calls drm_gem_handle_create(). You are correct, I need to rework this code > > This probably means you also need to open-code the cma side, by first > calling drm_gem_cma_create(), then doing any additional setup, and finally > doing the registration to userspace with drm_gem_handle_create as the very > last thing. Although I tend to avoid open-coding, but this seems the necessary measure here > > Alternativet is to do the pages/sg setup only when you create an fb (and > drop the pages again when the fb is destroyed), but that requires some > refcounting/locking in the driver. Not sure this will work: nothing prevents you from attaching multiple FBs to a single dumb handle So, not only ref-counting should be done here, but I also need to check if the dumb buffer, we are attaching to, has been created already So, I will rework with open-coding some stuff from CMA helpers > > Aside: There's still a lot of indirection and jumping around which makes > the code a bit hard to follow. Probably I am not sure of which indirection we are talking about, could you please specifically mark those annoying you? > >> + >> +static void xen_drm_drv_release(struct drm_device *dev) >> +{ >> + struct xen_drm_front_drm_info *drm_info = dev->dev_private; >> + struct xen_drm_front_info *front_info = drm_info->front_info; >> + >> + drm_atomic_helper_shutdown(dev); >> + drm_mode_config_cleanup(dev); >> + >> + xen_drm_front_evtchnl_free_all(front_info); >> + dbuf_free_all(&front_info->dbuf_list); >> + >> + drm_dev_fini(dev); >> + kfree(dev); >> + >> + /* >> + * Free now, as this release could be not due to rmmod, but >> + * due to the backend disconnect, making drm_info hang in >> + * memory until rmmod >> + */ >> + devm_kfree(&front_info->xb_dev->dev, front_info->drm_info); >> + front_info->drm_info = NULL; >> + >> + /* Tell the backend we are ready to (re)initialize */ >> + xenbus_switch_state(front_info->xb_dev, XenbusStateInitialising); > This needs to be in the unplug code. Yes that means you'll have multiple > drm_devices floating around, but that's how hotplug works. That would also > mean that you need to drop the front_info pointer from the backend at > unplug time. > > If you don't like those semantics then the only other option is to never > destroy the drm_device, but only mark the drm_connector as disconnected > when the xenbus backend is gone. But this half-half solution here where > you hotunplug the drm_device but want to keep it around still doesn't work > from a livetime pov. I'll try to play with this: on backend disconnect I will do the following:     drm_dev_unplug(dev)     xen_drm_front_evtchnl_free_all(front_info);     dbuf_free_all(&front_info->dbuf_list);     devm_kfree(&front_info->xb_dev->dev, front_info->drm_info);     front_info->drm_info = NULL;     xenbus_switch_state(front_info->xb_dev, XenbusStateInitialising); on drm_driver.release callback:     drm_atomic_helper_shutdown(dev);     drm_mode_config_cleanup(dev);     drm_dev_fini(dev);     kfree(dev); Does the above make sense? >> +static struct xenbus_driver xen_driver = { >> + .ids = xen_driver_ids, >> + .probe = xen_drv_probe, >> + .remove = xen_drv_remove, > I still don't understand why you have both the remove and fini versions of > this. See other comments, I think the xenbus vs. drm_device lifetime stuff > still needs to be cleaned up some more. This shouldn't be that hard > really. > > Or maybe I'm just totally misunderstanding this frontend vs. backend split > in xen, so if you have a nice gentle intro text for why that exists, it > might help. Probably misunderstanding comes from the fact that it is possible if backend dies it may still have its XenBus state set to connected, thus displback_disconnect callback will never be called. For that reason on rmmod I call fini for the DRM driver to destroy it. >> + /* >> + * pflip_timeout is set to current jiffies once we send a page flip and >> + * reset to 0 when we receive frame done event from the backed. >> + * It is checked during drm_connector_helper_funcs.detect_ctx to detect >> + * time-outs for frame done event, e.g. due to backend errors. >> + * >> + * This must be protected with front_info->io_lock, so races between >> + * interrupt handler and rest of the code are properly handled. >> + */ >> + unsigned long pflip_timeout; >> + >> + bool conn_connected; > I'm pretty sure this doesn't work. Especially the check in display_check > confuses me, if there's ever an error then you'll never ever be able to > display anything again, except when someone disables the display. That was the idea to allow dummy user-space to get an error in display_check and close, going through display_disable. Yes, compositors will die in this case. > If you want to signal errors with the output then this must be done > through the new link-status property and > drm_mode_connector_set_link_status_property. Rejecting kms updates in > display_check with -EINVAL because the hw has a temporary issue is kinda > not cool (because many compositors just die when this happens). I thought > we agreed already to remove that, sorry for not spotting that in the > previous version. Unfortunatelly, there is little software available which will benefit from this out of the box. I am specifically interested in embedded use-cases, e.g. Android (DRM HWC2 - doesn't support hotplug, HWC1.4 doesn't support link status), Weston (no device hotplug, but connectors and outputs). Other software, like kmscube, modetest will not handle that as well. So, such software will hang forever until killed. > > Some of the conn_connected checks also look a bit like they should be > replaced by drm_dev_is_unplugged instead, but I'm not sure. I believe you are talking about drm_simple_display_pipe_funcs? Do you mean I have to put drm_dev_is_unplugged in display_enable, display_disable and display_update callbacks? >> +static int connector_detect(struct drm_connector *connector, >> + struct drm_modeset_acquire_ctx *ctx, >> + bool force) >> +{ >> + struct xen_drm_front_drm_pipeline *pipeline = >> + to_xen_drm_pipeline(connector); >> + struct xen_drm_front_info *front_info = pipeline->drm_info->front_info; >> + unsigned long flags; >> + >> + /* check if there is a frame done event time-out */ >> + spin_lock_irqsave(&front_info->io_lock, flags); >> + if (pipeline->pflip_timeout && >> + time_after_eq(jiffies, pipeline->pflip_timeout)) { >> + DRM_ERROR("Frame done event timed-out\n"); >> + >> + pipeline->pflip_timeout = 0; >> + pipeline->conn_connected = false; >> + xen_drm_front_kms_send_pending_event(pipeline); >> + } >> + spin_unlock_irqrestore(&front_info->io_lock, flags); > If you want to check for timeouts please use a worker, don't piggy-pack on > top of the detect callback. Ok, will have a dedicated work for that. The reasons why I put this into the detect callback were: - the periodic worker is already there, and I do nothing heavy   in this callback - if frame done has timed out it most probably means that   backend has gone, so 10 sec period of detect timeout is also ok: thus I don't   need to schedule a work each page flip which could be a bit costly So, probably I will also need a periodic work (or kthread/timer) for frame done time-outs >> +static int connector_mode_valid(struct drm_connector *connector, >> + struct drm_display_mode *mode) >> +{ >> + struct xen_drm_front_drm_pipeline *pipeline = >> + to_xen_drm_pipeline(connector); >> + >> + if (mode->hdisplay != pipeline->width) >> + return MODE_ERROR; >> + >> + if (mode->vdisplay != pipeline->height) >> + return MODE_ERROR; >> + >> + return MODE_OK; >> +} > mode_valid on the connector only checks probe modes. Since that is > hardcoded this doesn't do much, which means userspace can give you a wrong > mode, and you fall over. Agree, I will remove this callback completely: I have drm_connector_funcs.fill_modes == drm_helper_probe_single_connector_modes, so it will only pick my single hardcoded mode from drm_connector_helper_funcs.get_modes callback (connector_get_modes). > > You need to use one of the other mode_valid callbacks instead, > drm_simple_display_pipe_funcs has the one you should use. > Not sure I understand why do I need to provide a callback here? For simple KMS the drm_simple_kms_crtc_mode_valid callback is used, which always returns MODE_OK if there is no .mode_valid set for the pipe. As per my understanding drm_simple_kms_crtc_mode_valid is only called for modes, which were collected by drm_helper_probe_single_connector_modes, so I assume each time .validate_mode is called it can only have my hardcoded mode to validate? >> + >> +static int display_check(struct drm_simple_display_pipe *pipe, >> + struct drm_plane_state *plane_state, >> + struct drm_crtc_state *crtc_state) >> +{ >> + struct xen_drm_front_drm_pipeline *pipeline = >> + to_xen_drm_pipeline(pipe); >> + >> + return pipeline->conn_connected ? 0 : -EINVAL; > As mentioned, this -EINVAL here needs to go. Since you already have a > mode_valid callback you can (should) drop this one here entirely. Not sure how mode_valid is relevant to this code [1]: This function is called in the check phase of an atomic update, specifically when the underlying plane is checked. But, anyways: the reason for this callback and it returning -EINVAL is primarialy for a dumb user-space which cannot handle hotplug events. But, as you mentioned before, it will make most compositors die, so I will remove this Thank you for reviewing, Oleksandr [1] https://elixir.bootlin.com/linux/v4.16-rc6/source/include/drm/drm_simple_kms_helper.h#L59