Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp767985ybe; Thu, 5 Sep 2019 05:47:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqy8RZHViqVOIptipNq9otYLHMsNb0rHjMWySBnu2O7pnbeVV9hLGSogzf0XqT2dtGJWfz5m X-Received: by 2002:a17:90a:c086:: with SMTP id o6mr3591492pjs.24.1567687642514; Thu, 05 Sep 2019 05:47:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567687642; cv=none; d=google.com; s=arc-20160816; b=cTrueR0lkkwvBiTh0Hqc+yOpAvD5qYUsRB5CZoouWhlkNPp/6kdtrCacEvMU35Qu3c 68GQcwTaRqcju9OlAvHnvL1VYJzBFWIF7zDhLdxGSTJOc/6QajPup7xc/tVUnlupqUMo DVJw0Rrq9rYjGoHYdMijtHDclzym1e/bxS7M1b7zvSFRW8Y+iKlPV0WeN+dX1iNwSswe utYpsp7fFNAbI3i3dyLc+z4/4wvmIH+e+M8kXIe/lTGT27PYjctI8FQLGjzEcOb0YZEc d85LV4wxYIslgaIlg0Jk922MzbDsXIzx08TfACDDhM0zK95czuoXI/eY465hQPDMN8Jn 1cMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=NIpCc3agBfyn/hYlVZDi+zpXjMl7RIQUehfUI+jQDVc=; b=BY50lPpl8TTAD3JLTet00ImusSaqbrodv1klzEY9zu0CuskO463yDyvb6NcXqIVjI+ onXDZuNRXTqdLQnht/1Im9XuAeQXZN5Wi4nN1m5gvzWyZlXCEi1XrrQ0JnbuipNC6efz 2XHjN7WNY53h0QJ5g76RLDf9c2cnH5xJrcUwp8KUYEKHiY26GAHf/uvnjksBwn9Pu596 avKjnNEt8OEsOvC9VP0BxAQ6KTeYRrfnY5UuIXKxAKaU1AKE8gGElVVnXHSJehw/Sx1R Uqaq4hKafl1bKAN+4H79PcnFZQ9Xm1gDWeIhcG0zxQx/pxqNGHg/LfgC3CUXR+fzlHdA syyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y63si1670419pgd.403.2019.09.05.05.47.06; Thu, 05 Sep 2019 05:47:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387560AbfIEKrt (ORCPT + 99 others); Thu, 5 Sep 2019 06:47:49 -0400 Received: from mga17.intel.com ([192.55.52.151]:3200 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731775AbfIEKrs (ORCPT ); Thu, 5 Sep 2019 06:47:48 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Sep 2019 03:47:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,470,1559545200"; d="scan'208";a="177265125" Received: from shbuild999.sh.intel.com (HELO localhost) ([10.239.147.113]) by orsmga008.jf.intel.com with ESMTP; 05 Sep 2019 03:47:45 -0700 Date: Thu, 5 Sep 2019 18:48:11 +0800 From: Feng Tang To: Daniel Vetter Cc: Dave Airlie , Stephen Rothwell , "Chen, Rong A" , Michel D?nzer , Linux Kernel Mailing List , dri-devel , Thomas Zimmermann , LKP Subject: Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression Message-ID: <20190905104811.GF5541@shbuild999.sh.intel.com> References: <2e1b4d65-d477-f571-845d-fa0a670859af@suse.de> <20190904062716.GC5541@shbuild999.sh.intel.com> <72c33bf1-9184-e24a-c084-26d9c8b6f9b7@suse.de> <20190904083558.GD5541@shbuild999.sh.intel.com> <20190905065917.GE5541@shbuild999.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 05, 2019 at 06:37:47PM +0800, Daniel Vetter wrote: > On Thu, Sep 5, 2019 at 8:58 AM Feng Tang wrote: > > > > Hi Vetter, > > > > On Wed, Sep 04, 2019 at 01:20:29PM +0200, Daniel Vetter wrote: > > > On Wed, Sep 4, 2019 at 1:15 PM Dave Airlie wrote: > > > > > > > > On Wed, 4 Sep 2019 at 19:17, Daniel Vetter wrote: > > > > > > > > > > On Wed, Sep 4, 2019 at 10:35 AM Feng Tang wrote: > > > > > > > > > > > > Hi Daniel, > > > > > > > > > > > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote: > > > > > > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann wrote: > > > > > > > > > > > > > > > > Hi > > > > > > > > > > > > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang: > > > > > > > > >> Thank you for testing. But don't get too excited, because the patch > > > > > > > > >> simulates a bug that was present in the original mgag200 code. A > > > > > > > > >> significant number of frames are simply skipped. That is apparently the > > > > > > > > >> reason why it's faster. > > > > > > > > > > > > > > > > > > Thanks for the detailed info, so the original code skips time-consuming > > > > > > > > > work inside atomic context on purpose. Is there any space to optmise it? > > > > > > > > > If 2 scheduled update worker are handled at almost same time, can one be > > > > > > > > > skipped? > > > > > > > > > > > > > > > > To my knowledge, there's only one instance of the worker. Re-scheduling > > > > > > > > the worker before a previous instance started, will not create a second > > > > > > > > instance. The worker's instance will complete all pending updates. So in > > > > > > > > some way, skipping workers already happens. > > > > > > > > > > > > > > So I think that the most often fbcon update from atomic context is the > > > > > > > blinking cursor. If you disable that one you should be back to the old > > > > > > > performance level I think, since just writing to dmesg is from process > > > > > > > context, so shouldn't change. > > > > > > > > > > > > Hmm, then for the old driver, it should also do the most update in > > > > > > non-atomic context? > > > > > > > > > > > > One other thing is, I profiled that updating a 3MB shadow buffer needs > > > > > > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with > > > > > > the cache setting of DRM shadow buffer? say the orginal code use a > > > > > > cachable buffer? > > > > > > > > > > Hm, that would indicate the write-combining got broken somewhere. This > > > > > should definitely be faster. Also we shouldn't transfer the hole > > > > > thing, except when scrolling ... > > > > > > > > First rule of fbcon usage, you are always effectively scrolling. > > > > > > > > Also these devices might be on a PCIE 1x piece of wet string, not sure > > > > if the numbers reflect that. > > > > > > pcie 1x 1.0 is 250MB/s, so yeah with a bit of inefficiency and > > > overhead not entirely out of the question that 150MB/s is actually the > > > hw limit. If it's really pcie 1x 1.0, no idea where to check that. > > > Also might be worth to double-check that the gpu pci bar is listed as > > > wc in debugfs/x86/pat_memtype_list. > > > > Here is some dump of the device info and the pat_memtype_list, while it is > > running other 0day task: > > Looks all good, I guess Dave is right with this probably only being a > real slow, real old pcie link, plus maybe some inefficiencies in the > mapping. Your 150MB/s, was that just the copy, or did you include all > the setup/map/unmap/teardown too in your measurement in the trace? Following is the breakdown, the 19240 us is the memory copy time The drm_fb_helper_dirty_work() calls sequentially 1. drm_client_buffer_vmap (290 us) 2. drm_fb_helper_dirty_blit_real (19240 us) 3. helper->fb->funcs->dirty() ---> NULL for mgag200 driver 4. drm_client_buffer_vunmap (215 us) Thanks, Feng > -Daniel > > > > > controller info > > ================= > > 03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller]) > > Subsystem: Intel Corporation MGA G200e [Pilot] ServerEngines (SEP1) > > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- > Interrupt: pin A routed to IRQ 16 > > NUMA node: 0 > > Region 0: Memory at d0000000 (32-bit, prefetchable) [size=16M] > > Region 1: Memory at d1800000 (32-bit, non-prefetchable) [size=16K] > > Region 2: Memory at d1000000 (32-bit, non-prefetchable) [size=8M] > > Expansion ROM at 000c0000 [disabled] [size=128K] > > Capabilities: [dc] Power Management version 2 > > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) > > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > > Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00 > > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us > > ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- > > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- > > MaxPayload 128 bytes, MaxReadReq 128 bytes > > DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend- > > LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns, L1 <1us > > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- > > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > > Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit- > > Address: 00000000 Data: 0000 > > Kernel driver in use: mgag200 > > Kernel modules: mgag200 > > > > > > Related pat setting > > =================== > > uncached-minus @ 0xc0000000-0xc0001000 > > uncached-minus @ 0xc0000000-0xd0000000 > > uncached-minus @ 0xc0008000-0xc0009000 > > uncached-minus @ 0xc0009000-0xc000a000 > > uncached-minus @ 0xc0010000-0xc0011000 > > uncached-minus @ 0xc0011000-0xc0012000 > > uncached-minus @ 0xc0012000-0xc0013000 > > uncached-minus @ 0xc0013000-0xc0014000 > > uncached-minus @ 0xc0018000-0xc0019000 > > uncached-minus @ 0xc0019000-0xc001a000 > > uncached-minus @ 0xc001a000-0xc001b000 > > write-combining @ 0xd0000000-0xd0300000 > > write-combining @ 0xd0000000-0xd1000000 > > uncached-minus @ 0xd1800000-0xd1804000 > > uncached-minus @ 0xd1900000-0xd1980000 > > uncached-minus @ 0xd1980000-0xd1981000 > > uncached-minus @ 0xd1a00000-0xd1a80000 > > uncached-minus @ 0xd1a80000-0xd1a81000 > > uncached-minus @ 0xd1f10000-0xd1f11000 > > uncached-minus @ 0xd1f11000-0xd1f12000 > > uncached-minus @ 0xd1f12000-0xd1f13000 > > > > Host bridge info > > ================ > > 00:00.0 Host bridge: Intel Corporation Device 7853 > > Subsystem: Intel Corporation Device 0000 > > Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort+ SERR- > Interrupt: pin A routed to IRQ 0 > > NUMA node: 0 > > Capabilities: [90] Express (v2) Root Port (Slot-), MSI 00 > > DevCap: MaxPayload 128 bytes, PhantFunc 0 > > ExtTag- RBE+ > > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > > MaxPayload 128 bytes, MaxReadReq 128 bytes > > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > > LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L1, Exit Latency L0s <512ns, L1 <4us > > ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+ > > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- > > RootCtl: ErrCorrectable+ ErrNon-Fatal+ ErrFatal+ PMEIntEna- CRSVisible- > > RootCap: CRSVisible- > > RootSta: PME ReqID 0000, PMEStatus- PMEPending- > > DevCap2: Completion Timeout: Range BCD, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd- > > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- > > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- > > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > > Compliance De-emphasis: -6dB > > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- > > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > > Capabilities: [e0] Power Management version 3 > > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > > Capabilities: [100 v1] Vendor Specific Information: ID=0002 Rev=0 Len=00c > > Capabilities: [144 v1] Vendor Specific Information: ID=0004 Rev=1 Len=03c > > Capabilities: [1d0 v1] Vendor Specific Information: ID=0003 Rev=1 Len=00a > > Capabilities: [250 v1] #19 > > Capabilities: [280 v1] Vendor Specific Information: ID=0005 Rev=3 Len=018 > > Capabilities: [298 v1] Vendor Specific Information: ID=0007 Rev=0 Len=024 > > > > > > Thanks, > > Feng > > > > > > > > > > -Daniel > > > -- > > > Daniel Vetter > > > Software Engineer, Intel Corporation > > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch