Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754679AbbERSZv (ORCPT ); Mon, 18 May 2015 14:25:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35346 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754209AbbERSZu (ORCPT ); Mon, 18 May 2015 14:25:50 -0400 Message-ID: <555A2EAC.9000302@redhat.com> Date: Mon, 18 May 2015 20:25:48 +0200 From: Denys Vlasenko User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: =?UTF-8?B?Q2hyaXN0aWFuIEvDtm5pZw==?= CC: Alex Deucher , linux-kernel@vger.kernel.org Subject: Re: [PATCH] radeon: Shrink radeon_ring_write() References: <1431971955-31231-1-git-send-email-dvlasenk@redhat.com> <1431971955-31231-2-git-send-email-dvlasenk@redhat.com> <555A2B4E.1090309@amd.com> In-Reply-To: <555A2B4E.1090309@amd.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5091 Lines: 132 On 05/18/2015 08:11 PM, Christian König wrote: > De-duplicating the error message is probably a good idea, > but are the remaining code changes really necessary for the size reduction? The conversion from if (ring->count_dw <= 0) error; ... ring->count_dw--; to if (--ring->count_dw < 0) error; ... eliminates one access to ring->count_dw. Conversion from ring->ring[ring->wptr++] = v; ring->wptr &= ring->ptr_mask; to ring->ring[ring->wptr] = v; ring->wptr = (ring->wptr + 1) & ring->ptr_mask; ideally (with infinitely smart compiler) shouldn't be necessary, but at least for my gcc version it eliminates one additional insn: gcc copies ring->wptr to a scratch reg, then increments it, then uses scratch reg for addressing... > On 18.05.2015 19:59, Denys Vlasenko wrote: >> Inlined radeon_ring_write() has 729 callers, which amounts to about 50000 >> bytes of code. however, deinlining it is probably too much >> of a performance impact. >> >> This patch shrinks slow path a bit and optimizes fast path. >> Comparison of generated machine code is below: >> >> old___________________________________ new____________________________ >> 55 push %rbp 55 push %rbp >> 4889e5 mov %rsp,%rbp ff4f38 decl 0x38(%rdi) >> 4154 push %r12 4889e5 mov %rsp,%rbp >> 4189f4 mov %esi,%r12d 4154 push %r12 >> 53 push %rbx 4189f4 mov %esi,%r12d >> 837f3800 cmpl $0x0,0x38(%rdi) 53 push %rbx >> 4889fb mov %rdi,%rbx 4889fb mov %rdi,%rbx >> 7f0e jg <.Lbl> 7905 jns <.Lbl> >> 48c7c78f51a785 mov $message,%rdi >> 31c0 xor %eax,%eax >> e89306f9ff call e8cbffffff call >> .Lbl: >> 8b4328 mov 0x28(%rbx),%eax 8b5328 mov 0x28(%rbx),%edx >> 488b5308 mov 0x8(%rbx),%rdx 488b4308 mov 0x8(%rbx),%rax >> 89c1 mov %eax,%ecx 488d0490 lea (%rax,%rdx,4),%rax >> ffc0 inc %eax >> 488d148a lea (%rdx,%rcx,4),%rdx 448920 mov %r12d,(%rax) >> 448922 mov %r12d,(%rdx) 8b4328 mov 0x28(%rbx),%eax >> 234354 and 0x54(%rbx),%eax ff4b34 decl 0x34(%rbx) >> ff4b38 decl 0x38(%rbx) ffc0 inc %eax >> ff4b34 decl 0x34(%rbx) 234354 and 0x54(%rbx),%eax >> 894328 mov %eax,0x28(%rbx) 894328 mov %eax,0x28(%rbx) >> 5b pop %rbx 5b pop %rbx >> 415c pop %r12 415c pop %r12 >> 5d pop %rbp 5d pop %rbp >> >> This shaves off more than 10 kbytes of code off the kernel: >> >> text data bss dec hex filename >> 85657104 22294872 20627456 128579432 7a9f768 vmlinux.before >> 85646544 22294872 20627456 128568872 7a9ce28 vmlinux >> >> Signed-off-by: Denys Vlasenko >> Cc: Christian König >> Cc: Alex Deucher >> Cc: linux-kernel@vger.kernel.org >> --- >> drivers/gpu/drm/radeon/radeon.h | 11 +++++------ >> drivers/gpu/drm/radeon/radeon_ring.c | 5 +++++ >> 2 files changed, 10 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h >> index bb6b25c..9106873 100644 >> --- a/drivers/gpu/drm/radeon/radeon.h >> +++ b/drivers/gpu/drm/radeon/radeon.h >> @@ -2658,14 +2658,13 @@ void radeon_atombios_fini(struct radeon_device *rdev); >> * >> * Write a value to the requested ring buffer (all asics). >> */ >> +void radeon_ring_overflow(void); >> static inline void radeon_ring_write(struct radeon_ring *ring, uint32_t v) >> { >> - if (ring->count_dw <= 0) >> - DRM_ERROR("radeon: writing more dwords to the ring than expected!\n"); >> - >> - ring->ring[ring->wptr++] = v; >> - ring->wptr &= ring->ptr_mask; >> - ring->count_dw--; >> + if (--ring->count_dw < 0) >> + radeon_ring_overflow(); >> + ring->ring[ring->wptr] = v; >> + ring->wptr = (ring->wptr + 1) & ring->ptr_mask; >> ring->ring_free_dw--; >> } >> diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c >> index 2456f69..8204c23 100644 >> --- a/drivers/gpu/drm/radeon/radeon_ring.c >> +++ b/drivers/gpu/drm/radeon/radeon_ring.c >> @@ -126,6 +126,11 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *ring, unsi >> return 0; >> } >> +void radeon_ring_overflow(void) >> +{ >> + DRM_ERROR("radeon: writing more dwords to the ring than expected!\n"); >> +} >> + >> /** >> * radeon_ring_lock - lock the ring and allocate space on it >> * > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/