Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965096AbcKXMb3 (ORCPT ); Thu, 24 Nov 2016 07:31:29 -0500 Received: from mail3.start.ca ([64.140.120.243]:52445 "EHLO mail3.start.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S938696AbcKXMb0 (ORCPT ); Thu, 24 Nov 2016 07:31:26 -0500 Subject: Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable To: Hayes Wang , "netdev@vger.kernel.org" References: <1394712342-15778-226-Taiwan-albertk@realtek.com> <1394712342-15778-227-Taiwan-albertk@realtek.com> <0835B3720019904CB8F7AA43166CEEB20105013E@RTITMBSV03.realtek.com.tw> <0835B3720019904CB8F7AA43166CEEB201050B7E@RTITMBSV03.realtek.com.tw> <0835B3720019904CB8F7AA43166CEEB2010517CE@RTITMBSV03.realtek.com.tw> <0835B3720019904CB8F7AA43166CEEB201051D51@RTITMBSV03.realtek.com.tw> <95fa9f67-3af6-6749-0e2b-c95406486f7d@pobox.com> Cc: nic_swsd , "linux-kernel@vger.kernel.org" , "linux-usb@vger.kernel.org" From: Mark Lord Message-ID: Date: Thu, 24 Nov 2016 07:31:17 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <95fa9f67-3af6-6749-0e2b-c95406486f7d@pobox.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2273 Lines: 58 On 16-11-23 02:29 PM, Mark Lord wrote: > On 16-11-23 10:12 AM, Hayes Wang wrote: >> Mark Lord [mlord@pobox.com] >> [...] >>> What does this code do: >> >>>> static void r8153_set_rx_early_size(struct r8152 *tp) >>>> { >>>> u32 mtu = tp->netdev->mtu; >>>> u32 ocp_data = (agg_buf_sz - mtu - VLAN_ETH_HLEN - VLAN_HLEN) / 4; >>>> >>>> ocp_write_word(tp, MCU_TYPE_USB, USB_RX_EARLY_SIZE, ocp_data); >>>> } >> >> This only works for RTL8153. However, what you use is RTL8152. >> It is like delay completion. It is used to reduce the loading of CPU >> by letting a transfer contain more data to reduce the number of >> transfers. >> >>> How is ocp_data used by the hardware? >>> Shouldn't the calculation also include sizeof(rx_desc) in there somewhere? >> >> The algorithm is from our hw engineers, and it should be >> >> (agg_buf_sz - packet size) / 8 >> >> You could refer to commit a59e6d815226 ("r8152: correct the rx early size"). > > Thanks. > > Right now I am working quite hard trying to narrow things down exactly. > You are correct that the driver does appear to be careful about accesses > beyond the filled portion of a URB buffer -- for some reason I thought > the original driver had issues there, but looking again it does not seem to. > > One idea that is now looking more likely: > Things could be suffering from speculative CPU accesses to RAM > (the system here has non-coherent d-cache/RAM). > This could incorrectly pre-load data from adjacent URB buffers > into the d-cache, creating coherency issues. I am testing now > with cacheline-sized guard zones between the buffers to see if > that is the issue or not. Nope. Guard zones did not fix it, so it's probably not a prefetch issue. Oddly, adding a couple of memory barriers to specific places in the driver does help, A LOT. Still not 100%, but it did pass 1800 reboot tests over night with only three bad rx_desc's reported. That's a new record here for the driver using kmalloc'd buffers, and put reliability on par with using non-cacheable buffers. Any way we look at it though, the chip/driver are simply unreliable, and relying upon hardware checksums (which fail due to the driver looking at garbage rather than the checksum bits) leads to data corruption. Cheers