Subject: Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable
To: Hayes Wang <hayeswang@realtek.com>,
        "netdev@vger.kernel.org" <netdev@vger.kernel.org>
References: <1394712342-15778-226-Taiwan-albertk@realtek.com>
 <1394712342-15778-227-Taiwan-albertk@realtek.com>
 <0835B3720019904CB8F7AA43166CEEB20105013E@RTITMBSV03.realtek.com.tw>
 <d683c019-4e0f-6fe6-368c-c4fc86c72fe6@pobox.com>
 <0835B3720019904CB8F7AA43166CEEB201050B7E@RTITMBSV03.realtek.com.tw>
 <b9809516-d036-bfc3-b7a3-6563033ec957@pobox.com>
 <0835B3720019904CB8F7AA43166CEEB2010517CE@RTITMBSV03.realtek.com.tw>
 <b61a0e10-c737-8912-3e4e-393f4eb32077@pobox.com>
 <0835B3720019904CB8F7AA43166CEEB201051D51@RTITMBSV03.realtek.com.tw>
 <95fa9f67-3af6-6749-0e2b-c95406486f7d@pobox.com>
Cc: nic_swsd <nic_swsd@realtek.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>
From: Mark Lord <mlord@pobox.com>
Message-ID: <baf5246d-9d8a-4029-6823-350ed561fd33@pobox.com>
Date: Thu, 24 Nov 2016 07:31:17 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <95fa9f67-3af6-6749-0e2b-c95406486f7d@pobox.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2273
Lines: 58

On 16-11-23 02:29 PM, Mark Lord wrote:
> On 16-11-23 10:12 AM, Hayes Wang wrote:
>> Mark Lord [mlord@pobox.com]
>> [...]
>>> What does this code do:
>>
>>>> static void r8153_set_rx_early_size(struct r8152 *tp)
>>>> {
>>>>        u32 mtu = tp->netdev->mtu;
>>>>        u32 ocp_data = (agg_buf_sz - mtu - VLAN_ETH_HLEN - VLAN_HLEN) / 4;
>>>>
>>>>        ocp_write_word(tp, MCU_TYPE_USB, USB_RX_EARLY_SIZE, ocp_data);
>>>> }
>>
>> This only works for RTL8153. However, what you use is RTL8152.
>> It is like delay completion. It is used to reduce the loading of CPU
>> by letting a transfer contain more data to reduce the number of
>> transfers.
>>
>>> How is ocp_data used by the hardware?
>>> Shouldn't the calculation also include sizeof(rx_desc) in there somewhere?
>>
>> The algorithm is from our hw engineers, and it should be
>>
>>    (agg_buf_sz - packet size) / 8
>>
>> You could refer to commit a59e6d815226 ("r8152: correct the rx early size").
>
> Thanks.
>
> Right now I am working quite hard trying to narrow things down exactly.
> You are correct that the driver does appear to be careful about accesses
> beyond the filled portion of a URB buffer -- for some reason I thought
> the original driver had issues there, but looking again it does not seem to.
>
> One idea that is now looking more likely:
> Things could be suffering from speculative CPU accesses to RAM
> (the system here has non-coherent d-cache/RAM).
> This could incorrectly pre-load data from adjacent URB buffers
> into the d-cache, creating coherency issues.  I am testing now
> with cacheline-sized guard zones between the buffers to see if
> that is the issue or not.

Nope.  Guard zones did not fix it, so it's probably not a prefetch issue.
Oddly, adding a couple of memory barriers to specific places in the driver
does help, A LOT.  Still not 100%, but it did pass 1800 reboot tests over night
with only three bad rx_desc's reported.

That's a new record here for the driver using kmalloc'd buffers,
and put reliability on par with using non-cacheable buffers.

Any way we look at it though, the chip/driver are simply unreliable,
and relying upon hardware checksums (which fail due to the driver
looking at garbage rather than the checksum bits) leads to data corruption.

Cheers