Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752248AbbBXBvl (ORCPT ); Mon, 23 Feb 2015 20:51:41 -0500 Received: from SpacedOut.fries.net ([67.64.210.234]:52966 "EHLO SpacedOut.fries.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751312AbbBXBvk (ORCPT ); Mon, 23 Feb 2015 20:51:40 -0500 X-Greylist: delayed 813 seconds by postgrey-1.27 at vger.kernel.org; Mon, 23 Feb 2015 20:51:39 EST Date: Mon, 23 Feb 2015 19:37:52 -0600 From: David Fries To: Thorsten Bschorr Cc: linux-kernel@vger.kernel.org, Evgeniy Polyakov Subject: Re: Fwd: w1/slaves/w1_therm: null-ptr access of sl->family_data Message-ID: <20150224013752.GA25909@spacedout.fries.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.3.9 (SpacedOut.fries.net [127.0.0.1]); Mon, 23 Feb 2015 19:37:55 -0600 (CST) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4733 Lines: 116 Thorsten, Are you planning on submitting a patch? FYI, I load the one wire module with wire search_count=1 in /etc/modules which tells it to search the bus once only. That works for me as devices don't come and go on my bus, and it isn't scanning the bus every few seconds just to find out nothing changed. If you are finding devices vanish even when they aren't maybe it will be useful to you as well, though the bigger problem might be why they are vanishing in the first place. On Mon, Feb 23, 2015 at 06:09:16PM +0100, Thorsten Bschorr wrote: > Hi, > > I have observed a null-pointer access in w1/slaves/w1_therm on my Raspberry > Pi running 3.18.5 with several DS18S20 temperature sensors. I have already > discussed the problem with Evgeniy. > > @Evgeniy & David: sorry for re-sending my message, my email > accidentally contained a HTML part. > > > > w1_therm: w1_slave_show checks if the sensor uses external power. If this > is the case, the mutex on dev->bus_mutex is unlocked while waiting 750 ms > for the sensor to convert the temperature. Before reading the temperature, > the mutex is again locked. > > During this sleep-time, the sensor could get detached, for example by w1.c: > w1_search_process_cb not finding the sensor (*): > !test_bit(W1_SLAVE_ACTIVE, &sl->flags) ==true. > This triggers w1_slave_detach, and hence w1_therm_remove_slave frees and > nulls sl->family_data. > > w1_slave_show does not check if familiy_data is null after re-acquiring the > bus_mutex resulting in a null-ptr access when storing data. > > After I've added checks for family_data!=0, I did not observe any more > crashes; the other data of struct w1_slave seem to be valid as long as any > thread executes w1_slave_show. > > > I have added debug-output to w1.c and w1_therm.c, here's a log of a > potential crash: > > [184199.510227] w1_master_driver w1_bus_master2: w1_search_process_cb, > !W1_SLAVE_ACTIVE, calling w1_slave_detach > [184199.510276] w1_slave_driver 10-000802d9c9e4: w1_slave_detach > destroy_now 1 > [184199.510297] w1_slave_driver 10-000802d9c9e4: w1_unref_slave refcnt 0 > [184199.510321] w1_slave_driver 10-000802d9c9e4: w1_unref_slave: detaching > 10-000802d9c9e4 [d5fb8800]. > [184199.510347] w1_slave_driver 10-000802d9c9e4: w1_unref_slave -> > w1_family_notify > [184199.510365] w1_slave_driver 10-000802d9c9e4: w1_family_notify calling > remove_slave > [184199.510382] w1_slave_driver 10-000802d9c9e4: w1_therm_remove_slave > [184199.510400] w1_slave_driver 10-000802d9c9e4: w1_therm_remove_slave > refcnt -1 > [184200.049745] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after > sleep), family_data==0 > [184200.137133] w1_slave_driver 10-000802d9c9e4: w1_slave_show (before > sleep), family_data==0 > [184200.889551] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after > sleep), family_data==0 > [184200.930866] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after > sleep), family_data==0 > [184200.930907] w1_slave_driver 10-000802d9c9e4: Read failed CRC check > [184200.931002] w1_slave_driver 10-000802d9c9e4: w1_unref_slave -> > device_unregister > [184200.931169] w1 w1_unref_slave -> kfree > > Note: When this crash happened, multiple threads were reading the sensor. > > > I could trigger the problem several times, and each time device_unregister > in w1_unref_slave was executed *after* w1_slave_show. In one case, the > logged time-difference between the first family_data==0 message and > device_unregister was about 8 seconds! > I have not observed a w1_slave_show call after w1_unref_slave (as long as > the device was not re-attached again). > > >From my observation, the w1_slave data seem to be valid as long as > w1_slave_show is executed. > My guess is that the call of sysfs_remove_groups in w1_family_notify hits a > mutex (I did not add debug output here). > > > (*) On my tiny raspberry, this happens from time to time with high CPU and > external disc load (timing and/or electrical issues); it seems that the > sensor does not respond in time to the (periodic) search. > > > Please email me if you need further information. > > > > Best regards, > Thorsten Bschorr > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/