DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:to:subject:cc:in-reply-to:mime-version
         :content-type:content-transfer-encoding:content-disposition
         :references;
        b=sy7aX7D+nk8soZlTgX2315/b2Q7nQjcZrlybmUjZ+hS9QBGaE+e/UNgQYk80tIB/9k
         c6C80Cat9/9rP1xUBY+7ql6qVtctwvpsdhgBezCuTMWYYuZOpvjqTuy2mCBk/uMAPhXo
         BCHF9/S22dgPmRaaEMiw/+I1W34mdpS7+LC/E=
Message-ID: <b9df5fa10808112021v7bb48c9ejefab284bea82c837@mail.gmail.com>
Date: Tue, 12 Aug 2008 09:21:49 +0600
From: "Rakib Mullick" <rakib.mullick@gmail.com>
To: "Max Krasnyansky" <maxk@qualcomm.com>
Subject: Re: [PATCH] cpuset: Rework sched domains and CPU hotplug handling (take 4)
Cc: linux-kernel@vger.kernel.org
In-Reply-To: <48A0B16B.2080801@qualcomm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <1218490433-10576-1-git-send-email-maxk@qualcomm.com>
	 <48A0B16B.2080801@qualcomm.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 24215
Lines: 588

On 8/12/08, Max Krasnyansky <maxk@qualcomm.com> wrote:
>
>
>  Max Krasnyansky wrote:
>  > This is an updated version of my previous cpuset patch on top of
>  > the latest mainline git.
>  > The patch fixes CPU hotplug handling issues in the current cpusets code.
>  > Namely circular locking in rebuild_sched_domains() and unsafe access to
>  > the cpu_online_map in the cpuset cpu hotplug handler.
>
> Minor correction. I meant unsafe access to the cpu_online_map in the _memory_
>
> hotplug handler.
>
>
>  >
>  > This version includes changes suggested by Paul Jackson (naming, comments,
>  > style, etc). I also got rid of the separate workqueue thread because it is
>  > now safe to call get_online_cpus() from workqueue callbacks.
>  >
>  > --
>  > Here are some more details.
>  >
>  > rebuild_sched_domains() is the only way to rebuild sched domains
>  > correctly based on the current cpuset settings. What this means
>  > is that we need to be able to call it from different contexts,
>  > like cpu hotplug for example.
>  > Also latest scheduler code in -tip now calls rebuild_sched_domains()
>  > directly from functions like arch_reinit_sched_domains().
>  >
>  > In order to support that properly we need to rework cpuset locking
>  > rules to avoid circular dependencies, which is what this patch does.
>  > New lock nesting rules are explained in the comments.
>  > We can now safely call rebuild_sched_domains() from virtually any
>  > context. The only requirement is that it needs to be called under
>  > get_online_cpus(). This allows cpu hotplug handlers and the scheduler
>  > to call rebuild_sched_domains() directly.
>  > The rest of the cpuset code now offloads sched domains rebuilds to
>  > a workqueue (async_rebuild_sched_domains()).
>  >
>  > This version of the patch addresses comments from the previous review.
>  > I fixed all miss-formated comments and trailing spaces.
>  >
>  > I also factored out the code that builds domain masks and split up CPU and
>  > memory hotplug handling. This was needed to simplify locking, to avoid unsafe
>  > access to the cpu_online_map from mem hotplug handler, and in general to make
>  > things cleaner.
>  >
>  > The patch passes moderate testing (building kernel with -j 16, creating &
>  > removing domains and bringing cpus off/online at the same time) on the
>  > quad-core2 based machine.
>  > It passes lockdep checks, even with preemptable RCU enabled.
>  > This time I also tested in with suspend/resume path and everything is working
>  > as expected.
>  >
>  > Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
>  > Cc: mingo@elte.hu
>  > Cc: pj@sgi.com
>  > Cc: menage@google.com
>  > Cc: a.p.zijlstra@chello.nl
>  > Cc: vegard.nossum@gmail.com
>  > ---
>  >  kernel/cpuset.c |  312 ++++++++++++++++++++++++++++++++-----------------------
>  >  1 files changed, 182 insertions(+), 130 deletions(-)
>  >
>  > diff --git a/kernel/cpuset.c b/kernel/cpuset.c
>  > index d5ab79c..f227bc1 100644
>  > --- a/kernel/cpuset.c
>  > +++ b/kernel/cpuset.c
>  > @@ -14,6 +14,8 @@
>  >   *  2003-10-22 Updates by Stephen Hemminger.
>  >   *  2004 May-July Rework by Paul Jackson.
>  >   *  2006 Rework by Paul Menage to use generic cgroups
>  > + *  2008 Rework of the scheduler domains and CPU hotplug handling
>  > + *       by Max Krasnyansky
>  >   *
>  >   *  This file is subject to the terms and conditions of the GNU General Public
>  >   *  License.  See the file COPYING in the main directory of the Linux
>  > @@ -236,9 +238,11 @@ static struct cpuset top_cpuset = {
>  >
>  >  static DEFINE_MUTEX(callback_mutex);
>  >
>  > -/* This is ugly, but preserves the userspace API for existing cpuset
>  > +/*
>  > + * This is ugly, but preserves the userspace API for existing cpuset
>  >   * users. If someone tries to mount the "cpuset" filesystem, we
>  > - * silently switch it to mount "cgroup" instead */
>  > + * silently switch it to mount "cgroup" instead
>  > + */
>  >  static int cpuset_get_sb(struct file_system_type *fs_type,
>  >                        int flags, const char *unused_dev_name,
>  >                        void *data, struct vfsmount *mnt)
>  > @@ -473,10 +477,9 @@ static int validate_change(const struct cpuset *cur, const struct cpuset *trial)
>  >  }
>  >
>  >  /*
>  > - * Helper routine for rebuild_sched_domains().
>  > + * Helper routine for generate_sched_domains().
>  >   * Do cpusets a, b have overlapping cpus_allowed masks?
>  >   */
>  > -
>  >  static int cpusets_overlap(struct cpuset *a, struct cpuset *b)
>  >  {
>  >       return cpus_intersects(a->cpus_allowed, b->cpus_allowed);
>  > @@ -518,26 +521,15 @@ update_domain_attr_tree(struct sched_domain_attr *dattr, struct cpuset *c)
>  >  }
>  >
>  >  /*
>  > - * rebuild_sched_domains()
>  > - *
>  > - * This routine will be called to rebuild the scheduler's dynamic
>  > - * sched domains:
>  > - * - if the flag 'sched_load_balance' of any cpuset with non-empty
>  > - *   'cpus' changes,
>  > - * - or if the 'cpus' allowed changes in any cpuset which has that
>  > - *   flag enabled,
>  > - * - or if the 'sched_relax_domain_level' of any cpuset which has
>  > - *   that flag enabled and with non-empty 'cpus' changes,
>  > - * - or if any cpuset with non-empty 'cpus' is removed,
>  > - * - or if a cpu gets offlined.
>  > - *
>  > - * This routine builds a partial partition of the systems CPUs
>  > - * (the set of non-overlappping cpumask_t's in the array 'part'
>  > - * below), and passes that partial partition to the kernel/sched.c
>  > - * partition_sched_domains() routine, which will rebuild the
>  > - * schedulers load balancing domains (sched domains) as specified
>  > - * by that partial partition.  A 'partial partition' is a set of
>  > - * non-overlapping subsets whose union is a subset of that set.
>  > + * generate_sched_domains()
>  > + *
>  > + * This function builds a partial partition of the systems CPUs
>  > + * A 'partial partition' is a set of non-overlapping subsets whose
>  > + * union is a subset of that set.
>  > + * The output of this function needs to be passed to kernel/sched.c
>  > + * partition_sched_domains() routine, which will rebuild the scheduler's
>  > + * load balancing domains (sched domains) as specified by that partial
>  > + * partition.
>  >   *
>  >   * See "What is sched_load_balance" in Documentation/cpusets.txt
>  >   * for a background explanation of this.
>  > @@ -547,13 +539,7 @@ update_domain_attr_tree(struct sched_domain_attr *dattr, struct cpuset *c)
>  >   * domains when operating in the severe memory shortage situations
>  >   * that could cause allocation failures below.
>  >   *
>  > - * Call with cgroup_mutex held.  May take callback_mutex during
>  > - * call due to the kfifo_alloc() and kmalloc() calls.  May nest
>  > - * a call to the get_online_cpus()/put_online_cpus() pair.
>  > - * Must not be called holding callback_mutex, because we must not
>  > - * call get_online_cpus() while holding callback_mutex.  Elsewhere
>  > - * the kernel nests callback_mutex inside get_online_cpus() calls.
>  > - * So the reverse nesting would risk an ABBA deadlock.
>  > + * Must be called with cgroup_lock held.
>  >   *
>  >   * The three key local variables below are:
>  >   *    q  - a linked-list queue of cpuset pointers, used to implement a
>  > @@ -588,10 +574,10 @@ update_domain_attr_tree(struct sched_domain_attr *dattr, struct cpuset *c)
>  >   *   element of the partition (one sched domain) to be passed to
>  >   *   partition_sched_domains().
>  >   */
>  > -
>  > -void rebuild_sched_domains(void)
>  > +static int generate_sched_domains(cpumask_t **domains,
>  > +                     struct sched_domain_attr **attributes)
>  >  {
>  > -     LIST_HEAD(q);           /* queue of cpusets to be scanned*/
>  > +     LIST_HEAD(q);           /* queue of cpusets to be scanned */
>  >       struct cpuset *cp;      /* scans q */
>  >       struct cpuset **csa;    /* array of all cpuset ptrs */
>  >       int csn;                /* how many cpuset ptrs in csa so far */
>  > @@ -601,23 +587,26 @@ void rebuild_sched_domains(void)
>  >       int ndoms;              /* number of sched domains in result */
>  >       int nslot;              /* next empty doms[] cpumask_t slot */
>  >
>  > -     csa = NULL;
>  > +     ndoms = 0;
>  >       doms = NULL;
>  >       dattr = NULL;
>  > +     csa = NULL;
>  >
>  >       /* Special case for the 99% of systems with one, full, sched domain */
>  >       if (is_sched_load_balance(&top_cpuset)) {
>  > -             ndoms = 1;
>  >               doms = kmalloc(sizeof(cpumask_t), GFP_KERNEL);
>  >               if (!doms)
>  > -                     goto rebuild;
>  > +                     goto done;
>  > +
>  >               dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
>  >               if (dattr) {
>  >                       *dattr = SD_ATTR_INIT;
>  >                       update_domain_attr_tree(dattr, &top_cpuset);
>  >               }
>  >               *doms = top_cpuset.cpus_allowed;
>  > -             goto rebuild;
>  > +
>  > +             ndoms = 1;
>  > +             goto done;
>  >       }
>  >
>  >       csa = kmalloc(number_of_cpusets * sizeof(cp), GFP_KERNEL);
>  > @@ -680,61 +669,141 @@ restart:
>  >               }
>  >       }
>  >
>  > -     /* Convert <csn, csa> to <ndoms, doms> */
>  > +     /*
>  > +      * Now we know how many domains to create.
>  > +      * Convert <csn, csa> to <ndoms, doms> and populate cpu masks.
>  > +      */
>  >       doms = kmalloc(ndoms * sizeof(cpumask_t), GFP_KERNEL);
>  > -     if (!doms)
>  > -             goto rebuild;
>  > +     if (!doms) {
>  > +             ndoms = 0;
>  > +             goto done;
>  > +     }
>  > +
>  > +     /*
>  > +      * The rest of the code, including the scheduler, can deal with
>  > +      * dattr==NULL case. No need to abort if alloc fails.
>  > +      */
>  >       dattr = kmalloc(ndoms * sizeof(struct sched_domain_attr), GFP_KERNEL);
>  >
>  >       for (nslot = 0, i = 0; i < csn; i++) {
>  >               struct cpuset *a = csa[i];
>  > +             cpumask_t *dp;
>  >               int apn = a->pn;
>  >
>  > -             if (apn >= 0) {
>  > -                     cpumask_t *dp = doms + nslot;
>  > -
>  > -                     if (nslot == ndoms) {
>  > -                             static int warnings = 10;
>  > -                             if (warnings) {
>  > -                                     printk(KERN_WARNING
>  > -                                      "rebuild_sched_domains confused:"
>  > -                                       " nslot %d, ndoms %d, csn %d, i %d,"
>  > -                                       " apn %d\n",
>  > -                                       nslot, ndoms, csn, i, apn);
>  > -                                     warnings--;
>  > -                             }
>  > -                             continue;
>  > +             if (apn < 0) {
>  > +                     /* Skip completed partitions */
>  > +                     continue;
>  > +             }
>  > +
>  > +             dp = doms + nslot;
>  > +
>  > +             if (nslot == ndoms) {
>  > +                     static int warnings = 10;
>  > +                     if (warnings) {
>  > +                             printk(KERN_WARNING
>  > +                              "rebuild_sched_domains confused:"
>  > +                               " nslot %d, ndoms %d, csn %d, i %d,"
>  > +                               " apn %d\n",
>  > +                               nslot, ndoms, csn, i, apn);
>  > +                             warnings--;
>  >                       }
>  > +                     continue;
>  > +             }
>  >
>  > -                     cpus_clear(*dp);
>  > -                     if (dattr)
>  > -                             *(dattr + nslot) = SD_ATTR_INIT;
>  > -                     for (j = i; j < csn; j++) {
>  > -                             struct cpuset *b = csa[j];
>  > -
>  > -                             if (apn == b->pn) {
>  > -                                     cpus_or(*dp, *dp, b->cpus_allowed);
>  > -                                     b->pn = -1;
>  > -                                     if (dattr)
>  > -                                             update_domain_attr_tree(dattr
>  > -                                                                + nslot, b);
>  > -                             }
>  > +             cpus_clear(*dp);
>  > +             if (dattr)
>  > +                     *(dattr + nslot) = SD_ATTR_INIT;
>  > +             for (j = i; j < csn; j++) {
>  > +                     struct cpuset *b = csa[j];
>  > +
>  > +                     if (apn == b->pn) {
>  > +                             cpus_or(*dp, *dp, b->cpus_allowed);
>  > +                             if (dattr)
>  > +                                     update_domain_attr_tree(dattr + nslot, b);
>  > +
>  > +                             /* Done with this partition */
>  > +                             b->pn = -1;
>  >                       }
>  > -                     nslot++;
>  >               }
>  > +             nslot++;
>  >       }
>  >       BUG_ON(nslot != ndoms);
>  >
>  > -rebuild:
>  > -     /* Have scheduler rebuild sched domains */
>  > +done:
>  > +     kfree(csa);
>  > +
>  > +     *domains    = doms;
>  > +     *attributes = dattr;
>  > +     return ndoms;
>  > +}
>  > +
>  > +/*
>  > + * Rebuild scheduler domains.
>  > + *
>  > + * Call with neither cgroup_mutex held nor within get_online_cpus().
>  > + * Takes both cgroup_mutex and get_online_cpus().
>  > + *
>  > + * Cannot be directly called from cpuset code handling changes
>  > + * to the cpuset pseudo-filesystem, because it cannot be called
>  > + * from code that already holds cgroup_mutex.
>  > + */
>  > +static void do_rebuild_sched_domains(struct work_struct *unused)
>  > +{
>  > +     struct sched_domain_attr *attr;
>  > +     cpumask_t *doms;
>  > +     int ndoms;
>  > +
>  >       get_online_cpus();
>  > -     partition_sched_domains(ndoms, doms, dattr);
>  > +
>  > +     /* Generate domain masks and attrs */
>  > +     cgroup_lock();
>  > +     ndoms = generate_sched_domains(&doms, &attr);
>  > +     cgroup_unlock();
>  > +
>  > +     /* Have scheduler rebuild the domains */
>  > +     partition_sched_domains(ndoms, doms, attr);
>  > +
>  >       put_online_cpus();
>  > +}
>  >
>  > -done:
>  > -     kfree(csa);
>  > -     /* Don't kfree(doms) -- partition_sched_domains() does that. */
>  > -     /* Don't kfree(dattr) -- partition_sched_domains() does that. */
>  > +static DECLARE_WORK(rebuild_sched_domains_work, do_rebuild_sched_domains);
>  > +
>  > +/*
>  > + * Rebuild scheduler domains, asynchronously via workqueue.
>  > + *
>  > + * If the flag 'sched_load_balance' of any cpuset with non-empty
>  > + * 'cpus' changes, or if the 'cpus' allowed changes in any cpuset
>  > + * which has that flag enabled, or if any cpuset with a non-empty
>  > + * 'cpus' is removed, then call this routine to rebuild the
>  > + * scheduler's dynamic sched domains.
>  > + *
>  > + * The rebuild_sched_domains() and partition_sched_domains()
>  > + * routines must nest cgroup_lock() inside get_online_cpus(),
>  > + * but such cpuset changes as these must nest that locking the
>  > + * other way, holding cgroup_lock() for much of the code.
>  > + *
>  > + * So in order to avoid an ABBA deadlock, the cpuset code handling
>  > + * these user changes delegates the actual sched domain rebuilding
>  > + * to a separate workqueue thread, which ends up processing the
>  > + * above do_rebuild_sched_domains() function.
>  > + */
>  > +static void async_rebuild_sched_domains(void)
>  > +{
>  > +     schedule_work(&rebuild_sched_domains_work);
>  > +}
>  > +
>  > +/*
>  > + * Accomplishes the same scheduler domain rebuild as the above
>  > + * async_rebuild_sched_domains(), however it directly calls the
>  > + * rebuild routine synchronously rather than calling it via an
>  > + * asynchronous work thread.
>  > + *
>  > + * This can only be called from code that is not holding
>  > + * cgroup_mutex (not nested in a cgroup_lock() call.)
>  > + */
>  > +void rebuild_sched_domains(void)
>  > +{
>  > +     do_rebuild_sched_domains(NULL);
>  >  }
>  >
>  >  /**
>  > @@ -863,7 +932,7 @@ static int update_cpumask(struct cpuset *cs, const char *buf)
>  >               return retval;
>  >
>  >       if (is_load_balanced)
>  > -             rebuild_sched_domains();
>  > +             async_rebuild_sched_domains();
>  >       return 0;
>  >  }
>  >
>  > @@ -1090,7 +1159,7 @@ static int update_relax_domain_level(struct cpuset *cs, s64 val)
>  >       if (val != cs->relax_domain_level) {
>  >               cs->relax_domain_level = val;
>  >               if (!cpus_empty(cs->cpus_allowed) && is_sched_load_balance(cs))
>  > -                     rebuild_sched_domains();
>  > +                     async_rebuild_sched_domains();
>  >       }
>  >
>  >       return 0;
>  > @@ -1131,7 +1200,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
>  >       mutex_unlock(&callback_mutex);
>  >
>  >       if (cpus_nonempty && balance_flag_changed)
>  > -             rebuild_sched_domains();
>  > +             async_rebuild_sched_domains();
>  >
>  >       return 0;
>  >  }
>  > @@ -1492,6 +1561,9 @@ static u64 cpuset_read_u64(struct cgroup *cont, struct cftype *cft)
>  >       default:
>  >               BUG();
>  >       }
>  > +
>  > +     /* Unreachable but makes gcc happy */
>  > +     return 0;
>  >  }
>  >
>  >  static s64 cpuset_read_s64(struct cgroup *cont, struct cftype *cft)
>  > @@ -1504,6 +1576,9 @@ static s64 cpuset_read_s64(struct cgroup *cont, struct cftype *cft)
>  >       default:
>  >               BUG();
>  >       }
>  > +
>  > +     /* Unrechable but makes gcc happy */
>  > +     return 0;
>  >  }
>  >
>  >
>  > @@ -1692,15 +1767,9 @@ static struct cgroup_subsys_state *cpuset_create(
>  >  }
>  >
>  >  /*
>  > - * Locking note on the strange update_flag() call below:
>  > - *
>  >   * If the cpuset being removed has its flag 'sched_load_balance'
>  >   * enabled, then simulate turning sched_load_balance off, which
>  > - * will call rebuild_sched_domains().  The get_online_cpus()
>  > - * call in rebuild_sched_domains() must not be made while holding
>  > - * callback_mutex.  Elsewhere the kernel nests callback_mutex inside
>  > - * get_online_cpus() calls.  So the reverse nesting would risk an
>  > - * ABBA deadlock.
>  > + * will call async_rebuild_sched_domains().
>  >   */
>  >
>  >  static void cpuset_destroy(struct cgroup_subsys *ss, struct cgroup *cont)
>  > @@ -1719,7 +1788,7 @@ static void cpuset_destroy(struct cgroup_subsys *ss, struct cgroup *cont)
>  >  struct cgroup_subsys cpuset_subsys = {
>  >       .name = "cpuset",
>  >       .create = cpuset_create,
>  > -     .destroy  = cpuset_destroy,
>  > +     .destroy = cpuset_destroy,
>  >       .can_attach = cpuset_can_attach,
>  >       .attach = cpuset_attach,
>  >       .populate = cpuset_populate,
>  > @@ -1811,7 +1880,7 @@ static void move_member_tasks_to_cpuset(struct cpuset *from, struct cpuset *to)
>  >  }
>  >
>  >  /*
>  > - * If common_cpu_mem_hotplug_unplug(), below, unplugs any CPUs
>  > + * If CPU and/or memory hotplug handlers, below, unplug any CPUs
>  >   * or memory nodes, we need to walk over the cpuset hierarchy,
>  >   * removing that CPU or node from all cpusets.  If this removes the
>  >   * last CPU or node from a cpuset, then move the tasks in the empty
>  > @@ -1903,35 +1972,6 @@ static void scan_for_empty_cpusets(const struct cpuset *root)
>  >  }
>  >
>  >  /*
>  > - * The cpus_allowed and mems_allowed nodemasks in the top_cpuset track
>  > - * cpu_online_map and node_states[N_HIGH_MEMORY].  Force the top cpuset to
>  > - * track what's online after any CPU or memory node hotplug or unplug event.
>  > - *
>  > - * Since there are two callers of this routine, one for CPU hotplug
>  > - * events and one for memory node hotplug events, we could have coded
>  > - * two separate routines here.  We code it as a single common routine
>  > - * in order to minimize text size.
>  > - */
>  > -
>  > -static void common_cpu_mem_hotplug_unplug(int rebuild_sd)
>  > -{
>  > -     cgroup_lock();
>  > -
>  > -     top_cpuset.cpus_allowed = cpu_online_map;
>  > -     top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
>  > -     scan_for_empty_cpusets(&top_cpuset);
>  > -
>  > -     /*
>  > -      * Scheduler destroys domains on hotplug events.
>  > -      * Rebuild them based on the current settings.
>  > -      */
>  > -     if (rebuild_sd)
>  > -             rebuild_sched_domains();
>  > -
>  > -     cgroup_unlock();
>  > -}
>  > -
>  > -/*
>  >   * The top_cpuset tracks what CPUs and Memory Nodes are online,
>  >   * period.  This is necessary in order to make cpusets transparent
>  >   * (of no affect) on systems that are actively using CPU hotplug
>  > @@ -1939,40 +1979,52 @@ static void common_cpu_mem_hotplug_unplug(int rebuild_sd)
>  >   *
>  >   * This routine ensures that top_cpuset.cpus_allowed tracks
>  >   * cpu_online_map on each CPU hotplug (cpuhp) event.
>  > + *
>  > + * Called within get_online_cpus().  Needs to call cgroup_lock()
>  > + * before calling generate_sched_domains().
>  >   */
>  > -
>  > -static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
>  > +static int cpuset_track_online_cpus(struct notifier_block *unused_nb,
>  >                               unsigned long phase, void *unused_cpu)
>  >  {
>  > +     struct sched_domain_attr *attr;
>  > +     cpumask_t *doms;
>  > +     int ndoms;
>  > +
>  >       switch (phase) {
>  > -     case CPU_UP_CANCELED:
>  > -     case CPU_UP_CANCELED_FROZEN:
>  > -     case CPU_DOWN_FAILED:
>  > -     case CPU_DOWN_FAILED_FROZEN:
>  >       case CPU_ONLINE:
>  >       case CPU_ONLINE_FROZEN:
>  >       case CPU_DEAD:
>  >       case CPU_DEAD_FROZEN:
>  > -             common_cpu_mem_hotplug_unplug(1);
>  >               break;
>  > +
>  >       default:
>  >               return NOTIFY_DONE;
>  >       }
>  >
>  > +     cgroup_lock();
>  > +     top_cpuset.cpus_allowed = cpu_online_map;
>  > +     scan_for_empty_cpusets(&top_cpuset);
>  > +     ndoms = generate_sched_domains(&doms, &attr);
>  > +     cgroup_unlock();
>  > +
>  > +     /* Have scheduler rebuild the domains */
>  > +     partition_sched_domains(ndoms, doms, attr);
>  > +
>  >       return NOTIFY_OK;
>  >  }
>  >
>  >  #ifdef CONFIG_MEMORY_HOTPLUG
>  >  /*
>  >   * Keep top_cpuset.mems_allowed tracking node_states[N_HIGH_MEMORY].
>  > - * Call this routine anytime after you change
>  > - * node_states[N_HIGH_MEMORY].
>  > - * See also the previous routine cpuset_handle_cpuhp().
>  > + * Call this routine anytime after node_states[N_HIGH_MEMORY] changes.
>  > + * See also the previous routine cpuset_track_online_cpus().
>  >   */
>  > -
>  >  void cpuset_track_online_nodes(void)
>  >  {
>  > -     common_cpu_mem_hotplug_unplug(0);
>  > +     cgroup_lock();
>  > +     top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
>  > +     scan_for_empty_cpusets(&top_cpuset);
>  > +     cgroup_unlock();
>  >  }
>  >  #endif
>  >
>  > @@ -1987,7 +2039,7 @@ void __init cpuset_init_smp(void)
>  >       top_cpuset.cpus_allowed = cpu_online_map;
>  >       top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
>  >
>  > -     hotcpu_notifier(cpuset_handle_cpuhp, 0);
>  > +     hotcpu_notifier(cpuset_track_online_cpus, 0);
>  >  }
>  >
   One more thing Max, while you are allocating memory for "dattr"  I
think it  needs to check that it is successful or not . I've shown it
on one of my previous patch on 7th
Aug.
>  >  /**
>  --
>
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>  the body of a message to majordomo@vger.kernel.org
>  More majordomo info at  http://vger.kernel.org/majordomo-info.html
>  Please read the FAQ at  http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/