FOLLY: Thread pools & Executors














































FOLLY: Thread pools & Executors



Thread
pools & Executors



Run your concurrent code in a performant way



 



How do I
use the thread pools? 



Wangle provides two concrete thread pools as well as
building them in as part of a complete async framework. Generally, you might
want to grab the global executor, and use it with a future, like this:



auto f =
someFutureFunction().via(getCPUExecutor()).then(...)



Or maybe you need to construct a thrift/memcache client, and
need an event base:



auto f =
getClient(getIOExecutor()->getEventBase())->callSomeFunction(args...)



        
.via(getCPUExecutor())



        
.then([](Result r){ .... do something with result});



vs. C++11's std::launch 



The current C++11 std::launch only has two modes: async or
deferred. In a production system, neither is what you want: async will launch a
new thread for every launch without limit, while deferred will defer the work
until it is needed lazily, but then do the work in the current thread
synchronously
 when it is needed.



Wangle's thread pools always launch work as soon as
possible, have limits to the maximum number of tasks / threads allowed, so we
will never use more threads than absolutely needed.



Why do we need yet another set of thread pools? 



Unfortunately none of the existing thread pools had every
feature needed - things based on pipes are too slow. Several older ones didn't
support std::function.



Why do we need several different types of thread
pools? 



If you want epoll support, you need an fd - event_fd is the
latest notification hotness. Unfortunately, an active fd triggers all the epoll
loops it is in, leading to thundering herd - so if you want a fair queue you
need to use some kind of semaphore. Unfortunately semaphores can't be put in
epoll loops, so they are incompatible with IO. Fortunately, you usually want to
separate the IO and CPU bound work anyway to give stronger tail latency
guarantees on IO.



IOThreadPoolExecutor 


  • Uses event_fd for notification, and waking an epoll loop.

  • There is one queue per thread/epoll.

  • If the thread is already running and not waiting on epoll, we don't make any
    additional syscalls to wake up the loop, just put the new task in the
    queue.

  • If any thread has been waiting for more than a few seconds, its stack is
    madvised away. Currently however tasks are scheduled round robin on the
    queues, so unless there is no work going on, this isn't
    very effective.

  • ::getEventBase() will return an EventBase you can schedule IO work on directly, chosen
    round-robin.

  • Since there is one queue per thread, there is hardly any contention on the
    queues - so a simple spinlock around an std::deque is used for the tasks.
    There is no max queue size.



CPUThreadPoolExecutor 




  • A single queue backed by folly/LifoSem and folly/MPMC queue. Since there is
    only a single queue, contention can be quite high, since all the worker
    threads and all the producer threads hit the same queue. MPMC queue excels
    in this situation. MPMC queue dictates a max queue size.

  • LifoSem wakes up threads in Lifo order

  • Inactive threads have their stack madvised away. This works quite well in
    combination with Lifosem - it almost doesn't matter if more threads than
    are necessary are specified at startup.

  • stop() will finish all outstanding tasks at exit

  • Supports priorities - priorities are implemented as multiple queues - each worker
    thread checks the highest priority queue first. Threads themselves don't
    have priorities set.



ThreadPoolExecutor 

Base class that contains the thread startup/shutdown/stats
logic



Observers 

An observer interface is provided to listen for thread
start/stop events. This is useful to create objects that should be
one-per-thread. If threads are added/removed from the thread pool it also have
them work correctly.



Stats 

PoolStats are provided to get task count, running time, etc.


Comments