{ "cells": [ { "cell_type": "markdown", "id": "cloudy-garage", "metadata": { "tags": [] }, "source": [ "# Comparing threading and multiprocessing" ] }, { "cell_type": "code", "execution_count": 1, "id": "radical-fisher", "metadata": { "tags": [] }, "outputs": [], "source": [ "import time\n", "\n", "import multiprocessing as mp\n", "from multiprocessing import Pool as ProcessPool\n", "from multiprocessing.pool import ThreadPool\n", "\n", "import pandas as pd\n", "\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 2, "id": "crazy-organic", "metadata": { "tags": [] }, "outputs": [], "source": [ "sns.set_context('talk')" ] }, { "cell_type": "markdown", "id": "lucky-sudan", "metadata": { "tags": [] }, "source": [ "## Introduction\n", "\n", "There are various choices when trying to run code in parallel.\n", "\n", "The `threading` module will run all threads on the same CPU core which requires less overhead and allows for more efficient sharing of memory. However, it is not truly parallel and executes the threads when others are idling.\n", "\n", "The `multiprocessing` module runs the processes on multiple CPU cores and can thus execute code at the same time." ] }, { "cell_type": "markdown", "id": "north-final", "metadata": { "tags": [] }, "source": [ "## Preparations" ] }, { "cell_type": "markdown", "id": "considered-length", "metadata": { "tags": [] }, "source": [ "To investigate the differences between threading and multiprocessing,\n", "we will simulate work for each data point and measure when it was executed.\n", "\n", "Due to some [design choices](https://bugs.python.org/issue25053), we have to import the worker function from a separate module for the `ProcessPool` to work." ] }, { "cell_type": "code", "execution_count": 3, "id": "ruled-exclusive", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting worker.py\n" ] } ], "source": [ "%%writefile worker.py\n", "\n", "import time\n", "\n", "\n", "def worker(data):\n", " tmp = []\n", " for i in data:\n", " # simulate CPU load\n", " for _ in range(1_000_000):\n", " pass\n", "\n", " # store execution time\n", " tmp.append(time.time())\n", " return tmp " ] }, { "cell_type": "code", "execution_count": 4, "id": "immediate-white", "metadata": { "tags": [] }, "outputs": [], "source": [ "from worker import worker" ] }, { "cell_type": "code", "execution_count": 5, "id": "cutting-freeware", "metadata": { "tags": [] }, "outputs": [], "source": [ "data = list(range(20))\n", "num = mp.cpu_count()" ] }, { "cell_type": "markdown", "id": "horizontal-arnold", "metadata": { "tags": [] }, "source": [ "## Computations" ] }, { "cell_type": "markdown", "id": "valid-header", "metadata": { "tags": [] }, "source": [ "We run the `worker` function on the dataset for each executor in both the process and thread pool." ] }, { "cell_type": "code", "execution_count": 6, "id": "laden-albania", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 3.79 s, sys: 131 ms, total: 3.92 s\n", "Wall time: 3.86 s\n" ] } ], "source": [ "%%time\n", "with ThreadPool(num) as p:\n", " thread_result = p.map(worker, [data] * num)" ] }, { "cell_type": "code", "execution_count": 7, "id": "committed-volunteer", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 26 ms, sys: 56.7 ms, total: 82.7 ms\n", "Wall time: 1.4 s\n" ] } ], "source": [ "%%time\n", "with ProcessPool(num) as p:\n", " process_result = p.map(worker, [data] * num)" ] }, { "cell_type": "markdown", "id": "convertible-slovakia", "metadata": { "tags": [] }, "source": [ "Next, we store the result in a dataframe." ] }, { "cell_type": "code", "execution_count": 8, "id": "brilliant-concord", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", " | variable | \n", "value | \n", "type | \n", "
---|---|---|---|
0 | \n", "job 0 | \n", "0.105189 | \n", "thread | \n", "
1 | \n", "job 0 | \n", "0.316431 | \n", "thread | \n", "
2 | \n", "job 0 | \n", "0.592710 | \n", "thread | \n", "
3 | \n", "job 0 | \n", "0.704985 | \n", "thread | \n", "
4 | \n", "job 0 | \n", "0.822085 | \n", "thread | \n", "