{ "cells": [ { "cell_type": "markdown", "id": "behind-maine", "metadata": { "tags": [] }, "source": [ "# The Python Almanac\n", "\n", "The world of Python packages is adventurous and can be confusing at times.\n", "Here, I try to aggregate and showcase a diverse set of Python packages which have become useful at some point." ] }, { "cell_type": "markdown", "id": "outstanding-packing", "metadata": { "tags": [] }, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "id": "lasting-scott", "metadata": { "tags": [] }, "source": [ "### Installing Python\n", "\n", "Normally you should use your system's package manager.\n", "In case of problems, try [pyenv](https://github.com/pyenv/pyenv):\n", "\n", "```bash\n", "$ pyenv versions\n", "$ pyenv install \n", "$ pyenv global \n", "```\n", "\n", "This will install the specified Python version to `$(pyenv root)/versions`." ] }, { "cell_type": "markdown", "id": "radical-temple", "metadata": { "tags": [] }, "source": [ "### Installing packages\n", "\n", "Python packages can be easily installed from [PyPI (Python Package Index)](https://pypi.org/):\n", "\n", "```bash\n", "$ pip install --user (local install does not clash with system packages)\n", "```\n", "\n", "Using `--user` will install the package only for the current user. This is good if multiple users need different package versions, but can lead to redundant installations.\n", "\n", "To install from a git repository, use the following command:\n", "\n", "```bash\n", "$ pip install --user -U git+https://github.com//@\n", "```" ] }, { "cell_type": "markdown", "id": "respiratory-hayes", "metadata": { "tags": [] }, "source": [ "### Package management\n", "\n", "While packages can be installed globally or user-specific, it often makes sense to create project-specific virtual environments.\n", "\n", "This can be easily accomplished using [venv](https://docs.python.org/3/library/venv.html):\n", "\n", "```bash\n", "$ python -m venv my_venv\n", "$ . venv/bin/activate\n", "$ pip install \n", "```" ] }, { "cell_type": "markdown", "id": "recovered-heather", "metadata": { "tags": [] }, "source": [ "## Software development" ] }, { "cell_type": "markdown", "id": "excess-weight", "metadata": { "tags": [] }, "source": [ "### Package Distribution\n", "\n", "Use `setuptools`.\n", "[poetry](https://github.com/sdispater/poetry) handles many otherwise slightly annoying things:\n", "```\n", "$ poetry init/add/install/run/publish\n", "```\n", "\n", "CI encapsulation: [tox](https://github.com/tox-dev/tox).\n", "\n", "Keeping track of version numbers can be achieved using [bump2version](https://github.com/c4urself/bump2version).\n", "\n", "Transform between various project file formats using [dephell](https://github.com/dephell/dephell)." ] }, { "cell_type": "markdown", "id": "descending-portrait", "metadata": { "tags": [] }, "source": [ "### Testing\n", "\n", "Setup testing using [pytest](https://github.com/pytest-dev/pytest). It has a wide range of useful features, such as fixtures (modularized per-test setup code) and test parametrization (quickly execute the same test for multiple inputs)." ] }, { "cell_type": "code", "execution_count": 1, "id": "yellow-decade", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing /tmp/tests.py\n" ] } ], "source": [ "%%writefile /tmp/tests.py\n", "\n", "import os\n", "import pytest\n", "\n", "\n", "@pytest.fixture(scope='session')\n", "def custom_directory(tmp_path_factory):\n", " return tmp_path_factory.mktemp('workflow_test')\n", "\n", "\n", "def test_fixture_execution(custom_directory):\n", " assert os.path.isdir(custom_directory)\n", "\n", "\n", "@pytest.mark.parametrize('expression_str,result', [\n", " ('2+2', 4), ('2*2', 4), ('2**2', 4)\n", "])\n", "def test_expression_evaluation(expression_str, result):\n", " assert eval(expression_str) == result" ] }, { "cell_type": "code", "execution_count": 2, "id": "saved-karma", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m============================= test session starts ==============================\u001b[0m\r\n", "platform darwin -- Python 3.8.6, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- /Users/kimja/.pyenv/versions/3.8.6/bin/python3.8\r\n", "cachedir: .pytest_cache\r\n", "rootdir: /tmp\r\n", "plugins: anyio-2.0.2\r\n", "\u001b[1mcollecting ... \u001b[0m" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m\r", "collected 4 items \u001b[0m\r\n", "\r\n", "../../../../../../../../tmp/tests.py::test_fixture_execution \u001b[32mPASSED\u001b[0m\u001b[32m [ 25%]\u001b[0m\r\n", "../../../../../../../../tmp/tests.py::test_expression_evaluation[2+2-4] \u001b[32mPASSED\u001b[0m\u001b[32m [ 50%]\u001b[0m\r\n", "../../../../../../../../tmp/tests.py::test_expression_evaluation[2*2-4] \u001b[32mPASSED\u001b[0m\u001b[32m [ 75%]\u001b[0m\r\n", "../../../../../../../../tmp/tests.py::test_expression_evaluation[2**2-4] \u001b[32mPASSED\u001b[0m\u001b[32m [100%]\u001b[0m\r\n", "\r\n", "\u001b[32m============================== \u001b[32m\u001b[1m4 passed\u001b[0m\u001b[32m in 0.03s\u001b[0m\u001b[32m ===============================\u001b[0m\r\n" ] } ], "source": [ "!pytest -v /tmp/tests.py" ] }, { "cell_type": "markdown", "id": "rental-berlin", "metadata": { "tags": [] }, "source": [ "### Linting/Formatting\n", "\n", "Linters and code formatters improve the quality of your Python code by conducting a static analysis and flagging issues.\n", "\n", "* [flake8](https://github.com/PyCQA/flake8): Catch various common errors and adhere to PEP8. Supports many [plugins](https://github.com/DmytroLitvinov/awesome-flake8-extensions).\n", "* [pylint](https://github.com/PyCQA/pylint): Looks for even more sources of code smell.\n", "* [black](https://github.com/psf/black): \"*the* uncompromising Python code formatter\".\n", "\n", "While there can be a considerable overlap between the tools' outputs, each offers its own advantages and they can typically be used together." ] }, { "cell_type": "markdown", "id": "foreign-october", "metadata": { "tags": [] }, "source": [ "### Profiling\n", "\n", "Code profiling tools are a great way of finding parts of your code which can be optimized.\n", "They come in various flavors:\n", "\n", "* [line_profiler](https://github.com/pyutils/line_profiler): which parts of the code require most execution time\n", "* [memory_profiler](https://github.com/pythonprofilers/memory_profiler): which parts of the code consume the most memory" ] }, { "cell_type": "markdown", "id": "spiritual-transportation", "metadata": { "tags": [] }, "source": [ "Consider the following script (note the `@profile` decorator):" ] }, { "cell_type": "code", "execution_count": 3, "id": "expanded-wedding", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing /tmp/script.py\n" ] } ], "source": [ "%%writefile /tmp/script.py\n", "\n", "@profile\n", "def main():\n", " # takes a long time\n", " for _ in range(100_000):\n", " 1337**42\n", "\n", " # requires a lot of memory\n", " arr = [1] * 1_000_000\n", " \n", "main()" ] }, { "cell_type": "markdown", "id": "trying-polyester", "metadata": { "tags": [] }, "source": [ "#### line_profiler" ] }, { "cell_type": "code", "execution_count": 4, "id": "pressing-norfolk", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wrote profile results to /tmp/script.py.lprof\r\n", "Timer unit: 1e-06 s\r\n", "\r\n", "Total time: 0.13024 s\r\n", "File: /tmp/script.py\r\n", "Function: main at line 2\r\n", "\r\n", "Line # Hits Time Per Hit % Time Line Contents\r\n", "==============================================================\r\n", " 2 @profile\r\n", " 3 def main():\r\n", " 4 # takes a long time\r\n", " 5 100001 33213.0 0.3 25.5 for _ in range(100_000):\r\n", " 6 100000 93762.0 0.9 72.0 1337**42\r\n", " 7 \r\n", " 8 # requires a lot of memory\r\n", " 9 1 3265.0 3265.0 2.5 arr = [1] * 1_000_000\r\n", "\r\n" ] } ], "source": [ "!kernprof -l -v -o /tmp/script.py.lprof /tmp/script.py" ] }, { "cell_type": "markdown", "id": "adolescent-times", "metadata": { "tags": [] }, "source": [ "#### memory_profiler" ] }, { "cell_type": "code", "execution_count": 5, "id": "associate-cookie", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Filename: /tmp/script.py\r\n", "\r\n", "Line # Mem usage Increment Occurences Line Contents\r\n", "============================================================\r\n", " 2 37.668 MiB 37.668 MiB 1 @profile\r\n", " 3 def main():\r\n", " 4 # takes a long time\r\n", " 5 37.668 MiB 0.000 MiB 100001 for _ in range(100_000):\r\n", " 6 37.668 MiB 0.000 MiB 100000 1337**42\r\n", " 7 \r\n", " 8 # requires a lot of memory\r\n", " 9 45.301 MiB 7.633 MiB 1 arr = [1] * 1_000_000\r\n", "\r\n", "\r\n" ] } ], "source": [ "!python3 -m memory_profiler /tmp/script.py" ] }, { "cell_type": "markdown", "id": "engaged-mexican", "metadata": { "tags": [] }, "source": [ "### Debugging" ] }, { "cell_type": "markdown", "id": "lyric-determination", "metadata": { "tags": [] }, "source": [ "#### Raw python\n", "\n", "[ipdb](https://github.com/gotcha/ipdb) is useful Python commandline debugger.\n", "To invoke it, simply put `import ipdb; ipdb.set_trace()` in your code.\n", "Starting with Python 3.7, you can also write `breakpoint()`. This honors the `PYTHONBREAKPOINT` environment variable.\n", "To automatically start the debugger when an error occurs, run your script with `python -m ipdb -c continue