{ "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1-final" }, "orig_nbformat": 2, "kernelspec": { "name": "python3", "display_name": "Python 3", "language": "python" } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ { "source": [ "# Anonymization Primitives aka Diffix Toolbox\n", "\n", "This notebook will showcase current features of the Diffix PostgreSQL extension.\n", "\n", "The functionality presented below is the raw interface to Diffix, intended only for people who know what they're doing.\n", "I like to think of it as a \"Diffix Level 0\" or \"Diffix Toolbox\".\n", "\n", "To run this notebook locally you need `psycopg2`, `ipython-sql`, and of course a PostgreSQL database with the extension installed.\n", "\n", "Let's connect to a database that has `pg_diffix` installed." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Connected to database.\n" ] } ], "source": [ "%load_ext sql\n", "%sql postgresql://edon:secret@localhost:5432/diffixdb\n", "%config SqlMagic.displaycon = False # No connection string in output.\n", "%config SqlMagic.feedback = False # No Done, rows affected.\n", "# Make sure pg_diffix is installed and loaded.\n", "%sql load 'pg_diffix'\n", "print('Connected to database.')" ] }, { "source": [ "## Setting up the dataset\n", "\n", "Let's create a table `purchases` with some columns." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[]" ] }, "metadata": {}, "execution_count": 2 } ], "source": [ "%%sql\n", "DROP TABLE IF EXISTS purchases;\n", "CREATE TABLE purchases (\n", " uid INTEGER,\n", " product TEXT,\n", " brand TEXT,\n", " price FLOAT\n", ");" ] }, { "source": [ "We'll use this helper function to insert many copies of a row at once." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def insert(how_many, row):\n", " uid, product, brand, price = row\n", " for i in range(how_many):\n", " %sql \\\n", " INSERT INTO purchases (uid, product, brand, price) \\\n", " VALUES (:uid, :product, :brand, :price)" ] }, { "source": [ "Let's insert some data." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Done\n" ] } ], "source": [ "# User 1\n", "insert(3, [ 1, 'phone', 'apple', 1000 ])\n", "insert(1, [ 1, 'laptop', 'apple', 2000 ])\n", "insert(1, [ 1, 'chair', 'ikea', 50 ])\n", "\n", "# User 2\n", "insert(1, [ 2, 'phone', 'samsung', 600 ])\n", "insert(2, [ 2, 'laptop', 'dell', 1000 ])\n", "insert(1, [ 2, 'chair', 'ikea', 50 ])\n", "\n", "# User 3\n", "insert(1, [ 3, 'phone', 'apple', 900 ])\n", "insert(7, [ 3, 'phone', 'samsung', 400 ])\n", "insert(1, [ 3, 'chair', 'ikea', 50 ])\n", "\n", "# User 4\n", "insert(2, [ 4, 'phone', 'samsung', 500 ])\n", "insert(1, [ 4, 'laptop', 'hp', 1000 ])\n", "insert(1, [ 4, 'chair', 'ikea', 50 ])\n", "\n", "# User 5\n", "insert(1, [ 5, 'phone', 'apple', 1000 ])\n", "insert(1, [ 5, 'laptop', 'hp', 1000 ])\n", "insert(1, [ 5, 'chair', 'ikea', 50 ])\n", "insert(1, [ 5, 'candy', 'milka', 1 ])\n", "\n", "# User 6\n", "insert(2, [ 6, 'phone', 'apple', 800 ])\n", "insert(1, [ 6, 'laptop', 'asus', 1000 ])\n", "insert(4, [ 6, 'candy', 'milka', 1 ])\n", "insert(2, [ 6, 'candy', 'kitkat', 1 ])\n", "insert(1, [ 6, 'candy', 'snickers', 1 ])\n", "insert(2, [ 6, 'candy', 'mars', 1 ])\n", "insert(1, [ 6, 'candy', 'm&m', 1 ])\n", "\n", "# User 7\n", "insert(1, [ 7, 'phone', 'samsung', 500 ])\n", "insert(1, [ 7, 'laptop', 'hp', 1000 ])\n", "insert(1, [ 7, 'chair', 'ikea', 50 ])\n", "insert(1, [ 7, 'rocket', 'spacex', 1e9 ])\n", "\n", "print('Done')" ] }, { "source": [ "Let's get an overview of products and how many times they have been purchased.\n", "No anonymization yet - these are regular counts." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[('candy', 11, 2),\n", " ('chair', 6, 6),\n", " ('laptop', 7, 6),\n", " ('phone', 18, 7),\n", " ('rocket', 1, 1)]" ], "text/html": "
product | \nsales | \nunique_buyers | \n
---|---|---|
candy | \n11 | \n2 | \n
chair | \n6 | \n6 | \n
laptop | \n7 | \n6 | \n
phone | \n18 | \n7 | \n
rocket | \n1 | \n1 | \n
product | \ntrue_count | \ndiffix_count | \n
---|---|---|
laptop | \n7 | \n7 | \n
chair | \n6 | \n5 | \n
phone | \n18 | \n12 | \n
candy | \n11 | \n3 | \n
rocket | \n1 | \n2 | \n
product | \nexplain_diffix_count | \n
---|---|
laptop | \nuniq=6, seed=e02c556f5e0f top=[2➔2, 1➔1 | 4➔1, 5➔1, 6➔1, 7➔1] true=7, flat=6, final=7 | \n
chair | \nuniq=6, seed=d6ce455f4e08 top=[1➔1, 2➔1 | 3➔1, 4➔1, 5➔1, 7➔1] true=6, flat=6, final=5 | \n
phone | \nuniq=7, seed=c33a827d3c7e top=[3➔8, 1➔3 | 4➔2, 6➔2, 2➔1, 5➔1, 7➔1] true=18, flat=10, final=12 | \n
candy | \nuniq=2, seed=f7f7a7d92a5a top=[6➔10 | 5➔1] true=11, flat=2, final=3 | \n
rocket | \nuniq=1, seed=842059fe8157 top=[7➔1] true=1, flat=0, final=2 | \n
product | \ndiffix_count | \n
---|---|
laptop | \n7 | \n
chair | \n5 | \n
phone | \n12 | \n
product | \nexplain_diffix_lcf | \n
---|---|
laptop | \nuniq=6, seed=fbd63230ac41 thresh=4, pass=true | \n
chair | \nuniq=6, seed=630013b163d5 thresh=5, pass=true | \n
phone | \nuniq=7, seed=9e8c7f3f93c7 thresh=4, pass=true | \n
candy | \nuniq=2, seed=ebb4a2ec3ca5 thresh=5, pass=false | \n
rocket | \nuniq=1, seed=bff99cd5977f thresh=5, pass=false | \n
suppressed | \n
---|
2 | \n
product | \nbrands | \n
---|---|
candy | \nkitkat, mars, milka, m&m, snickers | \n
chair | \nikea | \n
laptop | \napple, asus, dell, hp | \n
phone | \napple, samsung | \n
rocket | \nspacex | \n
product | \ndiffix_count | \n
---|---|
laptop | \n4 | \n
chair | \n2 | \n
phone | \n2 | \n
candy | \n9 | \n
rocket | \n2 | \n
product | \ntrue_count | \nby_uid | \nby_brand | \n
---|---|---|---|
laptop | \n7 | \n7 | \n4 | \n
chair | \n6 | \n5 | \n2 | \n
phone | \n18 | \n12 | \n2 | \n
candy | \n11 | \n3 | \n9 | \n
rocket | \n1 | \n2 | \n2 | \n
product | \nby_uid | \nby_brand | \n
---|---|---|
laptop | \nuniq=6, seed=e02c556f5e0f top=[2➔2, 1➔1 | 4➔1, 5➔1, 6➔1, 7➔1] true=7, flat=6, final=7 | \n uniq=4, seed=097d1f3ed09b top=[hp➔3, dell➔2 | apple➔1, asus➔1] true=7, flat=4, final=4 | \n
chair | \nuniq=6, seed=d6ce455f4e08 top=[1➔1, 2➔1 | 3➔1, 4➔1, 5➔1, 7➔1] true=6, flat=6, final=5 | \n uniq=1, seed=741a3c802039 top=[ikea➔6] true=6, flat=0, final=2 | \n
phone | \nuniq=7, seed=c33a827d3c7e top=[3➔8, 1➔3 | 4➔2, 6➔2, 2➔1, 5➔1, 7➔1] true=18, flat=10, final=12 | \n uniq=2, seed=9c4771a1c67f top=[samsung➔11, apple➔7 | ] true=18, flat=0, final=2 | \n
candy | \nuniq=2, seed=f7f7a7d92a5a top=[6➔10 | 5➔1] true=11, flat=2, final=3 | \n uniq=5, seed=ccbd9692e7da top=[milka➔5 | kitkat➔2, mars➔2, snickers➔1, m&m➔1] true=11, flat=8, final=9 | \n
rocket | \nuniq=1, seed=842059fe8157 top=[7➔1] true=1, flat=0, final=2 | \n uniq=1, seed=c8d7aed4b124 top=[spacex➔1 | ] true=1, flat=0, final=2 | \n
product | \nby_uid | \nby_brand | \n
---|---|---|
None | \nuniq=7, seed=c33a827d3c7e top=[6➔13, 3➔9 | 1➔5, 2➔4, 4➔4, 5➔4, 7➔4] true=43, flat=29, final=31 | \n uniq=12, seed=6f888ae1c421 top=[samsung➔11 | apple➔8, ikea➔6, milka➔5, hp➔3, dell➔2] true=43, flat=37, final=37 | \n
laptop | \nuniq=6, seed=e02c556f5e0f top=[2➔2, 1➔1 | 4➔1, 5➔1, 6➔1, 7➔1] true=7, flat=6, final=7 | \n uniq=4, seed=097d1f3ed09b top=[hp➔3, dell➔2 | apple➔1, asus➔1] true=7, flat=4, final=4 | \n
chair | \nuniq=6, seed=d6ce455f4e08 top=[1➔1, 2➔1 | 3➔1, 4➔1, 5➔1, 7➔1] true=6, flat=6, final=5 | \n uniq=1, seed=741a3c802039 top=[ikea➔6] true=6, flat=0, final=2 | \n
phone | \nuniq=7, seed=c33a827d3c7e top=[3➔8, 1➔3 | 4➔2, 6➔2, 2➔1, 5➔1, 7➔1] true=18, flat=10, final=12 | \n uniq=2, seed=9c4771a1c67f top=[samsung➔11, apple➔7 | ] true=18, flat=0, final=2 | \n
candy | \nuniq=2, seed=f7f7a7d92a5a top=[6➔10 | 5➔1] true=11, flat=2, final=3 | \n uniq=5, seed=ccbd9692e7da top=[milka➔5 | kitkat➔2, mars➔2, snickers➔1, m&m➔1] true=11, flat=8, final=9 | \n
rocket | \nuniq=1, seed=842059fe8157 top=[7➔1] true=1, flat=0, final=2 | \n uniq=1, seed=c8d7aed4b124 top=[spacex➔1 | ] true=1, flat=0, final=2 | \n
product | \nanon_count | \n
---|---|
laptop | \n4 | \n
phone | \n2 | \n
true_count | \nanon_count | \n
---|---|
43 | \n31 | \n
pg_diffix.noise_seed | \n
---|
diffix | \n
explain_diffix_lcf | \n
---|
uniq=7, seed=307da96580c4 thresh=3, pass=true | \n
explain_diffix_lcf | \n
---|
uniq=7, seed=e41c45431789 thresh=4, pass=true | \n
pg_diffix.noise_seed | \n
---|
diffix | \n
name | \nsetting | \nshort_desc | \n
---|---|---|
pg_diffix.low_count_threshold_max | \n5 | \nMaximum low count threshold (inclusive). | \n
pg_diffix.low_count_threshold_min | \n2 | \nMinimum low count threshold (inclusive). | \n
pg_diffix.noise_seed | \ndiffix | \nSeed used for initializing noise layers. | \n
pg_diffix.noise_sigma | \n1 | \nStandard deviation of noise added to aggregates. | \n
pg_diffix.outlier_count_max | \n2 | \nMaximum outlier count (inclusive). | \n
pg_diffix.outlier_count_min | \n1 | \nMinimum outlier count (inclusive). | \n
pg_diffix.top_count_max | \n6 | \nMaximum top contributors count (inclusive). | \n
pg_diffix.top_count_min | \n4 | \nMinimum top contributors count (inclusive). | \n