PostgreSQL Anonymizer 1.0: Privacy By Design For Postgres
================================================================================
Limoges, France, May 17th, 2022
`PostgreSQL Anonymizer` is an extension that hides or replaces personally
identifiable information (PII) or commercially sensitive data from a PostgreSQL
database.
The extension supports 3 different anonymization strategies: [Dynamic Masking],
[Static Masking] and [Anonymous Dumps]. It also offers a large choice of
[Masking Functions] such as Substitution, Randomization, Faking,
Pseudonymization, Partial Scrambling, Shuffling, Noise Addition and
Generalization.
[Masking Functions]: https://postgresql-anonymizer.readthedocs.io/en/latest/masking_functions/
[Anonymous Dumps]: https://postgresql-anonymizer.readthedocs.io/en/latest/anonymous_dumps/
[Static Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/static_masking/
[Dynamic Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/
NOTE: **This release is considered ready for production**
Implementing "Privacy By Design" with PostgreSQL
--------------------------------------------------------------------------------
4 years after the introduction of the GDPR, its application remains complex for
many companies and organizations. In particular, implementing the "privacy by
design" principle remains a headache... How can we write the data protection
rules directly during the design of an application ?
The vast majority of the current anonymization tools work outside the database,
using the same approach that ETL tools. As a result, the responsibility for
writing anonymization policy is usually assigned to production DBAs.
The PostgreSQL Anonymizer extension introduces a different approach as it seeks
to involve developers and architects early on, during the preliminary design
steps, by declaring the masking rules using SQL, directly inside the database
model itself, in the same way as an integrity constraint or an index !
When a developper wants to add a new column to a table, she/he usually
defines a few rules and restrictions that are enforced for this column.
With PostgreSQL Anonymizer, she/he can also declare that this column contains
personnal information and write a masking rule to describe how the data
will be transformed during the anonymization process.
The extension offers a panel of masking techniques: randomization, noise,
faking, partial destruction, pseudonymization, generalization, etc.
For Thierry Aimé who works at the Office of Architecture and Standards in the
French Public Finances Directorate General (DGFiP), the extensions plays a key
role in the data protection policy :
> « With PostgreSQL Anonymizer we integrate, from the design of the database,
> the principle that outside production the data must be anonymized. Thus we can
> inforce the RGPD rules, without affecting the quality of the tests during
> version upgrades for example. »
Here's a basic example:
CREATE TABLE player(
id SERIAL,
lastname TEXT,
birth DATE,
points INT
);
SECURITY LABEL FOR anon ON COLUMN player.lastname
IS 'MASKED WITH FUNCTION anon.fake_last_name()';
SECURITY LABEL FOR anon ON COLUMN player.birth
IS 'MASKED WITH VALUE NULL';
Alternatively, if the column can be declared as an [indirect identifier], then
the production DBA will be able to use a [K Anonymity] function to check that
there's no risk of [singling out] an individual inside the dataset.
[indirect identifier]: https://labkey.med.ualberta.ca/labkey/_webdav/REDCap%20Support/@wiki/identifiers/identifiers.html?listing=html
[K Anonymity]: https://postgresql-anonymizer.readthedocs.io/en/latest/generalization/#k-anonymity
[singling out]: https://www.cnil.fr/en/sheet-ndeg1-identify-personal-data
Data protection is a team effort ! Every person involved in the lifecycle
of application should be concerned. With that mindset, the PostgreSQL
Anonymizer extension provides tools for developpers and DBAs and help them
to implement the data masking rules early on, thus respecting the
"Privacy by Design" principle.
How to Install
--------------------------------------------------------------------------------
This extension is officially supported on PostgreSQL 9.6 and further versions.
On Red Hat, CentOS and Rocky Linux systems, you can install it directly from the
[official PostgreSQL RPM repository]:
dnf install postgresql_anonymizer14
Then load the extension with:
ALTER DATABASE foo SET session_preload_libraries = 'anon';
Create the extension inside the database:
CREATE EXTENSION anon CASCADE;
And finally, initialize the extension
SELECT anon.init();
For other systems, check out the [install] documentation:
https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
[official PostgreSQL RPM repository]: https://yum.postgresql.org/
[install]: https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
Thanks
--------------------------------------------------------------------------------
PostgreSQL Anonymizer was backed financially by the following entities :
* The French Public Finances Directorate General (DGFiP)
* BioMerieux, a world leader in the field of in vitro diagnostics
Many thanks to them for their help and feedback.
This project includes code, bugfixes, documentation, code reviews and ideas from
[dozens of contributors]. This version 1.0 is a great occasion to show our
gratitude to them!
[dozens of contributors]: https://gitlab.com/dalibo/postgresql_anonymizer/-/blob/master/AUTHORS.md
How to contribute
--------------------------------------------------------------------------------
PostgreSQL Anonymizer is part of the [Dalibo Labs] initiative. It is mainly
developed by [Damien Clochard].
This is an open project, contributions are welcome. We need your feedback and
ideas! Let us know what you think of this tool, how it fits your needs and
what features are missing.
If you want to help, you can find a list of `Junior Jobs` here:
https://gitlab.com/dalibo/postgresql_anonymizer/issues?label_name%5B%5D=Junior+Jobs
[Dalibo Labs]: https://labs.dalibo.com
[Damien Clochard]: https://www.dalibo.com/en/equipe#daamien
--------------------------------------------------------------------------------
PostgreSQL Anonymizer 0.12: Release Candidate 2
================================================================================
Limoges, France, April 10, 2022
`PostgreSQL Anonymizer` is an extension that hides or replaces personally
identifiable information (PII) or commercially sensitive data from a PostgreSQL
database.
The extension supports 3 different anonymization strategies: [Dynamic Masking],
[Static Masking] and [Anonymous Dumps]. It also offers a large choice of
[Masking Functions] such as Substitution, Randomization, Faking,
Pseudonymization, Partial Scrambling, Shuffling, Noise Addition and
Generalization.
[Masking Functions]: https://postgresql-anonymizer.readthedocs.io/en/latest/masking_functions/
[Anonymous Dumps]: https://postgresql-anonymizer.readthedocs.io/en/latest/anonymous_dumps/
[Static Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/static_masking/
[Dynamic Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/
Towards version 1.0 and beyond !
--------------------------------------------------------------------------------
This releases is focused on fixing remaining bugs and removing obsoletes parts
of the code.
4 New Parameters !
--------------------------------------------------------------------------------
This version introduces 4 new GUC parameters:
* anon.maskschema
* anon.sourceschema
* anon.algorithm
* anon.salt
Those values were previously stored in 2 tables `anon.config` and `anon.secret`.
They both are obsolete now.
Read the "Configure" section below for more details:
How to Install
--------------------------------------------------------------------------------
This extension is officially supported on PostgreSQL 9.6 and further versions.
On Red Hat, CentOS and Rocky Linux systems, you can install it directly from the
[official PostgreSQL RPM repository]:
dnf install postgresql_anonymizer14
Then load the extension with:
ALTER DATABASE foo SET session_preload_libraries = 'anon';
Create the extension inside the database:
CREATE EXTENSION anon CASCADE;
And finally, initialize the extension
SELECT anon.init();
For other systems, check out the [install] documentation:
https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
[official PostgreSQL RPM repository]: https://yum.postgresql.org/
[install]: https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
Thanks
--------------------------------------------------------------------------------
This release includes code, bugfixes, documentation, code reviews and ideas from
Radek Salač and others we may have missed.
Many thanks to them!
How to contribute
--------------------------------------------------------------------------------
PostgreSQL Anonymizer is part of the [Dalibo Labs] initiative. It is mainly
developed by [Damien Clochard].
This is an open project, contributions are welcome. We need your feedback and
ideas! Let us know what you think of this tool, how it fits your needs and
what features are missing.
If you want to help, you can find a list of `Junior Jobs` here:
https://gitlab.com/dalibo/postgresql_anonymizer/issues?label_name%5B%5D=Junior+Jobs
[Dalibo Labs]: https://labs.dalibo.com
[Damien Clochard]: https://www.dalibo.com/en/equipe#daamien
--------------------------------------------------------------------------------
PostgreSQL Anonymizer 0.11: Release Candidate 1
================================================================================
Paris, France, March 31, 2022
`PostgreSQL Anonymizer` is an extension that hides or replaces personally
identifiable information (PII) or commercially sensitive data from a PostgreSQL
database.
The extension supports 3 different anonymization strategies: [Dynamic Masking],
[Static Masking] and [Anonymous Dumps]. It also offers a large choice of
[Masking Functions] such as Substitution, Randomization, Faking,
Pseudonymization, Partial Scrambling, Shuffling, Noise Addition and
Generalization.
[Masking Functions]: https://postgresql-anonymizer.readthedocs.io/en/latest/masking_functions/
[Anonymous Dumps]: https://postgresql-anonymizer.readthedocs.io/en/latest/anonymous_dumps/
[Static Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/static_masking/
[Dynamic Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/
Towards version 1.0 and beyond !
--------------------------------------------------------------------------------
This releases does not introduce major changes. It is focused on fixing
remaining bugs and removing obsoletes parts of the code.
In previous versions we allowed users de declare masking rules using a `COMMENT`
statement. This is not supported anymore. Use `SECURITY LABEL FOR anon` to
declare your rules... And use `COMMENT` to declare comments :)
How to Install
--------------------------------------------------------------------------------
This extension is officially supported on PostgreSQL 9.6 and further versions.
On Red Hat, CentOS and Rocky Linux systems, you can install it directly from the
[official PostgreSQL RPM repository]:
dnf install postgresql_anonymizer14
Then load the extension with:
ALTER DATABASE foo SET session_preload_libraries = 'anon';
Create the extension inside the database:
CREATE EXTENSION anon CASCADE;
And finally, initialize the extension
SELECT anon.init();
For other systems, check out the [install] documentation:
https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
[official PostgreSQL RPM repository]: https://yum.postgresql.org/
[install]: https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
Thanks
--------------------------------------------------------------------------------
This release includes code, bugfixes, documentation, code reviews and ideas from
Christophe Courtois, Hrvoje Pavlinovic, Mike Tefft, Cristian Gomez Portes and
others we may have missed.
Many thanks to them!
How to contribute
--------------------------------------------------------------------------------
PostgreSQL Anonymizer is part of the [Dalibo Labs] initiative. It is mainly
developed by [Damien Clochard].
This is an open project, contributions are welcome. We need your feedback and
ideas! Let us know what you think of this tool, how it fits your needs and
what features are missing.
If you want to help, you can find a list of `Junior Jobs` here:
https://gitlab.com/dalibo/postgresql_anonymizer/issues?label_name%5B%5D=Junior+Jobs
[Dalibo Labs]: https://labs.dalibo.com
[Damien Clochard]: https://www.dalibo.com/en/equipe#daamien
[Frédéric Yhuel]: https://www.dalibo.com/en/equipe#frederic
--------------------------------------------------------------------------------
PostgreSQL Anonymizer 0.10: An improved engine and a brand new tutorial
================================================================================
Paris, France, March 14, 2022
`PostgreSQL Anonymizer` is an extension that hides or replaces personally
identifiable information (PII) or commercially sensitive data from a PostgreSQL
database.
The extension supports 3 different anonymization strategies: [Dynamic Masking],
[Static Masking] and [Anonymous Dumps]. It also offers a large choice of
[Masking Functions] such as Substitution, Randomization, Faking,
Pseudonymization, Partial Scrambling, Shuffling, Noise Addition and
Generalization.
[Masking Functions]: https://postgresql-anonymizer.readthedocs.io/en/latest/masking_functions/
[Anonymous Dumps]: https://postgresql-anonymizer.readthedocs.io/en/latest/anonymous_dumps/
[Static Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/static_masking/
[Dynamic Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/
Many improvements and a better dynamic masking engine
--------------------------------------------------------------------------------
This version is a major step toward the upcoming version 1.0 which will be
considered as production ready.
The main changes are focused on the `pg_dump_anon.sh` wrapper which is now
faster and more accurate.
The dynamic masking engine has been simplified and improved. The change is not
visible for end users but the inner workings are now more robust, in particular
the main event trigger that hides the real data to the masked roles.
A brand new tutorial
--------------------------------------------------------------------------------
Anonymization can be a difficult task and sometimes we just don't know where
to start ! In addition to the [project documentation], we are now publishing
a complete 4-hours workshop with practical examples, a series of exercises and
their solutions.
The result is a 50 pages document designed as a kickstarter to help you discover
how you can use PostgreSQL Anonymizer to protect the privacy and comply to the
GDPR requirements.
The tutorial is available here:
https://dali.bo/howto_anon_handout
How to Install
--------------------------------------------------------------------------------
This extension is officially supported on PostgreSQL 9.6 and further versions.
On Red Hat, CentOS and Rocky Linux systems, you can install it directly from the
[official PostgreSQL RPM repository]:
dnf install postgresql_anonymizer14
Then load the extension with:
ALTER DATABASE foo SET session_preload_libraries = 'anon';
Create the extension inside the database:
CREATE EXTENSION anon CASCADE;
And finally, initialize the extension
SELECT anon.init();
For other systems, check out the [install] documentation:
https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
[official PostgreSQL RPM repository]: https://yum.postgresql.org/
[install]: https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
Thanks
--------------------------------------------------------------------------------
This release includes code, bugfixes, documentation, code reviews and ideas from
Be Hai Tran, Florent Jardin, Yann B., Christophe Courtois, Vito Botta,
Cristiano S., Adrien S., Justin Wei (and others we may have missed).
Many thanks to them!
A final special thanks goes to [Frédéric Yhuel] for his work on the pl/pgsql code
and the documentation !
How to contribute
--------------------------------------------------------------------------------
PostgreSQL Anonymizer is part of the [Dalibo Labs] initiative. It is mainly
developed by [Damien Clochard].
This is an open project, contributions are welcome. We need your feedback and
ideas! Let us know what you think of this tool, how it fits your needs and
what features are missing.
If you want to help, you can find a list of `Junior Jobs` here:
https://gitlab.com/dalibo/postgresql_anonymizer/issues?label_name%5B%5D=Junior+Jobs
[Dalibo Labs]: https://labs.dalibo.com
[Damien Clochard]: https://www.dalibo.com/en/equipe#daamien
[Frédéric Yhuel]: https://www.dalibo.com/en/equipe#frederic
--------------------------------------------------------------------------------
PostgreSQL Anonymizer 0.9: Trusted Schemas and Support for PostgreSQL 14
================================================================================
Paris, France, July 2nd, 2021
`PostgreSQL Anonymizer` is an extension that hides or replaces personally
identifiable information (PII) or commercially sensitive data from a PostgreSQL
database.
The extension supports 3 different anonymization strategies: [Dynamic Masking],
[Static Masking] and [Anonymous Dumps]. It also offers a large choice of
[Masking Functions] such as Substitution, Randomization, Faking,
Pseudonymization, Partial Scrambling, Shuffling, Noise Addition and
Generalization.
[Masking Functions]: https://postgresql-anonymizer.readthedocs.io/en/latest/masking_functions/
[Anonymous Dumps]: https://postgresql-anonymizer.readthedocs.io/en/latest/anonymous_dumps/
[Static Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/static_masking/
[Dynamic Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/
Reject masking filters if they don't belong to a trusted schema
--------------------------------------------------------------------------------
With `PostgreSQL Anonymizer`, the database owner can define custom masking
filters that would hide sensitive information based on internal business rules,
for instance a specific masking function that would remove names and phone numbers
from a JSON document.
For security reasons, the database administrator may want to restrict this
feature by accepting only the masking filters located inside a **trusted** schema.
To activate this security barrier, the administrator can simply enable a GUC
option called `anon.restrict_to_trusted_schemas`:
ALTER SYSTEM SET anon.restrict_to_trusted_schemas = on;
And then declare which schemas are trusted:
SECURITY LABEL FOR anon ON SCHEMA foo IS 'TRUSTED';
By default, the schemas `pg_catalog` and `anon` are trusted. The `public` schema
is not trusted (and it should never be...).
IMPORTANT: Activating this parameter may break some pre-existing masking rules!
If that's the case, the database administrator may have to move some custom
masking functions inside a trusted schema. For now, this parameter is disabled
by default. However it will be set to 'on' by default in future versions.
Users are strongly encouraged to activate this option as soon as possible.
Warning: Support for Amazon RDS is now deprecated
--------------------------------------------------------------------------------
As announced in the previous version, we made the difficult choice to drop the
so-called `standalone installation method`. In practice, the `anon_standalone.sql`
file will not evolve anymore.
As a collateral effect, this means the extension won't work on most of the
Postgres-as-a-Service platforms, such as Amazon RDS, unless they decide to
actively support it.
If privacy and anonymity are a concern to you, we encourage you to contact the
customer services of these platforms and ask them if they plan to add this
extension to their catalog.
How to Install
--------------------------------------------------------------------------------
This extension is officially supported on PostgreSQL 9.6 and further versions.
On Red Hat, CentOS and Rocky Linux systems, you can install it directly from the
[official PostgreSQL RPM repository]:
yum install postgresql_anonymizer12
Then load the extension with:
ALTER DATABASE foo SET session_preload_libraries = 'anon';
Create the extension inside the database:
CREATE EXTENSION anon CASCADE;
And finally, initialize the extension
SELECT anon.init();
For other systems, check out the [install] documentation:
https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
> **WARNING:** The project is still under active development and should be
> used carefully.
[official PostgreSQL RPM repository]: https://yum.postgresql.org/
[install]: https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
Thanks
--------------------------------------------------------------------------------
This release includes code, bugfixes and ideas from Carlos Medeiros, Devrim Gündüz,
Andreas D, Thibaut Madelaine.
Many thanks to them!
How to contribute
--------------------------------------------------------------------------------
PostgreSQL Anonymizer is part of the [Dalibo Labs] initiative. It is mainly
developed by [Damien Clochard] and [Frédéric Yhuel]
This is an open project, contributions are welcome. We need your feedback and
ideas! Let us know what you think of this tool, how it fits your needs and
what features are missing.
If you want to help, you can find a list of `Junior Jobs` here:
https://gitlab.com/dalibo/postgresql_anonymizer/issues?label_name%5B%5D=Junior+Jobs
[Dalibo Labs]: https://labs.dalibo.com
[Damien Clochard]: https://www.dalibo.com/en/equipe#daamien
[Frédéric Yhuel]: https://www.dalibo.com/equipe#frederic
--------------------------------------------------------------------------------
PostgreSQL Anonymizer 0.8: Masking foreign tables and partitions
================================================================================
Paris, France, February 8, 2021
`PostgreSQL Anonymizer` is an extension that hides or replaces personally
identifiable information (PII) or commercially sensitive data from a PostgreSQL
database.
The extension supports 3 different anonymization strategies: [Dynamic Masking],
[Static Masking] and [Anonymous Dumps]. It also offers a large choice of
[Masking Functions] such as Substitution, Randomization, Faking,
Pseudonymization, Partial Scrambling, Shuffling, Noise Addition and
Generalization.
[Masking Functions]: https://postgresql-anonymizer.readthedocs.io/en/latest/masking_functions/
[Anonymous Dumps]: https://postgresql-anonymizer.readthedocs.io/en/latest/anonymous_dumps/
[Static Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/static_masking/
[Dynamic Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/
Improve performances and randomness
--------------------------------------------------------------------------------
Our latest efforts were focused on performance and improving the `shuffle`
algorithm and `fake` data filters. You may also notice that the `anonymize_database`
function is much faster.
Support for foreign tables and partitions
--------------------------------------------------------------------------------
You can now declare masking rules on partitions, inherited tables and foreign tables.
However keep in mind that the masking rules are **NOT INHERITED**. If you have
split a table into multiple partitions, you need to declare the masking rules for
each partition.
Warning: Support for Amazon RDS will be deprecated in the next version
--------------------------------------------------------------------------------
This extension was never really intended to work on Database As A Service platforms
(such as Amazon RDS or Google Cloud SQL). It just happens to work currently using
the `standalone installation` method but **we will no longer actively support it**.
In practice, the `anon_standalone.sql` file will not evolve anymore.
In future versions, we will introduce features that will force us to deprecate
this method. If privacy and anonymity are a concern to you, we encourage you to
contact the customer services of these platforms and ask them if they plan to
add this extension to their catalog.
How to Install
--------------------------------------------------------------------------------
This extension is officially supported on PostgreSQL 9.6 and further versions.
On Red Hat / CentOS systems, you can install it from the
[official PostgreSQL RPM repository]:
yum install postgresql_anonymizer12
Then load the extension with:
ALTER DATABASE foo SET session_preload_libraries = 'anon';
Create the extension inside the database:
CREATE EXTENSION anon CASCADE;
And finally, initialize the extension
SELECT anon.init();
For other systems, check out the [install] documentation :
https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
> **WARNING:** The project is still under active development and should be
> used carefully.
[official PostgreSQL RPM repository]: https://yum.postgresql.org/
[install]: https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
Thanks
--------------------------------------------------------------------------------
This release includes code, bugfixes and ideas from Rushal Verma, Paul Bonaud,
Dmitry Fomin, Rodrigo Otsuka , Nicolas Peltier, Matthieu Larcher and others.
Many thanks to them!
How to contribute
--------------------------------------------------------------------------------
PostgreSQL Anonymizer is part of the [Dalibo Labs] initiative. It is mainly
developed by [Damien Clochard].
This is an open project, contributions are welcome. We need your feedback and
ideas! Let us know what you think of this tool, how it fits your needs and
what features are missing.
If you want to help, you can find a list of `Junior Jobs` here:
https://gitlab.com/dalibo/postgresql_anonymizer/issues?label_name%5B%5D=Junior+Jobs
[Dalibo Labs]: https://labs.dalibo.com
[Damien Clochard]: https://www.dalibo.com/en/equipe#daamien
--------------------------------------------------------------------------------
PostgreSQL Anonymizer 0.7: Generic Hashing and Advanced Faking
================================================================================
Eymoutiers, France, September 25, 2020
`PostgreSQL Anonymizer` is an extension that hides or replaces personally
identifiable information (PII) or commercially sensitive data from a PostgreSQL
database.
The extension supports 3 different anonymization strategies: [Dynamic Masking],
[In-Place Anonymization] and [Anonymous Dumps]. It also offers a large choice of
[Masking Functions] such as Substitution, Randomization, Faking,
Pseudonymization, Partial Scrambling, Shuffling, Noise Addition and
Generalization.
[Masking Functions]: https://postgresql-anonymizer.readthedocs.io/en/latest/masking_functions/
[Anonymous Dumps]: https://postgresql-anonymizer.readthedocs.io/en/latest/anonymous_dumps/
[In-Place Anonymization]: https://postgresql-anonymizer.readthedocs.io/en/latest/in_place_anonymization/
[Dynamic Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/
Generic Hashing
--------------------------------------------------------------------------------
In theory, hashing is not a valid anonymization technique, however in practice
it is sometimes necessary to generate a determinist hash of the original data.
For instance, when a pair of primary key / foreign key is a "natural key",
it may contain actual information (like a customer number containing a birth
date or something similar).
Hashing such columns allows to keep referential integrity intact even for
relatively unusual source data. Therefore, the extension provides 2 masking
functions:
* `anon.hash(value)` will return a text hash of the value using a secret salt
and a secret hash algorithm (see below)
* `anon.digest(value,salt,algorithm)` lets you choose a salt and the hash
algorithm you want to use
By default a random secret salt is generated when the extension is initialized
and the default hash algortihm is `sha512`. You can change that if needed.
Keep in mind that hashing is a form of Pseudonymization. This means that the
real data can be rebuilt using the hashed value and the masking function. If an
attacker gets access to these elements, he or she can easily re-identify
some persons using `brute force` or `dictionary` attacks. Therefore, **the
salt and the algorithm used to hash the data must be protected with the
same level of security that the original dataset.**
Many thanks to Gunnar "Nick" Bluth for his help on this feature !
Advanced Faking
-------------------------------------------------------------------------------
Generating fake data is a complex topic. The anon extension offers a set of
basic faking functions but for more advanced faking methods, in particular
if you are looking for **localized fake data**, take a look at
[PostgreSQL Faker], an extension based upon the well-known [Faker python library].
[PostgreSQL Faker]: https://gitlab.com/dalibo/postgresql_faker
[Faker python library]: https://faker.readthedocs.io
This extension provides an advanced faking engine with localisation support
For example:
CREATE SCHEMA faker;
CREATE EXTENSION faker SCHEMA faker;
SELECT faker.faker('de_DE');
SELECT faker.first_name_female();
first_name_female
-------------------
Mirja
How to Install
--------------------------------------------------------------------------------
This extension is officially supported on PostgreSQL 9.6 and further versions.
On Red Hat / CentOS systems, you can install it from the
[official PostgreSQL RPM repository]:
yum install postgresql_anonymizer12
Then load the extension with:
ALTER DATABASE foo SET session_preload_libraries = 'anon';
Create the extension inside the database:
CREATE EXTENSION anon CASCADE;
And finally, initialize the extension
SELECT anon.init();
For other systems, check out the [install] documentation :
https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
> **WARNING:** The project is at an early stage of development and should be
> used carefully.
[official PostgreSQL RPM repository]: https://yum.postgresql.org/
[install]: https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
Thanks
--------------------------------------------------------------------------------
This release includes code, bugfixes and ideas from Gunnar "Nick" Bluth,
Yann Robin, Christophe Courtois, Nikolay Samokhvalov.
Many thanks to them !
How to contribute
--------------------------------------------------------------------------------
PostgreSQL Anonymizer is part of the [Dalibo Labs] initiative. It is mainly
developed by [Damien Clochard].
This is an open project, contributions are welcome. We need your feedback and
ideas ! Let us know what you think of this tool, how it fits your needs and
what features are missing.
If you want to help, you can find a list of `Junior Jobs` here:
https://gitlab.com/dalibo/postgresql_anonymizer/issues?label_name%5B%5D=Junior+Jobs
[Dalibo Labs]: https://labs.dalibo.com
[Damien Clochard]: https://www.dalibo.com/en/equipe#daamien
--------------------------------------------------------------------------------
PostgreSQL Anonymizer 0.6: Pseudonymization and Improved Anonymous Exports
================================================================================
Eymoutiers, France, Mars 5, 2020
`PostgreSQL Anonymizer` is an extension that hides or replaces personally
identifiable information (PII) or commercially sensitive data from a PostgreSQL
database.
The extension supports 3 different anonymization strategies: [Dynamic Masking],
[In-Place Anonymization] and [Anonymous Dumps]. It also offers a large choice of
[Masking Functions]: Substitution, Randomization, Faking, Pseudonymization,
Partial Scrambling, Shuffling, Noise Addition and Generalization.
[Masking Functions]: https://postgresql-anonymizer.readthedocs.io/en/latest/masking_functions/
[Anonymous Dumps]: https://postgresql-anonymizer.readthedocs.io/en/latest/anonymous_dumps/
[In-Place Anonymization]: https://postgresql-anonymizer.readthedocs.io/en/latest/in_place_anonymization/
[Dynamic Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/
Pseudonymization
--------------------------------------------------------------------------------
[Pseudonymization] functions are similar to the faking functions in the sense
that they generate realistic values. The main difference is that pseudonymization
is deterministic : the functions will always return the same fake value based on
a seed and an optional salt.
# SELECT anon.pseudo_email('Alice','salt123');
pseudo_email
-------------------
fcadell56@ucoz.ru
# SELECT anon.pseudo_email('Alice','salt123');
pseudo_email
-------------------
fcadell56@ucoz.ru
**WARNING** : Pseudonymization is often confused with anonymization but in fact
they serve 2 different purposes. With pseudonymization, the real data can be
rebuilt using the pseudo data and the masking rule. If an attacker
gets access to these elements, he or she can easily re-identify some people
using `brute force` or `dictionary` attacks. Therefore, you should protect any
pseudonymized data with the same level of security that the original dataset.
The GDPR makes it very clear that personal data which have undergone
pseudonymization are still considered to be personal information.
[Pseudonymization]: https://postgresql-anonymizer.readthedocs.io/en/latest/masking_functions/#pseudonymization
Improved Anonymous Exports
--------------------------------------------------------------------------------
The `anon.dump()` function was slow and unpractical. It is now deprecated and
replace by a tool named `pg_dump_anon` that you can use like the regular `pg_dump`
command:
pg_dump_anon -h localhost -U bob mydb > anonymous_dump.sql
It uses the same connections parameters that `pg_dump`. The PostgreSQL
environment variables ($PGHOST, PGUSER, etc.) and `.pgpass` are supported.
However the `plain` format is the only supported format. The other formats
(`custom`, `dir` and `tar`) are not supported.
Detecting Hidden Identifiers
--------------------------------------------------------------------------------
This extension makes it very easy to declare masking rules. But of course when
you're creating an anonymization strategy, the hard part is to scan the database
model to find which columns contains direct and indirect identifiers and then
decide how these identifiers should be masked.
We now provide a `detect()` function that will search for common identifiers
names based on a dictionary. For now, 2 dictionaries are available: English
('en_US') and French ('fr_FR'). By default the English dictionary is used:
# SELECT anon.detect('en_US');
table_name | column_name | identifiers_category | direct
------------+----------------+----------------------+--------
customer | CreditCard | creditcard | t
customer | id | account_id | t
vendor | Firstname | firstname | t
The identifier categories are based on the [HIPAA classification].
[HIPAA classification]: https://www.luc.edu/its/aboutits/itspoliciesguidelines/hipaainformation/18hipaaidentifiers/
How to Install
--------------------------------------------------------------------------------
This extension is officially supported on PostgreSQL 9.6 and later.
On Red Hat / CentOS systems, you can install it from the
[official PostgreSQL RPM repository]:
yum install postgresql_anonymizer12
Then add 'anon' in the `shared_preload_libraries` parameter of your
`postgresql.conf` file. And restart your instance.
For other systems, check out the [install] documentation :
https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
> **WARNING:** The project is at an early stage of development and should be
> used carefully.
[official PostgreSQL RPM repository]: https://yum.postgresql.org/
[install]: https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
Thanks
--------------------------------------------------------------------------------
This release includes code, bugfixes and ideas from Sebastien Delobel, Sam
Buckingham, Thomas Clark, Joe Auty, Pierre-Henri Dubois Amy and Olleg Samoylov.
Many thanks to them !
How to contribute
--------------------------------------------------------------------------------
PostgreSQL Anonymizer is part of the [Dalibo Labs] initiative. It is mainly
developed by [Damien Clochard].
This is an open project, contributions are welcome. We need your feedback and
ideas ! Let us know what you think of this tool, how it fits your needs and
what features are missing.
If you want to help, you can find a list of `Junior Jobs` here:
https://gitlab.com/dalibo/postgresql_anonymizer/issues?label_name%5B%5D=Junior+Jobs
[Dalibo Labs]: https://labs.dalibo.com
[Damien Clochard]: https://www.dalibo.com/en/equipe#daamien
--------------------------------------------------------------------------------
PostgreSQL Anonymizer 0.5: Generalization and k-anonymity
================================================================================
Eymoutiers, France, November 6, 2019
`Postgresql Anonymizer` is an extension that hides or replaces personally
identifiable information (PII) or commercially sensitive data from a PostgreSQL
database.
The extension supports 3 different anonymization strategies: [Dynamic Masking],
[In-Place Anonymization] and [Anonymous Dumps]. It also offers a large choice of
[Masking Functions]: Substitution, Randomization, Faking, Partial Scrambling,
Shuffling, Noise Addition and Generalization.
[Masking Functions]: https://postgresql-anonymizer.readthedocs.io/en/latest/masking_functions/
[Anonymous Dumps]: https://postgresql-anonymizer.readthedocs.io/en/latest/anonymous_dumps/
[In-Place Anonymization]: https://postgresql-anonymizer.readthedocs.io/en/latest/in_place_anonymization/
[Dynamic Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/
Generalization
--------------------------------------------------------------------------------
The idea of generalization is to replace data with a broader, less accurate
value. For instance, instead of saying "Bob is 28 years old", you can say
"Bob is between 20 and 30 years old". This is interesting for analytics because
the data remains true while avoiding the risk of re-identification.
PostgreSQL can handle generalization very easily with the [RANGE] data types,
a very poweful way to store and manipulate a set of values contained between
a lower and an upper bound.
[RANGE]: https://www.postgresql.org/docs/current/rangetypes.html
Here's a basic table containing medical data:
SELECT * FROM patient;
ssn | firstname | zipcode | birth | disease
-------------+-----------+---------+------------+---------------
253-51-6170 | Alice | 47012 | 1989-12-29 | Heart Disease
091-20-0543 | Bob | 42678 | 1979-03-22 | Allergy
565-94-1926 | Caroline | 42678 | 1971-07-22 | Heart Disease
510-56-7882 | Eleanor | 47909 | 1989-12-15 | Acne
We want the anonymized data to remain **true** because it will be
used for statistics. We can build a view upon this table to remove
useless columns and generalize the indirect identifiers (zipcode and
birthday):
CREATE MATERIALIZED VIEW generalized_patient AS
SELECT
'REDACTED'::TEXT AS firstname,
anon.generalize_int4range(zipcode,1000) AS zipcode,
anon.generalize_daterange(birth,'decade') AS birth,
disease
FROM patient;
This will give us a less accurate view of the data:
SELECT * FROM generalized_patient;
firstname | zipcode | birth | disease
-----------+---------------+-------------------------+---------------
REDACTED | [47000,48000) | [1980-01-01,1990-01-01) | Heart Disease
REDACTED | [42000,43000) | [1970-01-01,1980-01-01) | Allergy
REDACTED | [42000,43000) | [1970-01-01,1980-01-01) | Heart Disease
REDACTED | [47000,48000) | [1980-01-01,1990-01-01) | Acne
k-anonymity
--------------------------------------------------------------------------------
k-anonymity is an industry-standard term used to describe a property of an
anonymized dataset. The k-anonymity principle states that within a
given dataset, any anonymized individual cannot be distinguished from at
least `k-1` other individuals. In other words, k-anonymity might be described
as a "hiding in the crowd" guarantee. A low value of `k` indicates there's a risk
of re-identification using linkage with other data sources.
You can evaluate the k-anonymity factor of a table in 2 steps :
Step 1: First defined the columns that are [indirect idenfiers] ( also known
as "quasi identifers") like this:
SECURITY LABEL FOR anon ON COLUMN generalized_patient.zipcode
IS 'INDIRECT IDENTIFIER';
SECURITY LABEL FOR anon ON COLUMN generalized_patient.birth
IS 'INDIRECT IDENTIFIER';
Step 2: Once the indirect identifiers are declared :
SELECT anon.k_anonymity('generalized_patient')
In the example above, the k-anonymity factor of the `generalized_patient`
materialized view is `2`.
Lorem Ipsum
--------------------------------------------------------------------------------
For TEXT and VARCHAR columns, you can now use the classic [Lorem Ipsum]
generator:
* `anon.lorem_ipsum()` returns 5 paragraphs
* `anon.lorem_ipsum(2)` returns 2 paragraphs
* `anon.lorem_ipsum( paragraphs := 4 )` returns 4 paragraphs
* `anon.lorem_ipsum( words := 20 )` returns 20 words
* `anon.lorem_ipsum( characters := 7 )` returns 7 characters
[Lorem Ipsum]: https://lipsum.com
How to Install
--------------------------------------------------------------------------------
This extension is officially supported on PostgreSQL 9.6 and later.
On Red Hat / CentOS systems, you can install it from the
[official PostgreSQL RPM repository]:
yum install postgresql_anonymizer12
Then add 'anon' in the `shared_preload_libraries` parameter of your
`postgresql.conf` file. And restart your instance.
For other system, check out the [install] documentation :
https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
> **WARNING:** The project is at an early stage of development and should be
> used carefully.
[official PostgreSQL RPM repository]: https://yum.postgresql.org/
[install]: https://postgresql-anonymizer.readthedocs.io/en/latest/INSTALL/
Thanks
--------------------------------------------------------------------------------
This release includes code and ideas from Travis Miller, Jan Birk and Olleg
Samoylov. Many thanks to them !
How to contribute
--------------------------------------------------------------------------------
PostgreSQL Anonymizer is part of the [Dalibo Labs] initiative. It is mainly
developed by [Damien Clochard].
This is an open project, contributions are welcome. We need your feedback and
ideas ! Let us know what you think of this tool, how it fits your needs and
what features are missing.
If you want to help, you can find a list of `Junior Jobs` here:
https://gitlab.com/dalibo/postgresql_anonymizer/issues?label_name%5B%5D=Junior+Jobs
[Dalibo Labs]: https://labs.dalibo.com
[Damien Clochard]: https://www.dalibo.com/en/equipe#daamien
--------------------------------------------------------------------------------
PostgreSQL Anonymizer 0.4 : Declare Masking Rules With Security Labels
================================================================================
Eymoutiers, October 14, 2019
`Postgresql Anonymizer` is an extension that hides or replaces personally
identifiable information (PII) or commercially sensitive data from a PostgreSQL
database.
This new version introduces a major change of syntax. In the previous versions,
the data masking rules were declared with column comments. They are now defined
by using [security labels]:
[security labels]: https://www.postgresql.org/docs/current/sql-security-label.html
SECURITY LABEL FOR anon
ON COLUMN customer.lastname
IS 'MASKED WITH FUNCTION anon.fake_last_name()'
The previous syntax is still supported and backward compatibility is maintained.
How to Install
--------------------------------------------------------------------------------
This extension is officially supported on PostgreSQL 9.6 and later.
It requires extension named [tsm_system_rows] (available in the `contrib`
package) and an extension called [ddlx] (available via [PGXN]) :
pgxn install ddlx
pgxn install postgresql_anonymizer
Then add 'anon' in the `shared_preload_libraries` parameter of your
`postgresql.conf` file. And restart your instance.
> **WARNING:** The project is at an early stage of development and should be used
> carefully.
[tsm_system_rows]: https://www.postgresql.org/docs/current/tsm-system-rows.html
[ddlx]: https://github.com/lacanoid/pgddl
[PGXN]: https://pgxn.org/
How to contribute
--------------------------------------------------------------------------------
PostgreSQL Anonymizer is part of the [Dalibo Labs] initiative. It is mainly
developed by [Damien Clochard].
This is an open project, contributions are welcome. We need your feedback and
ideas ! Let us know what you think of this tool, how it fits your needs and
what features are missing.
If you want to help, you can find a list of `Junior Jobs` here:
https://gitlab.com/dalibo/postgresql_anonymizer/issues?label_name%5B%5D=Junior+Jobs
[Dalibo Labs]: https://labs.dalibo.com
[Damien Clochard]: https://www.dalibo.com/en/equipe#daamien
--------------------------------------------------------------------------------
PostgreSQL Anonymizer 0.3 : In-Place Masking and Anonymous Dumps
================================================================================
Paris, August 26, 2019
`postgresql_anonymizer` is an extension that hides or replaces personally
identifiable information (PII) or commercially sensitive data from a PostgreSQL
database.
Firts of all, you declare a list of [Masking Rules] directly inside the database
model with SQL comments like this :
COMMENT ON COLUMN users.name IS 'MASKED WITH FUNCTION md5(name)';
Once the masking rules are declared, anonymization can be acheived in 3
different ways:
* [Anonymous Dumps]: Simply export the masked data into an SQL file
* [In-Place Anonymization]: Remove the sensible data according to the rules
* [Dynamic Masking]: Hide sensible data, only for the masked users
In addition, various [Masking Functions] are available : randomization, faking,
partial scrambling, shuffling, noise, etc... You can also user your own custom
function !
For more detail, please take a look at the documention:
https://postgresql-anonymizer.readthedocs.io/
[Masking Rules]: https://postgresql-anonymizer.readthedocs.io/en/latest/declare_masking_rules/
[Masking Functions]: https://postgresql-anonymizer.readthedocs.io/en/latest/masking_functions/
[Anonymous Dumps]: https://postgresql-anonymizer.readthedocs.io/en/latest/anonymous_dumps/
[In-Place Anonymization]: https://postgresql-anonymizer.readthedocs.io/en/latest/in_place_anonymization/
[Dynamic Masking]: https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/
How to Install
--------------------------------------------------------------------------------
This extension is officially supported on PostgreSQL 9.6 and later.
It requires extension named [tsm_system_rows] (available in the `contrib`
package) and an extension called [ddlx] (available via [PGXN]) :
pgxn install ddlx
pgxn install postgresql_anonymizer
> **WARNING:** The project is at an early stage of development and should be used
> carefully.
[tsm_system_rows]: https://www.postgresql.org/docs/current/tsm-system-rows.html
[ddlx]: https://github.com/lacanoid/pgddl
[PGXN]: https://pgxn.org/
How to contribute
--------------------------------------------------------------------------------
PostgreSQL Anonymizer is part of the [Dalibo Labs] initiative. It is mainly
developed by [Damien Clochard].
This is an open project, contributions are welcome. We need your feedback and
ideas ! Let us know what you think of this tool, how it fits your needs and
what features are missing.
If you want to help, you can find a list of `Junior Jobs` here:
https://gitlab.com/dalibo/postgresql_anonymizer/issues?label_name%5B%5D=Junior+Jobs
[Dalibo Labs]: https://labs.dalibo.com
[Damien Clochard]: https://www.dalibo.com/en/equipe#daamien
--------------------------------------------------------------------------------
Introducing PostgreSQL Anonymizer 0.2.1 !
================================================================================
Paris, october 29, 2018
`postgresql_anonymizer` is an extension to mask or replace personally identifiable
information (PII) or commercially sensitive data from a PostgreSQL database.
The projet is aiming toward a **declarative approach** of anonymization. This
means we're trying to extend PostgreSQL's Data Definition Language (DDL) in
order to specify the anonymization strategy inside the table definition itself.
The extension can be used to put dynamic masks on certain users or permanently
modify sensitive data. Various masking techniques are available : randomization,
partial scrambling, custom rules, etc.
This tool is distributed under the PostgreSQL licence and the code is here:
https://gitlab.com/daamien/postgresql_anonymizer
Example
--------------------------------------------------------------------------------
Imagine a `people` table
=# SELECT * FROM people;
id | name | phone
------+----------------+------------
T800 | Schwarzenegger | 0609110911
### STEP 1 : Activate the masking engine
=# CREATE EXTENSION IF NOT EXISTS anon CASCADE;
=# SELECT anon.mask_init();
### STEP 2 : Declare a masked user
=# CREATE ROLE skynet;
=# COMMENT ON ROLE skynet IS 'MASKED';
### STEP 3 : Declare the masking rules
=# COMMENT ON COLUMN people.name
-# IS 'MASKED WITH FUNCTION anon.random_last_name()';
=# COMMENT ON COLUMN people.phone
-# IS 'MASKED WITH FUNCTION anon.partial(phone,2,$$******$$,2)';
### STEP 4 : Connect with the masked user
=# \! psql test -U skynet -c 'SELECT * FROM people;'
id | name | phone
------+----------+------------
T800 | Nunziata | 06******11
How to Install
--------------------------------------------------------------------------------
This extension is officially supported on PostgreSQL 9.6 and later.
It should also work on PostgreSQL 9.5 with a bit of hacking.
It requires an extension named `tsm_system_rows`, which is delivered by the
postgresql-contrib package of the main linux distributions
You can install it with `pgxn` or build from source it like any other
extenstion.
**WARNING:** The project is at an early stage of development and should be used carefully.
How to contribute
--------------------------------------------------------------------------------
I'd like to thanks all my wonderful colleagues at [Dalibo] for their support
and especially Thibaut Madelaine for the initial ideas.
This is an open project, contributions are welcome. I need your feedback and
ideas ! Let me know what you think of this tool, how it fits your needs and
what features are missing.
If you want to help, you can find a list of `Junior Jobs` here:
[Dalibo]: https://dalibo.com