This is the documentation for asr.database.duplicates-recipe. This recipe is comprised of a single instruction, namely:

Run this recipe through the CLI interface

$ asr run asr.database.duplicates

or as a python module

$ python -m asr.database.duplicates



asr.database.duplicates.main(database, databaseout=None, filterstring='<=natoms,<energy', comparison_keys='', rmsd_tol=0.3, skip_distance_calc=False)[source]

Filter out duplicates of a database.

  • database (db-connection) – Database to be analyzed for duplicates.

  • databaseout (str) – Filename of new database with duplicates removed.

  • filterstring (str) – Comma separated string of filters. A simple filter could be ‘<energy’ which only pick a material if no other material with lower energy exists (in other words: chose the lowest energy materials). ‘<’ means ‘smallest’. Other accepted operators are {‘<=’, ‘>=’, ‘>’, ‘<’, ‘==’}. Additional filters can be added to construct more complex filters, i.e., ‘<energy,<=natoms’ means that a material is only picked if no other materials with lower energy AND fewer or same number of atoms exists.

  • comparison_keys (str) – Comma separated string of keys that should be identical between rows to be compared. Eg. ‘magstate,natoms’.

  • rmsd_tol (float) – Tolerance on RMSD between materials for them to be considered to be duplicates.

  • skip_distance_calc (bool) – If true, only use reduced formula and comparison_keys to match structures. Skip calculating distances between structures. The output rmsd’s will be 0 for matching structures.


  • duplicate_groups: Dict containing all duplicate groups. The key of each group is the uid of the prioritized candidate of the group.

Return type