asr.database.duplicates¶
Contents
Summary¶
This is the documentation for asr.database.duplicates
-recipe.
This recipe is comprised of a single instruction, namely:
Run this recipe through the CLI interface
$ asr run asr.database.duplicates
or as a python module
$ python -m asr.database.duplicates
Steps¶
asr.database.duplicates¶
- asr.database.duplicates.main(database, databaseout=None, filterstring='<=natoms,<energy', comparison_keys='', rmsd_tol=0.3, skip_distance_calc=False)[source]¶
Filter out duplicates of a database.
- Parameters
database (db-connection) – Database to be analyzed for duplicates.
databaseout (str) – Filename of new database with duplicates removed.
filterstring (str) – Comma separated string of filters. A simple filter could be ‘<energy’ which only pick a material if no other material with lower energy exists (in other words: chose the lowest energy materials). ‘<’ means ‘smallest’. Other accepted operators are {‘<=’, ‘>=’, ‘>’, ‘<’, ‘==’}. Additional filters can be added to construct more complex filters, i.e., ‘<energy,<=natoms’ means that a material is only picked if no other materials with lower energy AND fewer or same number of atoms exists.
comparison_keys (str) – Comma separated string of keys that should be identical between rows to be compared. Eg. ‘magstate,natoms’.
rmsd_tol (float) – Tolerance on RMSD between materials for them to be considered to be duplicates.
skip_distance_calc (bool) – If true, only use reduced formula and comparison_keys to match structures. Skip calculating distances between structures. The output rmsd’s will be 0 for matching structures.
- Returns
- Keys:
duplicate_groups
: Dict containing all duplicate groups. The key of each group is the uid of the prioritized candidate of the group.
- Return type