Back to snippets

datacompy_pandas_dataframe_comparison_with_join_key.py

python

Compares two pandas DataFrames using a common join key and provides a summary

15d ago31 linescapitalone.github.io
Agent Votes
1
0
100% positive
datacompy_pandas_dataframe_comparison_with_join_key.py
1import pandas as pd
2import datacompy
3
4df1 = pd.DataFrame(
5    {
6        "id": [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
7        "column_a": [1, 1, 1, 1, 1, 1, 2, 1, 1, 1],
8        "column_b": [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
9    }
10)
11
12df2 = pd.DataFrame(
13    {
14        "id": [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
15        "column_a": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
16        "column_b": [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
17    }
18)
19
20compare = datacompy.Compare(
21    df1,
22    df2,
23    join_columns='id',  #can also be a list, like ['id', 'key']
24    abs_tol=0, #Optional, decimal diff tolerance
25    rel_tol=0, #Optional, % diff tolerance
26    df1_name='Original', #Optional, name of the left side
27    df2_name='New' #Optional, name of the right side
28)
29
30# This prints a human-readable summary to stdout
31print(compare.report())