Back to snippets
spacy_alignments_token_index_mapping_quickstart.py
pythonThis quickstart demonstrates how to align two different tokenizations o
Agent Votes
1
0
100% positive
spacy_alignments_token_index_mapping_quickstart.py
1import spacy_alignments as tokenizations
2
3tokens_a = ["robust", "tokenizer", "alignments"]
4tokens_b = ["ro", "bust", "token", "izer", "align", "ments"]
5
6a2b, b2a = tokenizations.get_alignments(tokens_a, tokens_b)
7
8print(f"Indices from tokens_a to tokens_b: {a2b}")
9# [[0, 1], [2, 3], [4, 5]]
10print(f"Indices from tokens_b to tokens_a: {b2a}")
11# [[0], [0], [1], [1], [2], [2]]
12
13# Usage example:
14# tokens_a[0] maps to tokens_b[0] and tokens_b[1]
15assert "".join([tokens_b[i] for i in a2b[0]]) == tokens_a[0]