How to Build a Coreference Resolution Pipeline with spaCy

Coreference resolution is the task of figuring out which words in a text refer to the same real-world entity. When a paragraph says “Sarah joined the team. She immediately impressed everyone,” you need to know that “She” means “Sarah.” Without this step, downstream tasks like entity extraction, summarization, and knowledge graph construction all suffer from fragmented entity references.

The fastest way to get coreference resolution running in Python is with coreferee, a spaCy pipeline component. Here is the minimal setup:

1
2
3
pip install spacy coreferee
python3 -m spacy download en_core_web_lg
python3 -m coreferee install en

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import spacy
import coreferee

nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("coreferee")

doc = nlp(
    "When Sarah joined the company, she was nervous. "
    "But her manager welcomed her warmly. "
    "He told her the team was glad to have her."
)

doc._.coref_chains.print()

This prints the detected coreference chains with token indices and the text they refer to. Each chain groups all mentions of the same entity together.

Setting Up the Coreferee Pipeline and Resolving Clusters

Coreferee works as a spaCy pipeline component that combines neural networks with rule-based heuristics. It supports English, French, German, and Polish. The library detects coreference chains and exposes them through spaCy’s extension attributes.

Here is how to inspect and use the chains programmatically:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import spacy
import coreferee

nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("coreferee")

text = (
    "Microsoft announced its quarterly earnings yesterday. "
    "The company beat analyst expectations. "
    "Satya Nadella said he was pleased with the results. "
    "He credited the cloud division for its strong performance."
)

doc = nlp(text)

# Print all coreference chains
doc._.coref_chains.print()

# Iterate over each chain and its mentions
for chain in doc._.coref_chains:
    mentions = []
    for mention in chain:
        mention_tokens = [doc[idx].text for idx in mention.token_indexes]
        mentions.append(" ".join(mention_tokens))
    most_specific_idx = chain.most_specific_mention_index
    referent_tokens = [doc[idx].text for idx in chain[most_specific_idx].token_indexes]
    referent = " ".join(referent_tokens)
    print(f"Chain: {mentions} -> resolves to '{referent}'")

# Resolve a specific pronoun
# Find what "he" (token at some index) refers to
for token in doc:
    if token.text.lower() == "he":
        resolved = doc._.coref_chains.resolve(token)
        if resolved:
            print(f"'{token.text}' at index {token.i} refers to: {[t.text for t in resolved]}")

The most_specific_mention_index property gives you the index of the most descriptive mention in the chain, which is typically the full noun phrase rather than a pronoun. The resolve() method traverses chains to find the most specific referent for any given token.

Replacing Pronouns with Their Referents

A common use case is rewriting text with pronouns replaced by the entities they refer to. This makes text self-contained for downstream processing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def resolve_references(doc):
    """Replace pronouns with their most specific referent."""
    token_replacements = {}

    for chain in doc._.coref_chains:
        most_specific_idx = chain.most_specific_mention_index
        referent_mention = chain[most_specific_idx]
        referent_text = " ".join([doc[idx].text for idx in referent_mention.token_indexes])

        for mention_idx, mention in enumerate(chain):
            if mention_idx == most_specific_idx:
                continue
            # Only replace single-token pronouns for simplicity
            if len(mention.token_indexes) == 1:
                token_idx = mention.token_indexes[0]
                token_replacements[token_idx] = referent_text

    resolved_tokens = []
    for token in doc:
        if token.i in token_replacements:
            resolved_tokens.append(token_replacements[token.i])
        else:
            resolved_tokens.append(token.text)

    return " ".join(resolved_tokens)


doc = nlp(
    "Satya Nadella announced the new product. "
    "He said it would change the industry."
)
print(resolve_references(doc))

This outputs text where “He” gets replaced with “Satya Nadella” and “it” gets replaced with “the new product,” making every sentence independently understandable.

Transformer-Based Coreference with Maverick

For higher accuracy, especially on complex documents, you can use Maverick, a state-of-the-art transformer-based coreference model from SapienzaNLP. It was published at ACL 2024 and outperforms models with up to 13 billion parameters while using only around 500 million.

1
pip install maverick-coref

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from maverick import Maverick

# Load the OntoNotes-trained model (best general-purpose accuracy)
model = Maverick(
    hf_name_or_path="sapienzanlp/maverick-mes-ontonotes",
    device="cpu"  # use "cuda:0" if you have a GPU
)

text = (
    "Barack Obama traveled to Rome last week. "
    "The former president visited the Colosseum. "
    "He said the city reminded him of his first trip to Europe."
)

result = model.predict(text)

# Print the clusters with their text mentions
for i, cluster in enumerate(result["clusters_text_mentions"]):
    print(f"Cluster {i}: {cluster}")

# Print token-level offsets for each cluster
for i, cluster in enumerate(result["clusters_token_offsets"]):
    print(f"Cluster {i} offsets: {cluster}")

Maverick returns clusters as lists of mention spans. Each cluster groups all text spans that refer to the same entity. The clusters_text_mentions field gives you the actual strings, and clusters_token_offsets gives you the start/end token positions.

Three pretrained models are available depending on your domain:

Model	Dataset	F1	Singletons
`maverick-mes-ontonotes`	OntoNotes	83.6	No
`maverick-mes-litbank`	LitBank	78.0	Yes
`maverick-mes-preco`	PreCo	87.4	Yes

Use ontonotes for general text, litbank for literary or narrative content, and preco for broad coverage with singleton mentions.

Improving Downstream Tasks with Coref Resolution

Coreference resolution acts as a preprocessing step that makes every other NLP task work better. Here are two concrete patterns.

Better Entity Extraction

Named entity recognition often misses the connection between “Google” and “the company” and “it” and “they.” By resolving coreferences first, you get a complete picture of every entity mention:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import spacy
import coreferee

nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("coreferee")

text = (
    "Google announced a new AI chip yesterday. "
    "The company said it would be available in Q3. "
    "Sundar Pichai presented it at the keynote."
)

doc = nlp(text)

# Build entity-to-mentions mapping using coref chains
entity_mentions = {}
for chain in doc._.coref_chains:
    most_specific_idx = chain.most_specific_mention_index
    referent_tokens = chain[most_specific_idx].token_indexes
    referent = " ".join([doc[idx].text for idx in referent_tokens])

    all_mentions = []
    for mention in chain:
        mention_text = " ".join([doc[idx].text for idx in mention.token_indexes])
        all_mentions.append(mention_text)

    entity_mentions[referent] = all_mentions

for entity, mentions in entity_mentions.items():
    print(f"{entity}: {mentions}")

Cleaner Summarization Input

Feed resolved text to a summarizer so it does not have to guess what pronouns mean:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import spacy
import coreferee
from transformers import pipeline

nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("coreferee")

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

text = (
    "Dr. Elena Vasquez published her findings on Monday. "
    "She demonstrated that her new method reduces errors by 40%. "
    "The researcher presented her work at NeurIPS. "
    "Her colleagues praised the results."
)

doc = nlp(text)
resolved_text = resolve_references(doc)  # using the function from earlier

print("Original:", text)
print("Resolved:", resolved_text)

# The summarizer now gets unambiguous input
summary = summarizer(resolved_text, max_length=50, min_length=20, do_sample=False)
print("Summary:", summary[0]["summary_text"])

The resolved text replaces “She,” “her,” and “The researcher” with “Dr. Elena Vasquez,” giving the summarizer explicit entity references instead of ambiguous pronouns.

Handling Edge Cases

Cataphora (Forward References)

Cataphora is when a pronoun appears before its referent: “Before he left, John locked the door.” Coreferee handles this because it analyzes the full document before resolving chains, not just left-to-right context. No special configuration is needed, but be aware that cataphoric references have lower accuracy than standard anaphoric ones.

Nested References

Nested references occur when one coreference chain contains mentions that overlap with another chain. For example: “The CEO of Google said his company would invest more in AI. He confirmed it at the conference.” Here “his company” contains a possessive pronoun that itself refers to the CEO, while “his company” as a whole refers to Google.

Coreferee resolves these by tracking chains independently. Each chain handles one entity. You can traverse multiple chains to fully resolve nested references:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
doc = nlp(
    "The CEO of Anthropic announced their new model. "
    "He said it would be released next month. "
    "The company confirmed the timeline."
)

for chain in doc._.coref_chains:
    most_specific_idx = chain.most_specific_mention_index
    referent = " ".join([doc[idx].text for idx in chain[most_specific_idx].token_indexes])
    for mention in chain:
        span = " ".join([doc[idx].text for idx in mention.token_indexes])
        if span != referent:
            print(f"'{span}' -> '{referent}'")

Long Documents

Both coreferee and Maverick have practical limits on document length. Coreferee processes the full spaCy Doc object, so it scales with your available memory. Maverick’s transformer backbone has a context window constraint.

For long documents, chunk the text at paragraph or section boundaries, process each chunk independently, and then merge the results. Make sure your chunks overlap by a sentence or two so entities that cross boundaries still get resolved:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def process_long_document(text, nlp, max_chars=5000, overlap_chars=500):
    """Process long documents by chunking with overlap."""
    chunks = []
    start = 0
    while start < len(text):
        end = min(start + max_chars, len(text))
        # Try to break at a sentence boundary
        if end < len(text):
            last_period = text.rfind(".", start, end)
            if last_period > start + overlap_chars:
                end = last_period + 1
        chunks.append(text[start:end])
        start = end - overlap_chars if end < len(text) else end

    all_chains = []
    for chunk in chunks:
        doc = nlp(chunk)
        for chain in doc._.coref_chains:
            chain_mentions = []
            for mention in chain:
                tokens = " ".join([doc[idx].text for idx in mention.token_indexes])
                chain_mentions.append(tokens)
            all_chains.append(chain_mentions)

    return all_chains

Common Errors and Fixes

ModuleNotFoundError: No module named 'coreferee' after installing. You installed the package but forgot to download the language model. Run python3 -m coreferee install en after pip install coreferee.

ValueError: [E002] Can't find factory for 'coreferee'. This happens when the spaCy model version and coreferee version are incompatible. Coreferee is tested with spaCy 3.0 through 3.5. Check your spaCy version with python3 -m spacy info and pin a compatible coreferee version.

Poor resolution accuracy with en_core_web_sm. Coreferee relies heavily on the quality of spaCy’s POS tagger, parser, and NER. The small model (sm) lacks the accuracy coreferee needs. Always use en_core_web_lg or en_core_web_trf for production-quality results. The trf model gives the best accuracy but is slower.

RuntimeError: CUDA out of memory when running Maverick. The DeBERTa-large backbone needs around 4 GB of VRAM. If you hit memory limits, set device="cpu" or use a smaller batch size. CPU inference is slower but works on any machine.

Coreferee returns empty chains. This usually means the text is too short or lacks any anaphoric references. Coreferee intentionally avoids false positives, so if it cannot confidently resolve a reference, it skips it. Also check that your text actually contains pronouns or noun phrases that need resolution.

Maverick predict() returns no clusters. Make sure you are using the right model for your data. The ontonotes model does not predict singletons by default. If you need single-mention entities, either pass singletons=True or use the preco or litbank model variants.

Setting Up the Coreferee Pipeline and Resolving Clusters#

Replacing Pronouns with Their Referents#

Transformer-Based Coreference with Maverick#

Improving Downstream Tasks with Coref Resolution#

Better Entity Extraction#

Cleaner Summarization Input#

Handling Edge Cases#

Cataphora (Forward References)#

Nested References#

Long Documents#

Common Errors and Fixes#

Related Guides#

About the Author