How to Build a Data Labeling Pipeline with Label Studio

The Quick Version

Label Studio is an open-source data labeling platform that handles text, images, audio, and video annotation. You set up a project, define a labeling interface, import data, and annotators label it through a web UI. The real power comes from ML-assisted pre-labeling — a model generates initial labels and humans correct them, cutting labeling time by 50-80%.

1
2
pip install label-studio
label-studio start --port 8080

That launches the Label Studio UI at http://localhost:8080. Create an account, then set up your first project programmatically:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from label_studio_sdk import Client

ls = Client(url="http://localhost:8080", api_key="your-api-key")

# Create a project for sentiment classification
project = ls.start_project(
    title="Product Review Sentiment",
    label_config="""
    <View>
      <Text name="text" value="$review_text"/>
      <Choices name="sentiment" toName="text" choice="single">
        <Choice value="Positive"/>
        <Choice value="Negative"/>
        <Choice value="Neutral"/>
      </Choices>
    </View>
    """,
)

# Import data
tasks = [
    {"data": {"review_text": "This product exceeded my expectations!"}},
    {"data": {"review_text": "Terrible quality, broke after one day."}},
    {"data": {"review_text": "It works as described. Nothing special."}},
]
project.import_tasks(tasks)

print(f"Project created: {project.id} with {len(tasks)} tasks")

Labeling Interfaces for Different Data Types

Label Studio uses XML-based config to define what annotators see and how they label:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Image classification
image_config = """
<View>
  <Image name="image" value="$image_url"/>
  <Choices name="category" toName="image" choice="single">
    <Choice value="Cat"/>
    <Choice value="Dog"/>
    <Choice value="Other"/>
  </Choices>
</View>
"""

# Object detection (bounding boxes)
detection_config = """
<View>
  <Image name="image" value="$image_url"/>
  <RectangleLabels name="label" toName="image">
    <Label value="Car" background="red"/>
    <Label value="Person" background="blue"/>
    <Label value="Bicycle" background="green"/>
  </RectangleLabels>
</View>
"""

# Named entity recognition
ner_config = """
<View>
  <Labels name="label" toName="text">
    <Label value="Person" background="#FF0000"/>
    <Label value="Organization" background="#00FF00"/>
    <Label value="Location" background="#0000FF"/>
    <Label value="Date" background="#FFA500"/>
  </Labels>
  <Text name="text" value="$text"/>
</View>
"""

# Text summarization (input + output)
summary_config = """
<View>
  <Text name="document" value="$text"/>
  <TextArea name="summary" toName="document"
            placeholder="Write a summary..."
            maxSubmissions="1" rows="4"/>
</View>
"""

ML-Assisted Pre-Labeling

The biggest time saver: use a model to generate initial labels, then have humans verify and correct them. Build an ML backend that Label Studio calls automatically.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# ml_backend.py
from label_studio_ml import LabelStudioMLBase
from transformers import pipeline

class SentimentPreLabeler(LabelStudioMLBase):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.model = pipeline(
            "sentiment-analysis",
            model="distilbert-base-uncased-finetuned-sst-2-english",
        )

    def predict(self, tasks, **kwargs):
        """Generate pre-annotations for new tasks."""
        predictions = []
        for task in tasks:
            text = task["data"]["review_text"]
            result = self.model(text)[0]

            # Map model output to Label Studio format
            label = "Positive" if result["label"] == "POSITIVE" else "Negative"

            predictions.append({
                "result": [{
                    "from_name": "sentiment",
                    "to_name": "text",
                    "type": "choices",
                    "value": {"choices": [label]},
                }],
                "score": result["score"],
            })

        return predictions

    def fit(self, event, data, **kwargs):
        """Retrain when new annotations are submitted (optional)."""
        # Collect newly annotated data and fine-tune
        pass

1
2
3
# Run the ML backend
pip install label-studio-ml
label-studio-ml start ml_backend.py --port 9090

Connect it to your project in the Label Studio UI: Settings → Machine Learning → Add Model → http://localhost:9090.

Now every new task gets pre-labeled automatically. Annotators see the model’s suggestion and either accept it (one click) or correct it.

Bulk Import and Export

For large datasets, use the SDK to import data in bulk and export annotations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from label_studio_sdk import Client
import json

ls = Client(url="http://localhost:8080", api_key="your-api-key")
project = ls.get_project(project_id=1)

# Bulk import from a JSON file
with open("reviews_10k.json") as f:
    tasks = [{"data": {"review_text": item["text"]}} for item in json.load(f)]

# Import in batches to avoid timeouts
batch_size = 500
for i in range(0, len(tasks), batch_size):
    batch = tasks[i:i + batch_size]
    project.import_tasks(batch)
    print(f"Imported {min(i + batch_size, len(tasks))}/{len(tasks)}")

# Export completed annotations
annotations = project.export_tasks(export_type="JSON")
print(f"Exported {len(annotations)} annotated tasks")

# Convert to training format
training_data = []
for task in annotations:
    if task.get("annotations"):
        annotation = task["annotations"][0]  # first annotator's label
        for result in annotation["result"]:
            if result["type"] == "choices":
                training_data.append({
                    "text": task["data"]["review_text"],
                    "label": result["value"]["choices"][0],
                })

print(f"Training examples: {len(training_data)}")

Quality Control with Review Workflows

Set up a review process where a senior annotator checks a sample of labels:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import random

def setup_review_queue(project, sample_rate: float = 0.2):
    """Flag a percentage of completed tasks for review."""
    tasks = project.get_tasks()
    completed = [t for t in tasks if t.get("annotations")]

    review_sample = random.sample(
        completed,
        k=int(len(completed) * sample_rate),
    )

    for task in review_sample:
        # Add a review flag (using task metadata)
        project.update_task(task["id"], meta={"needs_review": True})

    print(f"Flagged {len(review_sample)}/{len(completed)} tasks for review")

# Agreement metrics
def compute_agreement(project) -> dict:
    """Compute inter-annotator agreement for multi-annotator projects."""
    tasks = project.get_tasks()
    agreements = []

    for task in tasks:
        annotations = task.get("annotations", [])
        if len(annotations) < 2:
            continue

        labels = []
        for ann in annotations:
            for result in ann["result"]:
                if result["type"] == "choices":
                    labels.append(result["value"]["choices"][0])

        if len(labels) >= 2:
            agreements.append(labels[0] == labels[1])

    if not agreements:
        return {"agreement_rate": 0, "n_tasks": 0}

    return {
        "agreement_rate": sum(agreements) / len(agreements),
        "n_tasks": len(agreements),
    }

stats = compute_agreement(project)
print(f"Agreement: {stats['agreement_rate']:.2%} across {stats['n_tasks']} tasks")

Aim for 90%+ agreement on simple classification tasks. Below 80% usually means your labeling guidelines are ambiguous — clarify them before continuing.

Automating the Full Pipeline

Connect Label Studio to your training pipeline so new annotations automatically trigger model retraining:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import time
from label_studio_sdk import Client

def annotation_pipeline(project_id: int, check_interval: int = 300):
    """Poll for new annotations and trigger retraining."""
    ls = Client(url="http://localhost:8080", api_key="your-api-key")
    project = ls.get_project(project_id)
    last_count = 0

    while True:
        annotations = project.export_tasks(export_type="JSON")
        completed = [t for t in annotations if t.get("annotations")]
        current_count = len(completed)

        if current_count > last_count:
            new_count = current_count - last_count
            print(f"{new_count} new annotations. Total: {current_count}")

            if current_count >= 100 and current_count % 50 == 0:
                print("Triggering model retrain...")
                training_data = export_training_data(completed)
                # retrain_model(training_data)  # your training function
                print("Retrain complete. Updating ML backend...")

            last_count = current_count

        time.sleep(check_interval)

Common Errors and Fixes

Label Studio UI shows “No tasks” after import

Check the data format. Tasks must be a list of dicts with a "data" key. The values inside data must match the value attributes in your labeling config (e.g., $review_text maps to data.review_text).

ML backend returns predictions but they don’t show up

Verify the prediction format matches your labeling config. The from_name and to_name fields must exactly match the name attributes in your XML config. Enable “Show predictions” in the project settings.

Slow loading with 10K+ tasks

Label Studio loads task lists in pages. For large projects, use filters and pagination in the UI. For API access, use project.get_tasks(page=1, page_size=100) instead of loading all tasks at once.

Annotations don’t export in the right format

Use the export_type parameter: "JSON" for full Label Studio format, "CSV" for flat files, "COCO" for object detection, "CONLL2003" for NER. Each format restructures the data for its target framework.

Multiple annotators disagree on labels

This isn’t a bug — it’s data quality signal. Compute agreement metrics, identify confusing examples, and improve your annotation guidelines. For production, use majority voting (3 annotators, take the most common label) for ambiguous cases.

The Quick Version#

Labeling Interfaces for Different Data Types#

ML-Assisted Pre-Labeling#

Bulk Import and Export#

Quality Control with Review Workflows#

Automating the Full Pipeline#

Common Errors and Fixes#

Related Guides#

About the Author