The Quick Version#
Label Studio is an open-source data labeling platform that handles text, images, audio, and video annotation. You set up a project, define a labeling interface, import data, and annotators label it through a web UI. The real power comes from ML-assisted pre-labeling — a model generates initial labels and humans correct them, cutting labeling time by 50-80%.
1
2
| pip install label-studio
label-studio start --port 8080
|
That launches the Label Studio UI at http://localhost:8080. Create an account, then set up your first project programmatically:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| from label_studio_sdk import Client
ls = Client(url="http://localhost:8080", api_key="your-api-key")
# Create a project for sentiment classification
project = ls.start_project(
title="Product Review Sentiment",
label_config="""
<View>
<Text name="text" value="$review_text"/>
<Choices name="sentiment" toName="text" choice="single">
<Choice value="Positive"/>
<Choice value="Negative"/>
<Choice value="Neutral"/>
</Choices>
</View>
""",
)
# Import data
tasks = [
{"data": {"review_text": "This product exceeded my expectations!"}},
{"data": {"review_text": "Terrible quality, broke after one day."}},
{"data": {"review_text": "It works as described. Nothing special."}},
]
project.import_tasks(tasks)
print(f"Project created: {project.id} with {len(tasks)} tasks")
|
Labeling Interfaces for Different Data Types#
Label Studio uses XML-based config to define what annotators see and how they label:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
| # Image classification
image_config = """
<View>
<Image name="image" value="$image_url"/>
<Choices name="category" toName="image" choice="single">
<Choice value="Cat"/>
<Choice value="Dog"/>
<Choice value="Other"/>
</Choices>
</View>
"""
# Object detection (bounding boxes)
detection_config = """
<View>
<Image name="image" value="$image_url"/>
<RectangleLabels name="label" toName="image">
<Label value="Car" background="red"/>
<Label value="Person" background="blue"/>
<Label value="Bicycle" background="green"/>
</RectangleLabels>
</View>
"""
# Named entity recognition
ner_config = """
<View>
<Labels name="label" toName="text">
<Label value="Person" background="#FF0000"/>
<Label value="Organization" background="#00FF00"/>
<Label value="Location" background="#0000FF"/>
<Label value="Date" background="#FFA500"/>
</Labels>
<Text name="text" value="$text"/>
</View>
"""
# Text summarization (input + output)
summary_config = """
<View>
<Text name="document" value="$text"/>
<TextArea name="summary" toName="document"
placeholder="Write a summary..."
maxSubmissions="1" rows="4"/>
</View>
"""
|
ML-Assisted Pre-Labeling#
The biggest time saver: use a model to generate initial labels, then have humans verify and correct them. Build an ML backend that Label Studio calls automatically.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
| # ml_backend.py
from label_studio_ml import LabelStudioMLBase
from transformers import pipeline
class SentimentPreLabeler(LabelStudioMLBase):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.model = pipeline(
"sentiment-analysis",
model="distilbert-base-uncased-finetuned-sst-2-english",
)
def predict(self, tasks, **kwargs):
"""Generate pre-annotations for new tasks."""
predictions = []
for task in tasks:
text = task["data"]["review_text"]
result = self.model(text)[0]
# Map model output to Label Studio format
label = "Positive" if result["label"] == "POSITIVE" else "Negative"
predictions.append({
"result": [{
"from_name": "sentiment",
"to_name": "text",
"type": "choices",
"value": {"choices": [label]},
}],
"score": result["score"],
})
return predictions
def fit(self, event, data, **kwargs):
"""Retrain when new annotations are submitted (optional)."""
# Collect newly annotated data and fine-tune
pass
|
1
2
3
| # Run the ML backend
pip install label-studio-ml
label-studio-ml start ml_backend.py --port 9090
|
Connect it to your project in the Label Studio UI: Settings → Machine Learning → Add Model → http://localhost:9090.
Now every new task gets pre-labeled automatically. Annotators see the model’s suggestion and either accept it (one click) or correct it.
Bulk Import and Export#
For large datasets, use the SDK to import data in bulk and export annotations:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
| from label_studio_sdk import Client
import json
ls = Client(url="http://localhost:8080", api_key="your-api-key")
project = ls.get_project(project_id=1)
# Bulk import from a JSON file
with open("reviews_10k.json") as f:
tasks = [{"data": {"review_text": item["text"]}} for item in json.load(f)]
# Import in batches to avoid timeouts
batch_size = 500
for i in range(0, len(tasks), batch_size):
batch = tasks[i:i + batch_size]
project.import_tasks(batch)
print(f"Imported {min(i + batch_size, len(tasks))}/{len(tasks)}")
# Export completed annotations
annotations = project.export_tasks(export_type="JSON")
print(f"Exported {len(annotations)} annotated tasks")
# Convert to training format
training_data = []
for task in annotations:
if task.get("annotations"):
annotation = task["annotations"][0] # first annotator's label
for result in annotation["result"]:
if result["type"] == "choices":
training_data.append({
"text": task["data"]["review_text"],
"label": result["value"]["choices"][0],
})
print(f"Training examples: {len(training_data)}")
|
Quality Control with Review Workflows#
Set up a review process where a senior annotator checks a sample of labels:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
| import random
def setup_review_queue(project, sample_rate: float = 0.2):
"""Flag a percentage of completed tasks for review."""
tasks = project.get_tasks()
completed = [t for t in tasks if t.get("annotations")]
review_sample = random.sample(
completed,
k=int(len(completed) * sample_rate),
)
for task in review_sample:
# Add a review flag (using task metadata)
project.update_task(task["id"], meta={"needs_review": True})
print(f"Flagged {len(review_sample)}/{len(completed)} tasks for review")
# Agreement metrics
def compute_agreement(project) -> dict:
"""Compute inter-annotator agreement for multi-annotator projects."""
tasks = project.get_tasks()
agreements = []
for task in tasks:
annotations = task.get("annotations", [])
if len(annotations) < 2:
continue
labels = []
for ann in annotations:
for result in ann["result"]:
if result["type"] == "choices":
labels.append(result["value"]["choices"][0])
if len(labels) >= 2:
agreements.append(labels[0] == labels[1])
if not agreements:
return {"agreement_rate": 0, "n_tasks": 0}
return {
"agreement_rate": sum(agreements) / len(agreements),
"n_tasks": len(agreements),
}
stats = compute_agreement(project)
print(f"Agreement: {stats['agreement_rate']:.2%} across {stats['n_tasks']} tasks")
|
Aim for 90%+ agreement on simple classification tasks. Below 80% usually means your labeling guidelines are ambiguous — clarify them before continuing.
Automating the Full Pipeline#
Connect Label Studio to your training pipeline so new annotations automatically trigger model retraining:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
| import time
from label_studio_sdk import Client
def annotation_pipeline(project_id: int, check_interval: int = 300):
"""Poll for new annotations and trigger retraining."""
ls = Client(url="http://localhost:8080", api_key="your-api-key")
project = ls.get_project(project_id)
last_count = 0
while True:
annotations = project.export_tasks(export_type="JSON")
completed = [t for t in annotations if t.get("annotations")]
current_count = len(completed)
if current_count > last_count:
new_count = current_count - last_count
print(f"{new_count} new annotations. Total: {current_count}")
if current_count >= 100 and current_count % 50 == 0:
print("Triggering model retrain...")
training_data = export_training_data(completed)
# retrain_model(training_data) # your training function
print("Retrain complete. Updating ML backend...")
last_count = current_count
time.sleep(check_interval)
|
Common Errors and Fixes#
Label Studio UI shows “No tasks” after import
Check the data format. Tasks must be a list of dicts with a "data" key. The values inside data must match the value attributes in your labeling config (e.g., $review_text maps to data.review_text).
ML backend returns predictions but they don’t show up
Verify the prediction format matches your labeling config. The from_name and to_name fields must exactly match the name attributes in your XML config. Enable “Show predictions” in the project settings.
Slow loading with 10K+ tasks
Label Studio loads task lists in pages. For large projects, use filters and pagination in the UI. For API access, use project.get_tasks(page=1, page_size=100) instead of loading all tasks at once.
Annotations don’t export in the right format
Use the export_type parameter: "JSON" for full Label Studio format, "CSV" for flat files, "COCO" for object detection, "CONLL2003" for NER. Each format restructures the data for its target framework.
Multiple annotators disagree on labels
This isn’t a bug — it’s data quality signal. Compute agreement metrics, identify confusing examples, and improve your annotation guidelines. For production, use majority voting (3 annotators, take the most common label) for ambiguous cases.