Install the SDK and Get an API Key

1
pip install -U google-genai

Get a free API key from Google AI Studio. Set it as an environment variable:

1
export GEMINI_API_KEY="your-api-key-here"

The SDK reads GEMINI_API_KEY automatically. You can also pass it explicitly when creating the client. Python 3.9+ is required.

The package name is google-genai, not google-generativeai. The old google-generativeai package hit end-of-life in November 2025. If you’re following an older tutorial that imports import google.generativeai as genai, stop and switch to the new SDK.

Your First API Call

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain how Python generators work in three sentences.",
)

print(response.text)

That’s the entire pattern. The client.models.generate_content() method is the workhorse for everything – text, images, PDFs, function calling. The model parameter takes a model string. As of early 2026, your best options are:

  • gemini-2.5-flash – Fast, cheap, great for most tasks. This is what you should default to.
  • gemini-2.5-pro – More capable for complex reasoning, code analysis, and multi-document tasks. Costs more.
  • gemini-2.5-flash-lite – Cheapest option, best for high-volume classification or summarization.

Google also has gemini-3-flash-preview and gemini-3-pro-preview in preview. They’re more powerful but APIs may change before GA.

Configuring Generation Parameters

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Write a haiku about distributed systems.",
    config=types.GenerateContentConfig(
        temperature=0.7,
        top_p=0.95,
        top_k=40,
        max_output_tokens=256,
    ),
)

print(response.text)

Set temperature=0 for deterministic output (useful for extraction tasks). The types module holds all the configuration classes you’ll need throughout the SDK.

Streaming Responses

For chatbots and real-time UIs, stream tokens as they arrive instead of waiting for the full response:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from google import genai

client = genai.Client()

for chunk in client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Explain the CAP theorem and why it matters for microservices.",
):
    print(chunk.text, end="", flush=True)

print()

The generate_content_stream() method returns an iterator of chunks. Each chunk has a .text attribute with the latest tokens. Streaming doesn’t change the total generation time, but the user sees output immediately.

Multi-Turn Conversations

The SDK has a built-in chat session that tracks conversation history for you:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from google import genai

client = genai.Client()

chat = client.chats.create(model="gemini-2.5-flash")

response = chat.send_message("What are the main differences between TCP and UDP?")
print(response.text)

response = chat.send_message("Which one should I use for a real-time game?")
print(response.text)

response = chat.send_message("Show me a minimal UDP socket server in Python.")
print(response.text)

The chat object accumulates messages automatically. You don’t need to manually track and pass the conversation history on each call. This also works with streaming via chat.send_message_stream().

You can seed a chat with history if you’re restoring a previous session:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from google import genai
from google.genai import types

client = genai.Client()

history = [
    types.Content(role="user", parts=[types.Part.from_text("What is Redis?")]),
    types.Content(role="model", parts=[types.Part.from_text("Redis is an in-memory data store used as a cache, message broker, and database.")]),
]

chat = client.chats.create(model="gemini-2.5-flash", history=history)
response = chat.send_message("How do I persist data in Redis?")
print(response.text)

Multimodal Input: Images and PDFs

Gemini is natively multimodal. You can send images, PDFs, audio, and video alongside text in a single request.

Analyzing a Local Image

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from google import genai
from google.genai import types

client = genai.Client()

with open("architecture-diagram.png", "rb") as f:
    image_bytes = f.read()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        types.Part.from_bytes(data=image_bytes, mime_type="image/png"),
        types.Part.from_text("Describe this architecture diagram. List every service and how they connect."),
    ],
)

print(response.text)

Uploading and Analyzing a PDF

For larger files, upload them first with the Files API:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from google import genai

client = genai.Client()

uploaded_file = client.files.upload(file="quarterly-report.pdf")

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        "Summarize the key financial metrics from this report.",
        uploaded_file,
    ],
)

print(response.text)

The Files API handles files up to 2GB. Uploaded files are available for 48 hours. This is the easiest way to process PDFs, long videos, or audio files that would be too large to send inline.

Function Calling

Function calling lets Gemini invoke your Python functions to fetch real-time data or perform actions. The SDK can handle the full loop automatically.

Automatic Function Calling

Pass Python functions directly as tools. The SDK extracts the schema from type hints and docstrings, calls your function when the model requests it, and sends the result back:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from google import genai
from google.genai import types

def get_stock_price(ticker: str) -> dict:
    """Get the current stock price for a given ticker symbol.

    Args:
        ticker: Stock ticker symbol, e.g. GOOGL, AAPL.

    Returns:
        Dictionary with the current price and currency.
    """
    # Replace with a real API call
    prices = {"GOOGL": 185.32, "AAPL": 242.15, "MSFT": 420.50}
    price = prices.get(ticker.upper(), 0.0)
    return {"ticker": ticker.upper(), "price": price, "currency": "USD"}

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What's the current price of Google stock?",
    config=types.GenerateContentConfig(
        tools=[get_stock_price],
    ),
)

print(response.text)
# "The current stock price of Google (GOOGL) is $185.32 USD."

The SDK automatically calls get_stock_price when Gemini requests it, then sends the result back to the model to produce a natural language answer. No manual loop needed.

Manual Function Calling

If you need control over function execution (for validation, logging, or async calls), disable automatic calling:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from google import genai
from google.genai import types

def search_database(query: str, limit: int = 10) -> dict:
    """Search the product database.

    Args:
        query: Search query string.
        limit: Maximum number of results to return.
    """
    return {"results": [f"Product matching '{query}'"], "count": 1}

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Find me wireless headphones under $100",
    config=types.GenerateContentConfig(
        tools=[search_database],
        automatic_function_calling=types.AutomaticFunctionCallingConfig(disable=True),
    ),
)

# Inspect the function call the model wants to make
for part in response.candidates[0].content.parts:
    if part.function_call:
        print(f"Function: {part.function_call.name}")
        print(f"Args: {part.function_call.args}")

This gives you a chance to validate arguments, add logging, or route the call to an async worker before sending the result back.

Structured Output with Pydantic

When you need JSON that matches a specific schema – for data extraction, API responses, or pipeline steps – use structured output with Pydantic models:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from google import genai
from pydantic import BaseModel, Field
from typing import Optional

class MovieReview(BaseModel):
    title: str = Field(description="The movie title")
    year: int = Field(description="Release year")
    rating: float = Field(description="Rating out of 10")
    summary: str = Field(description="One-sentence summary of the review")
    recommended: bool = Field(description="Whether the reviewer recommends it")

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Review the movie Inception (2010) by Christopher Nolan.",
    config={
        "response_mime_type": "application/json",
        "response_json_schema": MovieReview.model_json_schema(),
    },
)

review = MovieReview.model_validate_json(response.text)
print(f"{review.title} ({review.year}): {review.rating}/10")
print(f"Recommended: {review.recommended}")
print(f"Summary: {review.summary}")

The model is constrained to output valid JSON matching your schema. No more parsing broken JSON or hoping the model follows your prompt instructions. This works with enums, nested models, lists, and optional fields.

For simpler cases, you can skip Pydantic and pass a dictionary schema directly. But Pydantic gives you automatic validation and type safety, so use it.

Safety Settings

Gemini has built-in content safety filters that can block requests or responses. The defaults are reasonable, but you may need to adjust them for applications that handle sensitive content legitimately:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain the historical context of propaganda techniques.",
    config=types.GenerateContentConfig(
        safety_settings=[
            types.SafetySetting(
                category=types.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
                threshold=types.HarmBlockThreshold.BLOCK_ONLY_HIGH,
            ),
            types.SafetySetting(
                category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
                threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
            ),
        ],
    ),
)

print(response.text)

The available categories are HARM_CATEGORY_HARASSMENT, HARM_CATEGORY_HATE_SPEECH, HARM_CATEGORY_SEXUALLY_EXPLICIT, HARM_CATEGORY_DANGEROUS_CONTENT, and HARM_CATEGORY_CIVIC_INTEGRITY. Thresholds range from BLOCK_NONE to BLOCK_LOW_AND_ABOVE. Even with BLOCK_NONE, Google’s internal filters may still block certain content.

If your request gets blocked, the response will have an empty text attribute and a populated candidates[0].finish_reason of SAFETY. Check response.candidates[0].safety_ratings to see which category triggered the block.

Common Errors and Fixes

ModuleNotFoundError: No module named 'google.genai'

1
ModuleNotFoundError: No module named 'google.genai'

You installed the wrong package. The correct command is pip install google-genai. If you previously had google-generativeai installed, uninstall it first to avoid conflicts:

1
2
pip uninstall google-generativeai
pip install -U google-genai

400 INVALID_ARGUMENT: API key not valid

1
google.genai.errors.ClientError: 400 INVALID_ARGUMENT. API key not valid. Please pass a valid API key.

Your API key is missing, expired, or malformed. Verify it’s set correctly:

1
echo $GEMINI_API_KEY

If you’re passing it explicitly, make sure there are no trailing spaces or newline characters. Generate a new key at Google AI Studio if needed.

429 RESOURCE_EXHAUSTED

1
google.genai.errors.ClientError: 429 RESOURCE_EXHAUSTED. Resource has been exhausted (e.g. check quota).

You’ve hit a rate limit. The free tier has generous but real limits – 15 requests per minute for gemini-2.5-flash and 2 RPM for gemini-2.5-pro. Solutions:

  • Add exponential backoff with time.sleep() between requests.
  • Switch to a smaller model (gemini-2.5-flash-lite) for high-volume workloads.
  • Upgrade to a paid plan for higher quotas.

Empty response with finish_reason: SAFETY

Your prompt or the generated response triggered a safety filter. Check which category caused it:

1
2
3
if not response.text:
    for rating in response.candidates[0].safety_ratings:
        print(f"{rating.category}: {rating.probability}")

Adjust safety settings to be more permissive for that category, or rephrase your prompt to avoid triggering the filter.

TypeError: generate_content() got an unexpected keyword argument

You’re mixing up the old google-generativeai SDK patterns with the new google-genai SDK. The old SDK used model.generate_content() on a model object. The new SDK uses client.models.generate_content() on the client. Check your imports – you should have from google import genai, not import google.generativeai as genai.

Gemini vs. Claude vs. GPT

Pick the right model for the job:

  • Gemini 2.5 Flash wins on price and speed. Google’s free tier is the most generous of the three providers. For prototyping, high-volume extraction, or budget-constrained projects, it’s the obvious choice. Native multimodal support (images, video, audio, PDFs in a single call) is smoother than competitors.
  • Claude (via the Anthropic SDK) is strongest at long-context analysis, nuanced writing, and instruction following. If your app needs to process 100k+ token documents or produce carefully structured output, Claude is worth the premium.
  • GPT-4o (via the OpenAI SDK) has the broadest ecosystem. More third-party tools, more fine-tuning options, more community resources. If you need function calling with complex tool chains or are already deep in the OpenAI ecosystem, stay there.

All three are excellent. For most new projects in early 2026, start with Gemini Flash for cost, evaluate Claude for quality-sensitive tasks, and use GPT-4o when ecosystem integration matters most.