Skip to main content

Image Analysis Agent

When building AI applications, you often need agents that can analyze and understand visual content. Timbal lets you create image analysis agents that can identify objects, describe scenes, and answer questions about visual content using the tools parameter.

Prerequisites

This example uses the openai model. Make sure to add OPENAI_API_KEY to your .env file.

.env
OPENAI_API_KEY=your_api_key_here

Creating an agent

Create a simple agent that analyzes images to identify objects, describe scenes, and answer questions about visual content.

from timbal.core import Agent
image_analysis_agent = Agent(
name="image-analysis",
description="Analyzes images to identify objects and describe scenes",
system_prompt="""You can view an image and identify objects, describe scenes, and answer questions about the content.
You can also determine species of animals and describe locations in the image.""",
model="openai/gpt-4o"
)

Creating a function

This function provides a sample image URL for testing the agent's image analysis capabilities.

import random
def get_sample_image() -> str:
"""Get a sample image URL for testing image analysis."""
sample_images = [
"https://images.unsplash.com/photo-1441974231531-c6227db76b6e?w=800", # Forest
"https://images.unsplash.com/photo-1506905925346-21bda4d32df4?w=800", # Mountains
"https://images.unsplash.com/photo-1541961017774-22349e4a1262?w=800", # Bird
"https://images.unsplash.com/photo-1558618666-fcd25c85cd64?w=800", # Cat
]
return random.choice(sample_images)

Example usage

Use the agent directly by calling it with a prompt message that includes an image.

from timbal.types.file import File
async def main():
image_url = get_sample_image()
# Create a message with image and text for the agent
prompt = [File.validate(image_url), "Analyze this image and identify the main objects or subjects. If there are animals, provide their common name and scientific name. Also describe the location or setting in one or two short sentences."]
# Call the image analysis agent directly
response = await image_analysis_agent(prompt=prompt).collect()
# Extract the text response
response_text = response.output.content[0].text
print(response_text)
if __name__ == "__main__":
asyncio.run(main())