OpsSquad
Core Concepts

Squads

Understanding team-based organization with squads

A Squad is a logical team of AI agents working together. Squads help you organize your agents by purpose, environment, or function, enabling effective collaboration and management.

What is a Squad?

A squad represents:

  • An environment (Production, Staging, Development)
  • A functional team (DevOps, Security, SRE)
  • A project or application team
  • Any logical grouping that makes sense for your organization

Why Use Squads?

Organization

Group related agents by purpose:

  • All production monitoring agents in one squad
  • Each environment or function separate
  • Easy to navigate and manage teams

Collaboration

Enable agents to work together:

  • Agents in the same squad share context
  • Coordinate complex investigations
  • Combine specialized capabilities

Isolation

Keep different teams separate:

  • Production squad isolated from staging
  • Development experiments don't affect production
  • Clear boundaries between contexts

Access Control

Manage permissions at the squad level:

  • Team members see only their squads
  • Different access levels per squad
  • Audit trails per team

Flexibility

Organize by what makes sense:

  • By environment (Prod, Staging, Dev)
  • By function (DevOps, Security, Compliance)
  • By application (API Team, Frontend Team)
  • By region (US-East Squad, EU-West Squad)

Squad Organization Patterns

By Environment

The most common pattern:

SquadPurposeAgents
Production SRELive production monitoringWeb Monitor, DB Guardian, Security Audit
Staging TeamPre-production testingTest Agent, Integration Monitor
Development SquadDevelopment supportDev Helper, Build Monitor

By Function

Organize by team responsibility:

SquadPurposeAgents
DevOps TeamInfrastructure managementDeployment Agent, Config Monitor
Security SquadSecurity monitoringVulnerability Scanner, Audit Agent
Database TeamDatabase operationsDB Monitor, Query Optimizer

By Application

For multi-application organizations:

SquadPurposeAgents
API SquadAPI service monitoringAPI Monitor, Performance Tracker
Frontend TeamFrontend monitoringWeb Monitor, Asset Optimizer
ML PipelineML operationsModel Monitor, Data Validator

By Region

For geographically distributed operations:

SquadPurposeAgents
US-East SquadUS East operationsRegional Monitor, Failover Agent
EU-West SquadEuropean operationsEU Monitor, Compliance Agent

Combined

Combine patterns as needed:

SquadPurpose
Production API - US-EastAPI production in US East
Staging FrontendFrontend staging environment
Dev ML PipelineML development team

Squad Lifecycle

Creating a Squad

  1. Navigate to Squads in the dashboard
  2. Click "Create Squad"
  3. Enter name, environment, and purpose
  4. Configure optional AI settings
  5. Save the squad

Squad States

StateDescription
ActiveNormal operation, agents can be created
PausedNo new operations, existing agents preserved
ArchivedHistorical reference, read-only

Deleting a Squad

Warning: You must delete or move all agents in a squad before deleting the squad itself.

  1. Remove or reassign all agents
  2. Go to squad settings
  3. Click "Delete Squad"
  4. Confirm deletion

Managing Squad Agents

Adding Agents

  1. Navigate to your squad
  2. Click "Create Agent"
  3. Define agent name and purpose
  4. Configure system prompt and capabilities
  5. Link to a deployed node (or link later)

Agent Organization

Within a squad, agents can:

  • Work independently on different tasks
  • Collaborate on complex investigations
  • Share context and findings
  • Be reassigned to different nodes as needed

Squad Configuration

Configure squad-level settings:

  • Environment: Production, Staging, Development
  • Purpose: Team description and goals
  • AI Settings: Shared prompts and behavior
  • Access Control: Who can manage the squad

Squad Health

Health Indicators

StatusMeaning
HealthyAll agents active and linked
DegradedSome agents unlinked or nodes offline
UnhealthyMost agents unavailable
EmptyNo agents in squad

Health Monitoring

The platform tracks:

  • Agent link status (linked/unlinked)
  • Node connectivity (online/offline)
  • Agent activity and responsiveness
  • Command execution success rate

Alerting

Configure alerts for:

  • Agents becoming unlinked
  • Linked nodes going offline
  • Squad health status changes
  • Failed command executions

Infrastructure Flexibility

Squads are infrastructure-agnostic. Agents can link to nodes running on:

InfrastructureExample
Virtual MachinesAWS EC2, GCP Compute Engine, Azure VMs
Bare MetalDedicated servers, on-premise hardware
ContainersDocker hosts, Kubernetes nodes
Edge DevicesRaspberry Pi, IoT gateways
HybridMix of cloud, on-premise, and edge

The flexibility of the node-linking architecture allows you to:

  • Reassign agents to different servers without recreating them
  • Scale infrastructure independently from agent configuration
  • Move agents between environments by relinking nodes

Multi-Squad Strategies

Environment Promotion

Organize squads by environment:

Promote changes through squad environments.

Functional Teams

Organize by responsibility:

  • DevOps Squad - Infrastructure and deployment
  • Security Squad - Security monitoring and compliance
  • SRE Squad - Reliability and performance
  • Database Squad - Database operations

Regional Operations

Squads for different regions:

  • US-East Squad - Primary US operations
  • EU-West Squad - European operations
  • APAC Squad - Asia-Pacific operations

Each squad has agents linked to regional nodes.

Best Practices

Naming Conventions

Use clear, descriptive squad names:

  • Include environment: Production SRE, Staging DevOps
  • Include function: Security Team, Database Squad
  • Be specific: API Production Monitoring, Frontend Development

Example: Production API - Security Squad

Documentation

Document your squads:

  • Purpose and responsibilities
  • Team members with access
  • Related systems and services
  • Escalation procedures
  • Contact information

Agent Management

Within squads:

  • Create agents with clear, specific purposes
  • Link agents to appropriate nodes
  • Monitor agent and node health
  • Regularly review and update agent configurations

Limits

Consider subscription limits:

  • Number of squads allowed
  • Number of agents per squad
  • Number of nodes you can deploy
  • API rate limits

Cleanup

Regularly review squads:

  • Archive unused squads
  • Remove obsolete agents
  • Unlink agents from decommissioned nodes
  • Clean up test and development squads

Next Steps

On this page