Agents

An Agent is an autonomous AI entity that works as part of a squad. Agents are the core of OpsSquad.ai's capabilities, enabling AI-powered diagnostics, monitoring, and incident response through their linked nodes.

What is an Agent?

An agent is:

Autonomous - Can operate independently through linked nodes
AI-Powered - Understands natural language requests
Verified - Executes commands with permission and SLM guardrails
Squad-Based - Works as part of a team organized by purpose
Node-Linked - Connects to physical/virtual infrastructure for execution

Agent Capabilities

System Diagnostics

Agents can investigate system state:

Process monitoring (CPU, memory usage)
Disk space and I/O analysis
Network connectivity and traffic
Service health and status

Log Analysis

Agents can analyze logs:

Search for patterns and errors
Correlate events across time
Identify anomalies
Summarize recent activity

Service Management

With approval, agents can:

Check service status
Restart services
View container states
Manage processes

Custom Commands

Agents can execute authorized commands:

Run diagnostics scripts
Execute health checks
Gather metrics
Perform routine maintenance

Agent Lifecycle

Agents go through distinct states:

State	Description	Node Requirement
Created	Agent exists in squad, not linked	None
Linked	Connected to a node	Has node assignment
Active	Receiving and processing requests	Node must be online
Unlinked	Disconnected from node	None
Inactive	Suspended but linked	Node assignment preserved
Deleted	Removed from squad	None

State Transitions

From	To	How
Created	Linked	Link to deployed node
Linked	Active	Node comes online
Active	Unlinked	Remove node link
Active	Inactive	Pause agent
Inactive	Active	Resume agent
Linked	Linked	Reassign to different node
Any	Deleted	Delete agent

Agent Architecture

Components

Agents are cloud-based AI entities that coordinate with nodes:

Agent-Node Relationship

Agent Creation - Created in squad through dashboard
Node Linking - Linked to a deployed node
Command Flow - Agent sends commands to linked node
Execution - Node executes commands on server
Response - Results returned to agent for processing

Node Execution Layer

The linked node provides:

MCP Shell Server for secure command execution
Security validation before execution
Resource management for processes
Output capture and streaming

Agent Types

Agents can be specialized for different roles within your squad:

Type	Focus	Example Use Cases
System	OS-level diagnostics	Monitor CPU, memory, disk usage
Database	Database operations	Check DB health, query performance
Container	Container management	Monitor Docker/K8s deployments
Security	Security monitoring	Audit logs, check vulnerabilities
Custom	Specialized tasks	Your specific workflows

Agent-Node Communication

Protocol

Nodes communicate with the platform using JSON over TCP:

{
  "type": "COMMAND",
  "node_id": "node_abc123",
  "agent_id": "agent_xyz789",
  "request_id": "req_123456",
  "timestamp": "2024-01-15T10:30:00Z",
  "payload": {
    "command": "ps",
    "args": ["aux", "--sort=-%cpu"]
  }
}

Message Types

Type	Direction	Purpose
`REGISTER`	Node → Platform	Initial authentication
`HEARTBEAT`	Both	Keep-alive signal
`COMMAND`	Platform → Node	Command request from agent
`RESPONSE`	Node → Platform	Command result to agent
`ERROR`	Node → Platform	Error notification

Security

All communication is:

Authenticated - Using node tokens
Validated - Request ID correlation
Timestamped - Replay attack prevention
Logged - Complete audit trail

Resource Usage

Agent resource consumption:

Component	Resource Usage
Agent (Cloud)	Serverless, scales automatically
Node (Server)	30-50 MB memory, <1% CPU idle
Network	Minimal (heartbeats + commands)
Disk	Config + logs on server

When nodes execute commands, usage temporarily increases based on the command.