Files
autoglue/agents.md
Copilot 6878bd875b Add agents.md architecture documentation (#685)
* Initial plan

* Add comprehensive agents.md documentation

Co-authored-by: allanice001 <700853+allanice001@users.noreply.github.com>

* Update agents.md to address code review feedback

Co-authored-by: allanice001 <700853+allanice001@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: allanice001 <700853+allanice001@users.noreply.github.com>
2026-02-19 17:03:21 +00:00

11 KiB

Autoglue Repository Architecture & Agents

Overview

Autoglue is a Kubernetes cluster management platform built with Go that manages the lifecycle of K3s clusters across GlueOps-supported cloud providers. It provides a REST API for cluster provisioning, configuration, and management, along with a web UI and Terraform provider.

Repository Structure

autoglue/
├── cmd/                    # CLI commands
├── internal/              # Internal packages
│   ├── api/              # HTTP API routes and middleware
│   ├── app/              # Application setup
│   ├── auth/             # Authentication logic
│   ├── bg/               # Background job workers
│   ├── common/           # Common utilities
│   ├── config/           # Configuration management
│   ├── db/               # Database operations
│   ├── handlers/         # HTTP request handlers
│   ├── keys/             # Cryptographic key management
│   ├── mapper/           # Data mapping utilities
│   ├── models/           # Database models
│   ├── utils/            # Utility functions
│   ├── version/          # Version information
│   └── web/              # Web UI integration
├── sdk/                   # Generated SDKs
│   └── ts/               # TypeScript SDK
├── ui/                    # Frontend application (React)
├── docs/                  # OpenAPI/Swagger documentation
├── postgres/              # PostgreSQL configuration
├── main.go                # Application entry point
├── schema.sql             # Database schema
└── docker-compose.yml     # Development environment

Core Components

1. API Layer (internal/api)

The API layer provides RESTful endpoints for managing cloud resources:

  • Authentication (mount_auth_routes.go): OAuth/OIDC integration, JWT tokens
  • Clusters (mount_cluster_routes.go): Kubernetes cluster management
  • Servers (mount_server_routes.go): Server resource management
  • SSH Keys (mount_ssh_routes.go): SSH key generation and management
  • DNS (mount_dns_routes.go): DNS record management
  • Load Balancers (mount_load_balancer_routes.go): Load balancer configuration
  • Node Pools (mount_node_pool_routes.go): Worker node pool management
  • Credentials (mount_credential_routes.go): Cloud provider credentials
  • Organizations (mount_org_routes.go): Multi-tenant organization management

Middleware:

  • Request logging with zerolog
  • Rate limiting (1000 requests/minute per IP)
  • CORS handling
  • Security headers
  • Authentication/Authorization
  • Request body size limits (10MB)

2. Background Jobs (internal/bg)

Autoglue uses the Archer job queue system with PostgreSQL-backed job persistence.

Active Job Workers:

Worker Purpose Timeout
bootstrap_bastion Provision and configure bastion host servers Configurable (default 60s)
archer_cleanup Clean up old job records 5 minutes
tokens_cleanup Purge expired refresh tokens 5 minutes
db_backup_s3 Backup database to S3 15 minutes
dns_reconcile Synchronize DNS records with Route53 2 minutes
org_key_sweeper Remove expired organization API keys 5 minutes
cluster_action Execute cluster lifecycle actions Configurable

Planned Job Workers (Currently Disabled):

The following workers exist in the codebase but are currently commented out:

  • prepare_cluster - Prepare infrastructure for cluster deployment
  • cluster_setup - Initial cluster configuration
  • cluster_bootstrap - Full Kubernetes cluster bootstrapping process

Configuration:

  • archer.instances: Number of worker instances (default: 1)
  • archer.timeoutSec: Job timeout in seconds (default: 60)
  • archer.cleanup_retain_days: Job retention period (default: 7 days)

3. Data Models (internal/models)

Core Models:

  • User - User accounts with OAuth integration
  • Organization - Multi-tenant organizations
  • Membership - User-organization relationships
  • ApiKey - API authentication tokens (user and org-level)
  • OrganizationKey - Organization-level credentials with auto-expiry
  • Cluster - Kubernetes cluster definitions
  • NodePool - Worker node group configurations
  • Server - Individual server instances
  • SshKey - SSH keypair management with encryption
  • LoadBalancer - Load balancer configurations
  • Domain - DNS domain management
  • Credential - Cloud provider API credentials (AWS, etc.)
  • Job - Background job queue records
  • SigningKey - JWT signing keys with rotation
  • RefreshToken - OAuth refresh token storage
  • MasterKey - Master encryption key for data at rest
  • Label, Annotation, Taint - Kubernetes resource metadata

4. Handlers (internal/handlers)

Request handlers implement business logic for API endpoints:

  • auth.go - OAuth flows, token issuance
  • clusters.go - Cluster CRUD operations
  • servers.go - Server provisioning
  • ssh_keys.go - SSH key generation with Ed25519/RSA support
  • dns.go - DNS record management via Route53
  • load_balancers.go - Load balancer configuration
  • node_pools.go - Node pool management with labels/annotations/taints
  • credentials.go - Cloud credential storage
  • orgs.go - Organization management
  • me.go - Current user information
  • me_keys.go - User API key management
  • health.go - Health check endpoints
  • version.go - Version information

5. Security & Encryption

Cryptography:

  • Master Key: AES-256-GCM encryption for root secrets
  • Organization Keys: Per-org encryption keys derived from master key
  • SSH Keys: Secure generation and encrypted storage
  • JWT Tokens: RS256 signing with key rotation
  • API Keys: Argon2id hashing for token storage
  • At-Rest Encryption: All sensitive data (kubeconfigs, credentials, SSH keys)

Authentication Methods:

  1. OAuth/OIDC (Google Workspace integration)
  2. Bearer tokens (JWT)
  3. Organization Key/Secret pairs
  4. User API keys

6. CLI Commands (cmd)

  • serve - Start the API server (default command)
  • keys generate - Generate JWT signing keys
  • encrypt create-master - Create master encryption key
  • db - Database management utilities
  • version - Display version information

7. Integration Points

Cloud Providers:

  • AWS (Route53 for DNS, S3 for backups)
  • Support for multi-cloud credentials

External Services:

  • PostgreSQL (primary data store)
  • S3-compatible storage (backups)
  • OAuth providers (Google)

SDKs:

  • TypeScript SDK (sdk/ts/) - Generated from OpenAPI spec
  • Go SDK (consumed via module alias) - Used by external integrations

External Integrations:

  • Terraform Provider - Separate repository providing IaC support for Autoglue resources

Development Workflow

Prerequisites

  • Go 1.25.4+
  • Docker & Docker Compose
  • PostgreSQL (via docker-compose)
  • Node.js (for UI development)

Setup

# 1. Configure environment
cp .env.example .env

# 2. Start database
docker compose up -d

# 3. Generate JWT keys
go run . keys generate

# 4. Create master encryption key
go run . encrypt create-master

# 5. Update OpenAPI docs and SDKs
make swagger
make sdk-all

# 6. Start API server with embedded UI
go run .

Build & Test

# Build application
go build -o autoglue .

# Run tests
go test ./...

# Build UI
make ui

Note: The Terraform provider is maintained in a separate repository.

API Architecture

Request Flow

Client → CORS → Rate Limit → Logger → Auth → Handler → DB/Jobs → Response

Authentication Flow

  1. User logs in via OAuth (Google)
  2. Backend validates token with provider
  3. JWT access token issued (short-lived)
  4. Refresh token stored in DB
  5. Organization context from X-Org-ID header

Job Execution Flow

  1. Handler enqueues job via Jobs.Enqueue()
  2. Archer worker picks up job from PostgreSQL
  3. Worker executes task with timeout
  4. Result stored in jobs table
  5. Retries on failure (configurable)

Database Schema

Key Tables:

  • users - User accounts
  • accounts - OAuth provider linkage
  • organizations - Tenant isolation
  • memberships - User-org relationships
  • api_keys - Authentication tokens
  • clusters - K8s cluster definitions
  • node_pools - Worker node groups
  • servers - Compute instances
  • ssh_keys - SSH keypair storage
  • load_balancers - LB configurations
  • domains - DNS domains
  • credentials - Cloud API credentials
  • jobs - Background job queue
  • signing_keys - JWT key rotation
  • refresh_tokens - OAuth token storage
  • master_keys - Encryption key hierarchy

Configuration

Environment variables (.env):

  • DATABASE_URL - PostgreSQL connection string
  • JWT_PRIVATE_ENC_KEY - JWT private key encryption
  • GOOGLE_CLIENT_ID / GOOGLE_CLIENT_SECRET - OAuth
  • ALLOWED_ORIGINS - CORS configuration
  • archer.* - Job queue settings
  • AWS credentials for Route53/S3

Deployment

Docker

docker build -t autoglue .
docker run -p 8080:8080 --env-file .env autoglue

Production Considerations

  • Database connection pooling
  • Rate limiting configuration
  • CORS allowed origins
  • JWT key rotation schedule
  • Backup retention policies
  • Worker instance scaling
  • Monitoring and alerting

API Documentation

  • Swagger UI: http://localhost:8080/swagger/index.html
  • OpenAPI Spec: docs/openapi.yaml
  • SDK Documentation: sdk/ts/README.md

Testing

The repository includes:

  • Unit tests for handlers (*_test.go)
  • Test utilities (internal/testutil/)
  • Integration tests with embedded PostgreSQL

Run tests:

go test ./internal/handlers/
go test -v ./...

Key Features

  1. Multi-tenancy: Organization-based resource isolation
  2. Encryption at Rest: All sensitive data encrypted per-org
  3. Async Job Processing: Background tasks with retry logic
  4. API Key Management: Multiple authentication methods
  5. SSH Key Generation: Automated keypair creation (RSA/Ed25519)
  6. DNS Automation: Route53 integration for DNS records
  7. Kubernetes Management: Cluster lifecycle automation
  8. Terraform Provider: Infrastructure-as-Code support
  9. Web UI: React-based management interface
  10. OpenAPI/Swagger: Auto-generated API documentation

Architecture Patterns

  • Repository Pattern: Data access abstraction via GORM
  • Dependency Injection: Dependencies passed to handlers
  • Middleware Chain: Request processing pipeline
  • Job Queue: Async processing with Archer
  • Multi-tenant: Organization-scoped data isolation
  • Encryption: Key hierarchy (master → org → resource)

Future Enhancements

Based on commented code and structure:

  • Full cluster provisioning automation
  • Additional cloud provider support
  • Enhanced monitoring and observability
  • Cluster backup and restore
  • Advanced RBAC controls
  • Custom resource definitions

Resources