mirror of
https://github.com/GlueOps/autoglue.git
synced 2026-02-19 23:50:06 +01:00
Add agents.md architecture documentation (#685)
* Initial plan * Add comprehensive agents.md documentation Co-authored-by: allanice001 <700853+allanice001@users.noreply.github.com> * Update agents.md to address code review feedback Co-authored-by: allanice001 <700853+allanice001@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: allanice001 <700853+allanice001@users.noreply.github.com>
This commit is contained in:
340
agents.md
Normal file
340
agents.md
Normal file
@@ -0,0 +1,340 @@
|
|||||||
|
# Autoglue Repository Architecture & Agents
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Autoglue is a Kubernetes cluster management platform built with Go that manages the lifecycle of K3s clusters across GlueOps-supported cloud providers. It provides a REST API for cluster provisioning, configuration, and management, along with a web UI and Terraform provider.
|
||||||
|
|
||||||
|
## Repository Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
autoglue/
|
||||||
|
├── cmd/ # CLI commands
|
||||||
|
├── internal/ # Internal packages
|
||||||
|
│ ├── api/ # HTTP API routes and middleware
|
||||||
|
│ ├── app/ # Application setup
|
||||||
|
│ ├── auth/ # Authentication logic
|
||||||
|
│ ├── bg/ # Background job workers
|
||||||
|
│ ├── common/ # Common utilities
|
||||||
|
│ ├── config/ # Configuration management
|
||||||
|
│ ├── db/ # Database operations
|
||||||
|
│ ├── handlers/ # HTTP request handlers
|
||||||
|
│ ├── keys/ # Cryptographic key management
|
||||||
|
│ ├── mapper/ # Data mapping utilities
|
||||||
|
│ ├── models/ # Database models
|
||||||
|
│ ├── utils/ # Utility functions
|
||||||
|
│ ├── version/ # Version information
|
||||||
|
│ └── web/ # Web UI integration
|
||||||
|
├── sdk/ # Generated SDKs
|
||||||
|
│ └── ts/ # TypeScript SDK
|
||||||
|
├── ui/ # Frontend application (React)
|
||||||
|
├── docs/ # OpenAPI/Swagger documentation
|
||||||
|
├── postgres/ # PostgreSQL configuration
|
||||||
|
├── main.go # Application entry point
|
||||||
|
├── schema.sql # Database schema
|
||||||
|
└── docker-compose.yml # Development environment
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
## Core Components
|
||||||
|
|
||||||
|
### 1. API Layer (`internal/api`)
|
||||||
|
|
||||||
|
The API layer provides RESTful endpoints for managing cloud resources:
|
||||||
|
|
||||||
|
- **Authentication** (`mount_auth_routes.go`): OAuth/OIDC integration, JWT tokens
|
||||||
|
- **Clusters** (`mount_cluster_routes.go`): Kubernetes cluster management
|
||||||
|
- **Servers** (`mount_server_routes.go`): Server resource management
|
||||||
|
- **SSH Keys** (`mount_ssh_routes.go`): SSH key generation and management
|
||||||
|
- **DNS** (`mount_dns_routes.go`): DNS record management
|
||||||
|
- **Load Balancers** (`mount_load_balancer_routes.go`): Load balancer configuration
|
||||||
|
- **Node Pools** (`mount_node_pool_routes.go`): Worker node pool management
|
||||||
|
- **Credentials** (`mount_credential_routes.go`): Cloud provider credentials
|
||||||
|
- **Organizations** (`mount_org_routes.go`): Multi-tenant organization management
|
||||||
|
|
||||||
|
**Middleware:**
|
||||||
|
- Request logging with zerolog
|
||||||
|
- Rate limiting (1000 requests/minute per IP)
|
||||||
|
- CORS handling
|
||||||
|
- Security headers
|
||||||
|
- Authentication/Authorization
|
||||||
|
- Request body size limits (10MB)
|
||||||
|
|
||||||
|
### 2. Background Jobs (`internal/bg`)
|
||||||
|
|
||||||
|
Autoglue uses the [Archer](https://github.com/dyaksa/archer) job queue system with PostgreSQL-backed job persistence.
|
||||||
|
|
||||||
|
**Active Job Workers:**
|
||||||
|
|
||||||
|
| Worker | Purpose | Timeout |
|
||||||
|
|--------|---------|---------|
|
||||||
|
| `bootstrap_bastion` | Provision and configure bastion host servers | Configurable (default 60s) |
|
||||||
|
| `archer_cleanup` | Clean up old job records | 5 minutes |
|
||||||
|
| `tokens_cleanup` | Purge expired refresh tokens | 5 minutes |
|
||||||
|
| `db_backup_s3` | Backup database to S3 | 15 minutes |
|
||||||
|
| `dns_reconcile` | Synchronize DNS records with Route53 | 2 minutes |
|
||||||
|
| `org_key_sweeper` | Remove expired organization API keys | 5 minutes |
|
||||||
|
| `cluster_action` | Execute cluster lifecycle actions | Configurable |
|
||||||
|
|
||||||
|
**Planned Job Workers (Currently Disabled):**
|
||||||
|
|
||||||
|
The following workers exist in the codebase but are currently commented out:
|
||||||
|
- `prepare_cluster` - Prepare infrastructure for cluster deployment
|
||||||
|
- `cluster_setup` - Initial cluster configuration
|
||||||
|
- `cluster_bootstrap` - Full Kubernetes cluster bootstrapping process
|
||||||
|
|
||||||
|
**Configuration:**
|
||||||
|
- `archer.instances`: Number of worker instances (default: 1)
|
||||||
|
- `archer.timeoutSec`: Job timeout in seconds (default: 60)
|
||||||
|
- `archer.cleanup_retain_days`: Job retention period (default: 7 days)
|
||||||
|
|
||||||
|
### 3. Data Models (`internal/models`)
|
||||||
|
|
||||||
|
**Core Models:**
|
||||||
|
|
||||||
|
- `User` - User accounts with OAuth integration
|
||||||
|
- `Organization` - Multi-tenant organizations
|
||||||
|
- `Membership` - User-organization relationships
|
||||||
|
- `ApiKey` - API authentication tokens (user and org-level)
|
||||||
|
- `OrganizationKey` - Organization-level credentials with auto-expiry
|
||||||
|
- `Cluster` - Kubernetes cluster definitions
|
||||||
|
- `NodePool` - Worker node group configurations
|
||||||
|
- `Server` - Individual server instances
|
||||||
|
- `SshKey` - SSH keypair management with encryption
|
||||||
|
- `LoadBalancer` - Load balancer configurations
|
||||||
|
- `Domain` - DNS domain management
|
||||||
|
- `Credential` - Cloud provider API credentials (AWS, etc.)
|
||||||
|
- `Job` - Background job queue records
|
||||||
|
- `SigningKey` - JWT signing keys with rotation
|
||||||
|
- `RefreshToken` - OAuth refresh token storage
|
||||||
|
- `MasterKey` - Master encryption key for data at rest
|
||||||
|
- `Label`, `Annotation`, `Taint` - Kubernetes resource metadata
|
||||||
|
|
||||||
|
### 4. Handlers (`internal/handlers`)
|
||||||
|
|
||||||
|
Request handlers implement business logic for API endpoints:
|
||||||
|
|
||||||
|
- `auth.go` - OAuth flows, token issuance
|
||||||
|
- `clusters.go` - Cluster CRUD operations
|
||||||
|
- `servers.go` - Server provisioning
|
||||||
|
- `ssh_keys.go` - SSH key generation with Ed25519/RSA support
|
||||||
|
- `dns.go` - DNS record management via Route53
|
||||||
|
- `load_balancers.go` - Load balancer configuration
|
||||||
|
- `node_pools.go` - Node pool management with labels/annotations/taints
|
||||||
|
- `credentials.go` - Cloud credential storage
|
||||||
|
- `orgs.go` - Organization management
|
||||||
|
- `me.go` - Current user information
|
||||||
|
- `me_keys.go` - User API key management
|
||||||
|
- `health.go` - Health check endpoints
|
||||||
|
- `version.go` - Version information
|
||||||
|
|
||||||
|
### 5. Security & Encryption
|
||||||
|
|
||||||
|
**Cryptography:**
|
||||||
|
- **Master Key**: AES-256-GCM encryption for root secrets
|
||||||
|
- **Organization Keys**: Per-org encryption keys derived from master key
|
||||||
|
- **SSH Keys**: Secure generation and encrypted storage
|
||||||
|
- **JWT Tokens**: RS256 signing with key rotation
|
||||||
|
- **API Keys**: Argon2id hashing for token storage
|
||||||
|
- **At-Rest Encryption**: All sensitive data (kubeconfigs, credentials, SSH keys)
|
||||||
|
|
||||||
|
**Authentication Methods:**
|
||||||
|
1. OAuth/OIDC (Google Workspace integration)
|
||||||
|
2. Bearer tokens (JWT)
|
||||||
|
3. Organization Key/Secret pairs
|
||||||
|
4. User API keys
|
||||||
|
|
||||||
|
### 6. CLI Commands (`cmd`)
|
||||||
|
|
||||||
|
- `serve` - Start the API server (default command)
|
||||||
|
- `keys generate` - Generate JWT signing keys
|
||||||
|
- `encrypt create-master` - Create master encryption key
|
||||||
|
- `db` - Database management utilities
|
||||||
|
- `version` - Display version information
|
||||||
|
|
||||||
|
### 7. Integration Points
|
||||||
|
|
||||||
|
**Cloud Providers:**
|
||||||
|
- AWS (Route53 for DNS, S3 for backups)
|
||||||
|
- Support for multi-cloud credentials
|
||||||
|
|
||||||
|
**External Services:**
|
||||||
|
- PostgreSQL (primary data store)
|
||||||
|
- S3-compatible storage (backups)
|
||||||
|
- OAuth providers (Google)
|
||||||
|
|
||||||
|
**SDKs:**
|
||||||
|
- TypeScript SDK (`sdk/ts/`) - Generated from OpenAPI spec
|
||||||
|
- Go SDK (consumed via module alias) - Used by external integrations
|
||||||
|
|
||||||
|
**External Integrations:**
|
||||||
|
- Terraform Provider - Separate repository providing IaC support for Autoglue resources
|
||||||
|
|
||||||
|
## Development Workflow
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
- Go 1.25.4+
|
||||||
|
- Docker & Docker Compose
|
||||||
|
- PostgreSQL (via docker-compose)
|
||||||
|
- Node.js (for UI development)
|
||||||
|
|
||||||
|
### Setup
|
||||||
|
```bash
|
||||||
|
# 1. Configure environment
|
||||||
|
cp .env.example .env
|
||||||
|
|
||||||
|
# 2. Start database
|
||||||
|
docker compose up -d
|
||||||
|
|
||||||
|
# 3. Generate JWT keys
|
||||||
|
go run . keys generate
|
||||||
|
|
||||||
|
# 4. Create master encryption key
|
||||||
|
go run . encrypt create-master
|
||||||
|
|
||||||
|
# 5. Update OpenAPI docs and SDKs
|
||||||
|
make swagger
|
||||||
|
make sdk-all
|
||||||
|
|
||||||
|
# 6. Start API server with embedded UI
|
||||||
|
go run .
|
||||||
|
```
|
||||||
|
|
||||||
|
### Build & Test
|
||||||
|
```bash
|
||||||
|
# Build application
|
||||||
|
go build -o autoglue .
|
||||||
|
|
||||||
|
# Run tests
|
||||||
|
go test ./...
|
||||||
|
|
||||||
|
# Build UI
|
||||||
|
make ui
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note:** The Terraform provider is maintained in a separate repository.
|
||||||
|
|
||||||
|
## API Architecture
|
||||||
|
|
||||||
|
### Request Flow
|
||||||
|
```
|
||||||
|
Client → CORS → Rate Limit → Logger → Auth → Handler → DB/Jobs → Response
|
||||||
|
```
|
||||||
|
|
||||||
|
### Authentication Flow
|
||||||
|
1. User logs in via OAuth (Google)
|
||||||
|
2. Backend validates token with provider
|
||||||
|
3. JWT access token issued (short-lived)
|
||||||
|
4. Refresh token stored in DB
|
||||||
|
5. Organization context from `X-Org-ID` header
|
||||||
|
|
||||||
|
### Job Execution Flow
|
||||||
|
1. Handler enqueues job via `Jobs.Enqueue()`
|
||||||
|
2. Archer worker picks up job from PostgreSQL
|
||||||
|
3. Worker executes task with timeout
|
||||||
|
4. Result stored in `jobs` table
|
||||||
|
5. Retries on failure (configurable)
|
||||||
|
|
||||||
|
## Database Schema
|
||||||
|
|
||||||
|
**Key Tables:**
|
||||||
|
- `users` - User accounts
|
||||||
|
- `accounts` - OAuth provider linkage
|
||||||
|
- `organizations` - Tenant isolation
|
||||||
|
- `memberships` - User-org relationships
|
||||||
|
- `api_keys` - Authentication tokens
|
||||||
|
- `clusters` - K8s cluster definitions
|
||||||
|
- `node_pools` - Worker node groups
|
||||||
|
- `servers` - Compute instances
|
||||||
|
- `ssh_keys` - SSH keypair storage
|
||||||
|
- `load_balancers` - LB configurations
|
||||||
|
- `domains` - DNS domains
|
||||||
|
- `credentials` - Cloud API credentials
|
||||||
|
- `jobs` - Background job queue
|
||||||
|
- `signing_keys` - JWT key rotation
|
||||||
|
- `refresh_tokens` - OAuth token storage
|
||||||
|
- `master_keys` - Encryption key hierarchy
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Environment variables (`.env`):
|
||||||
|
- `DATABASE_URL` - PostgreSQL connection string
|
||||||
|
- `JWT_PRIVATE_ENC_KEY` - JWT private key encryption
|
||||||
|
- `GOOGLE_CLIENT_ID` / `GOOGLE_CLIENT_SECRET` - OAuth
|
||||||
|
- `ALLOWED_ORIGINS` - CORS configuration
|
||||||
|
- `archer.*` - Job queue settings
|
||||||
|
- AWS credentials for Route53/S3
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
### Docker
|
||||||
|
```bash
|
||||||
|
docker build -t autoglue .
|
||||||
|
docker run -p 8080:8080 --env-file .env autoglue
|
||||||
|
```
|
||||||
|
|
||||||
|
### Production Considerations
|
||||||
|
- Database connection pooling
|
||||||
|
- Rate limiting configuration
|
||||||
|
- CORS allowed origins
|
||||||
|
- JWT key rotation schedule
|
||||||
|
- Backup retention policies
|
||||||
|
- Worker instance scaling
|
||||||
|
- Monitoring and alerting
|
||||||
|
|
||||||
|
## API Documentation
|
||||||
|
|
||||||
|
- **Swagger UI**: `http://localhost:8080/swagger/index.html`
|
||||||
|
- **OpenAPI Spec**: `docs/openapi.yaml`
|
||||||
|
- **SDK Documentation**: `sdk/ts/README.md`
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
The repository includes:
|
||||||
|
- Unit tests for handlers (`*_test.go`)
|
||||||
|
- Test utilities (`internal/testutil/`)
|
||||||
|
- Integration tests with embedded PostgreSQL
|
||||||
|
|
||||||
|
Run tests:
|
||||||
|
```bash
|
||||||
|
go test ./internal/handlers/
|
||||||
|
go test -v ./...
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Features
|
||||||
|
|
||||||
|
1. **Multi-tenancy**: Organization-based resource isolation
|
||||||
|
2. **Encryption at Rest**: All sensitive data encrypted per-org
|
||||||
|
3. **Async Job Processing**: Background tasks with retry logic
|
||||||
|
4. **API Key Management**: Multiple authentication methods
|
||||||
|
5. **SSH Key Generation**: Automated keypair creation (RSA/Ed25519)
|
||||||
|
6. **DNS Automation**: Route53 integration for DNS records
|
||||||
|
7. **Kubernetes Management**: Cluster lifecycle automation
|
||||||
|
8. **Terraform Provider**: Infrastructure-as-Code support
|
||||||
|
9. **Web UI**: React-based management interface
|
||||||
|
10. **OpenAPI/Swagger**: Auto-generated API documentation
|
||||||
|
|
||||||
|
## Architecture Patterns
|
||||||
|
|
||||||
|
- **Repository Pattern**: Data access abstraction via GORM
|
||||||
|
- **Dependency Injection**: Dependencies passed to handlers
|
||||||
|
- **Middleware Chain**: Request processing pipeline
|
||||||
|
- **Job Queue**: Async processing with Archer
|
||||||
|
- **Multi-tenant**: Organization-scoped data isolation
|
||||||
|
- **Encryption**: Key hierarchy (master → org → resource)
|
||||||
|
|
||||||
|
## Future Enhancements
|
||||||
|
|
||||||
|
Based on commented code and structure:
|
||||||
|
- Full cluster provisioning automation
|
||||||
|
- Additional cloud provider support
|
||||||
|
- Enhanced monitoring and observability
|
||||||
|
- Cluster backup and restore
|
||||||
|
- Advanced RBAC controls
|
||||||
|
- Custom resource definitions
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
- **GitHub**: https://github.com/GlueOps/autoglue
|
||||||
|
- **Production API**: https://autoglue.glueopshosted.com/api/v1
|
||||||
|
- **Pre-prod API**: https://autoglue.glueopshosted.rocks/api/v1
|
||||||
|
- **Staging API**: https://autoglue.apps.nonprod.earth.onglueops.rocks/api/v1
|
||||||
Reference in New Issue
Block a user