---
name: database-optimization
description: Expert in SQL/NoSQL database performance optimization, query tuning, indexing strategies, schema design, and migrations. Use when optimizing slow queries, designing database schemas, creating indexes, or troubleshooting database performance issues.
allowed-tools: Read, Write, Edit, Grep, Bash
---

# Database Optimization Expert

## Purpose
Comprehensive database performance optimization including query tuning, indexing strategies, schema design, connection pooling, caching, and migration management for SQL and NoSQL databases.

## When to Use
- Slow query optimization
- Index design and creation
- Schema normalization/denormalization
- Database migration planning
- Connection pool configuration
- Query plan analysis
- N+1 query problems
- Database scaling strategies

## Capabilities

### SQL Optimization
- Query performance analysis with EXPLAIN
- Index design (B-tree, Hash, GiST, GIN)
- Query rewriting for performance
- JOIN optimization
- Subquery vs JOIN analysis
- Window functions and CTEs
- Partitioning strategies

### NoSQL Optimization
- Document structure design (MongoDB)
- Key-value optimization (Redis)
- Column-family design (Cassandra)
- Graph traversal optimization (Neo4j)
- Sharding strategies
- Replication configuration

### Schema Design
- Normalization (1NF, 2NF, 3NF, BCNF)
- Strategic denormalization
- Foreign key relationships
- Composite keys
- UUID vs auto-increment IDs
- Soft deletes vs hard deletes

## SQL Query Optimization Examples

```sql
-- BEFORE: Slow query with N+1 problem
SELECT * FROM users;
-- Then in application: for each user, SELECT * FROM orders WHERE user_id = ?

-- AFTER: Single query with JOIN
SELECT
  u.*,
  o.id as order_id,
  o.total as order_total,
  o.created_at as order_date
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
ORDER BY u.id, o.created_at DESC;

-- BETTER: Use window functions for latest order
SELECT
  u.*,
  o.id as latest_order_id,
  o.total as latest_order_total
FROM users u
LEFT JOIN LATERAL (
  SELECT id, total, created_at
  FROM orders
  WHERE user_id = u.id
  ORDER BY created_at DESC
  LIMIT 1
) o ON true;
```

### Index Strategy

```sql
-- Analyze query plan
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 123
  AND status = 'pending'
  AND created_at > '2024-01-01';

-- Create composite index (order matters!)
CREATE INDEX idx_orders_user_status_created
ON orders(user_id, status, created_at);

-- Covering index (include all needed columns)
CREATE INDEX idx_orders_covering
ON orders(user_id, status)
INCLUDE (total, created_at);

-- Partial index (for specific conditions)
CREATE INDEX idx_orders_pending
ON orders(user_id, created_at)
WHERE status = 'pending';

-- Expression index
CREATE INDEX idx_users_lower_email
ON users(LOWER(email));
```

### Query Rewriting

```sql
-- SLOW: Using OR
SELECT * FROM users WHERE name = 'John' OR email = 'john@example.com';

-- FAST: Using UNION ALL (if mutually exclusive)
SELECT * FROM users WHERE name = 'John'
UNION ALL
SELECT * FROM users WHERE email = 'john@example.com' AND name != 'John';

-- SLOW: Subquery in SELECT
SELECT
  u.*,
  (SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count
FROM users u;

-- FAST: JOIN with aggregation
SELECT
  u.*,
  COALESCE(o.order_count, 0) as order_count
FROM users u
LEFT JOIN (
  SELECT user_id, COUNT(*) as order_count
  FROM orders
  GROUP BY user_id
) o ON u.id = o.user_id;

-- SLOW: NOT IN with subquery
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM orders);

-- FAST: NOT EXISTS or LEFT JOIN
SELECT u.*
FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

-- Or
SELECT u.*
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NULL;
```

## MongoDB Optimization

```javascript
// Schema design with embedded documents
{
  _id: ObjectId("..."),
  user_id: 123,
  email: "user@example.com",
  profile: {  // Embedded for 1:1 relationships
    firstName: "John",
    lastName: "Doe",
    avatar: "url"
  },
  address: [  // Embedded array for 1:few
    { street: "123 Main", city: "NYC", type: "home" },
    { street: "456 Work Ave", city: "NYC", type: "work" }
  ],
  // Reference for 1:many (use separate collection)
  order_ids: [ObjectId("..."), ObjectId("...")]
}

// Indexing strategies
db.users.createIndex({ email: 1 }, { unique: true });
db.users.createIndex({ "profile.lastName": 1, "profile.firstName": 1 });
db.orders.createIndex({ user_id: 1, created_at: -1 });

// Compound index for common queries
db.orders.createIndex({
  status: 1,
  user_id: 1,
  created_at: -1
});

// Text index for search
db.products.createIndex({
  name: "text",
  description: "text"
});

// Aggregation pipeline optimization
db.orders.aggregate([
  // Match first (filter early)
  { $match: { status: "pending", created_at: { $gte: ISODate("2024-01-01") } } },

  // Lookup (join) only needed data
  { $lookup: {
      from: "users",
      localField: "user_id",
      foreignField: "_id",
      as: "user"
  }},

  // Project (select only needed fields)
  { $project: {
      order_id: "$_id",
      total: 1,
      "user.email": 1
  }},

  // Group and aggregate
  { $group: {
      _id: "$user.email",
      total_spent: { $sum: "$total" },
      order_count: { $sum: 1 }
  }},

  // Sort
  { $sort: { total_spent: -1 } },

  // Limit
  { $limit: 10 }
]);
```

## Connection Pooling

```javascript
// PostgreSQL with pg
const { Pool } = require('pg');

const pool = new Pool({
  host: 'localhost',
  database: 'mydb',
  max: 20,                    // Max clients in pool
  idleTimeoutMillis: 30000,  // Close idle clients after 30s
  connectionTimeoutMillis: 2000, // Timeout acquiring connection
});

// Proper connection usage
async function getUser(id) {
  const client = await pool.connect();
  try {
    const result = await client.query('SELECT * FROM users WHERE id = $1', [id]);
    return result.rows[0];
  } finally {
    client.release(); // Always release!
  }
}

// Transaction with automatic rollback
async function transferMoney(fromId, toId, amount) {
  const client = await pool.connect();
  try {
    await client.query('BEGIN');
    await client.query('UPDATE accounts SET balance = balance - $1 WHERE id = $2', [amount, fromId]);
    await client.query('UPDATE accounts SET balance = balance + $1 WHERE id = $2', [amount, toId]);
    await client.query('COMMIT');
  } catch (error) {
    await client.query('ROLLBACK');
    throw error;
  } finally {
    client.release();
  }
}
```

## Caching Strategies

```typescript
import Redis from 'ioredis';

const redis = new Redis();

// Cache-aside pattern
async function getUser(id: string) {
  // Try cache first
  const cached = await redis.get(`user:${id}`);
  if (cached) return JSON.parse(cached);

  // Cache miss - query database
  const user = await db.query('SELECT * FROM users WHERE id = $1', [id]);

  // Store in cache with TTL
  await redis.setex(`user:${id}`, 3600, JSON.stringify(user));

  return user;
}

// Invalidate cache on update
async function updateUser(id: string, data: any) {
  await db.query('UPDATE users SET ... WHERE id = $1', [id]);
  await redis.del(`user:${id}`); // Invalidate cache
}

// Write-through cache
async function createUser(data: any) {
  const user = await db.query('INSERT INTO users ... RETURNING *', [data]);
  await redis.setex(`user:${user.id}`, 3600, JSON.stringify(user));
  return user;
}
```

## Migration Best Practices

```sql
-- migrations/001_create_users.up.sql
BEGIN;

CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email VARCHAR(255) UNIQUE NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_users_email ON users(email);

COMMIT;

-- migrations/001_create_users.down.sql
BEGIN;
DROP TABLE IF EXISTS users CASCADE;
COMMIT;

-- Safe column addition (non-blocking)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Safe column removal (two-step)
-- Step 1: Make column nullable and stop using it
ALTER TABLE users ALTER COLUMN old_column DROP NOT NULL;
-- Deploy code that doesn't use column
-- Step 2: Drop column
ALTER TABLE users DROP COLUMN old_column;

-- Safe index creation (concurrent)
CREATE INDEX CONCURRENTLY idx_users_phone ON users(phone);

-- Safe data migration (batched)
DO $$
DECLARE
  batch_size INT := 1000;
  offset_val INT := 0;
  affected INT;
BEGIN
  LOOP
    UPDATE users
    SET normalized_email = LOWER(email)
    WHERE id IN (
      SELECT id FROM users
      WHERE normalized_email IS NULL
      LIMIT batch_size
    );

    GET DIAGNOSTICS affected = ROW_COUNT;
    EXIT WHEN affected = 0;

    -- Pause between batches
    PERFORM pg_sleep(0.1);
  END LOOP;
END $$;
```

## Performance Monitoring

```sql
-- PostgreSQL slow query log
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1s

-- Find slow queries
SELECT
  query,
  calls,
  total_time,
  mean_time,
  max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

-- Find missing indexes
SELECT
  schemaname,
  tablename,
  seq_scan,
  seq_tup_read,
  idx_scan,
  seq_tup_read / seq_scan as avg_seq_tup_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 10;

-- Table bloat
SELECT
  tablename,
  pg_size_pretty(pg_total_relation_size(tablename::regclass)) as size,
  pg_size_pretty(pg_total_relation_size(tablename::regclass) - pg_relation_size(tablename::regclass)) as index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(tablename::regclass) DESC;
```

## Success Criteria
- ✓ Query execution time < 100ms for common queries
- ✓ Proper indexes on frequently queried columns
- ✓ No N+1 query problems
- ✓ Connection pooling configured
- ✓ Cache hit rate > 80% for cacheable data
- ✓ Database CPU < 70%
- ✓ Zero-downtime migrations
- ✓ Monitoring and alerting in place