system-prompts-and-models-o.../claude_skills/data-pipeline/SKILL.md
Claude 484f6c6b17
Add 25 world-class Claude Code skills for comprehensive software development
Created comprehensive skill collection covering all aspects of modern software
development with production-ready patterns, best practices, and detailed documentation.

## Skills Organized by Domain

### Code Quality & Architecture (2 skills)
- advanced-code-refactoring: SOLID principles, design patterns, refactoring patterns
- code-review: Automated/manual review, security, performance, maintainability

### API & Integration (2 skills)
- api-integration-expert: REST/GraphQL/WebSocket with auth, retry, caching
- graphql-schema-design: Schema design, resolvers, optimization, subscriptions

### Database & Data (3 skills)
- database-optimization: SQL/NoSQL tuning, indexing, query optimization
- data-pipeline: ETL/ELT with Airflow, Spark, dbt
- caching-strategies: Redis, Memcached, CDN, invalidation patterns

### Security & Authentication (2 skills)
- security-audit: OWASP Top 10, vulnerability scanning, security hardening
- auth-implementation: OAuth2, JWT, session management, SSO

### Testing & Quality (2 skills)
- test-automation: Unit/integration/E2E tests, TDD/BDD, coverage
- performance-profiling: CPU/memory profiling, Core Web Vitals optimization

### DevOps & Infrastructure (3 skills)
- docker-kubernetes: Containerization, orchestration, production deployments
- ci-cd-pipeline: GitHub Actions, automated testing, deployment strategies
- logging-monitoring: Observability with Datadog, Prometheus, Grafana, ELK

### Frontend Development (3 skills)
- frontend-accessibility: WCAG 2.1 compliance, ARIA, keyboard navigation
- ui-component-library: Design systems with React/Vue, Storybook
- mobile-responsive: Responsive design, mobile-first, PWAs

### Backend & Scaling (2 skills)
- backend-scaling: Load balancing, sharding, microservices, horizontal scaling
- real-time-systems: WebSockets, SSE, WebRTC for real-time features

### ML & AI (1 skill)
- ml-model-integration: Model serving, inference optimization, monitoring

### Development Tools (2 skills)
- git-workflow-optimizer: Git workflows, branching strategies, conflict resolution
- dependency-management: Package updates, security patches, version conflicts

### Code Maintenance (3 skills)
- error-handling: Robust error patterns, logging, graceful degradation
- documentation-generator: API docs, README, technical specifications
- migration-tools: Database/framework migrations with zero downtime

## Key Features

Each skill includes:
- YAML frontmatter with name, description, allowed tools
- Clear purpose and when to use
- Comprehensive capabilities overview
- Production-ready code examples
- Best practices and patterns
- Success criteria
- Tool-specific configurations

## Highlights

- 25 comprehensive skills covering full development lifecycle
- Production-ready patterns and examples
- Security-first approach throughout
- Performance optimization built-in
- Comprehensive testing strategies
- DevOps automation and infrastructure as code
- Modern frontend with accessibility focus
- Scalable backend architectures
- Data engineering and ML integration
- Advanced Git workflows

## File Structure

claude_skills/
├── README.md (comprehensive documentation)
├── advanced-code-refactoring/
│   ├── SKILL.md (main skill definition)
│   ├── reference.md (design patterns, SOLID principles)
│   └── examples.md (refactoring examples)
├── api-integration-expert/
│   └── SKILL.md (REST/GraphQL/WebSocket integration)
├── [23 more skills...]

Total: 25 skills + comprehensive README + supporting documentation

## Usage

Personal skills: cp -r claude_skills/* ~/.claude/skills/
Project skills: cp -r claude_skills/* .claude/skills/

Skills automatically activate based on context and description triggers.
2025-11-11 23:20:08 +00:00

2.9 KiB

name description allowed-tools
data-pipeline Expert in building ETL/ELT pipelines, data processing, transformation, and orchestration using tools like Airflow, Spark, and dbt. Use for data engineering tasks, building data workflows, or implementing data processing systems. Read, Write, Edit, Grep, Glob, Bash

Data Pipeline Expert

Purpose

Build robust ETL/ELT pipelines for data processing, transformation, and orchestration.

Tools & Technologies

  • Orchestration: Apache Airflow, Prefect, Dagster
  • Processing: Apache Spark, dbt, Pandas
  • Storage: S3, GCS, Data Lakes
  • Warehouses: Snowflake, BigQuery, Redshift
  • Streaming: Apache Kafka, AWS Kinesis
  • Quality: Great Expectations, dbt tests

Airflow DAG Example

from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.providers.postgres.operators.postgres import PostgresOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'data-team',
    'retries': 3,
    'retry_delay': timedelta(minutes=5),
    'email_on_failure': True,
}

with DAG(
    'user_analytics_pipeline',
    default_args=default_args,
    schedule_interval='@daily',
    start_date=datetime(2024, 1, 1),
    catchup=False,
    tags=['analytics', 'users'],
) as dag:

    extract_users = PythonOperator(
        task_id='extract_users',
        python_callable=extract_from_api,
        op_kwargs={'endpoint': 'users'}
    )

    transform_data = PythonOperator(
        task_id='transform_data',
        python_callable=transform_user_data,
    )

    load_to_warehouse = PostgresOperator(
        task_id='load_to_warehouse',
        postgres_conn_id='warehouse',
        sql='sql/load_users.sql',
    )

    data_quality_check = PythonOperator(
        task_id='data_quality_check',
        python_callable=run_quality_checks,
    )

    extract_users >> transform_data >> load_to_warehouse >> data_quality_check

dbt Transformation

-- models/staging/stg_users.sql
with source as (
    select * from {{ source('raw', 'users') }}
),

transformed as (
    select
        id as user_id,
        lower(email) as email,
        created_at,
        updated_at,
        case
            when status = 'active' then true
            else false
        end as is_active
    from source
    where created_at is not null
)

select * from transformed

-- models/marts/fct_user_activity.sql
with user_events as (
    select * from {{ ref('stg_events') }}
),

aggregated as (
    select
        user_id,
        count(*) as total_events,
        count(distinct date(created_at)) as active_days,
        min(created_at) as first_event_at,
        max(created_at) as last_event_at
    from user_events
    group by 1
)

select * from aggregated

Success Criteria

  • ✓ Data freshness < 1 hour
  • ✓ Pipeline success rate > 99%
  • ✓ Data quality checks passing
  • ✓ Idempotent operations
  • ✓ Monitoring and alerting