system-prompts-and-models-o.../claude_skills/data-pipeline/SKILL.md
Claude 484f6c6b17
Add 25 world-class Claude Code skills for comprehensive software development
Created comprehensive skill collection covering all aspects of modern software
development with production-ready patterns, best practices, and detailed documentation.

## Skills Organized by Domain

### Code Quality & Architecture (2 skills)
- advanced-code-refactoring: SOLID principles, design patterns, refactoring patterns
- code-review: Automated/manual review, security, performance, maintainability

### API & Integration (2 skills)
- api-integration-expert: REST/GraphQL/WebSocket with auth, retry, caching
- graphql-schema-design: Schema design, resolvers, optimization, subscriptions

### Database & Data (3 skills)
- database-optimization: SQL/NoSQL tuning, indexing, query optimization
- data-pipeline: ETL/ELT with Airflow, Spark, dbt
- caching-strategies: Redis, Memcached, CDN, invalidation patterns

### Security & Authentication (2 skills)
- security-audit: OWASP Top 10, vulnerability scanning, security hardening
- auth-implementation: OAuth2, JWT, session management, SSO

### Testing & Quality (2 skills)
- test-automation: Unit/integration/E2E tests, TDD/BDD, coverage
- performance-profiling: CPU/memory profiling, Core Web Vitals optimization

### DevOps & Infrastructure (3 skills)
- docker-kubernetes: Containerization, orchestration, production deployments
- ci-cd-pipeline: GitHub Actions, automated testing, deployment strategies
- logging-monitoring: Observability with Datadog, Prometheus, Grafana, ELK

### Frontend Development (3 skills)
- frontend-accessibility: WCAG 2.1 compliance, ARIA, keyboard navigation
- ui-component-library: Design systems with React/Vue, Storybook
- mobile-responsive: Responsive design, mobile-first, PWAs

### Backend & Scaling (2 skills)
- backend-scaling: Load balancing, sharding, microservices, horizontal scaling
- real-time-systems: WebSockets, SSE, WebRTC for real-time features

### ML & AI (1 skill)
- ml-model-integration: Model serving, inference optimization, monitoring

### Development Tools (2 skills)
- git-workflow-optimizer: Git workflows, branching strategies, conflict resolution
- dependency-management: Package updates, security patches, version conflicts

### Code Maintenance (3 skills)
- error-handling: Robust error patterns, logging, graceful degradation
- documentation-generator: API docs, README, technical specifications
- migration-tools: Database/framework migrations with zero downtime

## Key Features

Each skill includes:
- YAML frontmatter with name, description, allowed tools
- Clear purpose and when to use
- Comprehensive capabilities overview
- Production-ready code examples
- Best practices and patterns
- Success criteria
- Tool-specific configurations

## Highlights

- 25 comprehensive skills covering full development lifecycle
- Production-ready patterns and examples
- Security-first approach throughout
- Performance optimization built-in
- Comprehensive testing strategies
- DevOps automation and infrastructure as code
- Modern frontend with accessibility focus
- Scalable backend architectures
- Data engineering and ML integration
- Advanced Git workflows

## File Structure

claude_skills/
├── README.md (comprehensive documentation)
├── advanced-code-refactoring/
│   ├── SKILL.md (main skill definition)
│   ├── reference.md (design patterns, SOLID principles)
│   └── examples.md (refactoring examples)
├── api-integration-expert/
│   └── SKILL.md (REST/GraphQL/WebSocket integration)
├── [23 more skills...]

Total: 25 skills + comprehensive README + supporting documentation

## Usage

Personal skills: cp -r claude_skills/* ~/.claude/skills/
Project skills: cp -r claude_skills/* .claude/skills/

Skills automatically activate based on context and description triggers.
2025-11-11 23:20:08 +00:00

117 lines
2.9 KiB
Markdown

---
name: data-pipeline
description: Expert in building ETL/ELT pipelines, data processing, transformation, and orchestration using tools like Airflow, Spark, and dbt. Use for data engineering tasks, building data workflows, or implementing data processing systems.
allowed-tools: Read, Write, Edit, Grep, Glob, Bash
---
# Data Pipeline Expert
## Purpose
Build robust ETL/ELT pipelines for data processing, transformation, and orchestration.
## Tools & Technologies
- **Orchestration**: Apache Airflow, Prefect, Dagster
- **Processing**: Apache Spark, dbt, Pandas
- **Storage**: S3, GCS, Data Lakes
- **Warehouses**: Snowflake, BigQuery, Redshift
- **Streaming**: Apache Kafka, AWS Kinesis
- **Quality**: Great Expectations, dbt tests
## Airflow DAG Example
```python
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.providers.postgres.operators.postgres import PostgresOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'data-team',
'retries': 3,
'retry_delay': timedelta(minutes=5),
'email_on_failure': True,
}
with DAG(
'user_analytics_pipeline',
default_args=default_args,
schedule_interval='@daily',
start_date=datetime(2024, 1, 1),
catchup=False,
tags=['analytics', 'users'],
) as dag:
extract_users = PythonOperator(
task_id='extract_users',
python_callable=extract_from_api,
op_kwargs={'endpoint': 'users'}
)
transform_data = PythonOperator(
task_id='transform_data',
python_callable=transform_user_data,
)
load_to_warehouse = PostgresOperator(
task_id='load_to_warehouse',
postgres_conn_id='warehouse',
sql='sql/load_users.sql',
)
data_quality_check = PythonOperator(
task_id='data_quality_check',
python_callable=run_quality_checks,
)
extract_users >> transform_data >> load_to_warehouse >> data_quality_check
```
## dbt Transformation
```sql
-- models/staging/stg_users.sql
with source as (
select * from {{ source('raw', 'users') }}
),
transformed as (
select
id as user_id,
lower(email) as email,
created_at,
updated_at,
case
when status = 'active' then true
else false
end as is_active
from source
where created_at is not null
)
select * from transformed
-- models/marts/fct_user_activity.sql
with user_events as (
select * from {{ ref('stg_events') }}
),
aggregated as (
select
user_id,
count(*) as total_events,
count(distinct date(created_at)) as active_days,
min(created_at) as first_event_at,
max(created_at) as last_event_at
from user_events
group by 1
)
select * from aggregated
```
## Success Criteria
- ✓ Data freshness < 1 hour
- Pipeline success rate > 99%
- ✓ Data quality checks passing
- ✓ Idempotent operations
- ✓ Monitoring and alerting