AI / 머신러닝2026-02-18·3분 읽기

AI Model Deployment MLOps Pipeline 구축 가이드

AI 모델 배포를 위한 MLOps 파이프라인 구축 방법을 실전 코드와 함께 상세히 알아보세요. Docker, Kubernetes, CI/CD를 활용한 완전한 배포 전략을 제시합니다.

AI Model Deployment MLOps Pipeline 구축 가이드

AI 모델의 성능이 아무리 뛰어나더라도 실제 프로덕션 환경에서 안정적으로 서비스되지 못한다면 그 가치를 제대로 발휘할 수 없습니다. 특히 LLM과 같은 대규모 AI 모델의 경우 배포와 운영의 복잡성이 더욱 증가하고 있어, 체계적인 MLOps 파이프라인 구축이 필수적입니다. 이 글에서는 AI 모델 배포를 위한 완전한 MLOps 파이프라인을 구축하는 방법을 실전 코드와 함께 살펴보겠습니다.

MLOps 파이프라인 아키텍처 설계

현대적인 MLOps 파이프라인은 데이터 수집부터 모델 배포, 모니터링까지의 전체 생명주기를 자동화해야 합니다. 효과적인 아키텍처는 다음과 같은 핵심 컴포넌트들로 구성됩니다.

# mlops-architecture.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: mlops-config
data:
  pipeline_stages: |
    - data_ingestion
    - data_validation
    - model_training
    - model_evaluation
    - model_deployment
    - monitoring
  deployment_strategy: "blue-green"
  scaling_policy: "horizontal"

아키텍처의 핵심은 각 단계별 독립성과 재사용성을 보장하는 것입니다. 데이터 파이프라인과 모델 파이프라인을 분리하고, 각각의 컴포넌트가 독립적으로 스케일링될 수 있도록 설계해야 합니다.

Docker를 활용한 모델 컨테이너화

AI 모델의 일관된 배포를 위해서는 컨테이너화가 필수입니다. 다음은 PyTorch 기반 모델을 위한 최적화된 Dockerfile 예시입니다.

# Dockerfile
FROM python:3.9-slim as base

# 시스템 패키지 설치
RUN apt-get update && apt-get install -y \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

# Python 의존성 설치
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# 멀티스테이지 빌드로 최종 이미지 크기 최적화
FROM base as production

WORKDIR /app
COPY src/ ./src/
COPY models/ ./models/
COPY config/ ./config/

# 비루트 사용자 생성
RUN useradd -m -u 1000 mlops && chown -R mlops:mlops /app
USER mlops

EXPOSE 8000
CMD ["uvicorn", "src.api.main:app", "--host", "0.0.0.0", "--port", "8000"]

컨테이너 최적화를 위해 멀티스테이지 빌드를 사용하고, 보안을 위해 비루트 사용자로 실행하는 것이 중요합니다.

FastAPI 기반 모델 서빙 API 구현

모델 서빙을 위한 고성능 API를 구현해보겠습니다. 비동기 처리와 배치 추론을 지원하는 구조입니다.

# src/api/main.py
from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel
import torch
import asyncio
from typing import List, Optional
import logging

app = FastAPI(title="AI Model API", version="1.0.0")

class PredictionRequest(BaseModel):
    text: str
    max_length: Optional[int] = 512
    temperature: Optional[float] = 0.7

class PredictionResponse(BaseModel):
    prediction: str
    confidence: float
    processing_time: float

class ModelManager:
    def __init__(self):
        self.model = None
        self.tokenizer = None
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        
    async def load_model(self):
        """모델을 비동기적으로 로드"""
        if self.model is None:
            # 실제 모델 로딩 로직
            self.model = torch.jit.load("models/model.pt")
            self.model.eval()
            logging.info(f"Model loaded on {self.device}")
    
    async def predict_batch(self, requests: List[str]) -> List[dict]:
        """배치 추론 수행"""
        await self.load_model()
        
        with torch.no_grad():
            # 배치 처리 로직
            results = []
            for text in requests:
                # 실제 추론 로직 구현
                prediction = f"Generated text for: {text}"
                confidence = 0.95
                results.append({
                    "prediction": prediction,
                    "confidence": confidence
                })
        
        return results

model_manager = ModelManager()

@app.on_event("startup")
async def startup_event():
    await model_manager.load_model()

@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    import time
    start_time = time.time()
    
    results = await model_manager.predict_batch([request.text])
    result = results[0]
    
    processing_time = time.time() - start_time
    
    return PredictionResponse(
        prediction=result["prediction"],
        confidence=result["confidence"],
        processing_time=processing_time
    )

@app.get("/health")
async def health_check():
    return {"status": "healthy", "model_loaded": model_manager.model is not None}

Kubernetes를 통한 자동 스케일링 설정

프로덕션 환경에서의 안정적인 서비스를 위해 Kubernetes 기반의 자동 스케일링을 구현합니다.

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-model-api
  labels:
    app: ai-model-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-model-api
  template:
    metadata:
      labels:
        app: ai-model-api
    spec:
      containers:
      - name: api
        image: ai-model-api:latest
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
          limits:
            memory: "4Gi"
            cpu: "2"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-model-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-model-api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

CI/CD 파이프라인 구성

GitHub Actions를 활용한 완전 자동화된 CI/CD 파이프라인을 구성합니다.

# .github/workflows/mlops-pipeline.yml
name: MLOps Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.9'
    
    - name: Install dependencies
      run: |
        pip install -r requirements.txt
        pip install pytest pytest-cov
    
    - name: Run tests
      run: |
        pytest tests/ --cov=src --cov-report=xml
    
    - name: Model validation
      run: |
        python scripts/validate_model.py

  build-and-deploy:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Build Docker image
      run: |
        docker build -t ${{ secrets.REGISTRY_URL }}/ai-model-api:${{ github.sha }} .
    
    - name: Push to registry
      run: |
        echo ${{ secrets.REGISTRY_PASSWORD }} | docker login -u ${{ secrets.REGISTRY_USERNAME }} --password-stdin ${{ secrets.REGISTRY_URL }}
        docker push ${{ secrets.REGISTRY_URL }}/ai-model-api:${{ github.sha }}
    
    - name: Deploy to Kubernetes
      run: |
        kubectl set image deployment/ai-model-api api=${{ secrets.REGISTRY_URL }}/ai-model-api:${{ github.sha }}
        kubectl rollout status deployment/ai-model-api

모델 성능 모니터링 및 로깅

프로덕션 환경에서의 모델 성능을 지속적으로 모니터링하는 것은 매우 중요합니다.

# src/monitoring/metrics.py
from prometheus_client import Counter, Histogram, Gauge
import logging
import time
from functools import wraps

# Prometheus 메트릭 정의
REQUEST_COUNT = Counter('model_requests_total', 'Total model requests', ['method', 'endpoint'])
REQUEST_LATENCY = Histogram('model_request_duration_seconds', 'Request latency')
MODEL_ACCURACY = Gauge('model_accuracy', 'Current model accuracy')
GPU_MEMORY_USAGE = Gauge('gpu_memory_usage_bytes', 'GPU memory usage')

def monitor_performance(func):
    """성능 모니터링 데코레이터"""
    @wraps(func)
    async def wrapper(*args, **kwargs):
        start_time = time.time()
        
        try:
            result = await func(*args, **kwargs)
            REQUEST_COUNT.labels(method='POST', endpoint='/predict').inc()
            return result
        except Exception as e:
            logging.error(f"Prediction error: {str(e)}")
            raise
        finally:
            REQUEST_LATENCY.observe(time.time() - start_time)
    
    return wrapper

class ModelDriftDetector:
    def __init__(self, threshold=0.1):
        self.threshold = threshold
        self.baseline_metrics = {}
    
    def detect_drift(self, current_metrics):
        """모델 드리프트 감지"""
        for metric, value in current_metrics.items():
            if metric in self.baseline_metrics:
                drift = abs(value - self.baseline_metrics[metric]) / self.baseline_metrics[metric]
                if drift > self.threshold:
                    logging.warning(f"Model drift detected for {metric}: {drift:.3f}")
                    return True
        return False

A/B 테스팅과 카나리 배포

새로운 모델 버전의 안전한 배포를 위한 A/B 테스팅과 카나리 배포 전략을 구현합니다.

# src/deployment/canary.py
import random
from typing import Dict, Any
import logging

class CanaryDeployment:
    def __init__(self, traffic_split: float = 0.1):
        self.traffic_split = traffic_split
        self.model_v1 = None  # 기존 모델
        self.model_v2 = None  # 새 모델
        
    async def route_request(self, request: Dict[str, Any]) -> Dict[str, Any]:
        """트래픽 분할 라우팅"""
        use_v2 = random.random() < self.traffic_split
        
        if use_v2 and self.model_v2:
            logging.info("Routing to model v2")
            result = await self.model_v2.predict(request)
            result['model_version'] = 'v2'
        else:
            logging.info("Routing to model v1")
            result = await self.model_v1.predict(request)
            result['model_version'] = 'v1'
            
        return result
    
    def update_traffic_split(self, new_split: float):
        """트래픽 분할 비율 동적 조정"""
        self.traffic_split = new_split
        logging.info(f"Traffic split updated to {new_split:.2%}")

# Istio를 사용한 카나리 배포 설정
canary_config = """
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ai-model-canary
spec:
  hosts:
  - ai-model-api
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: ai-model-api
        subset: v2
  - route:
    - destination:
        host: ai-model-api
        subset: v1
      weight: 90
    - destination:
        host: ai-model-api
        subset: v2
      weight: 10
"""

보안 및 거버넌스

MLOps 파이프라인에서 보안과 거버넌스는 필수적인 요소입니다. 모델 접근 권한 관리, 데이터 암호화, 감사 로깅 등을 구현해야 합니다.

# src/security/auth.py
from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
import jwt
from datetime import datetime, timedelta

security = HTTPBearer()

class SecurityManager:
    def __init__(self, secret_key: str):
        self.secret_key = secret_key
        self.algorithm = "HS256"
    
    def create_access_token(self, data: dict, expires_delta: timedelta = None):
        to_encode = data.copy()
        if expires_delta:
            expire = datetime.utcnow() + expires_delta
        else:
            expire = datetime.utcnow() + timedelta(minutes=15)
        
        to_encode.update({"exp": expire})
        encoded_jwt = jwt.encode(to_encode, self.secret_key, algorithm=self.algorithm)
        return encoded_jwt
    
    def verify_token(self, credentials: HTTPAuthorizationCredentials = Depends(security)):
        try:
            payload = jwt.decode(credentials.credentials, self.secret_key, algorithms=[self.algorithm])
            username: str = payload.get("sub")
            if username is None:
                raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED)
            return username
        except jwt.PyJWTError:
            raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED)

# 모델 접근 권한 관리
class ModelAccessControl:
    def __init__(self):
        self.permissions = {
            "admin": ["read", "write", "deploy"],
            "developer": ["read", "write"],
            "viewer": ["read"]
        }
    
    def check_permission(self, user_role: str, action: str) -> bool:
        return action in self.permissions.get(user_role, [])

마무리

AI 모델 배포를 위한 MLOps 파이프라인 구축은 단순히 모델을 서비스하는 것을 넘어서 전체 생명주기를 관리하는 종합적인 접근이 필요합니다. Docker와 Kubernetes를 활용한 컨테이너화, CI/CD 자동화, 성능 모니터링, 그리고 보안 거버넌스까지 고려한 완전한 파이프라인을 구축함으로써 안정적이고 확장 가능한 AI 서비스를 제공할 수 있습니다. 특히 LLM과 같은 대규모 모델의 경우 리소스 관리와 스케일링이 더욱 중요하므로, 이러한 MLOps 베스트 프랙티스를 적용하여 효율적인 AI 서비스를 구축하시기 바랍니다.

#MLOps#AI Model Deployment#Machine Learning#Docker#Kubernetes

AI Model Deployment MLOps Pipeline 구축 가이드

AI 모델 배포를 위한 MLOps 파이프라인 구축 방법을 실전 코드와 함께 상세히 알아보세요. Docker, Kubernetes, CI/CD를 활용한 완전한 배포 전략을 제시합니다.

AI Model Deployment MLOps Pipeline 구축 가이드

MLOps 파이프라인 아키텍처 설계

# mlops-architecture.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: mlops-config
data:
  pipeline_stages: |
    - data_ingestion
    - data_validation
    - model_training
    - model_evaluation
    - model_deployment
    - monitoring
  deployment_strategy: "blue-green"
  scaling_policy: "horizontal"

Docker를 활용한 모델 컨테이너화

AI 모델의 일관된 배포를 위해서는 컨테이너화가 필수입니다. 다음은 PyTorch 기반 모델을 위한 최적화된 Dockerfile 예시입니다.

# Dockerfile
FROM python:3.9-slim as base

# 시스템 패키지 설치
RUN apt-get update && apt-get install -y \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

# Python 의존성 설치
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# 멀티스테이지 빌드로 최종 이미지 크기 최적화
FROM base as production

WORKDIR /app
COPY src/ ./src/
COPY models/ ./models/
COPY config/ ./config/

# 비루트 사용자 생성
RUN useradd -m -u 1000 mlops && chown -R mlops:mlops /app
USER mlops

EXPOSE 8000
CMD ["uvicorn", "src.api.main:app", "--host", "0.0.0.0", "--port", "8000"]

컨테이너 최적화를 위해 멀티스테이지 빌드를 사용하고, 보안을 위해 비루트 사용자로 실행하는 것이 중요합니다.

FastAPI 기반 모델 서빙 API 구현

모델 서빙을 위한 고성능 API를 구현해보겠습니다. 비동기 처리와 배치 추론을 지원하는 구조입니다.

# src/api/main.py
from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel
import torch
import asyncio
from typing import List, Optional
import logging

app = FastAPI(title="AI Model API", version="1.0.0")

class PredictionRequest(BaseModel):
    text: str
    max_length: Optional[int] = 512
    temperature: Optional[float] = 0.7

class PredictionResponse(BaseModel):
    prediction: str
    confidence: float
    processing_time: float

class ModelManager:
    def __init__(self):
        self.model = None
        self.tokenizer = None
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        
    async def load_model(self):
        """모델을 비동기적으로 로드"""
        if self.model is None:
            # 실제 모델 로딩 로직
            self.model = torch.jit.load("models/model.pt")
            self.model.eval()
            logging.info(f"Model loaded on {self.device}")
    
    async def predict_batch(self, requests: List[str]) -> List[dict]:
        """배치 추론 수행"""
        await self.load_model()
        
        with torch.no_grad():
            # 배치 처리 로직
            results = []
            for text in requests:
                # 실제 추론 로직 구현
                prediction = f"Generated text for: {text}"
                confidence = 0.95
                results.append({
                    "prediction": prediction,
                    "confidence": confidence
                })
        
        return results

model_manager = ModelManager()

@app.on_event("startup")
async def startup_event():
    await model_manager.load_model()

@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    import time
    start_time = time.time()
    
    results = await model_manager.predict_batch([request.text])
    result = results[0]
    
    processing_time = time.time() - start_time
    
    return PredictionResponse(
        prediction=result["prediction"],
        confidence=result["confidence"],
        processing_time=processing_time
    )

@app.get("/health")
async def health_check():
    return {"status": "healthy", "model_loaded": model_manager.model is not None}

Kubernetes를 통한 자동 스케일링 설정

프로덕션 환경에서의 안정적인 서비스를 위해 Kubernetes 기반의 자동 스케일링을 구현합니다.

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-model-api
  labels:
    app: ai-model-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-model-api
  template:
    metadata:
      labels:
        app: ai-model-api
    spec:
      containers:
      - name: api
        image: ai-model-api:latest
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
          limits:
            memory: "4Gi"
            cpu: "2"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-model-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-model-api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

CI/CD 파이프라인 구성

GitHub Actions를 활용한 완전 자동화된 CI/CD 파이프라인을 구성합니다.

# .github/workflows/mlops-pipeline.yml
name: MLOps Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.9'
    
    - name: Install dependencies
      run: |
        pip install -r requirements.txt
        pip install pytest pytest-cov
    
    - name: Run tests
      run: |
        pytest tests/ --cov=src --cov-report=xml
    
    - name: Model validation
      run: |
        python scripts/validate_model.py

  build-and-deploy:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Build Docker image
      run: |
        docker build -t ${{ secrets.REGISTRY_URL }}/ai-model-api:${{ github.sha }} .
    
    - name: Push to registry
      run: |
        echo ${{ secrets.REGISTRY_PASSWORD }} | docker login -u ${{ secrets.REGISTRY_USERNAME }} --password-stdin ${{ secrets.REGISTRY_URL }}
        docker push ${{ secrets.REGISTRY_URL }}/ai-model-api:${{ github.sha }}
    
    - name: Deploy to Kubernetes
      run: |
        kubectl set image deployment/ai-model-api api=${{ secrets.REGISTRY_URL }}/ai-model-api:${{ github.sha }}
        kubectl rollout status deployment/ai-model-api

모델 성능 모니터링 및 로깅

프로덕션 환경에서의 모델 성능을 지속적으로 모니터링하는 것은 매우 중요합니다.

# src/monitoring/metrics.py
from prometheus_client import Counter, Histogram, Gauge
import logging
import time
from functools import wraps

# Prometheus 메트릭 정의
REQUEST_COUNT = Counter('model_requests_total', 'Total model requests', ['method', 'endpoint'])
REQUEST_LATENCY = Histogram('model_request_duration_seconds', 'Request latency')
MODEL_ACCURACY = Gauge('model_accuracy', 'Current model accuracy')
GPU_MEMORY_USAGE = Gauge('gpu_memory_usage_bytes', 'GPU memory usage')

def monitor_performance(func):
    """성능 모니터링 데코레이터"""
    @wraps(func)
    async def wrapper(*args, **kwargs):
        start_time = time.time()
        
        try:
            result = await func(*args, **kwargs)
            REQUEST_COUNT.labels(method='POST', endpoint='/predict').inc()
            return result
        except Exception as e:
            logging.error(f"Prediction error: {str(e)}")
            raise
        finally:
            REQUEST_LATENCY.observe(time.time() - start_time)
    
    return wrapper

class ModelDriftDetector:
    def __init__(self, threshold=0.1):
        self.threshold = threshold
        self.baseline_metrics = {}
    
    def detect_drift(self, current_metrics):
        """모델 드리프트 감지"""
        for metric, value in current_metrics.items():
            if metric in self.baseline_metrics:
                drift = abs(value - self.baseline_metrics[metric]) / self.baseline_metrics[metric]
                if drift > self.threshold:
                    logging.warning(f"Model drift detected for {metric}: {drift:.3f}")
                    return True
        return False

A/B 테스팅과 카나리 배포

새로운 모델 버전의 안전한 배포를 위한 A/B 테스팅과 카나리 배포 전략을 구현합니다.

# src/deployment/canary.py
import random
from typing import Dict, Any
import logging

class CanaryDeployment:
    def __init__(self, traffic_split: float = 0.1):
        self.traffic_split = traffic_split
        self.model_v1 = None  # 기존 모델
        self.model_v2 = None  # 새 모델
        
    async def route_request(self, request: Dict[str, Any]) -> Dict[str, Any]:
        """트래픽 분할 라우팅"""
        use_v2 = random.random() < self.traffic_split
        
        if use_v2 and self.model_v2:
            logging.info("Routing to model v2")
            result = await self.model_v2.predict(request)
            result['model_version'] = 'v2'
        else:
            logging.info("Routing to model v1")
            result = await self.model_v1.predict(request)
            result['model_version'] = 'v1'
            
        return result
    
    def update_traffic_split(self, new_split: float):
        """트래픽 분할 비율 동적 조정"""
        self.traffic_split = new_split
        logging.info(f"Traffic split updated to {new_split:.2%}")

# Istio를 사용한 카나리 배포 설정
canary_config = """
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ai-model-canary
spec:
  hosts:
  - ai-model-api
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: ai-model-api
        subset: v2
  - route:
    - destination:
        host: ai-model-api
        subset: v1
      weight: 90
    - destination:
        host: ai-model-api
        subset: v2
      weight: 10
"""

보안 및 거버넌스

MLOps 파이프라인에서 보안과 거버넌스는 필수적인 요소입니다. 모델 접근 권한 관리, 데이터 암호화, 감사 로깅 등을 구현해야 합니다.

# src/security/auth.py
from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
import jwt
from datetime import datetime, timedelta

security = HTTPBearer()

class SecurityManager:
    def __init__(self, secret_key: str):
        self.secret_key = secret_key
        self.algorithm = "HS256"
    
    def create_access_token(self, data: dict, expires_delta: timedelta = None):
        to_encode = data.copy()
        if expires_delta:
            expire = datetime.utcnow() + expires_delta
        else:
            expire = datetime.utcnow() + timedelta(minutes=15)
        
        to_encode.update({"exp": expire})
        encoded_jwt = jwt.encode(to_encode, self.secret_key, algorithm=self.algorithm)
        return encoded_jwt
    
    def verify_token(self, credentials: HTTPAuthorizationCredentials = Depends(security)):
        try:
            payload = jwt.decode(credentials.credentials, self.secret_key, algorithms=[self.algorithm])
            username: str = payload.get("sub")
            if username is None:
                raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED)
            return username
        except jwt.PyJWTError:
            raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED)

# 모델 접근 권한 관리
class ModelAccessControl:
    def __init__(self):
        self.permissions = {
            "admin": ["read", "write", "deploy"],
            "developer": ["read", "write"],
            "viewer": ["read"]
        }
    
    def check_permission(self, user_role: str, action: str) -> bool:
        return action in self.permissions.get(user_role, [])

마무리

#MLOps#AI Model Deployment#Machine Learning#Docker#Kubernetes

AI Model Deployment MLOps Pipeline 구축 가이드

AI Model Deployment MLOps Pipeline 구축 가이드

MLOps 파이프라인 아키텍처 설계

Docker를 활용한 모델 컨테이너화

FastAPI 기반 모델 서빙 API 구현

Kubernetes를 통한 자동 스케일링 설정

CI/CD 파이프라인 구성

모델 성능 모니터링 및 로깅

A/B 테스팅과 카나리 배포

보안 및 거버넌스

마무리

관련 게시글

LLM Fine-tuning vs RAG: 최적의 AI 전략 선택 가이드

Fine-tuning vs. RAG: LLM 애플리케이션 최적화 선택 가이드

LangChain AI Agent 심층 가이드: LLM 기반 자율 에이전트 구축

AI Model Deployment MLOps Pipeline 구축 가이드

AI Model Deployment MLOps Pipeline 구축 가이드

MLOps 파이프라인 아키텍처 설계

Docker를 활용한 모델 컨테이너화

FastAPI 기반 모델 서빙 API 구현

Kubernetes를 통한 자동 스케일링 설정

CI/CD 파이프라인 구성

모델 성능 모니터링 및 로깅

A/B 테스팅과 카나리 배포

보안 및 거버넌스

마무리

관련 게시글

LLM Fine-tuning vs RAG: 최적의 AI 전략 선택 가이드

Fine-tuning vs. RAG: LLM 애플리케이션 최적화 선택 가이드

LangChain AI Agent 심층 가이드: LLM 기반 자율 에이전트 구축