33.1.1 什么是 LLM 网关#

LLM 网关是位于 Claude Code 和模型提供商之间的中间层，提供集中式的模型访问管理。它充当代理，处理所有与 LLM API 的交互，为企业提供额外的控制和管理能力。

LLM 网关的核心功能#

集中身份验证：统一管理所有用户的 API 密钥和凭证
使用情况跟踪：监控跨团队和项目的使用情况
成本控制：实施预算限制和速率限制
审计日志：记录所有模型交互以满足合规要求
模型路由：在不同提供商和模型之间动态切换
负载均衡：在多个 API 端点之间分配请求
缓存优化：缓存常见查询以减少成本和延迟

LLM 网关架构#

┌─────────────────────────────────────────┐ │ Claude Code 客户端 │ │ (多个用户、多个会话) │ └─────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────┐ │ LLM 网关 │ │ (身份验证、路由、缓存、监控) │ │ ┌──────────────────────────────┐ │ │ │ 认证层 │ │ │ │ (API 密钥、SSO、MFA) │ │ │ └──────────────────────────────┘ │ │ ┌──────────────────────────────┐ │ │ │ 路由层 │ │ │ │ (模型选择、负载均衡) │ │ │ └──────────────────────────────┘ │ │ ┌──────────────────────────────┐ │ │ │ 缓存层 │ │ │ │ (查询缓存、响应缓存) │ │ │ └──────────────────────────────┘ │ │ ┌──────────────────────────────┐ │ │ │ 监控层 │ │ │ │ (日志、指标、告警) │ │ │ └──────────────────────────────┘ │ └─────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────┐ │ 模型提供商 │ │ (Anthropic、Bedrock、Vertex AI) │ └─────────────────────────────────────────┘

33.1.2 LLM 网关的优势#

与直接访问对比#

特性	直接访问	LLM 网关
身份验证	每个 API 密钥	集中管理
成本跟踪	分散	统一
速率限制	按用户	按团队/项目
审计日志	有限	完整
模型切换	手动	自动
缓存	无	有
负载均衡	无	有
故障转移	无	有

企业级优势#

成本优化

集中计费和预算控制
智能缓存减少重复查询
按使用模式优化模型选择

安全增强

统一身份验证和授权
完整的审计追踪
数据脱敏和过滤

运营效率

集中配置管理
统一监控和告警
简化的用户管理

合规支持

详细的审计日志
数据驻留控制
访问控制策略

33.1.3 LLM 网关类型#

1. 开源网关#

LiteLLM

LiteLLM 是一个流行的开源 LLM 网关，支持多个提供商：特点：

支持 100+ LLM 提供商
统一的 API 接口
内置缓存和速率限制
成本跟踪和预算控制
易于部署和配置 适用场景：
需要多提供商支持
希望快速部署
预算有限

LangServe

LangServe 是 LangChain 的服务器组件：特点：

与 LangChain 深度集成
支持自定义链和代理
实时流式响应
灵活的部署选项 适用场景：
使用 LangChain 生态
需要自定义处理逻辑
构建复杂的应用

2. 商业网关#

Azure AI Gateway

微软提供的托管网关服务：特点：

完全托管
企业级 SLA
集成 Azure 生态
高级安全功能 适用场景：
使用 Azure 基础设施
需要托管服务
要求高可用性

AWS Bedrock Gateway

AWS 提供的网关服务：特点：

与 AWS 服务集成
原生 IAM 支持
CloudWatch 监控
自动扩展 适用场景：
使用 AWS 基础设施
需要与 AWS 集成
要求企业级功能

3. 自建网关#

企业可以构建自己的 LLM 网关：优势：

完全控制
自定义功能
无供应商锁定挑战：
需要开发和维护
需要专业知识
持续更新成本

33.1.4 网关选择决策#

决策因素#


python
python

class GatewaySelector:
    """网关选择器"""

    def __init__(self):
        self.gateways = {
            'litellm': {
                'type': 'open_source',
                'cost': 'low',
                'complexity': 'low',
                'features': ['caching', 'rate_limiting', 'cost_tracking'],
                'providers': ['anthropic', 'bedrock', 'vertex', 'openai', 'cohere']
            },
            'langserve': {
                'type': 'open_source',
                'cost': 'low',
                'complexity': 'medium',
                'features': ['streaming', 'custom_chains', 'langchain_integration'],
                'providers': ['anthropic', 'openai', 'cohere']
            },
            'azure_gateway': {
                'type': 'commercial',
                'cost': 'high',
                'complexity': 'low',
                'features': ['managed', 'sla', 'azure_integration'],
                'providers': ['anthropic', 'openai', 'azure_openai']
            },
            'aws_gateway': {
                'type': 'commercial',
                'cost': 'high',
                'complexity': 'low',
                'features': ['managed', 'iam_integration', 'cloudwatch'],
                'providers': ['anthropic', 'bedrock', 'ai21']
            },
            'custom': {
                'type': 'custom',
                'cost': 'medium',
                'complexity': 'high',
                'features': ['full_control', 'custom_features'],
                'providers': ['all']
            }
        }

    def select(self, requirements: Dict) -> GatewayRecommendation:
        """选择网关"""
        scores = {}

        # 评估每个网关
        for gateway, metadata in self.gateways.items():
            score = self._evaluate_gateway(gateway, metadata, requirements)
            scores[gateway] = score

        # 选择最佳网关
        best_gateway = max(scores, key=scores.get)

        return GatewayRecommendation(
            gateway=best_gateway,
            score=scores[best_gateway],
            reasoning=self._generate_reasoning(best_gateway, requirements),
            alternatives=self._get_alternatives(scores, best_gateway)
        )

    def _evaluate_gateway(self,
                        gateway: str,
                        metadata: Dict,
                        requirements: Dict) -> float:
        """评估网关"""
        score = 0.0

        # 成本因素
        cost_preference = requirements.get('cost_preference', 'medium')
        cost_scores = {'low': 3, 'medium': 2, 'high': 1}
        score += cost_scores.get(metadata['cost'], 2)

        # 复杂度因素
        complexity_preference = requirements.get('complexity_preference', 'medium')
        complexity_scores = {'low': 3, 'medium': 2, 'high': 1}
        score += complexity_scores.get(metadata['complexity'], 2)

        # 功能匹配
        required_features = requirements.get('required_features', [])
        feature_match = len(
            set(required_features) & set(metadata['features'])
        ) / len(required_features) if required_features else 1.0
        score += feature_match * 2

        # 提供商支持
        required_providers = requirements.get('required_providers', [])
        if required_providers:
            provider_match = len(
                set(required_providers) & set(metadata['providers'])
            ) / len(required_providers)
            score += provider_match * 2

        return score

    def _generate_reasoning(self,
                           gateway: str,
                           requirements: Dict) -> str:
        """生成选择理由"""
        metadata = self.gateways[gateway]

        reasons = []

        if metadata['cost'] == requirements.get('cost_preference'):
            reasons.append(f"Matches cost preference ({metadata['cost']})")

        if metadata['complexity'] == requirements.get('complexity_preference'):
            reasons.append(f"Matches complexity preference ({metadata['complexity']})")

        if 'full_control' in metadata['features']:
            reasons.append("Provides full control over functionality")

        if 'managed' in metadata['features']:
            reasons.append("Fully managed service with SLA")

        return '; '.join(reasons) if reasons else "Best overall match"

```### 选择矩阵

| 需求 | LiteLLM | LangServe | Azure Gateway | AWS Gateway | 自建 |
|-------|----------|-----------|---------------|--------------|-------|
| 低成本 | ✓ | ✓ | ✗ | ✗ | ✓ |
| 快速部署 | ✓ | ✓ | ✓ | ✓ | ✗ |
| 多提供商 | ✓ | ✓ | ✗ | ✗ | ✓ |
| 完全托管 | ✗ | ✗ | ✓ | ✓ | ✗ |
| 自定义功能 | ✗ | ✓ | ✗ | ✗ | ✓ |
| 低维护 | ✓ | ✓ | ✓ | ✓ | ✗ |
| 企业 SLA | ✗ | ✗ | ✓ | ✓ | ✗ |

## 33.1.5 部署前准备

### 需求评估

class GatewayRequirements:
"""网关需求评估"""
def __init__(self):
self.requirements = {
'users': 0,
'requests_per_day': 0,
'providers': [],
'features': [],

'budget': 0.0, 'sla_requirement': None } def assess(self, deployment_data: Dict) -> RequirementsReport: """评估需求""" report = RequirementsReport()

评估用户数量

report.users = deployment_data.get('users', 10)

评估请求量

report.requests_per_day = deployment_data.get('requests_per_day', 1000)

评估提供商需求

report.providers = deployment_data.get('providers', ['anthropic'])

评估功能需求

report.features = deployment_data.get('features', [])

评估预算

report.budget = deployment_data.get('budget', 1000.0)

评估 SLA 需求

report.sla_requirement = deployment_data.get('sla_requirement', '99.9%')

生成基础设施需求

report.infrastructure = self._calculate_infrastructure_needs(report)

生成成本估算

report.estimated_cost = self._estimate_cost(report) return report def _calculate_infrastructure_needs(self, report: RequirementsReport) -> InfrastructureNeeds: """计算基础设施需求""" needs = InfrastructureNeeds()

CPU 需求

needs.cpu = max(2, report.users // 50)

内存需求

needs.memory = max(4, report.users // 25)

存储需求

needs.storage = max(20, report.requests_per_day // 100)

网络带宽

needs.bandwidth = max(10, report.requests_per_day // 100) return needs def _estimate_cost(self, report: RequirementsReport) -> CostEstimate: """估算成本""" estimate = CostEstimate()

基础设施成本

estimate.infrastructure_cost = ( needs.cpu * 20 + # $20 per CPU core per month needs.memory * 5 + # $5 per GB RAM per month needs.storage * 0.1 + # $0.1 per GB per month needs.bandwidth * 10 # $10 per Mbps per month )

API 成本

estimate.api_cost = report.requests_per_day * 30 * 0.001 # $0.001 per request

网关许可成本（如适用）

estimate.license_cost = 0.0

总成本

estimate.total_cost = ( estimate.infrastructure_cost + estimate.api_cost + estimate.license_cost ) return estimate

Claude Code 深度教程

33.1 LLM 网关概述