构建基于BDD与OpenFaaS的函数即服务内部开发者平台

平台工程

文章字数: 3.7k

阅读时长: 17 分

团队引入OpenFaaS的初衷是好的：降低后端开发的复杂度，让开发者专注业务逻辑。但现实是，我们很快陷入了另一个困境。开发者不再写Dockerfile，但他们开始编写、调试和维护越来越复杂的stack.yml文件。CI/CD流水线也变得脆弱，充满了faas-cli的shell脚本，任何微小的环境差异都可能导致部署失败。新成员的上手曲线陡峭，Serverless带来的敏捷性优势几乎被这种“工具链摩擦”所抵消。

痛点很明确：我们需要一个能屏蔽底层复杂度的内部开发者平台（IDP）。目标不是替代faas-cli，而是为80%的通用场景提供一个“一键式”的图形化发布路径。一个开发者应该能够通过Web界面，提交他的函数代码，然后由平台负责后续的测试、打包、发布和验证。这个想法成了我们团队一个季度性的技术项目。

初步构想与技术选型决策

这个内部平台的核心是一条自动化的工作流。我们首先用行为驱动开发（BDD）的思路，以用户故事的形式定义了核心功能：

Feature: Self-Service Function Deployment

Scenario: A developer successfully deploys a new Node.js function
  Given the developer is on the "New Function" page
  And they have selected the "node18-express" template
  When they provide a valid function name "user-profile-service"
  And they paste their function's source code into the editor
  And they click the "Deploy" button
  Then the system should initiate the deployment process
  And after a short period, they should see a "Deployment Successful" notification
  And the function "user-profile-service" should be available and responsive at its gateway endpoint

这个Gherkin剧本成为了我们架构设计的北极星。它清晰地描述了前端交互、后端处理和最终状态验证，这直接影响了我们的技术选型：

前端界面 (UI): Shadcn UI
我们不需要一个重量级、自带设计系统的UI库。我们需要的是能快速构建出干净、专业界面的基础组件，并且对这些组件有完全的控制权。Shadcn UI的模式——它不是一个依赖库，而是一系列你可以直接复制到自己代码库中的组件——完美符合我们的要求。这避免了组件库升级带来的麻烦，也让我们能轻松地根据内部设计规范进行定制。在真实项目中，这种可维护性和控制力远比开箱即用的主题重要。
后端工作流编排: OpenFaaS Function
用一个OpenFaaS函数来部署另一个OpenFaaS函数，这听起来有点“套娃”，但实际上是一个非常务实的选择。我们的IDP后端不需要7x24小时运行，它本质上是一个事件触发的流程。将其本身实现为一个Serverless函数（我们称之为deployer-function），可以复用现有的OpenFaaS基础设施，避免了额外维护一个常驻服务的成本。这个deployer-function将成为一个面向平台API，接收前端请求，并在后台执行faas-cli命令或直接调用OpenFaaS API来完成部署。
测试框架: Vitest
这个项目横跨前后端，我们需要一个能同时处理React组件测试和Node.js后端逻辑测试的工具。Vitest以其闪电般的速度和与Vite生态的无缝集成脱颖而出。更关键的是，我们可以利用它来编写端到端的测试，直接验证上面定义的BDD剧本。一个测试框架就能覆盖单元、集成和端到端测试，极大地简化了我们的开发工作流。

步骤化实现：从前端到后端的完整链路

我们的项目采用pnpm monorepo结构，包含两个主要部分：portal（React前端应用）和functions（OpenFaaS函数）。

/faas-idp
├── apps/
│   └── portal/         # React + Shadcn UI 前端
│       ├── src/
│       ├── package.json
│       └── ...
├── functions/
│   ├── deployer/       # 部署器函数
│   │   ├── handler.js
│   │   ├── package.json
│   │   └── Dockerfile
│   └── stack.yml       # OpenFaaS 函数定义
├── package.json
└── pnpm-workspace.yaml

1. 前端界面：构建部署表单

我们首先使用create-vite初始化portal应用，然后通过shadcn-ui的CLI引入我们需要的组件：Card, Input, Textarea, Button, Label, Select, 和 Toast。

核心组件是 DeploymentForm.tsx，它负责收集用户输入。

jsx
// apps/portal/src/components/DeploymentForm.tsx

import { useState } from "react";
import { Button } from "@/components/ui/button";
import { Card, CardContent, CardDescription, CardFooter, CardHeader, CardTitle } from "@/components/ui/card";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Textarea } from "@/components/ui/textarea";
import { useToast } from "@/components/ui/use-toast";

// 在真实项目中，这个地址应该是可配置的
const DEPLOYER_FUNCTION_URL = "http://<your-openfaas-gateway>/function/deployer";

interface DeploymentPayload {
  functionName: string;
  template: string;
  handlerCode: string;
}

export function DeploymentForm() {
  const [functionName, setFunctionName] = useState("");
  const [handlerCode, setHandlerCode] = useState(`module.exports = async (event, context) => {\n  return context\n    .status(200)\n    .succeed("Hello from your new function!");\n}`);
  const [isLoading, setIsLoading] = useState(false);
  const { toast } = useToast();

  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault();
    if (!functionName || !handlerCode) {
      toast({
        title: "Validation Error",
        description: "Function name and code cannot be empty.",
        variant: "destructive",
      });
      return;
    }
    setIsLoading(true);

    const payload: DeploymentPayload = {
      functionName,
      template: "node18-express", // 简化处理，实际应为Select组件
      handlerCode,
    };

    try {
      const response = await fetch(DEPLOYER_FUNCTION_URL, {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
        },
        body: JSON.stringify(payload),
      });

      if (!response.ok) {
        // 尝试解析后端返回的错误信息
        const errorData = await response.json();
        throw new Error(errorData.message || `Deployment failed with status: ${response.status}`);
      }

      const result = await response.json();
      
      toast({
        title: "Deployment Successful",
        description: `Function ${result.functionName} is now available at ${result.url}`,
      });
      setFunctionName(""); // 清空表单，准备下一次部署
    
    } catch (error) {
      // 这里的错误处理至关重要
      const errorMessage = error instanceof Error ? error.message : "An unknown error occurred.";
      console.error("Deployment failed:", errorMessage);
      toast({
        title: "Deployment Failed",
        description: errorMessage,
        variant: "destructive",
      });
    } finally {
      setIsLoading(false);
    }
  };

  return (
    <Card className="w-[650px]">
      <CardHeader>
        <CardTitle>Deploy a New Function</CardTitle>
        <CardDescription>Provide details for your new serverless function.</CardDescription>
      </CardHeader>
      <form onSubmit={handleSubmit}>
        <CardContent>
          <div className="grid w-full items-center gap-4">
            <div className="flex flex-col space-y-1.5">
              <Label htmlFor="name">Function Name</Label>
              <Input 
                id="name" 
                placeholder="e.g., user-profile-service" 
                value={functionName}
                onChange={(e) => setFunctionName(e.target.value.toLowerCase().replace(/\s/g, '-'))}
                disabled={isLoading}
              />
            </div>
            <div className="flex flex-col space-y-1.5">
              <Label htmlFor="code">Handler Code (handler.js)</Label>
              <Textarea 
                id="code" 
                placeholder="Paste your Node.js function code here"
                className="font-mono h-[300px]"
                value={handlerCode}
                onChange={(e) => setHandlerCode(e.target.value)}
                disabled={isLoading}
              />
            </div>
          </div>
        </CardContent>
        <CardFooter className="flex justify-end">
          <Button type="submit" disabled={isLoading}>
            {isLoading ? "Deploying..." : "Deploy"}
          </Button>
        </CardFooter>
      </form>
    </Card>
  );
}

代码中的错误处理和加载状态管理是生产级UI的必备要素。用户必须清楚地知道当前系统状态，以及操作失败的原因。

2. `deployer-function`: 核心部署逻辑

这个函数是整个平台的心脏。它需要被授予足够的权限来执行faas-cli命令。我们为它构建了一个自定义的Dockerfile，将faas-cli包含进去。

functions/deployer/handler.js:

// functions/deployer/handler.js
'use strict'

const { exec } = require('child_process');
const fs = require('fs').promises;
const path = require('path');
const os = require('os');

// 安全性提示：在生产环境中，这些变量应通过环境变量或OpenFaaS secrets传入
const OPENFAAS_GATEWAY = process.env.OPENFAAS_GATEWAY || 'http://gateway.openfaas:8080';
const TEMP_DIR_BASE = os.tmpdir();

/**
 * 异步执行shell命令的辅助函数
 * @param {string} command - The command to execute
 * @returns {Promise<{stdout: string, stderr: string}>}
 */
const execAsync = (command) => {
  return new Promise((resolve, reject) => {
    exec(command, (error, stdout, stderr) => {
      if (error) {
        console.error(`Exec error for command "${command}": ${stderr}`);
        // 将stderr作为错误信息的一部分，这对于调试至关重要
        reject(new Error(`Command failed: ${stderr || error.message}`));
        return;
      }
      resolve({ stdout, stderr });
    });
  });
};

module.exports = async (event, context) => {
  let tempDir = '';
  try {
    const { functionName, template, handlerCode } = event.body;

    // 1. 输入验证
    if (!functionName || !/^[a-z0-9]([-a-z0-9]*[a-z0-9])?$/.test(functionName)) {
      return context.status(400).fail({ message: 'Invalid function name. Must be lowercase, alphanumeric, and start with a letter.' });
    }
    if (!template || !handlerCode) {
      return context.status(400).fail({ message: 'Template and handler code are required.' });
    }

    // 2. 创建一个唯一的临时工作目录
    tempDir = await fs.mkdtemp(path.join(TEMP_DIR_BASE, `faas-deploy-${functionName}-`));
    console.log(`Created temporary directory: ${tempDir}`);

    // 3. 使用 faas-cli 创建函数骨架
    // 这里的 --gateway 是为了确保cli知道要和哪个OpenFaaS实例通信
    console.log(`Pulling template: ${template}`);
    await execAsync(`faas-cli template store pull ${template}`);

    console.log(`Creating new function skeleton for: ${functionName}`);
    await execAsync(`faas-cli new ${functionName} --lang ${template} --gateway ${OPENFAAS_GATEWAY}`, { cwd: tempDir });
    
    const functionDirPath = path.join(tempDir, functionName);
    
    // 4. 将用户代码写入handler文件
    const handlerFilePath = path.join(functionDirPath, 'handler.js');
    await fs.writeFile(handlerFilePath, handlerCode);
    console.log(`Wrote user code to ${handlerFilePath}`);
    
    // 5. 部署函数
    // 这里的stack file是动态生成的，就在临时目录里
    const stackFilePath = path.join(tempDir, `${functionName}.yml`);
    await execAsync(`faas-cli deploy -f ${stackFilePath} --gateway ${OPENFAAS_GATEWAY}`);
    console.log(`Deployment command issued for ${functionName}`);

    // 6. 返回成功响应
    const functionUrl = `${OPENFAAS_GATEWAY}/function/${functionName}`;
    return context
      .status(200)
      .succeed({ 
        message: 'Deployment initiated successfully.',
        functionName: functionName,
        url: functionUrl
      });

  } catch (err) {
    // 关键的错误捕获和日志记录
    console.error('Deployment process failed:', err.message);
    // 向前端返回有意义的错误信息
    return context
      .status(500)
      .fail({ 
          message: 'Internal deployment error.',
          details: err.message 
      });
  } finally {
    // 7. 清理临时文件，无论成功与否
    if (tempDir) {
      try {
        await fs.rm(tempDir, { recursive: true, force: true });
        console.log(`Cleaned up temporary directory: ${tempDir}`);
      } catch (cleanupErr) {
        console.error(`Failed to cleanup temp directory ${tempDir}:`, cleanupErr);
      }
    }
  }
}

这个handler.js的健壮性体现在：

详细的日志: 每一步操作都有日志输出，便于排查问题。
严格的输入验证: 防止无效的函数名或数据注入。
临时文件管理: 在临时目录中操作，避免并发请求之间的冲突，并通过finally块确保清理，防止磁盘空间被占满。
清晰的错误传递: exec的stderr被捕获并返回给前端，而不是一个模糊的“服务器错误”。

3. 实现BDD测试：用Vitest验证完整流程

这是将所有部分粘合在一起的关键。我们将在portal应用中创建一个端到端测试文件，它将模拟用户行为，并验证整个系统的反应。

我们将使用msw (Mock Service Worker) 来拦截对deployer-function的网络请求，这样我们的测试就不需要一个真正运行的OpenFaaS环境，使其更快、更可靠。

apps/portal/src/e2e/deployment.spec.ts:

// apps/portal/src/e2e/deployment.spec.ts
import { describe, it, expect, beforeAll, afterAll, afterEach } from 'vitest';
import { render, screen, fireEvent, waitFor } from '@testing-library/react';
import { userEvent } from '@testing-library/user-event';
import { App } from '@/App'; // 假设App组件渲染了DeploymentForm
import { rest } from 'msw';
import { setupServer } from 'msw/node';

const DEPLOYER_FUNCTION_URL = "http://localhost/function/deployer"; // 使用本地或mock地址

// 模拟成功的响应
const successResponse = rest.post(DEPLOYER_FUNCTION_URL, (req, res, ctx) => {
  return res(
    ctx.status(200),
    ctx.json({
      message: 'Deployment initiated successfully.',
      functionName: 'test-function',
      url: `${DEPLOYER_FUNCTION_URL}/test-function`
    })
  );
});

// 模拟失败的响应
const errorResponse = rest.post(DEPLOYER_FUNCTION_URL, (req, res, ctx) => {
    return res(
        ctx.status(500),
        ctx.json({
            message: 'Internal deployment error.',
            details: 'Command failed: faas-cli exited with code 1.'
        })
    );
});


const server = setupServer();

// Vitest生命周期钩子
beforeAll(() => server.listen({ onUnhandledRequest: 'error' }));
afterAll(() => server.close());
afterEach(() => server.resetHandlers());


describe('Feature: Self-Service Function Deployment', () => {

  it('Scenario: A developer successfully deploys a new Node.js function', async () => {
    server.use(successResponse);
    const user = userEvent.setup();

    // Given the developer is on the "New Function" page
    render(<App />);
    expect(screen.getByRole('heading', { name: /Deploy a New Function/i })).toBeInTheDocument();
    
    // And they have selected the "node18-express" template (simplified)
    
    // When they provide a valid function name "user-profile-service"
    const nameInput = screen.getByLabelText(/Function Name/i);
    await user.type(nameInput, 'user-profile-service');

    // And they paste their function's source code into the editor
    const codeEditor = screen.getByLabelText(/Handler Code/i);
    await user.clear(codeEditor); // 清除默认代码
    await user.type(codeEditor, 'module.exports = async () => ({ status: "ok" });');

    // And they click the "Deploy" button
    const deployButton = screen.getByRole('button', { name: /Deploy/i });
    await user.click(deployButton);

    // Then the system should initiate the deployment process
    expect(screen.getByRole('button', { name: /Deploying.../i })).toBeDisabled();

    // And after a short period, they should see a "Deployment Successful" notification
    await waitFor(() => {
      expect(screen.getByText(/Deployment Successful/i)).toBeInTheDocument();
    }, { timeout: 3000 });

    // And the form should be reset
    expect(nameInput).toHaveValue('');
  });

  it('Scenario: Deployment fails due to a server-side error', async () => {
    server.use(errorResponse);
    const user = userEvent.setup();

    // Given I am on the deployment page
    render(<App />);
    
    // When I fill the form and submit
    await user.type(screen.getByLabelText(/Function Name/i), 'failing-function');
    await user.click(screen.getByRole('button', { name: /Deploy/i }));

    // Then I should see a detailed error notification
    await waitFor(() => {
        expect(screen.getByText(/Deployment Failed/i)).toBeInTheDocument();
        // 验证错误详情是否被展示，这对于用户体验非常重要
        expect(screen.getByText(/Command failed: faas-cli exited with code 1./i)).toBeInTheDocument();
    });

    // And the deploy button should be enabled again
    expect(screen.getByRole('button', { name: /Deploy/i })).not.toBeDisabled();
  });
});

这个测试文件完美地将BDD剧本转化为了可执行的代码。它验证了UI的状态变化、API的调用、成功和失败的流程，以及给用户的反馈是否清晰。这种测试是项目质量的最后一道防线。

架构流程图

为了更清晰地展示整个工作流程，我们可以用Mermaid图来表示：

sequenceDiagram
    participant User as Developer
    participant Portal as Shadcn UI Portal
    participant Deployer as Deployer Function (OpenFaaS)
    participant OpenFaaS as OpenFaaS Gateway/Core
    participant Registry as Container Registry
    participant Cluster as Kubernetes/Container Runtime

    User->>Portal: Fills form and clicks "Deploy"
    Portal->>Deployer: POST / with JSON payload (name, code)
    activate Deployer
    Deployer->>Deployer: Creates temporary directory
    Deployer->>Deployer: Executes `faas-cli new`
    Deployer->>Deployer: Writes user code to handler.js
    Deployer->>Deployer: Executes `faas-cli build`
    Note over Deployer,Cluster: Builds container image locally
    Deployer->>Registry: Executes `faas-cli push`
    Registry-->>Deployer: Image pushed
    Deployer->>OpenFaaS: Executes `faas-cli deploy`
    activate OpenFaaS
    OpenFaaS->>Cluster: Creates/Updates Deployment & Service
    Cluster-->>OpenFaaS: Pods are running
    OpenFaaS-->>Deployer: Deployment acknowledged
    deactivate OpenFaaS
    Deployer-->>Portal: 200 OK with function URL
    deactivate Deployer
    Portal->>User: Shows "Success" Toast notification

遗留问题与未来迭代方向

这个v1版本的IDP平台虽然解决了核心痛点，但作为一个务实的工程方案，它依然存在一些局限性和可以改进的地方。

首先，deployer-function内部直接执行exec('faas-cli ...')的方式，虽然简单直接，但在安全性和可扩展性上存在隐患。一个更健壮的方案是让deployer-function直接通过HTTP请求与OpenFaaS Gateway的REST API进行交互，而不是依赖CLI工具。这将消除对faas-cli二进制文件的依赖，并提供更精细的错误控制。

其次，当前的部署过程是同步的。前端发起请求后会一直等待deployer-function执行完毕。对于复杂的函数（例如，需要安装很多依赖的），这个过程可能会超过HTTP的超时时间。未来的迭代方向是转向异步工作流。前端提交请求后，deployer-function立即返回一个任务ID，并触发一个NATS JetStream或Kafka消息。由另一个或一组worker函数订阅该消息来执行实际的部署工作，并通过WebSocket或轮询将部署状态（如BUILDING, PUSHING, DEPLOYING, READY）实时更新回前端。

最后，测试覆盖还不够完整。当前的端到端测试依赖于msw来mock后端，这无法发现deployer-function本身的逻辑错误。一个更完整的CI流程应该包含一个集成测试阶段，它会在一个临时的、隔离的OpenFaaS环境中（例如使用kind或k3d启动一个本地集群）真实地部署deployer-function并运行测试，验证从UI点击到新函数成功部署的整个物理链路。