在Ubuntu24上安装FireCrawl

在Ubuntu24上安装FireCrawl

5 min read

本流程的最终目标配置如下:

  1. Ubuntu 24 上安装 FireCrowl(API 服务器 & 工作进程)
  2. 在系统范围内安装 Node.js,全局使用 pnpm
  3. 部署 Rust 工具链(rustup 等),构建 FireCrawl 的 Rust 版 HTML 转换库(html-transformer)
  4. 安装 Playwright 的依赖包,并创建自定义的 Playwright 微服务脚本
  5. 创建两个用于启动 FireCrowl 的 systemd 服务(服务器用 & 工作进程用),实现:
    • 操作系统启动时自动启动
    • 可手动执行 sudo systemctl restart firecrowl-server / sudo systemctl restart firecrowl-workers 等命令
                                      (可选)
                                 +-----------------+
                                 |      Dify       |
                                 +-----------------+
                                          |
                                          | HTTP / REST API
                                          v
+------------------------------------------------------------------+
| FireCrawl (Node.js, TypeScript, pnpm)                            |
|  目录: /home/firecrawl/apps/api                                   |
|                                                                  |
|  +----------------------+      (pnpm run start)                 |
|  |   API 服务器         |-----------------------------------------+
|  | - Express            |                                         |
|  | - BullMQ Dashboard   |                                         |
|  +----------------------+                                         |
|             ^                                                    |
|             | (队列任务)                                          |
|  +----------------------+      (pnpm run workers)                |
|  |   Workers            |-----------------------------------------+
|  | - 抓取、解析、       |                                         |
|  |   索引               |                                         |
|  +----------------------+                                         |
|                                                                  |
|  - 在.env中配置的各种选项(API密钥、PORT、HOST、                |
|    PLAYWRIGHT_MICROSERVICE_URL等)                                |
|                                                                  |
|  - Rust HTML Transformer:                                        |
|      -> 通过Cargo构建于:                                          |
|         /home/firecrawl/apps/api/sharedLibs/html-transformer     |
|      -> 生成: libhtml_transformer.so                              |
|      -> FireCrawl用于快速HTML解析(或回退到Cheerio)               |
+------------------------------------------------------------------+
                                          |
                                          | 任务队列 / 速率限制
                                          v
+----------------------------------+
|        Redis (localhost)         |
|  - 用于BullMQ作业队列            |
+----------------------------------+


+---------------------------------------------+
| systemd (Ubuntu 24)                        |
|                                             |
|  +--------------------------+               |
|  | firecrowl-server         |               |
|  | - ExecStart=pnpm run start|               |
|  +--------------------------+               |
|  +--------------------------+               |
|  | firecrowl-workers        |               |
|  | - ExecStart=pnpm run     |               |
|  |   workers                |               |
|  +--------------------------+               |
|  +--------------------------+               |
|  | firecrowl-playwright     |               |
|  | - ExecStart=pnpm run     |               |
|  |   playwright-service     |               |
|  +--------------------------+               |
|  (开机自动启动、进程监控、                   |
|   通过systemd管理日志)                       |
+---------------------------------------------+

前提条件:

  • 在Ubuntu 24(代号”noble”)上不使用Docker运行
  • 操作系统用户具有管理员权限
  • 假定将FireCrawl仓库克隆到 /home/firecrawl
  • Redis已安装(sudo apt install -y redis-server)并正在运行

STEP1. 系统准备及 Node.js / pnpm 安装 #

STEP1.1 系统更新和开发工具安装 #

sudo apt update
sudo apt install -y build-essential pkg-config curl git libssl-dev

STEP1.2 从 NodeSource 安装 Node.js #

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs

STEP1.3 全局安装 pnpm #

sudo npm install -g pnpm

※请使用 which pnpm 确认执行路径(例: /usr/local/bin/pnpm)。


STEP2. FireCrawl 仓库克隆及依赖包安装 #

STEP2.1 克隆 FireCrawl #

cd /home/firecrawl
git clone https://github.com/mendableai/firecrawl.git

STEP2.2 依赖包安装 #

cd /home/firecrawl/apps/api
pnpm install

STEP2.3 .env 文件配置 #

创建或编辑 /home/firecrawl/apps/api/.env,设置必要的环境变量。例:

# ===== Required ENVS ======
NUM_WORKERS_PER_QUEUE=8
PORT=3002
HOST=0.0.0.0
REDIS_URL=redis://localhost:6379
REDIS_RATE_LIMIT_URL=redis://localhost:6379
PLAYWRIGHT_MICROSERVICE_URL=http://localhost:3000/html
USE_DB_AUTHENTICATION=false

# ===== Optional ENVS ======
TEST_API_KEY=fc-bestnet
BULL_AUTH_KEY=fc-bestnet
...(其他可选项根据需要设置)

STEP3. Rust 工具链部署及 HTML Transformer 构建 #

STEP3.1 Rust 工具链安装 #

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
rustc --version
cargo --version

STEP3.2 确认 FireCrawl 内的 Rust 库目录 #

在仓库根目录执行以下命令,查找包含 Cargo.toml 的目录。

cd /home/firecrawl
find . -type f -name Cargo.toml | grep -i html-transformer

例:找到 ./sharedLibs/html-transformer/Cargo.toml 后,进入该目录。

STEP3.3 Rust 库构建 #

cd /home/firecrawl/sharedLibs/html-transformer
cargo build --release

构建成功时将生成 target/release/libhtml_transformer.so。确认:

ls target/release/libhtml_transformer.so

STEP3.4 库部署或环境变量设置 #

  • 方法 A: 如果 FireCrawl 的 Node.js 代码通过相对路径加载,则无需特殊操作
  • 方法 B: 根据需要添加到 LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/home/firecrawl/sharedLibs/html-transformer/target/release:$LD_LIBRARY_PATH

如需在 systemd 中自动配置,请在服务文件中添加
Environment=LD_LIBRARY_PATH=...


STEP4. Playwright 设置及微服务脚本创建 #

STEP4.1 Playwright 依赖包安装 #

Ubuntu 24.04 依赖库示例:

sudo apt-get install -y libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 
    libdrm2 libxkbcommon0 libgtk-3-0 libpango-1.0-0 libcairo2 libgdk-pixbuf2.0-0 
    libgbm1 libatspi2.0-0 libx11-xcb1 libxcomposite1 libxdamage1 libxfixes3 
    libxrandr2 libxrender1 libxtst6 libxcb1 libxi6 libxcursor1 ca-certificates 
    fonts-liberation xdg-utils

STEP4.2 Playwright 浏览器安装 #

pnpm exec playwright install

STEP4.3 Playwright 微服务脚本创建 #

/home/firecrawl/apps/api 中创建 playwright-service.js,并粘贴以下代码示例。

// playwright-service.js
const http = require('http');
const { chromium } = require('playwright');

const PORT = 3000;

const server = http.createServer(async (req, res) => {
  if (req.method === 'GET' && req.url.startsWith('/html')) {
    try {
      const urlParam = new URL(req.url, `http://localhost:${PORT}`).searchParams.get('url');
      if (!urlParam) {
        res.writeHead(400, { 'Content-Type': 'text/plain' });
        return res.end('Missing ?url parameter');
      }
      const browser = await chromium.launch({ headless: true });
      const page = await browser.newPage();
      await page.goto(urlParam, { waitUntil: 'networkidle' });
      const content = await page.content();
      res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' });
      res.end(content);
      await browser.close();
    } catch (err) {
      console.error('Playwright microservice error:', err);
      res.writeHead(500, { 'Content-Type': 'text/plain' });
      res.end('Playwright error occurred');
    }
  } else {
    res.writeHead(404, { 'Content-Type': 'text/plain' });
    res.end('Not found');
  }
});

server.listen(PORT, '0.0.0.0', () => {
  console.log(`Playwright microservice listening on http://0.0.0.0:${PORT}/html`);
});

STEP4.4 在 package.json 中添加脚本 #

请在 /home/firecrawl/apps/api/package.json"scripts" 中添加以下内容(与现有项目合并)。

"playwright-service": "node playwright-service.js"

STEP5. Systemd 服务配置 #

为了让 FireCrawl 服务器、工作进程、Playwright 微服务自动启动,需创建 systemd 单元文件。

STEP5.1 FireCrawl 服务器服务 (/etc/systemd/system/firecrowl-server.service) #

[Unit]
Description=FireCrowl Server
After=network.target

[Service]
User=firecrawl
Group=firecrawl
WorkingDirectory=/home/firecrawl/apps/api
Environment=PATH=/usr/local/bin:/usr/bin:/bin
ExecStart=/bin/bash -c 'pnpm run start'
Restart=always
RestartSec=5
Type=simple

[Install]
WantedBy=multi-user.target

STEP5.2 FireCrawl 工作进程服务 (/etc/systemd/system/firecrowl-workers.service) #

[Unit]
Description=FireCrowl Workers
After=network.target

[Service]
User=firecrawl
Group=firecrawl
WorkingDirectory=/home/firecrawl/apps/api
Environment=PATH=/usr/local/bin:/usr/bin:/bin
ExecStart=/bin/bash -c 'pnpm run workers'
Restart=always
RestartSec=5
Type=simple

[Install]
WantedBy=multi-user.target

STEP5.3 Playwright 微服务服务 (/etc/systemd/system/firecrowl-playwright.service) #

[Unit]
Description=FireCrowl Playwright Microservice
After=network.target

[Service]
User=firecrawl
Group=firecrawl
WorkingDirectory=/home/firecrawl/apps/api
Environment=PATH=/home/firecrawl/.nvm/versions/node/v20.18.3/bin:/usr/local/bin:/usr/bin:/bin
ExecStart=/bin/bash -c 'pnpm run playwright-service'
Restart=always
RestartSec=5
Type=simple

[Install]
WantedBy=multi-user.target

注意:

  • 使用 User=firecrawl 时,需提前创建 firecrawl 用户,并通过 sudo chown -R firecrawl:firecrawl /home/firecrawl 等命令设置所有权。
  • Environment=PATH=... 中配置了通过 nvm 安装的 Node.js 路径。请根据环境进行修改。
  • 如果 Rust 库不是使用相对路径加载,还需要配置 LD_LIBRARY_PATH

STEP5.4 启用并启动 Systemd 服务 #

sudo systemctl daemon-reload
sudo systemctl enable firecrowl-server
sudo systemctl enable firecrowl-workers
sudo systemctl enable firecrowl-playwright
sudo systemctl start firecrowl-server
sudo systemctl start firecrowl-workers
sudo systemctl start firecrowl-playwright

sudo systemctl status <service> 确认各服务是否为 active (running) 状态。


STEP6. FireCrawl 的重启与验证 #

  1. .env 的配置确认:
    特别要确认 PLAYWRIGHT_MICROSERVICE_URL 是否设置为 http://localhost:3000/html
  2. FireCrawl 的服务器和工作进程已启动:
    日志中出现”Scrape via fetch…”等信息则表示 HTTP 请求正常运行
  3. Playwright 服务的运行确认:
    在浏览器中访问 http://<服务器IP>:3000/html?url=https://example.com,如果返回 HTML 则正常

STEP7. 最终确认 #

  • Rust库:
    确认 target/release/libhtml_transformer.so 存在,且日志中无错误信息
  • Playwright 微服务:
    测试是否正确启动并执行 JavaScript 渲染
  • 整体联动:
    最终检查 FireCrawl 服务器·工作进程·Playwright 是否协同工作,能否正常响应来自 Dify 等的请求
Updated on 2026年6月9日

What are your feelings

  • Happy
  • 常规
  • Sad

©2020 BESTNET.LLC . All Rights Reserved.