本程序以下列配置為最終目標。
- 在 Ubuntu 24 上安裝 FireCrowl (API 伺服器 & 工作處理程序)
- Node.js 安裝於整個系統,全域使用
pnpm - 導入 Rust 工具鏈(rustup 等),建置 FireCrawl 的 Rust 製 HTML 轉換函式庫(html-transformer)
- 安裝 Playwright 的相依套件,並建立自訂的 Playwright 微服務腳本
- 建立 2 個 FireCrowl 啟動用的 systemd 服務(伺服器用 & 工作處理程序用),
- 作業系統啟動時自動啟動
- 可手動執行
sudo systemctl restart firecrowl-server/sudo systemctl restart firecrowl-workers等指令
(選用) +-----------------+ | Dify | +-----------------+ | | HTTP / REST API v +------------------------------------------------------------------+ | FireCrawl (Node.js, TypeScript, pnpm) | | 目錄: /home/firecrawl/apps/api | | | | +----------------------+ (pnpm run start) | | | API 伺服器 |-----------------------------------------+ | | - Express | | | | - BullMQ Dashboard | | | +----------------------+ | | ^ | | | (佇列任務) | | +----------------------+ (pnpm run workers) | | | Workers |-----------------------------------------+ | | - 爬取、解析、 | | | | 索引 | | | +----------------------+ | | | | - 在 .env 中設定的各種選項(API 金鑰、PORT、HOST、 | | PLAYWRIGHT_MICROSERVICE_URL 等) | | | | - Rust HTML Transformer: | | -> 透過 Cargo 建置於: | | /home/firecrawl/apps/api/sharedLibs/html-transformer | | -> 生成: libhtml_transformer.so | | -> 由 FireCrawl 用於快速 HTML 解析(或回退到 | | Cheerio) | +------------------------------------------------------------------+ | | 任務佇列 / 速率限制 v +----------------------------------+ | Redis (localhost) | | - 由 BullMQ 用於作業佇列 | +----------------------------------+ +---------------------------------------------+ | systemd (Ubuntu 24) | | | | +--------------------------+ | | | firecrowl-server | | | | - ExecStart=pnpm run start| | | +--------------------------+ | | +--------------------------+ | | | firecrowl-workers | | | | - ExecStart=pnpm run | | | | workers | | | +--------------------------+ | | +--------------------------+ | | | firecrowl-playwright | | | | - ExecStart=pnpm run | | | | playwright-service | | | +--------------------------+ | | (開機時自動啟動、程序監控、 | | 日誌由 systemd 管理) | +---------------------------------------------+前提條件:
- 在 Ubuntu 24 (代號 “noble”) 上不使用 Docker 執行
- 作業系統使用者可以管理員權限操作
- 假設將 FireCrowl 儲存庫複製到
/home/firecrawl - Redis 已安裝 (
sudo apt install -y redis-server) 且正在執行
STEP1. 系統準備與 Node.js / pnpm 的安裝 #
STEP1.1 系統更新與開發工具的安裝 #
sudo apt update sudo apt install -y build-essential pkg-config curl git libssl-devSTEP1.2 從 NodeSource 安裝 Node.js #
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - sudo apt-get install -y nodejsSTEP1.3 全域安裝 pnpm #
sudo npm install -g pnpm※請使用
which pnpm確認執行路徑(例:/usr/local/bin/pnpm)。
STEP2. FireCrawl 儲存庫的複製與相依套件的安裝 #
STEP2.1 複製 FireCrawl #
cd /home/firecrawl git clone https://github.com/mendableai/firecrawl.gitSTEP2.2 安裝相依套件 #
cd /home/firecrawl/apps/api pnpm installSTEP2.3 設定 .env 檔案 #
建立或編輯
/home/firecrawl/apps/api/.env,設定所需的環境變數。例:# ===== Required ENVS ====== NUM_WORKERS_PER_QUEUE=8 PORT=3002 HOST=0.0.0.0 REDIS_URL=redis://localhost:6379 REDIS_RATE_LIMIT_URL=redis://localhost:6379 PLAYWRIGHT_MICROSERVICE_URL=http://localhost:3000/html USE_DB_AUTHENTICATION=false # ===== Optional ENVS ====== TEST_API_KEY=fc-bestnet BULL_AUTH_KEY=fc-bestnet ...(其他選項可視需要設定)
STEP3. Rust 工具鏈的導入與 HTML Transformer 的建置 #
STEP3.1 安裝 Rust 工具鏈 #
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env rustc --version cargo --versionSTEP3.2 確認 FireCrawl 內的 Rust 函式庫目錄 #
在儲存庫根目錄執行以下命令,尋找包含
Cargo.toml的目錄。cd /home/firecrawl find . -type f -name Cargo.toml | grep -i html-transformer例: 若找到
./sharedLibs/html-transformer/Cargo.toml,則移動至該目錄。STEP3.3 建置 Rust 函式庫 #
cd /home/firecrawl/sharedLibs/html-transformer cargo build --release建置成功時會產生
target/release/libhtml_transformer.so。確認:ls target/release/libhtml_transformer.soSTEP3.4 函式庫的配置或環境變數的設定 #
- 方法 A: 若 FireCrawl 的 Node.js 程式碼使用相對路徑載入,則無需特別作業
- 方法 B: 視需要新增至
LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/home/firecrawl/sharedLibs/html-transformer/target/release:$LD_LIBRARY_PATH若要使用 systemd 自動設定,請在服務檔案中新增
Environment=LD_LIBRARY_PATH=...。
STEP4. Playwright 的設定與微服務指令碼的建立 #
STEP4.1 安裝 Playwright 相依套件 #
針對 Ubuntu 24.04 的相依函式庫範例:
sudo apt-get install -y libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libgtk-3-0 libpango-1.0-0 libcairo2 libgdk-pixbuf2.0-0 libgbm1 libatspi2.0-0 libx11-xcb1 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libxrender1 libxtst6 libxcb1 libxi6 libxcursor1 ca-certificates fonts-liberation xdg-utilsSTEP4.2 安裝 Playwright 瀏覽器 #
pnpm exec playwright installSTEP4.3 建立 Playwright 微服務用指令碼 #
在
/home/firecrawl/apps/api建立playwright-service.js,並貼上以下程式碼範例。// playwright-service.js const http = require('http'); const { chromium } = require('playwright'); const PORT = 3000; const server = http.createServer(async (req, res) => { if (req.method === 'GET' && req.url.startsWith('/html')) { try { const urlParam = new URL(req.url, `http://localhost:${PORT}`).searchParams.get('url'); if (!urlParam) { res.writeHead(400, { 'Content-Type': 'text/plain' }); return res.end('Missing ?url parameter'); } const browser = await chromium.launch({ headless: true }); const page = await browser.newPage(); await page.goto(urlParam, { waitUntil: 'networkidle' }); const content = await page.content(); res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' }); res.end(content); await browser.close(); } catch (err) { console.error('Playwright microservice error:', err); res.writeHead(500, { 'Content-Type': 'text/plain' }); res.end('Playwright error occurred'); } } else { res.writeHead(404, { 'Content-Type': 'text/plain' }); res.end('Not found'); } }); server.listen(PORT, '0.0.0.0', () => { console.log(`Playwright microservice listening on http://0.0.0.0:${PORT}/html`); });STEP4.4 在 package.json 新增腳本 #
請在
/home/firecrawl/apps/api/package.json的"scripts"中新增以下內容(與現有項目合併)。"playwright-service": "node playwright-service.js"
STEP5. Systemd 服務設定 #
為了讓 FireCrawl 伺服器、Worker、Playwright 微服務自動啟動,請建立 systemd 單元檔案。
STEP5.1 FireCrawl 伺服器服務 (/etc/systemd/system/firecrowl-server.service) #
[Unit] Description=FireCrowl Server After=network.target [Service] User=firecrawl Group=firecrawl WorkingDirectory=/home/firecrawl/apps/api Environment=PATH=/usr/local/bin:/usr/bin:/bin ExecStart=/bin/bash -c 'pnpm run start' Restart=always RestartSec=5 Type=simple [Install] WantedBy=multi-user.targetSTEP5.2 FireCrawl Worker 服務 (/etc/systemd/system/firecrowl-workers.service) #
[Unit] Description=FireCrowl Workers After=network.target [Service] User=firecrawl Group=firecrawl WorkingDirectory=/home/firecrawl/apps/api Environment=PATH=/usr/local/bin:/usr/bin:/bin ExecStart=/bin/bash -c 'pnpm run workers' Restart=always RestartSec=5 Type=simple [Install] WantedBy=multi-user.targetSTEP5.3 Playwright 微服務服務 (/etc/systemd/system/firecrowl-playwright.service) #
[Unit] Description=FireCrowl Playwright Microservice After=network.target [Service] User=firecrawl Group=firecrawl WorkingDirectory=/home/firecrawl/apps/api Environment=PATH=/home/firecrawl/.nvm/versions/node/v20.18.3/bin:/usr/local/bin:/usr/bin:/bin ExecStart=/bin/bash -c 'pnpm run playwright-service' Restart=always RestartSec=5 Type=simple [Install] WantedBy=multi-user.target注意:
- 若使用
User=firecrawl,請事先建立firecrawl使用者,並透過sudo chown -R firecrawl:firecrawl /home/firecrawl等指令設定擁有權。 Environment=PATH=...已配置透過 nvm 安裝的 Node.js 路徑。請依據您的環境進行修改。- 若 Rust 函式庫非使用相對路徑載入,則需要設定
LD_LIBRARY_PATH。
STEP5.4 啟用並啟動 Systemd 服務 #
sudo systemctl daemon-reload sudo systemctl enable firecrowl-server sudo systemctl enable firecrowl-workers sudo systemctl enable firecrowl-playwright sudo systemctl start firecrowl-server sudo systemctl start firecrowl-workers sudo systemctl start firecrowl-playwrightsudo systemctl status <service>請確認各服務是否為active (running)狀態。
STEP6. FireCrawl 的重新啟動與驗證 #
- .env 的設定確認:
特別需要確認PLAYWRIGHT_MICROSERVICE_URL是否設定為http://localhost:3000/html - FireCrawl 的伺服器與 Worker 已啟動:
如果日誌中出現「Scrape via fetch…」等訊息,表示 HTTP 請求正常運作 - Playwright 服務的運作確認:
在瀏覽器中存取http://<伺服器IP>:3000/html?url=https://example.com,如果能返回 HTML 則表示正常
STEP7. 最終確認 #
- Rust 函式庫:
確認target/release/libhtml_transformer.so存在,且日誌中沒有錯誤訊息 - Playwright 微服務:
測試是否正確啟動並執行 JavaScript 渲染 - 整體協作:
最終檢查 FireCrawl 伺服器、Worker、Playwright 是否協作正常,並能正常回應來自 Dify 等的請求