This procedure sets the following configuration as the final goal.
- FireCrawl (API server & workers) installation on Ubuntu 24
- Node.js installed system-wide, with
pnpmused globally - Rust toolchain (rustup, etc.) deployment, and building FireCrawl’s Rust-based HTML conversion library (html-transformer)
- Installation of Playwright dependency packages and creation of a custom Playwright microservice script
- Creation of two systemd services for FireCrawl startup (server & workers):
- Automatic startup on OS boot
- Manual restart possible with
sudo systemctl restart firecrawl-server/sudo systemctl restart firecrawl-workers, etc.
(Optional) +-----------------+ | Dify | +-----------------+ | | HTTP / REST API v +------------------------------------------------------------------+ | FireCrawl (Node.js, TypeScript, pnpm) | | Directory: /home/firecrawl/apps/api | | | | +----------------------+ (pnpm run start) | | | API Server |-----------------------------------------+ | | - Express | | | | - BullMQ Dashboard | | | +----------------------+ | | ^ | | | (Queue tasks) | | +----------------------+ (pnpm run workers) | | | Workers |-----------------------------------------+ | | - Scraping, parsing, | | | | indexing | | | +----------------------+ | | | | - Various options configured in .env (API keys, PORT, HOST, | | PLAYWRIGHT_MICROSERVICE_URL, etc.) | | | | - Rust HTML Transformer: | | -> Built via Cargo in: | | /home/firecrawl/apps/api/sharedLibs/html-transformer | | -> Generates: libhtml_transformer.so | | -> Used by FireCrawl for fast HTML parsing (or falls back to | | Cheerio) | +------------------------------------------------------------------+ | | Task Queue / Rate Limit v +----------------------------------+ | Redis (localhost) | | - Used by BullMQ for job queuing| +----------------------------------+ +---------------------------------------------+ | systemd (Ubuntu 24) | | | | +--------------------------+ | | | firecrawl-server | | | | - ExecStart=pnpm run start| | | +--------------------------+ | | +--------------------------+ | | | firecrawl-workers | | | | - ExecStart=pnpm run | | | | workers | | | +--------------------------+ | | +--------------------------+ | | | firecrawl-playwright | | | | - ExecStart=pnpm run | | | | playwright-service | | | +--------------------------+ | | (Automatic start at boot, process monitoring,| | logs managed via systemd) | +---------------------------------------------+Prerequisites:
- Ubuntu 24 (codename “noble”) running without Docker
- OS user can operate with administrator privileges
- FireCrawl repository cloned to
/home/firecrawl - Redis installed (
sudo apt install -y redis-server) and running
STEP 1. System preparation and Node.js / pnpm installation #
STEP 1.1 System update and development tools installation #
sudo apt update sudo apt install -y build-essential pkg-config curl git libssl-devSTEP 1.2 Node.js installation from NodeSource #
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - sudo apt-get install -y nodejsSTEP 1.3 Global pnpm installation #
sudo npm install -g pnpmVerify the execution path with
which pnpm(e.g.,/usr/local/bin/pnpm).
STEP 2. FireCrawl repository cloning and dependency package installation #
STEP 2.1 FireCrawl clone #
cd /home/firecrawl git clone https://github.com/mendableai/firecrawl.gitSTEP 2.2 Dependency package installation #
cd /home/firecrawl/apps/api pnpm installSTEP 2.3 .env file configuration #
Create or edit
/home/firecrawl/apps/api/.envand set the required environment variables. Example:# ===== Required ENVS ====== NUM_WORKERS_PER_QUEUE=8 PORT=3002 HOST=0.0.0.0 REDIS_URL=redis://localhost:6379 REDIS_RATE_LIMIT_URL=redis://localhost:6379 PLAYWRIGHT_MICROSERVICE_URL=http://localhost:3000/html USE_DB_AUTHENTICATION=false # ===== Optional ENVS ====== TEST_API_KEY=fc-bestnet BULL_AUTH_KEY=fc-bestnet ...(other options as needed)
STEP 3. Rust toolchain deployment and HTML Transformer build #
STEP 3.1 Rust toolchain installation #
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env rustc --version cargo --versionSTEP 3.2 Verify Rust library directory within FireCrawl #
Run the following command in the repository root to find the directory containing
Cargo.toml.cd /home/firecrawl find . -type f -name Cargo.toml | grep -i html-transformerExample: If
./sharedLibs/html-transformer/Cargo.tomlis found, navigate to that directory.STEP 3.3 Build Rust library #
cd /home/firecrawl/sharedLibs/html-transformer cargo build --releaseOn successful build,
target/release/libhtml_transformer.sois generated. Verify:ls target/release/libhtml_transformer.soSTEP 3.4 Library placement or environment variable configuration #
- Method A: If FireCrawl’s Node.js code reads via relative path, no additional work is needed
- Method B: Add to
LD_LIBRARY_PATHif needed
export LD_LIBRARY_PATH=/home/firecrawl/sharedLibs/html-transformer/target/release:$LD_LIBRARY_PATHTo configure automatically with systemd, add
Environment=LD_LIBRARY_PATH=...to the service file.
STEP 4. Playwright setup and microservice script creation #
STEP 4.1 Playwright dependency package installation #
Example dependency libraries for Ubuntu 24.04:
sudo apt-get install -y libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libgtk-3-0 libpango-1.0-0 libcairo2 libgdk-pixbuf2.0-0 libgbm1 libatspi2.0-0 libx11-xcb1 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libxrender1 libxtst6 libxcb1 libxi6 libxcursor1 ca-certificates fonts-liberation xdg-utilsSTEP 4.2 Playwright browser installation #
pnpm exec playwright installSTEP 4.3 Create Playwright microservice script #
Create
playwright-service.jsin/home/firecrawl/apps/apiand paste the following example code.// playwright-service.js const http = require('http'); const { chromium } = require('playwright'); const PORT = 3000; const server = http.createServer(async (req, res) => { if (req.method === 'GET' && req.url.startsWith('/html')) { try { const urlParam = new URL(req.url, `http://localhost:${PORT}`).searchParams.get('url'); if (!urlParam) { res.writeHead(400, { 'Content-Type': 'text/plain' }); return res.end('Missing ?url parameter'); } const browser = await chromium.launch({ headless: true }); const page = await browser.newPage(); await page.goto(urlParam, { waitUntil: 'networkidle' }); const content = await page.content(); res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' }); res.end(content); await browser.close(); } catch (err) { console.error('Playwright microservice error:', err); res.writeHead(500, { 'Content-Type': 'text/plain' }); res.end('Playwright error occurred'); } } else { res.writeHead(404, { 'Content-Type': 'text/plain' }); res.end('Not found'); } }); server.listen(PORT, '0.0.0.0', () => { console.log(`Playwright microservice listening on http://0.0.0.0:${PORT}/html`); });STEP 4.4 Add script to package.json #
Add the following to the
"scripts"section in/home/firecrawl/apps/api/package.json(merge with existing items)."playwright-service": "node playwright-service.js"
STEP 5. Systemd service configuration #
Create systemd unit files for FireCrawl server, workers, and Playwright microservice to enable automatic startup.
STEP 5.1 FireCrawl server service (/etc/systemd/system/firecrowl-server.service) #
[Unit] Description=FireCrowl Server After=network.target [Service] User=firecrawl Group=firecrawl WorkingDirectory=/home/firecrawl/apps/api Environment=PATH=/usr/local/bin:/usr/bin:/bin ExecStart=/bin/bash -c 'pnpm run start' Restart=always RestartSec=5 Type=simple [Install] WantedBy=multi-user.targetSTEP 5.2 FireCrawl workers service (/etc/systemd/system/firecrowl-workers.service) #
[Unit] Description=FireCrawl Workers After=network.target [Service] User=firecrawl Group=firecrawl WorkingDirectory=/home/firecrawl/apps/api Environment=PATH=/usr/local/bin:/usr/bin:/bin ExecStart=/bin/bash -c 'pnpm run workers' Restart=always RestartSec=5 Type=simple [Install] WantedBy=multi-user.targetSTEP 5.3 Playwright microservice service (/etc/systemd/system/firecrowl-playwright.service) #
[Unit] Description=FireCrowl Playwright Microservice After=network.target [Service] User=firecrawl Group=firecrawl WorkingDirectory=/home/firecrawl/apps/api Environment=PATH=/home/firecrawl/.nvm/versions/node/v20.18.3/bin:/usr/local/bin:/usr/bin:/bin ExecStart=/bin/bash -c 'pnpm run playwright-service' Restart=always RestartSec=5 Type=simple [Install] WantedBy=multi-user.targetNotes:
- If using
User=firecrawl, create thefirecrawluser beforehand and set ownership withsudo chown -R firecrawl:firecrawl /home/firecrawl, etc. Environment=PATH=...specifies the Node.js path via nvm. Modify according to your environment.- If the Rust library is not read via relative path,
LD_LIBRARY_PATHconfiguration is also required.
STEP 5.4 Enable and start systemd services #
sudo systemctl daemon-reload sudo systemctl enable firecrawl-server sudo systemctl enable firecrawl-workers sudo systemctl enable firecrawl-playwright sudo systemctl start firecrawl-server sudo systemctl start firecrawl-workers sudo systemctl start firecrawl-playwrightVerify that each service shows
active (running)withsudo systemctl status <service>.
STEP 6. FireCrawl restart and verification #
- .env configuration check:
Especially verify thatPLAYWRIGHT_MICROSERVICE_URLis set tohttp://localhost:3000/html - FireCrawl server and workers startup:
If logs show “Scrape via fetch…”, HTTP requests are working correctly - Playwright service operation check:
Accesshttp://<server IP>:3000/html?url=https://example.comin a browser; if HTML is returned, it’s OK
STEP 7. Final verification #
- Rust library:
Confirm thattarget/release/libhtml_transformer.soexists and no errors appear in logs - Playwright microservice:
Test that it starts correctly and JavaScript rendering works - Overall integration:
Final check that FireCrawl server, workers, and Playwright work together and respond correctly to requests from Dify, etc.