Install FireCrawl on Ubuntu 24

Install FireCrawl on Ubuntu 24

5 min read

This procedure sets the following configuration as the final goal.

  1. FireCrawl (API server & workers) installation on Ubuntu 24
  2. Node.js installed system-wide, with pnpm used globally
  3. Rust toolchain (rustup, etc.) deployment, and building FireCrawl’s Rust-based HTML conversion library (html-transformer)
  4. Installation of Playwright dependency packages and creation of a custom Playwright microservice script
  5. Creation of two systemd services for FireCrawl startup (server & workers):
    • Automatic startup on OS boot
    • Manual restart possible with sudo systemctl restart firecrawl-server / sudo systemctl restart firecrawl-workers, etc.
                                          (Optional)
                                     +-----------------+
                                     |      Dify       |
                                     +-----------------+
                                              |
                                              | HTTP / REST API
                                              v
    +------------------------------------------------------------------+
    | FireCrawl (Node.js, TypeScript, pnpm)                            |
    |  Directory: /home/firecrawl/apps/api                              |
    |                                                                  |
    |  +----------------------+      (pnpm run start)                 |
    |  |   API Server         |-----------------------------------------+
    |  | - Express            |                                         |
    |  | - BullMQ Dashboard   |                                         |
    |  +----------------------+                                         |
    |             ^                                                    |
    |             | (Queue tasks)                                      |
    |  +----------------------+      (pnpm run workers)                |
    |  |   Workers            |-----------------------------------------+
    |  | - Scraping, parsing, |                                         |
    |  |   indexing           |                                         |
    |  +----------------------+                                         |
    |                                                                  |
    |  - Various options configured in .env (API keys, PORT, HOST,     |
    |    PLAYWRIGHT_MICROSERVICE_URL, etc.)                             |
    |                                                                  |
    |  - Rust HTML Transformer:                                        |
    |      -> Built via Cargo in:                                      |
    |         /home/firecrawl/apps/api/sharedLibs/html-transformer       |
    |      -> Generates: libhtml_transformer.so                         |
    |      -> Used by FireCrawl for fast HTML parsing (or falls back to   |
    |         Cheerio)                                                 |
    +------------------------------------------------------------------+
                                              |
                                              | Task Queue / Rate Limit
                                              v
    +----------------------------------+
    |        Redis (localhost)         |
    |  - Used by BullMQ for job queuing|
    +----------------------------------+
    
    
    +---------------------------------------------+
    | systemd (Ubuntu 24)                        |
    |                                             |
    |  +--------------------------+               |
    |  | firecrawl-server         |               |
    |  | - ExecStart=pnpm run start|               |
    |  +--------------------------+               |
    |  +--------------------------+               |
    |  | firecrawl-workers        |               |
    |  | - ExecStart=pnpm run     |               |
    |  |   workers                |               |
    |  +--------------------------+               |
    |  +--------------------------+               |
    |  | firecrawl-playwright     |               |
    |  | - ExecStart=pnpm run     |               |
    |  |   playwright-service     |               |
    |  +--------------------------+               |
    |  (Automatic start at boot, process monitoring,|
    |   logs managed via systemd)                  |
    +---------------------------------------------+
    

    Prerequisites:

    • Ubuntu 24 (codename “noble”) running without Docker
    • OS user can operate with administrator privileges
    • FireCrawl repository cloned to /home/firecrawl
    • Redis installed (sudo apt install -y redis-server) and running

    STEP 1. System preparation and Node.js / pnpm installation #

    STEP 1.1 System update and development tools installation #

    sudo apt update
    sudo apt install -y build-essential pkg-config curl git libssl-dev
    

    STEP 1.2 Node.js installation from NodeSource #

    curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
    sudo apt-get install -y nodejs
    

    STEP 1.3 Global pnpm installation #

    sudo npm install -g pnpm
    

    Verify the execution path with which pnpm (e.g., /usr/local/bin/pnpm).


    STEP 2. FireCrawl repository cloning and dependency package installation #

    STEP 2.1 FireCrawl clone #

    cd /home/firecrawl
    git clone https://github.com/mendableai/firecrawl.git
    

    STEP 2.2 Dependency package installation #

    cd /home/firecrawl/apps/api
    pnpm install
    

    STEP 2.3 .env file configuration #

    Create or edit /home/firecrawl/apps/api/.env and set the required environment variables. Example:

    # ===== Required ENVS ======
    NUM_WORKERS_PER_QUEUE=8
    PORT=3002
    HOST=0.0.0.0
    REDIS_URL=redis://localhost:6379
    REDIS_RATE_LIMIT_URL=redis://localhost:6379
    PLAYWRIGHT_MICROSERVICE_URL=http://localhost:3000/html
    USE_DB_AUTHENTICATION=false
    
    # ===== Optional ENVS ======
    TEST_API_KEY=fc-bestnet
    BULL_AUTH_KEY=fc-bestnet
    ...(other options as needed)
    

    STEP 3. Rust toolchain deployment and HTML Transformer build #

    STEP 3.1 Rust toolchain installation #

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    rustc --version
    cargo --version
    

    STEP 3.2 Verify Rust library directory within FireCrawl #

    Run the following command in the repository root to find the directory containing Cargo.toml.

    cd /home/firecrawl
    find . -type f -name Cargo.toml | grep -i html-transformer
    

    Example: If ./sharedLibs/html-transformer/Cargo.toml is found, navigate to that directory.

    STEP 3.3 Build Rust library #

    cd /home/firecrawl/sharedLibs/html-transformer
    cargo build --release
    

    On successful build, target/release/libhtml_transformer.so is generated. Verify:

    ls target/release/libhtml_transformer.so
    

    STEP 3.4 Library placement or environment variable configuration #

    • Method A: If FireCrawl’s Node.js code reads via relative path, no additional work is needed
    • Method B: Add to LD_LIBRARY_PATH if needed
    export LD_LIBRARY_PATH=/home/firecrawl/sharedLibs/html-transformer/target/release:$LD_LIBRARY_PATH
    

    To configure automatically with systemd, add Environment=LD_LIBRARY_PATH=... to the service file.


    STEP 4. Playwright setup and microservice script creation #

    STEP 4.1 Playwright dependency package installation #

    Example dependency libraries for Ubuntu 24.04:

    sudo apt-get install -y libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 
        libdrm2 libxkbcommon0 libgtk-3-0 libpango-1.0-0 libcairo2 libgdk-pixbuf2.0-0 
        libgbm1 libatspi2.0-0 libx11-xcb1 libxcomposite1 libxdamage1 libxfixes3 
        libxrandr2 libxrender1 libxtst6 libxcb1 libxi6 libxcursor1 ca-certificates 
        fonts-liberation xdg-utils
    

    STEP 4.2 Playwright browser installation #

    pnpm exec playwright install
    

    STEP 4.3 Create Playwright microservice script #

    Create playwright-service.js in /home/firecrawl/apps/api and paste the following example code.

    // playwright-service.js
    const http = require('http');
    const { chromium } = require('playwright');
    
    const PORT = 3000;
    
    const server = http.createServer(async (req, res) => {
      if (req.method === 'GET' && req.url.startsWith('/html')) {
        try {
          const urlParam = new URL(req.url, `http://localhost:${PORT}`).searchParams.get('url');
          if (!urlParam) {
            res.writeHead(400, { 'Content-Type': 'text/plain' });
            return res.end('Missing ?url parameter');
          }
          const browser = await chromium.launch({ headless: true });
          const page = await browser.newPage();
          await page.goto(urlParam, { waitUntil: 'networkidle' });
          const content = await page.content();
          res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' });
          res.end(content);
          await browser.close();
        } catch (err) {
          console.error('Playwright microservice error:', err);
          res.writeHead(500, { 'Content-Type': 'text/plain' });
          res.end('Playwright error occurred');
        }
      } else {
        res.writeHead(404, { 'Content-Type': 'text/plain' });
        res.end('Not found');
      }
    });
    
    server.listen(PORT, '0.0.0.0', () => {
      console.log(`Playwright microservice listening on http://0.0.0.0:${PORT}/html`);
    });
    

    STEP 4.4 Add script to package.json #

    Add the following to the "scripts" section in /home/firecrawl/apps/api/package.json (merge with existing items).

    "playwright-service": "node playwright-service.js"
    

    STEP 5. Systemd service configuration #

    Create systemd unit files for FireCrawl server, workers, and Playwright microservice to enable automatic startup.

    STEP 5.1 FireCrawl server service (/etc/systemd/system/firecrowl-server.service) #

    [Unit]
    Description=FireCrowl Server
    After=network.target
    
    [Service]
    User=firecrawl
    Group=firecrawl
    WorkingDirectory=/home/firecrawl/apps/api
    Environment=PATH=/usr/local/bin:/usr/bin:/bin
    ExecStart=/bin/bash -c 'pnpm run start'
    Restart=always
    RestartSec=5
    Type=simple
    
    [Install]
    WantedBy=multi-user.target
    

    STEP 5.2 FireCrawl workers service (/etc/systemd/system/firecrowl-workers.service) #

    [Unit]
    Description=FireCrawl Workers
    After=network.target
    
    [Service]
    User=firecrawl
    Group=firecrawl
    WorkingDirectory=/home/firecrawl/apps/api
    Environment=PATH=/usr/local/bin:/usr/bin:/bin
    ExecStart=/bin/bash -c 'pnpm run workers'
    Restart=always
    RestartSec=5
    Type=simple
    
    [Install]
    WantedBy=multi-user.target
    

    STEP 5.3 Playwright microservice service (/etc/systemd/system/firecrowl-playwright.service) #

    [Unit]
    Description=FireCrowl Playwright Microservice
    After=network.target
    
    [Service]
    User=firecrawl
    Group=firecrawl
    WorkingDirectory=/home/firecrawl/apps/api
    Environment=PATH=/home/firecrawl/.nvm/versions/node/v20.18.3/bin:/usr/local/bin:/usr/bin:/bin
    ExecStart=/bin/bash -c 'pnpm run playwright-service'
    Restart=always
    RestartSec=5
    Type=simple
    
    [Install]
    WantedBy=multi-user.target
    

    Notes:

    • If using User=firecrawl, create the firecrawl user beforehand and set ownership with sudo chown -R firecrawl:firecrawl /home/firecrawl, etc.
    • Environment=PATH=... specifies the Node.js path via nvm. Modify according to your environment.
    • If the Rust library is not read via relative path, LD_LIBRARY_PATH configuration is also required.

    STEP 5.4 Enable and start systemd services #

    sudo systemctl daemon-reload
    sudo systemctl enable firecrawl-server
    sudo systemctl enable firecrawl-workers
    sudo systemctl enable firecrawl-playwright
    sudo systemctl start firecrawl-server
    sudo systemctl start firecrawl-workers
    sudo systemctl start firecrawl-playwright
    

    Verify that each service shows active (running) with sudo systemctl status <service>.


    STEP 6. FireCrawl restart and verification #

    1. .env configuration check:
      Especially verify that PLAYWRIGHT_MICROSERVICE_URL is set to http://localhost:3000/html
    2. FireCrawl server and workers startup:
      If logs show “Scrape via fetch…”, HTTP requests are working correctly
    3. Playwright service operation check:
      Access http://<server IP>:3000/html?url=https://example.com in a browser; if HTML is returned, it’s OK

    STEP 7. Final verification #

    • Rust library:
      Confirm that target/release/libhtml_transformer.so exists and no errors appear in logs
    • Playwright microservice:
      Test that it starts correctly and JavaScript rendering works
    • Overall integration:
      Final check that FireCrawl server, workers, and Playwright work together and respond correctly to requests from Dify, etc.
Updated on 2026/6/9

What are your feelings

  • Happy
  • Normal
  • Sad

©2020 BESTNET.LLC . All Rights Reserved.