Workaround for Dify Plugin Daemon Thread Leak

Workaround for Dify Plugin Daemon Thread Leak

4 min read

Overview of the Issue #

Symptoms:

  • When indexing a large number of knowledge base documents, the plugin daemon (plugin_daemon) continues to spawn countless threads, and eventually at around “30,000” new threads can no longer be created.
  • The container logs show a can't start new thread error, and the service freezes.

Estimated Cause:

  • The container/cgroup v2 pids.max (process + thread limit) is set to 29958 (approximately 30,000), and thread creation fails once this limit is exceeded.
  • A thread leak in the plugin daemon, or massive concurrent processing is causing the thread count to balloon in a short time.

Troubleshooting Attempts:

  • Adjusting pids_limit: 0 or ulimit in Docker Compose or within the container had no effect because the host’s cgroup is fixed at 29958.
  • Checking the actual value of /sys/fs/cgroup/system.slice/docker-.scope/pids.max confirmed it was 29958.

Root Solution #

  • Fix the thread leak in the plugin daemon (apply updates or patches).
  • Split the documents to be processed into smaller batches so threads do not spiral out of control.

Temporary Workaround #

  • Modify the host's cgroup settings to raise pids.max to max (unlimited) or a larger value.
  • Additionally, set TasksMax=infinity in the systemd slice unit (such as system.slice) so the setting does not revert after restart.

Workaround Procedure (Step-by-Step) #

STEP 1. Identify the cgroup path from the container's "host-side PID" #

Verify the container ID:

docker ps

The plugin daemon is a container with an ID such as 998eb0c50703.

Use docker top <container ID> to obtain the PID on the host:

docker top 998eb0c50703

Example: The /app/main process has host PID 1012512, etc.

Refer to /proc/<host PID>/cgroup:

cat /proc/1012512/cgroup

Example output: 0::/system.slice/docker-998eb0c5070321...scope

STEP 2. Change the actual limit value (pids.max) #

Navigate to the relevant directory and check pids.max:

cd /sys/fs/cgroup/system.slice/docker-998eb0c5070321...scope
cat pids.max

If the value is a number like 29958, there is a limit; if it is max, it is unlimited.

Change to unlimited:

echo max | sudo tee pids.max

This will remove the limit "for now," but it is likely to revert if the container is restarted or regenerated.

STEP 3. Permanently set TasksMax to unlimited in systemd configuration #

To ensure the cgroup settings do not reset, override TasksMax in the systemd slice unit (system.slice):

3-1. Create an override file: #

sudo systemctl edit --force --full system.slice

When the editor opens, add the [Slice] section as follows and save:

[Slice]
TasksMax=infinity

3-2. Reload daemon & restart: #

sudo systemctl daemon-reload
sudo systemctl restart docker
If using docker-compose, restart the containers with docker-compose down && docker-compose up -d, etc.

When new containers are created, TasksMax=infinity will be applied.

3-3. Verify the setting: #

systemctl show system.slice -p TasksMax
# => OK if TasksMax=infinity

cat /sys/fs/cgroup/system.slice/pids.max
# => OK if max

After the container is brought up, again follow the method in "STEP 1~2" to check /sys/fs/cgroup/system.slice/docker-.scope/pids.max and verify it is set to max.

STEP 4. (If needed) Check kernel parameters, etc. #

Kernel thread/PID limits:

sysctl kernel.threads-max
sysctl kernel.pid_max

If the values are small (around 30,000–60,000), they may need to be increased further.

sudo sysctl -w kernel.threads-max=200000

To make the change persistent, add kernel.threads-max=200000 to /etc/sysctl.conf.

systemd-wide default TasksMax:

cat /etc/systemd/system.conf

If /etc/systemd/system.conf contains an entry like DefaultTasksMax=65535, it may apply to all services.

Changing this to infinity and restarting is more reliable.

Summary #

  • Root Cause:cgroup v2 pids.max is fixed at around 30,000, and because the plugin daemon spawns a large number of threads, it hits the limit.
  • Workaround:Change /sys/fs/cgroup/system.slice/docker-.scope/pids.max to max to relax the limit. Additionally, set TasksMax=infinity in the systemd slice so the setting does not revert after restart.
  • Note:Unless the root cause (thread leak in the application) is fixed, other issues such as memory exhaustion may eventually occur. If possible, seek a permanent fix through software updates or batch processing adjustments.

The steps above will allow you to avoid "the freeze issue when hitting the limit of approximately 29,958."

Updated on 2026年6月9日

What are your feelings

  • Happy
  • Normal
  • Sad