Design
Prerequisite
- Linux Kernel Version >= 3.10
- Cgroup file system mounted at /sys/fs/cgroup. Usually done by systemd
Architecture
Execution Workflow
Return Status
- Accepted: Program exited with status code 0 within time & memory limits
- Memory Limit Exceeded: Program uses more memory than memory limits
- Time Limit Exceeded:
- Program uses more CPU time than cpuLimit
- Or, program uses more clock time than clockLimit
- Output Limit Exceeded:
- Program output more than pipeCollector limits
- Or, program output more than output-limit
- File Error:
- CopyIn file is not existed
- Or, CopyIn file too large for container file system
- Or, CopyOut file is not existed after program exited
- Non Zero Exit Status: Program exited with non 0 status code within time & memory limits
- Signalled: Program exited with signal (e.g. SIGSEGV)
- Dangerous Syscall: Program killed by seccomp filter
- Internal Error:
- Program is not exist
- Or, container create not successful (e.g. not privileged docker)
- Or, other errors
Packages
- envexec: run single / group of programs in parallel within restricted environment and resource constraints
- env: reference implementation environments to inject into envexec
Windows Support
- Build
go-judge
by:go build ./cmd/go-judge/
- Build
go_judge.dll
: (need to installgcc
as well)go build -buildmode=c-shared -o go_judge.so ./cmd/go-judge-ffi/
- Run:
./go-judge
Windows Security
- Resources are limited by JobObject
- Privilege are limited by Restricted Low Mandatory Level Token
- Low Mandatory Level directory is created for read / write
MacOS Support
- Build
go-judge
by:go build ./cmd/go-judge/
- Build
go_judge.dylib
: (need to installXCode
)go build -buildmode=c-shared -o go_judge.dylib ./cmd/go-judge-ffi/
- Run:
./go-judge
MacOS Security
sandbox-init
profile deny network access and file read / write and read / write to/Users
directory
Notice
cgroup usage
For cgroup v1, the go-judge
need root privilege to create cgroup
. Either creates sub-directory /sys/fs/cgroup/cpuacct/gojudge
, /sys/fs/cgroup/memory/gojudge
, /sys/fs/cgroup/pids/gojudge
and make execution user readable or use sudo
to run it.
For cgroup v2, systemd dbus will be used to create a transient scope for cgroup integration.
If no permission to create cgroup, the cgroup related limit will not be effective.
cgroup v2 support
The cgroup v2 is supported by go-judge
now when running as root since more Linux distribution are enabling cgroup v2 by default (e.g. Ubuntu 21.10+, Fedora 31+). However, for kernel < 5.19, due to missing memory.max_usage_in_bytes
in memory
controller, the memory usage is now accounted by maxrss
returned by wait4
syscall. Thus, the memory usage appears higher than those who uses cgroup v1. For kernel >= 5.19, memory.peak
is being used.
When running in containers, the go-judge
will migrate all processed into /api
hierarchy to enable nesting support.
When running in Linux distributions powered by systemd
, the go-judge
will contact systemd
via dbus
to create a transient scope as cgroup root.
When running with kernel >= 5.7, the go-judge
will try faster clone3(CLONE_INTO_CGROUP)
method.
Memory Usage
The controller will consume 20M
memory and each container will consume 20M
+ size of tmpfs 2 * 128M
. For each request, it consumes as much as user program limit + extra limit (16k
) + total copy out max. Notice that the cached file stores in the shared memory (/dev/shm
) of the host, so please ensure enough size allocated.
For example, when concurrency = 4, the container itself can consume as much as 60 + (20+32) * 4M = 268M
+ 4 * total copy out + total max memory of requests.
Due to limitation of GO runtime, the memory will not return to OS automatically, which could lead to OOM killer. The background worker was introduced to checks heap usage and invokes GC when necessary.
-force-gc-target
default20m
, the minimal size to trigger GC-force-gc-interval
default5s
, the interval to check memory usage
Benchmark
By wrk
with t.lua
: wrk -s t.lua -c 1 -t 1 -d 30s --latency http://localhost:5050/run
.
However, these results are not the real use cases since the running time depends on the actual program specifies in the request. Normally, the go judge consumes ~1ms more compare to running without sandbox.
wrk.method = "POST"
wrk.body = '{"cmd":[{"args":["/bin/cat","a.hs"],"env":["PATH=/usr/bin:/bin"],"files":[{"content":""},{"name":"stdout","max":10240},{"name":"stderr","max":10240}],"cpuLimit":10000000000,"memoryLimit":104857600,"procLimit":50,"copyIn":{"a.hs":{"content":"main = putStrLn \\"Hello, World!\\""},"b":{"content":"TEST"}}}]}'
wrk.headers["Content-Type"] = "application/json;charset=UTF-8"
- Single thread ~800-860 op/s Windows 10 WSL2 @ 5800X
- Multi thread ~4500-6000 op/s Windows 10 WSL2 @ 5800X
Single thread:
Running 30s test @ http://localhost:5050/run
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.16ms 132.89us 6.20ms 90.15%
Req/Sec 0.87k 19.33 0.91k 85.33%
Latency Distribution
50% 1.13ms
75% 1.18ms
90% 1.27ms
99% 1.61ms
25956 requests in 30.01s, 6.88MB read
Requests/sec: 864.88
Transfer/sec: 234.68KB