Unix & Linux Asked by Ben Dilts on December 1, 2021
I’m running pdftoppm
to convert a user-provided PDF into a 300DPI image. This works great, except if the user provides an PDF with a very large page size. pdftoppm
will allocate enough memory to hold a 300DPI image of that size in memory, which for a 100 inch square page is 100*300 * 100*300 * 4 bytes per pixel = 3.5GB. A malicious user could just give me a silly-large PDF and cause all kinds of problems.
So what I’d like to do is put some kind of hard limit on memory usage for a child process I’m about to run–just have the process die if it tries to allocate more than, say, 500MB of memory. Is that possible?
I don’t think ulimit can be used for this, but is there a one-process equivalent?
Not really an answer to the question as posed, but:
Could you check the file-size, to prevent issues BEFORE trying to process a pdf? That would remove the "ridiculously large" issue.
There are also programs that will process a pdf (there are python programs, for instance: http://theautomatic.net/2020/01/21/how-to-read-pdf-files-with-python/) whereby one could split the pdf into more manageable-sized chunks. Or do both: if the file-size is reasonable, process it; otherwise (else) split it into as many pieces as required, and process those. One could then re-combine the outputs. One might need to have some overlap between sections to prevent "border" issues.
Limiting the available memory might well force a failure to process larger files, or lead to massive memory swap issues.
Answered by John Beck on December 1, 2021
On any systemd-based distro you can also use cgroups indirectly through systemd-run. E.g. for your case of limiting pdftoppm
to 500M of RAM, use:
systemd-run --scope -p MemoryMax=500M pdftoppm
Note: this will ask you for a password but the app gets launched as your user. Do not allow this to delude you into thinking that the command needs sudo
, because that would cause the command to run under root, which was hardly your intention.
If you don't want to enter the password (indeed, why would you need a password to limit memory you already own), you could use --user
option, however for this to work you will need cgroupsv2 support enabled, which right now requires to boot with systemd.unified_cgroup_hierarchy
kernel parameter.
Answered by Hi-Angel on December 1, 2021
I'm running Ubuntu 18.04.2 LTS and JanKanis script doesn't work for me quite as he suggests. Running limitmem 100M script
is limiting 100MB of RAM with unlimited swap.
Running limitmem 100M -s 100M script
fails silently as cgget -g "memory:$cgname"
has no parameter named memory.memsw.limit_in_bytes
.
So I disabled swap:
# create cgroup
sudo cgcreate -g "memory:$cgname"
sudo cgset -r memory.limit_in_bytes="$limit" "$cgname"
sudo cgset -r memory.swappiness=0 "$cgname"
bytes_limit=`cgget -g "memory:$cgname" | grep memory.limit_in_bytes | cut -d -f2`
Answered by d9ngle on December 1, 2021
I'm using the below script, which works great. It uses cgroups through Update: it now uses the commands from cgmanager
.cgroup-tools
. Name this script limitmem
and put it in your $PATH and you can use it like limitmem 100M bash
. This will limit both memory and swap usage. To limit just memory remove the line with memory.memsw.limit_in_bytes
.
edit: On default Linux installations this only limits memory usage, not swap usage. To enable swap usage limiting, you need to enable swap accounting on your Linux system. Do that by setting/adding swapaccount=1
in /etc/default/grub
so it looks something like
GRUB_CMDLINE_LINUX="swapaccount=1"
Then run sudo update-grub
and reboot.
Disclaimer: I wouldn't be surprised if cgroup-tools
also breaks in the future. The correct solution would be to use the systemd api's for cgroup management but there are no command line tools for that a.t.m.
edit (2021): Until now this script still works, but it goes against Linux's recommendation to have a single program manage your cgroups. Nowadays that program is usually systemd. Unfortunately systemd has a number of limitations that make it difficult to replace this script with systemd invocations. The systemd-run --user
command should allow a user to run a program with resource limitations, but that isn't supported on cgroups v1. (Everyone uses cgroups v1 because docker doesn't work on cgroupsv2 yet except for the very latest versions.) With root access (which this script also requires) it should be possible to use systemd-run
to create the correct systemd-supported cgroups, and then manually set the memory and swap properties in the right cgroup, but that is still to be implemented. See also this bug comment for context, and here and here for relevant documentation.
According to @Mikko's comment using a script like this with systemd runs the risk of systemd losing track of processes in a sessions. I haven't noticed such problems, but I use this script mostly on a single-user machine.
#!/bin/sh
# This script uses commands from the cgroup-tools package. The cgroup-tools commands access the cgroup filesystem directly which is against the (new-ish) kernel's requirement that cgroups are managed by a single entity (which usually will be systemd). Additionally there is a v2 cgroup api in development which will probably replace the existing api at some point. So expect this script to break in the future. The correct way forward would be to use systemd's apis to create the cgroups, but afaik systemd currently (feb 2018) only exposes dbus apis for which there are no command line tools yet, and I didn't feel like writing those.
# strict mode: error if commands fail or if unset variables are used
set -eu
if [ "$#" -lt 2 ]
then
echo Usage: `basename $0` "<limit> <command>..."
echo or: `basename $0` "<memlimit> -s <swaplimit> <command>..."
exit 1
fi
cgname="limitmem_$$"
# parse command line args and find limits
limit="$1"
swaplimit="$limit"
shift
if [ "$1" = "-s" ]
then
shift
swaplimit="$1"
shift
fi
if [ "$1" = -- ]
then
shift
fi
if [ "$limit" = "$swaplimit" ]
then
memsw=0
echo "limiting memory to $limit (cgroup $cgname) for command $@" >&2
else
memsw=1
echo "limiting memory to $limit and total virtual memory to $swaplimit (cgroup $cgname) for command $@" >&2
fi
# create cgroup
sudo cgcreate -g "memory:$cgname"
sudo cgset -r memory.limit_in_bytes="$limit" "$cgname"
bytes_limit=`cgget -g "memory:$cgname" | grep memory.limit_in_bytes | cut -d -f2`
# try also limiting swap usage, but this fails if the system has no swap
if sudo cgset -r memory.memsw.limit_in_bytes="$swaplimit" "$cgname"
then
bytes_swap_limit=`cgget -g "memory:$cgname" | grep memory.memsw.limit_in_bytes | cut -d -f2`
else
echo "failed to limit swap"
memsw=0
fi
# create a waiting sudo'd process that will delete the cgroup once we're done. This prevents the user needing to enter their password to sudo again after the main command exists, which may take longer than sudo's timeout.
tmpdir=${XDG_RUNTIME_DIR:-$TMPDIR}
tmpdir=${tmpdir:-/tmp}
fifo="$tmpdir/limitmem_$$_cgroup_closer"
mkfifo --mode=u=rw,go= "$fifo"
sudo -b sh -c "head -c1 '$fifo' >/dev/null ; cgdelete -g 'memory:$cgname'"
# spawn subshell to run in the cgroup. If the command fails we still want to remove the cgroup so unset '-e'.
set +e
(
set -e
# move subshell into cgroup
sudo cgclassify -g "memory:$cgname" --sticky `sh -c 'echo $PPID'` # $$ returns the main shell's pid, not this subshell's.
exec "$@"
)
# grab exit code
exitcode=$?
set -e
# show memory usage summary
peak_mem=`cgget -g "memory:$cgname" | grep memory.max_usage_in_bytes | cut -d -f2`
failcount=`cgget -g "memory:$cgname" | grep memory.failcnt | cut -d -f2`
percent=`expr "$peak_mem" / ( "$bytes_limit" / 100 )`
echo "peak memory used: $peak_mem ($percent%); exceeded limit $failcount times" >&2
if [ "$memsw" = 1 ]
then
peak_swap=`cgget -g "memory:$cgname" | grep memory.memsw.max_usage_in_bytes | cut -d -f2`
swap_failcount=`cgget -g "memory:$cgname" |grep memory.memsw.failcnt | cut -d -f2`
swap_percent=`expr "$peak_swap" / ( "$bytes_swap_limit" / 100 )`
echo "peak virtual memory used: $peak_swap ($swap_percent%); exceeded limit $swap_failcount times" >&2
fi
# remove cgroup by sending a byte through the pipe
echo 1 > "$fifo"
rm "$fifo"
exit $exitcode
Answered by JanKanis on December 1, 2021
In addition to the tools from daemontools
, suggested by Mark Johnson, you can also consider chpst
which is found in runit
. Runit itself is bundled in busybox
, so you might already have it installed.
The man page of chpst
shows the option:
-m bytes limit memory. Limit the data segment, stack segment, locked physical pages, and total of all segment per process to bytes bytes each.
Answered by oz123 on December 1, 2021
Another way to limit this is to use Linux's control groups. This is especially useful if you want to limit a process's (or group of processes') allocation of physical memory distinctly from virtual memory. For example:
cgcreate -g memory:myGroup
echo 500M > /sys/fs/cgroup/memory/myGroup/memory.limit_in_bytes
echo 5G > /sys/fs/cgroup/memory/myGroup/memory.memsw.limit_in_bytes
will create a control group named myGroup
, cap the set of processes run under myGroup
up to 500 MB of physical memory with memory.limit_in_bytes
and up to 5000 MB of physical and swap memory together with memory.memsw.limit_in_bytes
.
More info about these options can be found here: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-memory
To run a process under the control group:
cgexec -g memory:myGroup pdftoppm
Note that on a modern Ubuntu distribution this example requires installing the cgroup-bin
package and editing /etc/default/grub
to change GRUB_CMDLINE_LINUX_DEFAULT
to:
GRUB_CMDLINE_LINUX_DEFAULT="cgroup_enable=memory swapaccount=1"
and then running sudo update-grub
and rebooting to boot with the new kernel boot parameters.
Answered by user65369 on December 1, 2021
There's some problems with ulimit. Here's a useful read on the topic: Limiting time and memory consumption of a program in Linux, which lead to the timeout tool, which lets you cage a process (and its forks) by time or memory consumption.
The timeout tool requires Perl 5+ and the /proc
filesystem mounted. After that you copy the tool to e.g. /usr/local/bin
like so:
curl https://raw.githubusercontent.com/pshved/timeout/master/timeout |
sudo tee /usr/local/bin/timeout && sudo chmod 755 /usr/local/bin/timeout
After that, you can 'cage' your process by memory consumption as in your question like so:
timeout -m 500 pdftoppm Sample.pdf
Alternatively you could use -t <seconds>
and -x <hertz>
to respectively limit the process by time or CPU constraints.
The way this tool works is by checking multiple times per second if the spawned process has not oversubscribed its set boundaries. This means there actually is a small window where a process could potentially be oversubscribing before timeout notices and kills the process.
A more correct approach would hence likely involve cgroups, but that is much more involved to set up, even if you'd use Docker or runC, which among things, offer a more user-friendly abstraction around cgroups.
Answered by kvz on December 1, 2021
If your process doesn't spawn more children that consume the most memory, you may use setrlimit
function. More common user interface for that is using ulimit
command of the shell:
$ ulimit -Sv 500000 # Set ~500 mb limit
$ pdftoppm ...
This will only limit "virtual" memory of your process, taking into account—and limiting—the memory the process being invoked shares with other processes, and the memory mapped but not reserved (for instance, Java's large heap). Still, virtual memory is the closest approximation for processes that grow really large, making the said errors insignificant.
If your program spawns children, and it's them which allocate memory, it becomes more complex, and you should write auxiliary scripts to run processes under your control. I wrote in my blog, why and how.
Answered by P Shved on December 1, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP