bash - How to count number of forked (sub-?)processes -
somebody else has written (tm) bash script forks many sub-processes. needs optimization. i'm looking way measure "how bad" problem is.
can / how count says how many sub-processes forked script all-in-all / recursively?
this simplified version of existing, forking code looks - poor man's grep:
#!/bin/bash file=/tmp/1000lines.txt match=$1 let cnt=0 while read line cnt=`expr $cnt + 1` linearray[$cnt]="${line}" done < $file totallines=$cnt cnt=0 while [ $cnt -lt $totallines ] cnt=`expr $cnt + 1` matches=`echo ${linearray[$cnt]}|grep $match` if [ "$matches" ] ; echo ${linearray[$cnt]} fi done
it takes script 20 seconds $1
in 1000 lines of input. code forks way many sub-processes. in real code, there longer pipes (e.g. proga | progb | progc
) operating on each line using grep
, cut
, awk
, sed
, on.
this busy system lots of other stuff going on, count of how many processes forked on entire system during run-time of script of use me, i'd prefer count of processes started script , descendants. , guess analyze script , count myself, script long , rather complicated, i'd instrument counter debugging, if possible.
to clarify:
- i'm not looking number of processes under
$$
@ given time (e.g. viaps
), number of processes run during entire life of script. - i'm not looking faster version of particular example script (i can that). i'm looking way determine of 30+ scripts optimize first use bash built-ins.
you can count fork
ed processes trapping sigchld signal. if can edit script file can this:
set -o monitor # or set -m trap "((++fork))" chld
so fork
variable contain number of forks. @ end can print value:
echo $fork forks
for 1000 lines input file print:
3000 forks
this code forks 2 reasons. 1 each expr ...
, 1 `echo ...|grep...`
. in reading while-loop fork
s every time when line read; in processing while-loop fork
s 2 times (one because of expr ...
, 1 `echo ...|grep ...`
). 1000 lines file forks 3000 times.
but not exact! forks done calling shell. there more forks, because `echo ...|grep...`
forks start bash run code. after forks twice: 1 echo
, 1 grep
. 3 fork
s, not one. rather 5000 forks, not 3000.
if need count forks of forks (of forks...) (or cannot modify bash script or want other script), more exact solution can used
strace -fo s.log ./x.sh
it print lines this:
30934 execve("./x.sh", ["./x.sh"], [/* 61 vars */]) = 0
then need count unique pids using (first number pid):
awk '{n[$1]}end{print length(n)}' s.log
in case of script got 5001
(the +1 pid of original bash script).
comments
actually in case fork
s can avoided:
instead of
cnt=`expr $cnt + 1`
use
((++cnt))
instead of
matches=`echo ${linearray[$cnt]}|grep $match` if [ "$matches" ] ; echo ${linearray[$cnt]} fi
you can use bash's internal pattern matching:
[[ ${linearray[cnt]} =~ $match ]] && echo ${linearray[cnt]}
mind bash =~
uses ere not re (like grep). behave egrep (or grep -e
), not grep.
i assume defined linearray
not pointless (otherwise in reading loop matching tested , linearray
not needed) , used other purpose well. in case may suggest little bit shorter version:
readarray -t linearray <infile line in "${linearray[@]}";{ [[ $line} =~ $match ]] && echo $line; }
first line reads complete infile
linearray
without loop. second line process array element-by-element.
measures
original script 1000 lines (on cygwin):
$ time ./test.sh 3000 forks real 0m48.725s user 0m14.107s sys 0m30.659s
modified version
forks real 0m0.075s user 0m0.031s sys 0m0.031s
same on linux:
3000 forks real 0m4.745s user 0m1.015s sys 0m4.396s
and
forks real 0m0.028s user 0m0.022s sys 0m0.005s
so version uses no fork
(or clone
) @ all. may suggest use version small (<100 kib) files. in other cases grap, egrep, awk on performs pure bash solution. should checked performance test.
for thousand lines on linux got following:
$ time grep solaris infile # solaris not in infile real 0m0.001s user 0m0.000s sys 0m0.001s
Comments
Post a Comment