bash - How to count number of forked (sub-?)processes -
somebody else has written (tm) bash script forks many sub-processes. needs optimization. i'm looking way measure "how bad" problem is.
can / how count says how many sub-processes forked script all-in-all / recursively?
this simplified version of existing, forking code looks - poor man's grep:
#!/bin/bash file=/tmp/1000lines.txt match=$1 let cnt=0 while read line cnt=`expr $cnt + 1` linearray[$cnt]="${line}" done < $file totallines=$cnt cnt=0 while [ $cnt -lt $totallines ] cnt=`expr $cnt + 1` matches=`echo ${linearray[$cnt]}|grep $match` if [ "$matches" ] ; echo ${linearray[$cnt]} fi done it takes script 20 seconds $1 in 1000 lines of input. code forks way many sub-processes. in real code, there longer pipes (e.g. proga | progb | progc) operating on each line using grep, cut, awk, sed , on.
this busy system lots of other stuff going on, count of how many processes forked on entire system during run-time of script of use me, i'd prefer count of processes started script , descendants. , guess analyze script , count myself, script long , rather complicated, i'd instrument counter debugging, if possible.
to clarify:
- i'm not looking number of processes under
$$@ given time (e.g. viaps), number of processes run during entire life of script. - i'm not looking faster version of particular example script (i can that). i'm looking way determine of 30+ scripts optimize first use bash built-ins.
you can count forked processes trapping sigchld signal. if can edit script file can this:
set -o monitor # or set -m trap "((++fork))" chld so fork variable contain number of forks. @ end can print value:
echo $fork forks for 1000 lines input file print:
3000 forks this code forks 2 reasons. 1 each expr ... , 1 `echo ...|grep...`. in reading while-loop forks every time when line read; in processing while-loop forks 2 times (one because of expr ... , 1 `echo ...|grep ...`). 1000 lines file forks 3000 times.
but not exact! forks done calling shell. there more forks, because `echo ...|grep...` forks start bash run code. after forks twice: 1 echo , 1 grep. 3 forks, not one. rather 5000 forks, not 3000.
if need count forks of forks (of forks...) (or cannot modify bash script or want other script), more exact solution can used
strace -fo s.log ./x.sh it print lines this:
30934 execve("./x.sh", ["./x.sh"], [/* 61 vars */]) = 0 then need count unique pids using (first number pid):
awk '{n[$1]}end{print length(n)}' s.log in case of script got 5001 (the +1 pid of original bash script).
comments
actually in case forks can avoided:
instead of
cnt=`expr $cnt + 1` use
((++cnt)) instead of
matches=`echo ${linearray[$cnt]}|grep $match` if [ "$matches" ] ; echo ${linearray[$cnt]} fi you can use bash's internal pattern matching:
[[ ${linearray[cnt]} =~ $match ]] && echo ${linearray[cnt]} mind bash =~ uses ere not re (like grep). behave egrep (or grep -e), not grep.
i assume defined linearray not pointless (otherwise in reading loop matching tested , linearray not needed) , used other purpose well. in case may suggest little bit shorter version:
readarray -t linearray <infile line in "${linearray[@]}";{ [[ $line} =~ $match ]] && echo $line; } first line reads complete infile linearray without loop. second line process array element-by-element.
measures
original script 1000 lines (on cygwin):
$ time ./test.sh 3000 forks real 0m48.725s user 0m14.107s sys 0m30.659s modified version
forks real 0m0.075s user 0m0.031s sys 0m0.031s same on linux:
3000 forks real 0m4.745s user 0m1.015s sys 0m4.396s and
forks real 0m0.028s user 0m0.022s sys 0m0.005s so version uses no fork (or clone) @ all. may suggest use version small (<100 kib) files. in other cases grap, egrep, awk on performs pure bash solution. should checked performance test.
for thousand lines on linux got following:
$ time grep solaris infile # solaris not in infile real 0m0.001s user 0m0.000s sys 0m0.001s
Comments
Post a Comment