Launch external process with pipes from processx

I want to start and stop an external process from R. I found the processx package for this, however, It does not seem to work if my command line argument contains pipes. In principal I want to run the following command:

top "cbd1 | egrep -v \"Tasks|Cpu|Mem|Swap|PID|top icbd|(.*0\\.0\\s*0.0.*)|^$\" | tee -a /tmp/Rtmpy2D1CZ/file26b51cf0b724"

this starts top (system ressource monitor) and logs lines that match a certain regex pattern to a temporary file.
Has anyone dealt with a similar issue before?

This is not a single external process, but three processes (top, egrep and tee), connected by a pipe.

Since this is already Unix specific, I guess you could just start a shell and pass your command line. The caveat is that it is not so easy to kill all the grandchildren processes, i.e. the children of the shell, top, egrep and tee.

Anyway, a naive solution is something like this (I cannot run your exact command, as it does not work on my system):

p <- processx::process$new("bash", c("-c", "seq 1 10 | sort"), stdout = "|")
p$read_all_output_lines()
#>  [1] "1"  "10" "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"

You could use the pkill tool to kill the shell together with its children. (Note that pkill behaves wildly differently on different systems.) Although these particular processes should finish and quit quite quickly, anyway.

Hmm thanks i'll see where this takes me.
I don't need the output of the command as I just need the physical file that tee writes to disk.

Basically I just need the pid of top, since when i kill top egrep and tee will also terminate. So theoretically just using system2 if I have a reliable way to figure out the pid of top would also suffice. I am running this on a multiuser system so I am not sure how to identify the correct top process amongs (theoretically) several top processes

anyways, your solution seems to work for me (killing bash also kills the processes) thx :slight_smile:

So if you don't want to run this in the background, you can also use processx::run(), that is way simpler. Or just system2() if you don't need advanced features, like callbacks for output, our time limits.

To figure out the pid of top, you could write a small shell script, that does it, and call that from R. Otherwise there will be no reliable way to get the pid of top I am afraid, especially not with with system2, because that even loses parent pid information, as it uses an intermediate shell.

I doubt that killing bash would kill the child processes, to be honest, in general I don't think that is true.

What i want to do is basically this

start_logging_cpu_usage_to_disk()
do_stuff()
stop_logging_cpu_usage()
read_log()

I want the cpu usage to log to disk, just in case do_stuff() crashes.
Seems doable with the approach right now, so i'll fiddle around with what I got.

Right now just killing bash seems sufficent on my test system (where it definitely ends the process - i tested it).
If I find a cleaner way I might put it in a small package.

Oh, I see, so this would be running in the background, continuously, I missed that, sorry. I thought that this top invocation is a one shot command that just returns.

Btw. this is a call that works for me on Linux, I had to fix the quotes, the options and the filename:

top cb -d1 | egrep -v "Tasks|Cpu|Mem|Swap|PID|top icbd|(.*0\\.0\\s*0.0.*)|^$" | tee -a foobar

(It does log a lot of things, though....)

Anyway, so in this case, yes, start up bash, and if top is smart enough to quit when the parent bash is killed that's good news.

Yes, what I pasted above was how it looks as an escaped R character string (thats also why you have all those double-backslashes). Just for the sake of completeness:

# for R 
"top cbd1 -w512 | egrep -v \"Tasks|Cpu|Mem|Swap|PID|top icbd|(.*0\\.0\\s*0.0.*)|^$\" | tee -a /tmp/RtmpxjZE3n/file5fce2ac17135"`

# for bash
top cbd1 -w512 | egrep -v "Tasks|Cpu|Mem|Swap|PID|top icbd|(.*0\.0\s*0.0.*)|^$" | tee -a /tmp/RtmpxjZE3n/file5fce2ac17135