My question is how does the algorithm decide when to stop? I see there are 5 observations in the bottom most nodes. Is there a threshold that when the number of observations is equal to that threshold the algorithm stops?

The help for the tree() function shows the following arguments, which include control

tree(formula, data, weights, subset,
na.action = na.pass, control = tree.control(nobs, ...),
method = "recursive.partition",
split = c("deviance", "gini"),
model = FALSE, x = FALSE, y = TRUE, wts = TRUE, ...)

The description of control is

control A list as returned by tree.control

The help for tree.control shows

Usage
tree.control(nobs, mincut = 5, minsize = 10, mindev = 0.01)
Arguments
nobs The number of observations in the training set.
mincut The minimum number of observations to include in either child node. This is a
weighted quantity; the observational weights are used to compute the ‘number’.
The default is 5.
minsize The smallest allowed node size: a weighted quantity. The default is 10.
mindev The within-node deviance must be at least this times that of the root node for the
node to be split.

So, it seems that the minimum number of observations to include in a child node is 5 by default but you can adjust that.