Data clustering based on Optimum-Path Forest (OPF) relies on a suitable estimation of the probability density function (pdf). The method essentially segments the domes of the pdf such that each dome is a cluster (optimum-path tree) rooted at a representative sample. The method has been successfully used for image segmentation and active learning, however, large datasets compromise its efficiency and the immediate solution affects its effectiveness. We have investigated divide-and- conquer approaches to circumvent the problem.
In active learning, a few labeled samples may compromise the design of the supervised pattern classifier based on OPF. We have developed semi-supervised learning approaches with effective label propagation from the supervised samples, in order to improve pattern classification based on OPF. We have also developed a python toolbox to teach the design of image processing and machine learning operators based on OPF.