Large-scale data containing multiple important rare clusters, even at moderately high dimensions, poses challenges for existing clustering methods. To address this issue, we propose a new mixture model, namely the Hidden Markov Model on Variable Blocks (HMM-VB), and a new mode search algorithm called Modal Baum-Welch (MBW) for mode-association clustering. HMM-VB incorporates prior knowledge on the chain-like dependence structure among groups of variables, achieving the effect of dimension reduction as well as incisive modeling of the rare clusters. When such prior knowledge is unavailable or assumed for the sake of parsimonious modeling, we develop a recursive search algorithm to optimize the dependence structure among variables based on BIC. The new MBW algorithm ensures the feasibility of clustering via mode association, achieving linear complexity in terms of the number of variable blocks despite the exponentially growing number of state combinations in HMM-VB. In a series of experiments on simulated and real data, our proposed method outperforms other widely used methods. In addition, we prove theories about the identifiability of HMM-VB and the consistency of our approach to search for the variable dependence structure in a special case. We also explore a more general mixture model with latent Bayesian networks (MLBN). By developing a new Baum-Welch type algorithm on directed acyclic graphs, we obtain an exact EM algorithm and a mode search algorithm for MLBN.