Language is complex, but our labeled data sets generally aren't. For example, treebanks specify coarse categories like noun phrases and verb phrases, but they say nothing about richer phenomena like agreement, case, definiteness, and so on. In this talk, I will present a latent variable model for natural language parsing which is able to learn these underlying complexities automatically. I will describe a state-splitting approach which begins with a trivial X-bar grammar and learns to iteratively refine the previous grammar. In each step, latent variables are used to refine the previous model, until a final, full-complexity model is reached. Because each refinement introduces only limited additional complexity, learning can be done efficiently and effectively in a generative, as well as in a discriminative framework. In the generative variant, the latent variables are used to split grammar symbols. For example, noun phrases are first split into subjects and objects, then singular and plural, and so on. A split&merge technique is used to allocate the refinements only where necessary, allowing different grammar symbols to specialize to different degrees. I also present a discriminative multiscale variant which splits grammar rules rather than grammar symbols. In this approach, complexity need not be uniform across the entire grammar, allowing orders of magnitude of space savings. Both variants produce the best parsing accuracies across an array of languages, in a fully language-general fashion.