The most massive galaxies in the universe are rare, but because of this, their formation history imposes some of the strongest constraints on our models of galaxy formation. In the local universe, massive galaxies like M87 appear relatively dull, with elliptical morphologies, old stars, and little ongoing star formation. For decades, archeological studies predicted that most of the action during these galaxies’ formation must have occurred at much higher redshift (z > 2 or more). Deep and wide field surveys of the near infrared sky are now allowing us to directly observe the progenitors of local massive galaxies as they are forming. I will show state-of-the-art observations of this process up to z ~ 6, where we are finding that the early stages of massive galaxy formation are in fact extremely dynamic, with huge bursts of dust-obscured star formation, ubiquitous AGN activity, and significant morphological evolution. I will also discuss what we are just starting to learn in the JWST era, where we are observing the process of massive galaxy formation out to its initial stages at z ~ 10-15.