You can definitely run small experiments, but there are going to be important distributional shifts between toy systems and extremely capable situationally aware systems that know they can defeat you. You want a robust theory that you have reason to expect to scale to the real world, rather than relying solely on experiments, although experiments can give you useful data points.