Yeah, I just don't think at this point, it's like a particularly scientifically tractable question. I agree, it would probably be hard, but it's not an answerable question because I think to perhaps answer this question, you would need to know what this AI... The shape of this AI system perhaps or what this would look like, and we just don't have a sense of what this would look like. Imagine 300 years ago, people wondering about cars and thinking about what would happen if they got a flat tire. You would need to know what a tire is and how this tire works, or what the material that this tire is made of. So yeah, I agree it's a hard problem, it's just like... Yeah, not tractable at this point, or it's not easy to even formulate this as a tackable research question…
(from: Interviews with AI Researchers)

Counter-arguments:

Alignment research is in a pre-paradigmatic stage, and there is no consensus on when and how AGI might emerge. That being said, if we believe there is a serious risk, it would seem prudent to at least try and solve the challenges - instead of waiting and doing nothing.
There is a remarkable variety of research projects on the way that have produced concrete outputs. Some of them are more tractable than others. Collectively, there has been a tremendous amount of progress over the past 10 years. While no “alignment solution” exists, we now understand the problem quite a bit better. An outline of current research directions is presented below.
Some AI safety researchers consider it plausible that AGI will be developed mostly using present-day techniques, and that it will happen soon. In that case, there seems to be a very strong case that safety techniques should be researched now, building upon the capabilities we do have (even if they do not constitute “general” intelligence yet). Furthermore, work in AI alignment helps us understand current systems better and thus use them more responsibly and effectively.
The techniques used to build AGI in the future might have some relation to AI systems currently in use. Therefore, if we develop alignment strategies that work on current systems, we could lay valuable groundwork.
Currently, there are so few people working on AI safety that it seems worthwhile to increase the effort quite a bit, considering the magnitude of the problem. Also, consider how important AI will become over the next decades even before AGI is invented.
At the moment, very few people are working on the safety of future general AI systems (as opposed to improving ethical alignment of current narrow systems). It might be only around 300, while 10-100x as many people work on speeding up the progress towards general AI (see one estimate here, another one here). Obviously, there is no proof that advanced AI systems will cause catastrophe. But there are arguments from multiple directions indicating that there is at least some level of risk, and the consequences might be catastrophic. For a technology with such impacts, it seems like safety research is neglected.
Any progress on alignment, even without knowing how future AGI is built, would be a big step in the right direction. In the words of AI safety researcher Paul Cristiano: “For now, finding any plausible approach to alignment, that works for any setting of unknown details, would be a big accomplishment. With such an approach in hand we could start to ask how sensitive it is to the unknown details, but it seems premature to be pessimistic before even taking that first step.”
If research actually shows that AI alignment is not possible, then that is very important to know. In that case, we would need to rely on coordinating around not creating AGI at all to avoid catastrophe.

Summary of current alignment research:

Much present-day work on AI safety is only weakly dependent on the technical details of the architecture of AGI. Therefore, our ignorance about the details of future systems is not a huge problem. Additionally, the field of AI safety research is really broad and covers many different approaches. It’s likely that at least some of them will be quite useful—even in the future with new techniques. Here are some examples of work currently being done:

Mechanistic interpretability is an attempt to reverse engineer the detailed computations performed by a model. In the past, this has been done mostly on CNN vision models. Researchers at AI safety research company Anthropic have made progress on mechanistic interpretability for Transformer language models (Olsson et al., 2022). David Bau’s lab at Northeastern University has developed methods for finding specific factual associations in transformer language models and then editing them (Meng et al., 2022).
Redwood Research is working on reliability through adversarial training, on a state-of-the-art language model.
Leading AI companies OpenAI and DeepMind have teams working on alignment research. There are also academics and academic groups like CHAI, the NYU Alignment Research Group, Dan Hendrycks and Jacob Steinhardt at UC Berkeley, and David Krueger’s group at the University of Cambridge working on various aspects of the problem.

More resources