It's hard when it's a cobbled-together set of systems sending signals to each other, some of which are automated by human robots (e.g. main system sees exit, files a dozen tickets, some of these get picked up by different automation, some of them cause humans to trigger other processes, each of which can again file half a dozen tickets in three different systems...)
No, it's not. Make every step check the existence of a kill-swith entry in a shared resource, say, zookeeper, mysql, etc., before they act. If it's there, halt. A human than can do the manual cleanup.
Yes and no. You’re assuming all systems have a sane way of integrating. You’re also assuming that whoever build this has knowledge of all moving parts and has thought through all edge cases.
When you put in the killswitch, is this for a step, for a workflow, for a subsystem, for the whole thing? It’s alluring to think that you have the option to stop everything, but do you really want to stop everything if there are thousands of things happening in the system?
Can a human think through what happens to what in this situation? If the builders of the system missed it, what are the odds that an operator will catch it?
For example: gas stations. They have a big red button that stops everything. If a pump if on fire it makes sense. If the trash can near the pump is full and you cannot throw your garbage away would anyone stop everything for the can to be emptied?