I think what offpolicy was trying to clumsily say is that policy evaluation (I come from the economic policy econometrics world originally) can be used for RL.
Maybe it can, but isn't Bayesian stuff really costly most of the time?
How? AFAIK policy analysis is based on asking causal and counterfactual questions, primarily by trying to find a quasi-control-population that can be a proxy for the intervention and then regressing against it vs the observational data. I forget the name specifically. Causal models would represent this explicitly and you could reason about the model to see if your economics question is well posed. Nobody does this now but it is important to do because it lays out the assumptions which can't be hidden behind politics.
RL is for training system parameters based on positive or negative reinforcement from a critic. RL is based on a Markov decision process. RL has policy search idea but that is separate from economic policy evaluation.
Maybe it can, but isn't Bayesian stuff really costly most of the time?