The Monte Carlo policy gradient