Speculative decoding reduces forward passes through a target model by using a cheaper draft model to guess future tokens, then verifying them in parallel.