The text-to-speech feature in Captivate 4 comes from a company called Neospeech. The feature is compatible with VTML(VoiceText Markup Language), which you can use to make the voices sound more natural.
Information about the VTML language can be found here in this PDF: http://www.neospeech.com/manual/vt_kor-Engine-API-References-v3.7.0%20(english_translation).pdf
If you go to Appendix C: VTML Tagset you will see some commands that you can use directly in the text in Captivate. Once you convert your text to speech, these tags will be interpreted by the Neospeech engine.
Update: English manual for version 3.9 can be found here: https://ondemand.neospeech.com/vt_eng-Engine-VTML-v3.9.0-3.pdf
A couple of examples:
<vtml_break level=”0″ | “1” | “2”/>