Reading the Economist - Hpricot, Ruby-RSS, Festival

December 14th, 2007 posted by codders

Well, having the Economist read at any rate.

First, set up Festival (configuring it to use ALSA and an ‘English’ voice):

apt-get install festival
apt-get install festvox-rablpc16k
cat > ~/.festivalrc <<END
(Parameter.set 'Audio_Command "aplay -D plug:dmix -q -c 1 -t raw -f s16 -r \$SR \$FILE")
(Parameter.set 'Audio_Method 'Audio_Command)
(voice_rab_diphone)
END

Then liberally sprinkle some ruby:

#!/usr/bin/ruby

require 'rss/1.0'
require 'rss/2.0'
require 'open-uri'
require 'yaml'
require 'hpricot'
include YAML

TEMPFILE = "/tmp/economistreader"
puts "Fetching feed"
source = "http://www.economist.com/rss/full_print_edition_rss.xml"
content = ""
open(source) do |s| content = s.read end
rss = RSS::Parser.parse(content, false)

puts "Title: #{rss.channel.title}"
puts "Found #{rss.items.size} items"
for item in rss.items
  puts "#{item.title}"
  puts "Read? [Y/n]”
  if readline.strip.downcase =~ /^n/
    next
  end
  doc = Hpricot(open(item.link))
  paras = doc.search(”//div[@class='col-left']/p[@class='']“)
  File.open(”#{TEMPFILE}.body”, “w”) do |f|
    paras.each do |p|
      f.write(p.inner_text + “\n”)
      puts p.inner_text
    end
  end
  system(”festival”, “–tts”, “#{TEMPFILE}.body”)
end

I give it about 3 articles before the voice drives me completely insane. There’s a character-set issue that puts ‘?’s in odd places and causes Festival to get confused. Even without confusing characters, free text-to-speech software still isn’t ‘all that‘.
You could also, it’s worth pointing out, visit PimpMyNews. You’ll find the Economist’s feed under ‘Business/World Business News’. Unfortunately, they are lazy and their software only reads out the text from the RSS ‘Description’ field rather than parsing the whole article. That said, if what you want is to hear the first 200 words of every article in the Economist, that’s your badger.