Byte-packing / reading binary data in Ruby

November 29th, 2007 posted by codders

A lot of high-level languages don’t have wonderful native support for bit-level manipulation of data, so when you find yourself having to implement a proprietary wire protocol or parse a custom file header, you often feel a little lost. Fortunately you’re not the first person to feel lost, and some kind person has, by and large, already solved the problem for you. In Java, for example, there’s the fine commons.io library (in fact, all the commons libraries are pretty fine), and in Ruby you have BitStruct:

# Syntax here is
#    parsed-datatype
#    :symbol_name for the parsed data
#    field size in bits
#    comment
class BinaryHeader < BitStruct
  default_options :endian => :native
  char     :id, 4*8, "File ID"
  char     :format_string, 4*8, "Format String"
  unsigned :remainder, 32, "Remaining bytes"
  unsigned :trackId, 32, "Track ID"
  unsigned :formatId, 32, "Format ID"
  unsigned :codecId, 32, "Codec ID"
  unsigned :major, 16, "Major Version"
  unsigned :minor, 16, "Minor Version"
  unsigned :validation, 32, "Validation Data"
  unsigned :size, 32, "File Size"
end

There are a number of ways you can find binary data on your hands. You could read directly from a file / socket, you could generate some, or you could receive it Base64 encoded from a Webservices request:

def get_header(file)
  header_bytes_base64 = @webservice.getFileHeader(file.trackId)
  header_bytes = Base64.decode64(header_bytes_base64)
  if header_bytes.size != HEADER_SIZE
    puts "Invalid header size: #{header_bytes.size}"
    return nil
  end
  header = BinaryHeader.new(header_bytes)
  puts "Got Header:"
  puts header.inspect_detailed
  if header.id != "FILE"
    puts "Unknown file ID: #{id}"
    return nil
  end
  if header.format_string != "MP3X"
    puts "Unknown file format: #{format}"
    return nil
  end
  return header
end

Problem solved.