Friday, October 5th, 2007...9:31 am
Reading An Excel File With Ruby
This tutorial will cover how to read (or parse) an excel file with ruby. I had to write a script to unpivot some data for a co-worker that saved him hours of time, and I got to write a ruby script, so it was a win-win. Here's how you can do the same thing.
Installing Parseexcel
Parseexcel is a ruby port of the perl parseexcel module.
It's installable via a nice gem like so:
That's that, now remember since it's a gem library we have to tell our script to use the gem libs when we run the script from the console using the -rubygems switch.
Using Parseexcel
Parseexcel is a very straight forward library, you can't do too much with it, but it gets the job done.
Getting a Workbook
This returns the actual excel file's workbook, from there we need to determine what worksheet we're on.
Getting a Worksheet
Will return the first worksheet, you could also use the each method on the workbook to iterate over all the worksheets.
Iterating over rows and columns
The worksheet object has a very nice each method that will allow us to iterate over the rows like so
j=0
i=0
if row != nil
row.each { |cell|
if cell != nil
contents = cell.to_s('latin1')
puts "Row: #{j} Cell: #{i} #{contents}"
end
i = i+1
}
j = j +1
end
}
Getting Cell Data
- Getting a String: cell.to_s('latin1')
- Getting a Float: cell.to_s('latin1')
- Getting a Int: cell.to_i
- Getting a Date: cell.date
Getting A Specific Cell
A basic script for dumping an excel file
#Open the excel file passed in from the commandline
workbook = Spreadsheet::ParseExcel.parse(ARGV[0])
#Get the first worksheet
worksheet = workbook.worksheet(0)
#cycle over every row
worksheet.each { |row|
j=0
i=0
if row != nil
#cycle over each cell in this row if it's not an empty row
row.each { |cell|
if cell != nil
#Get the contents of the cell as a string
contents = cell.to_s('latin1')
puts "Row: #{j} Cell: #{i}> #{contents}"
end
i = i+1
}
end
}
To run the script, remember to use the -rubygems switch so that you can find the parsexcel library.







16 Comments
October 5th, 2007 at 1:09 pm
Where does j get incremented?
October 6th, 2007 at 2:05 am
Nice to know this exists.
@Matt: it looks like the incrementer for 'j' is missing.
However, assuming the collections use the Enumerable mixin it could be rewritten as:
worksheet.each_with_index { |row, j|
if row != nil
#cycle over each cell in this row if it's not an empty row
row.each_with_index { |cell, i|
if cell != nil
#Get the contents of the cell as a string
contents = cell.to_s('latin1')
puts "Row: #{j} Cell: #{i}> #{contents}"
end
}
end
}
October 17th, 2007 at 2:17 pm
The Parseexcel gem has horrible horrible memory leaks and should not be used in a production environment.
October 20th, 2007 at 11:32 am
@Ned: can you suggest an alternative?
October 22nd, 2007 at 4:16 am
What are the *horrible* memory leaks?
October 23rd, 2007 at 12:03 pm
To run the script, remember to use the -rubygems switch so that you can find the parsexcel library.
.... you can add it to your source like require 'rubygems'. But it has to be the first require. And it's only needed under *nix enviroments
November 5th, 2007 at 5:04 am
A couple problems:
If a cell contains an integer, the to_s conversion returns it as a float, e.g. 12 gives "12.0" (I have a column that contains house numbers or names).
I cannot extract cells containing dates: nil, empty string or zero is returned for all conversions, date, to_s, to_i.
These issues could be caused by excel, or me, or both.
BTW: There is a rape and paste error on the 'Getting a Float', should be to_f.
An explanation of the 'Latin1' parameter to to_s would be nice.
November 5th, 2007 at 5:09 am
Update: the date conversion probably works fine. I didn't realise, but the date cells I was trying to extract were calculated (formulas). I don't suppose this library is supposed to be able to handle that (but it would be nice)!
November 6th, 2007 at 5:38 pm
When I try the sample script I get the following message "default formats are encoded in ISO-8859-1". I'm not sure what it wants me to do.
November 6th, 2007 at 6:04 pm
Ok, ignore the previous comment. I figured out that the message is just a standard message when you start using parseexcel. The main problem is that it's not reading my spreadsheet properly :(
November 12th, 2007 at 11:11 pm
[...] Reading an Excel file with Ruby [...]
December 3rd, 2007 at 9:34 am
[...] Reading An Excel File With Ruby Рабы корпораций, это специально для вас Чтение данных из Excel-файлов [...]
January 27th, 2008 at 1:15 pm
[...] Reading Excel Files From RubyWe Heart Code posts a tutorial showing how to use the Parseexcel library, a port of a Perl library that lets you parse Excel spreadsheets from your code.How To Debug Your Rails App With ruby-debugPatrick Lenz presents a great article that shows how to debug a Rails application from start to finish.Economical Use of Amazon S3 with Ruby on RailsRobert Dempsey has written a comprehensive article on how to use S3 in an economic, efficient way from Ruby on Rails applications.Ruby on Rails vs ColdFusion CommercialThe ever active RailsEnvy guys have produced their eighth Ruby on Rails vs X commercial.. presenting Ruby on Rails vs ColdFusion. [...]
May 22nd, 2008 at 2:35 pm
I was having problems getting the initial example working. The following did, was sucessful ...
require 'parseexcel'
workbook = Spreadsheet::ParseExcel.parse(ARGV[0])
worksheet = workbook.worksheet(1)
col = ('A'..'Z').to_a # Only good for up to 26 columns
j = 1
worksheet.each { |row|
i = 0
if row != nil
row.each { |cell|
if cell != nil
contents = cell.to_s('latin1')
puts "#{col[i]}#{j}: #{contents}"
end
i += 1
}
j += 1
# if j == 6 then break end # Uncomment to limit to a certain number of rows
end
}
The key for me was not using worksheet(0) [didn't work for me], but worksheet(1) [worked!].
May 22nd, 2008 at 2:39 pm
I was having problems getting the initial example working. The following however, was successful ...
require 'parseexcel'
workbook = Spreadsheet::ParseExcel.parse(ARGV[0])
worksheet = workbook.worksheet(1)
col = ('A'..'Z').to_a # Only good for up to 26 columns
j = 1
worksheet.each { |row|
i = 0
if row != nil
row.each { |cell|
if cell != nil
contents = cell.to_s('latin1')
puts "#{col[i]}#{j}: #{contents}"
end
i += 1
}
j += 1
# if j == 6 then break end # Uncomment to limit to a certain number of rows
end
}
The key for me was not using worksheet(0) [didn't work for me], but worksheet(1) [worked!].
June 4th, 2008 at 3:44 pm
I'm having a problem with parseexcel
When I try to use cell.to_s('latin1') I get an error for wrong number of arguments (1 for 0)
For some reason Ruby is not seeing parseexcel's to_s method. Any ideas what is going on here? (I tried with -rubygems to no avail)
Leave a Reply