By Corey Adler
This post is the sixth in a series discussing EPPlus, an open-sourced tool for interacting code with Excel. In my most recent post I put all the previous posts to good use in creating an Export function, given a set of data, which created a spreadsheet with style, number formatting, formulas, and more. For this post I would like to discuss the flow of data going in the opposite direction—an Import function from Excel, where the data contained in the Excel file goes straight to the database. Unlike an Export function, there are many possible tricks (and pitfalls) in dealing with an Import function—many of which are typically delegated by application requirements. I will be showing 2 different basic ways of importing the data, while leaving up to you any ways to improve upon my methods to create a more advanced import process. I am using the same solution, database, and tools as found in the last post.
Here’s the data that I’ll be importing into the system:
I’ll start off with what I’m having returned from the first Import service call. I’ve created an enum, called ImportResult, to showcase the 4 different possible outcomes of this method: That it imported successfully, that the file was not formatted correctly (more on that later), that the file was not an Excel file suitable for use with EPPlus, or that the file is open in another process. You’ll see for the second process that I return something far different instead.
Given what’s being returned, let’s take a look at the start of the Import process.
The first thing that I’m doing with my import is to check to make sure that the file being sent into the import is one that can be used by the Import process. If the file doesn’t end with “xlsx”—like a newer format Excel file—then it returns the NotExcel result. It’s important to remember that EPPlus does not deal with older Excel files. Newer Excel files (.xlsx extension) utilize XML for the underlying data and styling of everything. EPPlus, at heart, manipulates this XML to create/edit the file. It would have no idea how to actually deal with the older format, which does not use XML. Afterwards I’m checking to see that there’s something on the sheet, and that the sheet’s columns matches the ones created by the Export—otherwise it will return the WrongFormat result.
It’s important to note that this is certainly not ideal. Checking to see that it’s the exact same sheet is probably not good practice. It’s also certainly not a good idea to hard code what the column headers should be, and in which cells they should specifically be located. You’ll see in the 2nd version an improvement on this method for the checking the format of the file.
Next, let’s figure out the last row that contains records in the worksheet, so that we know when to stop looking for more records in the spreadsheet.
One thing to keep in mind here is that simply calling worksheet.Dimension.End.Row does not automatically return the row with the last record. It certainly would be easier all around if it did, but it unfortunately does not. Hence the need for the subsequent while loop. The while loop goes backwards from what it considers the end row, looking for any value in each row (for the first 3 columns), and breaks when it finds something. The reason I’m not looking in the 4th and 5th columns is because, if it is the sheet that was exported, there might be information about the grand total and top performer totals. I wouldn’t want the subsequent loop to think that that row contains information that should be imported into the system (even if it wouldn’t actually work when trying to save that row, as you’ll see next).
Now onto the rest of the Import.
Starting from the second row (which is the first row with data—the first row contains the headers), each row is converted into PersonData records, which are then immediately saved to the database with a try-catch block surrounding the save call. So why save them individually instead of all at once? Because I don’t want one bad apple poisoning preventing saving the other records. This first import method contains nothing located in the catch block—that will be changed in the next method. Once the save is completed, a Success result is returned. What happens after the result comes back is up to you—I simply display a MessageBox that says “Success”.
There are, of course, multiple problems with this method. First that the method does not return how many records were imported and how many failed. Secondly, everything about this method is static and rigid. The spreadsheet must be in order and perfect and the data must not contain any errors. The final problem comes when you try to import a file that’s already currently opened up in Excel. When trying to open the ExcelPackage an error will be generated. That’s because running Excel puts a lock on the file in use. Your import method trying to open it runs into that lock and fails.
So let’s try that one again, shall we? Let’s start off by changing the beginning of the method.
You can immediately see some major changes in this version of the Import. Instead of returning the enum, I’m instead returning a list of string—any messages that you want passed on the user after the Import process completes, whether successful or not. You can see examples of utilizing these messages in the changes to the error checkers at the bottom of the above picture. Connected to this is the tracking of the total number of records successfully imported. That number will also be passed into a message for resultMessages (which will be returned). The next thing is that I’ve surrounded the ExcelPackage using statement in a try-catch block. This is meant to catch the final problem mentioned for the first Import process above. It will catch the IOException being thrown, as will be shown later. Before we move on, take note of the five string variables right after the using statement—they will be quite important in solving the static problem next.
Here is how I’m taking care of the static column problem. You can see that, instead of checking to see that the sheet is exactly as I want it to be, I’m looking dynamically through the entire header row for each of those column names. The order they’re in doesn’t matter; as long as they’re there the records will be properly imported (and if not a message will be returned to that effect). Afterwards I’m populating those string variables that I mentioned in the last paragraph. For each of those variables it looks to see where in the header it can be found, and then grabs the first char in the Address string. Remember, the Address property returns the location for that specific range—in this case the column letter and the row number. Since I don’t need the row number I only grab the first char of the string, which contains the column letter. Now I know exactly where each of the relevant columns for a PersonData record can be found in the spreadsheet. After finding out the final row (same as with the first method above), I can now create the records more dynamically.
Notice that I’m doing most of the same things for this method as with the first one. I’m saving the records individually, and I’m populating the PersonData record into the database in the exact same way. For this method, though, I’m utilizing the dynamic setting of the column string variables to assign the value from specific cells in each row to their corresponding property on the object. The order of the columns no longer matters, as the assignment is no longer tied to hard-coded values. The last thing that I want you to notice is what’s populated in the catch block there. If, for any reason at all, the record cannot be saved (like if, for example, someone put a non-decimal value in the Total Sales column, and it fails to be parsed), then a message is constructed, including the exception thrown, to be passed on later to the user (however way you want it to be done). Now there will be no more confusion as to what worked or what didn’t, and why they didn’t. Now let’s take a look at the rest of this Import process.
After the entire process has completed, the resultMessages collection gets populated with the total number of successes at the first position in the list—so that the user can immediately see how many were imported. Finally, there is the catch block mentioned earlier that catches the IOException thrown if the file is still open. In either case, the messages are then returned to the user to do with as they please. That’s all there is to it.
That’s how I’ve implemented a basic Import process. It is not, by any means, a finished product; you will, undoubtedly, find a number of areas that can be improved upon. Perhaps it would be nice to have a screen to allow the users to map Excel columns to the database columns? Or maybe use reflection to make this more of a base class for use with multiple data types? All of these are good ideas, and more. I just wanted to show you how to get started on your way.
I started this blog post series with the intent on showing you how to easily implement Import and Export functionality using this tool. I hope that, over the course of these posts that that point has been driven home. But I’m not quite done yet. In the middle of writing this blog post series I noticed that EPPlus recently (in April 2014) released a Beta of version 4.0 with additional tweaks and features. In my next blog post, I will delve a little bit more into these features, and show you which ones will affect your Import/Export functions. Until then, I wish you good coding.