[If you're looking for a step-by-step explanation of how to extract data PDF to Excel using VBA, please refer to this post. In there, I explain how you can use VBA to implement the 3 PDF to Excel conversion methods I discuss below, and provide macro code examples.]
[Some of the links in this Excel Tutorial are affiliate links, which means that if you choose to make a purchase, I will earn a commission. This commission comes at no additional cost to you.]
As explained by John Walkenbach in the Excel 2016 Bible:
Before you can do anything with data, you must get it into a worksheet.
The most recent versions of Microsoft Office have several features that allow you to, among others, import data into Excel from the following sources:
- The most common text file formats, such as comma-separated values (.csv) and text (.txt) files.
- A Microsoft Access database.
- A webpage.
In some of these cases, the tools you require to do the job are readily available in Excel's Get External Data group of commands within the Data tab.
However, the Get External Data group of commands isn't the topic of this blog post.
When the particular file format you're using isn't supported by Excel, importing data may be slightly more complicated.
As a general matter, I have no problems with these limitations. You probably won't encounter such problems very frequently as long as you don't work with obscure file formats that aren't supported by Excel. There is, however, one big exception to this rule.
This is a particular file format whose data is not that easy to bring into Excel despite being very popular and widely used:
Portable Document Format (.pdf) files are one of the most widely used file formats for electronic documents. makeuseof.com states that PDF is “the single most popular document formats outside of Office”.
If you're anything like me, you probably encounter PDF files at work (almost) every day. In some of those cases, you may need to analyze the data within a particular PDF file with Excel.
If you're in such a situation, you may ask:
How can you convert a PDF file into an Excel worksheet?
You may have also noticed that successfully converting a PDF file into an Excel worksheet is hard. Doing this requires knowledge of both Excel and PDF.
At the same time, the ability to accurately and quickly convert PDF files to Excel is very valuable. According to the Udemy blog post I link to above:
Once you understand the process of converting PDF to Excel and have learned more about what type of data analysis you can do with Excel, you will likely start to see all kinds of possibilities, both personally and professionally.
My purpose with this blog post is to help you easily convert PDF files to Excel worksheets. Among other things, I explain 3 different methods you can use to convert a PDF file to Excel and some criteria you can use to determine which method to use. The methods are organized from the simplest (which also returns the less precise results) to the most advanced (usually providing the most accurate conversions).
You can use the following table of contents to navigate to the relevant section where I explain the method that you're interested in.
If you're interested in the opposite process (converting Excel files to PDF), I provide a thorough explanation of the topic plus 10 examples of VBA code in this blog post.
Before I explain each of the methods that you can use to convert a PDF file to Excel, let's start by taking a look at:
What Are PDF Files
The acronym PDF stands for Portable Document Format.
In very broad terms the PDF format is a digital format that you can use to represent electronic documents. One of the main appeals of the PDF file format is that the document representation is independent of any of the following:
- Operating System.
The reason for this is that the PDF file itself carries the complete description of the document layout and all the information that is necessary to correctly display the electronic document. Therefore, PDF documents allow you (and other users) to:
- Exchange/Share/View documents;
- In an easy/reliable/accurate way;
- Independent of the environment/software/hardware used to create/view/print the document;
- While preserving the content/formatting/appearance of the document.
In other words: When you use the PDF format to represent a document, the formatting is preserved regardless of the software, hardware or operating system used when the file is opened later.
A further advantage of the PDF file format is that PDF files are compact. Prepressure explains how PDF keeps file sizes to “an absolute minimum” by using:
- Sophisticated compression algorithms; and
- A “clever” file structure.
Considering the above, it isn't that difficult to see why the PDF file format is so widely used. As explained by Microsoft, the PDF format:
- Retains the intended document formatting; and
- Enables sharing.
This explains why my cheat sheet with keyboard shortcuts for Excel (which you can get here) is saved (and shared) as a PDF file. The PDF format enables me to set (i) a particular formatting for the document, and (ii) share it with you. Later, once you open the document, you'll see the list of keyboard shortcuts in the format that I originally intended.
Microsoft (in the webpage I link to above) does mention an additional important characteristic of PDF files that, in the end, is what gives rise to the topic of this blog post:
Data within a PDF file can't be easily changed.
Depending on your perspective, you may consider this to be an advantage or a disadvantage. More precisely:
- If your main purpose is to prevent (or at least make difficult) the modification of a particular document, you may be happy that the data within a PDF file can't be easily changed.
- If you need to work with, and manipulate, the data within a PDF file, you're probably annoyed by how difficult is it to edit a PDF document.
Most Excel users, you and me included, found ourselves in the second camp most of the time. We need to work with the data within the PDF file. Therefore, we usually want to have the ability to convert a PDF file to Excel.
I assume that you also want to have the ability to convert PDF files to Excel, so let's take a look at some of the most popular methods to bring data from PDF files into Excel:
Method #1 To Convert PDF Files To Excel: Copy And Paste
The most basic method of bringing data from a PDF file into Excel is to simply copy and paste.
As explained by Excel authority John Walkenbach in the Excel 2016 Bible, you have a good chance of being able to paste data into an Excel workbook if you're able to copy the data from another application. Since some (but not all) PDF files allow you to copy data, there are cases in which you may be able to bring all the data you require into Excel by using the basic commands of copy and paste.
Let's take a look at a practical example of how you can copy and paste data from certain PDF files into Excel:
The following screenshot shows a table within a pdf document. More precisely, you can find this table in page 22 of the Working Paper from the European Central Bank titled Inflation forecasts: Are market-based and survey-based measures informative? by Magdalena Grothe and Aidan Meyler.
Throughout this blog post, I show the results obtained when applying each of the different methods to convert this table from PDF to Excel. This Convert PDF to Excel Tutorial is accompanied by an Excel workbook containing these results. You can get immediate free access to this example workbook by subscribing to the Power Spreadsheets Newsletter.
Step #1: Select And Copy The Data
The first step to copy data from a PDF file is to, simply, select the relevant data and copy it. You can generally use the “Ctrl + C” keyboard shortcut for purposes of copying data.
Step #2: Paste The Data Into Excel
Once you've copied the relevant data from the PDF file, and this is available on the Clipboard, you need to go to Excel and paste it.
Even though this sounds easy, in practice it doesn't work that smoothly. In fact, this step highlights some of the main limitations of this method of converting PDF files to Excel.
In the Excel 2016 Bible, John Walkenbach suggests using the Paste Special command and trying some of the different options that appear. You can access the Paste Special dialog box by:
- #1: Clicking on the drop-down section of the Paste split button in the Home tab of the Ribbon; and
- #2: Selecting “Paste Special…”.
You can also open the Paste Special dialog box by using the keyboard shortcut “Ctrl + Alt + V”.
In the Paste Special dialog box, you can choose from several options. The following image shows how the Paste Special dialog box looks like when I paste the data from the PDF table that appears above:
For this example, I select the option to paste as Text and click on the OK button on the lower-right corner of the dialog box.
The following screenshot shows the pasted data in the Excel worksheet:
In most cases this isn't precisely the result you want. However, you'll rarely be able to get better results when using this method. As explained by Excel authorities Bill Jelen (Mr. Excel) and Szilvia Juhasz in XL: The 40 Greatest Excel Tips of All Time:
If you open the PDF in Acrobat Reader, copy the data, and paste to Excel, it will unwind into a single column.
This is precisely what happens in the example above. And you can generally expect this to happen whenever following this method.
As a consequence of the above, you'll usually need to complete the process with the following…
Step #3: Cleanup The Data
Even though pasted data in Excel generally requires some cleanup, you have a variety of tools you can use to make the cleanup easier, faster and more precise.
Since this blog post isn't about data cleanup, I don't go into any specific methods. However, some of the tools and features that you may find helpful (depending on the particular situation) are the following:
- Text functions and formulas.
- The Remove Duplicates command.
- Get & Transform / Power Query.
- The Text to Columns command.
- Flash Fill.
I may cover some of these topics in future tutorials within Power Spreadsheets. If you want to receive an email when I publish new material in Power Spreadsheets, please make sure to register to our Newsletter by entering your email address below:
Overall, this first method of converting a PDF file to Excel leaves some things to be desired. The following are, in my opinion, its 2 biggest drawbacks:
- Limitation #1: The method only works when you're able to copy the data from the PDF file. As you may have experienced, there are times where you are not able to copy data from a PDF file.
- Limitation #2: Any data that you paste into Excel using this method generally unwinds into a single column and requires cleanup.
In other words, you may still have to do a substantial amount of work in order to get the data from the PDF file into a form that is ready for analysis.
There are some situations in which you may not have any other option for purposes of converting a PDF file to Excel. This is the case if, for example, you don't have access to any of the tools that are required to apply the other methods that I explain below.
In any case, as long as you have access to a recent version of Microsoft Word or to Word Online, the following method may help you achieve better results when converting a PDF file to Excel.
Method #2 To Convert PDF Files To Excel: Use Microsoft Word
In order to make use of this method, you need access to one of the following:
- One of the most recent versions of Microsoft Word (2013 or later).
- Access to Microsoft Word online and OneDrive.
In broad terms, the logic behind converting a PDF file to Excel is the same regardless of which of the above versions of Word you use. You, basically, follow these 2 simple steps:
- Step #1: Open the relevant PDF file using Microsoft Word.
- Step #2: Copy the relevant content from the Microsoft Word file and paste it into Excel.
However, let's take a more detailed look at each of these methods to convert a PDF File to Excel using Word. In both cases, I use the same sample table as above, which you can find in page 22 of the Working Paper from the European Central Bank titled Inflation forecasts: Are market-based and survey-based measures informative?.
Convert PDF File To Excel Using A Recent Version Of Microsoft Word
Let's start by taking a look at how you can use a recent version of Microsoft Word to convert a PDF file to Excel:
Step #1: Open The PDF File
You can open the PDF file you want to convert using any of several methods, including the following 2:
- Method #1: On the Windows File Explorer: (i) Right-click on the PDF file to expand the right-click menu, (ii) select “Open with”, and (iii) click on Word. In the case of the screenshot below, I open the file using Word 2016 on Windows 10.
- Method #2: Follow these 3 easy steps:
Step #1: Within Word, click on the File tab of the Ribbon to get to the Backstage View.
Step #2: Select Open from the pane on the left side of the screen and click on Browse.
Step #3: Once Word displays the Open dialog box, (i) navigate to the folder where the PDF file is stored, (ii) select it, and (iii) click on the Open button on the lower right corner of the dialog.
If you prefer using keyboard shortcuts to get to the Open dialog box, you can replace steps #1 and #2 above with a keyboard shortcut such as “Ctrl + F12” or “Alt + F + O + O”.
After you've asked Word to open the file, a dialog box (such as the one below) is displayed. This dialog box informs you about the following:
- The PDF file will be converted to an editable Word document.
This is perhaps the main key for the whole process of converting a PDF file to Excel using this method.
- The conversion may take a while.
This depends on different factors, such as the size of the file you're converting and the amount of graphics within the file.
- The resulting Word document is optimized to allow text edition. As a consequence of this, the converted Word file will likely look different from the source PDF. This is very likely, in particular, if the file you're converting has many graphics.
When Word displays this dialog box, click on the OK button.
As mentioned above, the conversion may take a while, so you may have to wait a little before proceeding to the next step.
Step #2: If Necessary, Enable Editing Of The File
Depending on the source of the PDF file, Word may open it in Protected View.
To exit Protected View, follow these 2 easy steps:
- Step #1: Click on the Enable Editing button that appears on the Message Bar.
- Step #2: Word usually displays (as in step #1 above) a message box informing you that Word will convert the PDF file to an editable Word document.
When this dialog box appears, simply click the OK button again and wait until Word completes the conversion process.
Step #3: Copy The Relevant Sections Of The Editable Word Document
Once you've completed step #1 and (if necessary) step #2 above, Word displays the original PDF file as an editable Word document.
When you have the editable Word document, select the section that you want to take into Excel. In the example below, I select the same table as in the previous method:
Once you've selected the relevant information in the editable Word document, copy it. For these purposes, you can use any of the following methods:
- Method #1: Press the right button of the mouse and select “Copy” from the contextual menu.
- Method #2: Click on the Copy button in the Home tab of the Ribbon.
- Method #3: Use a keyboard shortcut such as “Ctrl + C”.
Step #4: Paste In Excel
By now, you have the information you need in an editable format. Therefore, you can go to Excel and paste it by using, among others, any of the following methods:
- Method #1: Press the Paste button in the Home tab of the Ribbon.
- Method #2: Use the “Ctrl + V” keyboard shortcut.
In the example above, the resulting table (once pasted in Excel) looks as follows:
In some cases, including this example, the conversion from PDF to Excel isn't perfect.
Notice, for example, how Word hasn't been able to convert the values in the first section of the table (Mean error) to an editable form. In this case, those values are pasted as an image. Therefore, you must use another method (including those described in this Excel tutorial) to bring them into Excel before being able to work with them.
You won't encounter these shortcomings always. In some cases, this method works just fine.
However, as explained by Microsoft, the feature Word uses to convert PDF files to Word documents (called PDF Reflow) “works best with files that are mostly text” and doesn't handle elements such as tables with cell spacing very well.
If you encounter problems while using Word to convert a PDF file to Excel, you can always try one of the other methods I describe in this blog post.
Convert PDF File To Excel Using Microsoft Word Online
Even though the process for converting a PDF file to Excel using Word Online is substantially similar to that I followed when using the desktop version of Word, there are a few small differences. Let's take a look at the 5 easy steps you can use to convert a PDF file to Excel using Word Online.
Step #1: Upload The PDF File To OneDrive
Go to OneDrive and upload the PDF file you want to convert to Excel using either of the following methods:
- Method #1: Dragging the relevant file to the OneDrive window in your browser.
- Method #2: Saving the PDF file in the appropriate folder using the OneDrive app for your computer.
Step #2: Open The PDF File Using Word Online
To open the PDF file using Word online, go to the OneDrive window in your browser, right click on the file and select “Open in Word Online” from the contextual menu.
Step #3: Make The PDF File Editable
Once Word online has opened the PDF file, convert it into an editable document by clicking on the Edit in Word button in the upper part of the screen.
Word online displays a dialog box informing you that it will make a copy of the PDF file and convert it into an editable Word document. Confirm by clicking on the Convert button on the lower part of the dialog box.
Once Word online has finished the conversion, it displays another dialog box informing you changes in the layout of the PDF file may have occurred. Click on the Edit button on the lower right corner of the dialog box.
Step #4: Copy The Section Of The Document You Want To Take To Excel
Once Word online has converted the PDF file to an editable document, the screen looks roughly as follows. The actual document will (most likely) be different in your case.
Go to the section of the editable document that you want to bring into Excel, and select it. In the screenshot shown below, I select the same table I use for the previous examples within this Excel tutorial:
Once you've selected what you want, copy it by using the keyboard shortcut “Ctrl + C”.
If you have problems when trying to copy from Word online, you can use the desktop version of Microsoft Word in your computer to continue with the process. To do this, click on the Open In Word button that appears to the right of the Ribbon tabs.
If you choose to open the editable file in Microsoft Word, the desktop version of Word is launched. You can then follow the steps I describe in the previous section to copy the relevant data.
Step #5: Paste The Data In Excel
Once you've copied the data, go back to Excel and paste it by using (among others) one of the following methods:
- Method #1: Click on the Paste button.
- Method #2: Use the “Ctrl + V” keyboard shortcut.
The following screenshot shows the resulting table in Excel (after I've adjusted the column width):
Not surprisingly, the results are almost identical as those obtained by opening the PDF file with a recent version of Microsoft Word (explained above).
Just as when opening the PDF file with Microsoft Word, you'll notice that the results aren't always perfect (although in some cases they will be). In the case of the example displayed above, for example, all the Mean errors (first section of the table) are pasted as an image.
In order to be able to manipulate those values in Excel, you'll need to bring them using another method (such as the other ones explained in this blog post) or type them directly in Excel.
Method #3 To Convert PDF Files To Excel: Use A PDF Converter
If you have to constantly convert PDF files to Excel or want to avoid the shortcomings of the other 2 methods described above, it may be a good idea to use a PDF converter.
There are several PDF converters in the market. I use Able2Extract.
Even if you don't end up using Able2Extract, you may want to steer clear of online PDF conversion services when converting sensitive or confidential PDF files. As explained at makeuseof:
(…) if you have to convert anything of a sensitive nature, be that personal or business related, you really shouldn’t upload anything to a third-party site.
Additionally, the results obtained with online converters aren't always satisfactory.
As a consequence of the above, I show you how to convert a PDF file to Excel using Able2Extract. As when explaining the other methods above, I use the table within the European Central Bank Working Paper titled Inflation forecasts: Are market-based and survey-based measures informative? as an example.
Let's take a look at how you can convert a PDF to Excel in 6 easy steps when using Able2Extract:
Step #1: Display The Open Dialog Box
In order to get Able2Extract to display the Open dialog box, click on the Open button on the top left corner of the screen or use the “Ctrl + O” keyboard shortcut.
Step #2: Open The File You Want To Convert
Once Able2Extract displays the Open dialog box, use it to browse to the folder where the file you want to convert is located.
Once you've located the PDF file to be converted, select it and click the Open button on the lower right corner of the Open dialog box.
Step #3: Select The Data You Want To Convert
Able2Extract opens the PDF file you want to convert. It also explains how you can select the data you want to convert.
As explained by Able2Extract, you can select data using any of the following methods:
Method #1 To Select Data With Able2Extract
Click on the Select All icon on the toolbar.
Method #2 To Select Data With Able2Extract
Go to the Edit menu and select any of the following options or use the appropriate keyboard shortcut:
Let's take a look at each of these options separately:
Option #1: Select Page Range…
Select Page Range (keyboard shortcut “Ctrl + R”) allows you to select a particular range of pages, without actually selecting all of the content of the PDF file.
Able2Extract displays the Select Page Range dialog box after you've clicked on “Select Page Range…” in the Edit menu or used the “Ctrl + R” keyboard shortcut.
You determine the pages to be converter by typing the relevant range and clicking on the OK button on the lower section of the Select Page Range dialog box. For example, to convert pages 21 and 22, you'd enter “21-22” and click “OK”.
Option #2: Select All Pages.
The Select All Pages option (keyboard shortcut “Ctrl + A”) allows you to select all of the pages of the PDF document.
Option #3: Select All on Page.
Select All on Page (keyboard shortcut “Ctrl + B”) selects all the data in the current page of the PDF file.
Option #4: Select Area.
The option to Select Area (keyboard shortcut “Ctrl + *”) allows you to use the mouse to select a particular section of the PDF file to convert.
For example, I can use this option to select the table in page 22 of the European Central Bank Working Paper that I use as an example throughout this blog post.
Method #3 To Select Data With Able2Extract
The third way of selecting data with Able2Extract is very similar to using the Select Area option in the Edit menu. Simply use the mouse to select the portion of the PDF document that you want to convert.
Step #4: Select Excel As Output File Type
Once you have selected the data you want to convert from PDF to Excel using any of the methods explained above, click on the Excel button on the toolbar or use the keyboard shortcut “Ctrl + E” to select Excel as the output file type for the conversion.
Step #5: Click On Convert
Once you've clicked on the Excel button of the toolbar, Able2Extract provides you 2 options regarding the way in which you want the conversion to occur:
- Option #1: Automatic.
This is the default option, and is also the recommended choice for most PDF to Excel conversions. If you choose this option, Able2Extract determines the positioning of the columns automatically.
To choose Automatic conversion, click on the Convert button that appears on the lower left section of the Convert to Excel dialog box.
- Option #2: Custom.
In the special cases where the Automatic conversion doesn't work properly (for example, the resulting Excel table isn't properly aligned), you can use the Custom conversion option to specify the column structure. This allows you to designate the column structure before Able2Extract carries out the actual conversion into Excel.
To use the Custom conversion option, click on the Define button on the lower middle section of the Convert to Excel dialog.
For this particular example, I choose Automatic conversion. I may explain how to use the Custom conversion option in a future blog post.
Step #6: Save The Excel Spreadsheet
After you click on the Convert button to use Automatic conversion, Able2Extract displays the Save As dialog box.
Use this dialog to select the location and filename of the converted Excel file, and click on the Save button on the lower right corner to confirm your choice. Notice how the Save As dialog box is saving the resulting file as an Excel Spreadsheet.
Once you click on “Save”, Able2Extract converts the selected section(s) of the PDF file into Excel, and launches Excel.
The results I obtain when converting the sample table are shown in the screenshot below. Notice how, among others, Able2Extract was able (i) to replicate the table structure and (ii) extract all of the significant values from the source PDF document.
There's still some cleaning up work to be done. Notice, for example, how negative numbers have been extracted as text (I highlight one such value below).
These are, however, small issues that are relatively easy to fix. As mentioned by Mr. Excel at CFO:
(…) once I had the data in a table in Excel, it is easy enough to fix those issues.
For example, the negative numbers that are stored as text can easily be converted into actual numbers by using the VALUE function and, if necessary, Excel's text functions.
If you're interested in using Able2Extract to convert PDF documents to Excel files, you can download it, and get a 7-day free trial, here.
How To Convert PDF Files To Excel: Which Method To Use
In this blog post, you have seen 3 different methods to convert a PDF file to Excel:
- Method #1: Copy and paste.
- Method #2: Use Microsoft Word.
- Method #3: Use a PDF converter.
You may be wondering which of the 3 methods that I explain in this blog post should you use when converting PDF files to Excel.
Each of the 3 different methods has different advantages and disadvantages.
As a general matter, the results obtained when copying and pasting data from a PDF file to Excel (method #1) are (in my opinion) not particularly good. Therefore, in most situations, you're likely to be better off using Microsoft Word (method #2) or a PDF converter (method #3).
For purposes of choosing between Microsoft Word (method #2) and a PDF converter (method #3) for purposes of converting PDF files to Excel, I suggest you consider the specific situation you're in and, particularly:
- The length and complexity of the PDF data you want to convert to Excel.
If you constantly convert lengthy or complex documents from PDF to Excel, you may want to consider using a PDF converter such as Able2Extract.
- How often (or how many times) do you need to convert PDF files to Excel.
If you find yourself constantly carrying out the process of converting a PDF file to Excel, a PDF converter (such as Able2Extract) may come in handy.
An additional factor to consider is that, as shown in the examples above, a good PDF converter (like Able2Extract) is less prone to introducing errors in your data. Some of the manual conversion methods may introduce errors in your data and you will need to carry a more thorough re-check to confirm the accuracy of the conversion.
The following statement from Microsoft MVP Bill Jelen provides a good summary of these criteria for choosing between Microsoft Word or a PDF converter for your file conversion needs:
If you have a one-page table, the PDF-to-Word-to-Excel solution will work suitably well. If you have a several-page document with many different tables or repeating headers, then going to a third-party solution such as Able2Extract makes sense.
This Convert PDF to Excel Tutorial is accompanied by an Excel workbook containing the results I obtain when using each of the methods to convert PDF files into Excel I cover above. You can get immediate free access to this example workbook by subscribing to the Power Spreadsheets Newsletter. This workbook contains 4 different worksheets, each of them shows the results of each of the conversion methods I explain above.
After reading this blog post you have a good knowledge about 3 of the most popular and common methods to convert PDF files to Excel:
- Copy and paste.
- Use Microsoft Word.
- Use a PDF converter, such as Able2Extract.
You've also seen some criteria that can help you decide which method is the right one for you and learned more about the relationship between PDF and Excel files.
This knowledge will help you to convert PDF files to Excel worksheets quickly and easily. You're also likely to start seeing new possibilities and opportunities for analyzing data or carrying other analysis that you didn't do before because, for example, the source data was stored in PDF format.
Books Referenced In This Excel Tutorial
- Jelen, Bill and Juhasz, Szilvia (2015). MrExcel XL: 40 Greatest Excel Tips of all Time. Uniontown, OH: Holy Macro! Books.
- Walkenbach, John (2015). Microsoft Excel 2016 Bible. Indianapolis, IN: John Wiley & Sons Inc.