YTread Logo
YTread Logo

Properly Convert PDF to Excel

Jun 03, 2021
Would you like a free tool to extract data from PDF to Excel? There are PDF-Excel

convert

ers that can perform the task. You can also use copy and paste if you don't have a lot of data. Now, Power Query has a new feature that allows you to import data from a PDF document, but does it really work? Is your important information updated when the content of the PDF document changes and you update your data? Let's find out. (upbeat music) Let's look at three different examples to see if we can successfully import our data from PDF to Excel.
properly convert pdf to excel
In the first example, I have this safety report and on the second page, I have the safety performance data. So let's say I get this from the security department. I need to take this table and put it in my Excel file so you can do more analysis on it. Well, my first attempt would be to just copy and paste this. I'm just going to highlight, press Control + C to copy, go to Excel and press Control + V. Now, this brings up the information, but it loses all the formatting. So if I had time I could organize this into a table again, but unfortunately I don't have time.
properly convert pdf to excel

More Interesting Facts About,

properly convert pdf to excel...

So this is what I'm going to do instead. Go to Data, Get Data, From File, From PDF. I want my security data, click Import. In the Navigator view, I can select which part of the PDF document I want to import. If it recognizes any table, it will show it to me as a table. At the bottom, under the tables, you will see the content of the page. That may be a table, but along with everything else that is on that page. Notice that on page two, where I have my table in page view here, I also see the page numbers.
properly convert pdf to excel
So in this case, since I just want the table information, I'll go to the table view and click Transform Data to make sure everything looks good. I have headers here. This one is missing. Let's also give this report a name. Everything else seems good. Let's send this to our Excel sheet. I wanted a table and I want it in the existing worksheet and I get my table in the proper structure. I can update the format of this. I can use this to create pivot tables or use formulas to do additional analysis. Now we also want to check if this is dynamic.
properly convert pdf to excel
So let's say I receive new information. That information is in another PDF document. It has the same name, it is called Security Data. But this time it also includes information from 2020. So I'm just going to drag and drop it to my drive. And I'm going to replace the file in the destination. Now back to our reports. Right click and refresh this to see if we pass the new information and our 2020 information appears automatically. So we have created a connection with our PDF document. And as long as the structure of our PDF document does not change, everything will remain dynamic.
In the second example, I have this report from Tesla's website, it's their Q2 2020 financial update. Now let's say I'm interested in taking this table, which is on page four, and I want to bring it into Excel. So here I can see your quarterly financial information. As a first try, let's try to copy this. So Control + C, go to Excel, let's do Control + V and see if it works this time. Unfortunately, it doesn't work. It is going to be very difficult to work with this data. So let's go to Data, Get Data, From File, From PDF. That's my PDF.
Let's import. This time, I get a lot of tables and a lot of pages. That's how many pages I have in my PDF file. Now, as long as you can work with the table view, this way you can avoid all the other information that is on the page. So the table I wanted was on page four. Now it doesn't look so good. The quarterly information is here, but it looks like my first column, the text column, was split into multiple columns. So let's take a look at whether we can easily transform this to get what we want.
It is recognized that these are numerical columns. They are then formatted as decimal type. These last two are in text format because they have a combination of numbers and text. So if I want, I can clean them up or get rid of them completely if I don't want them in my report or I can just keep them as they are. Now I'll leave the rest as is. But what I want to clean are these columns here. I want them to become a single column. So I'm going to highlight the first column. Hold down the Shift key, highlight this column five here.
Now let's just merge these, right click, Merge Columns. My delimiter should be a space. Call this $ in millions apart from the % values. This is something similar that they had in their report, and millions except percentages and per share data. Well, I'll leave it at that. Click OK. And now we have a column. When you use transform to merge columns, you get these extra spaces in your merged column. Now, you won't get that if we add a column and merge them together, but that's okay. What I can do is just crop this. We right click, Transform, Crop and get rid of these extra spaces.
What I'm also going to do is get rid of these white spaces. So let's go ahead and filter these. Now, here's something interesting. Notice the negative numbers here. Let's take a look at our PDF report. Negative numbers are in parentheses. Then Power Query recognized them as negative numbers instead of putting them as text, which is great because it saves us a lot of extra steps. My report is done. Let's give this query a name and send it to the workbook. Close and load. And we have successfully brought the table from the PDF document to our Excel file.
In the third example, I want to do something different. I want to take the table of contents from a manual. Now, this manual is over 200 pages long and the table of contents starts on page three and goes to page five. This is the part of the manual that I want to take and put into Excel. So let's see how we can do that. Let's go to our Excel, go back to the Data tab, Get Data, From File, From PDF. That's my manual, let's import it. Now, the larger the PDF file, the longer it will take to parse it, but check this out.
Even though it's over 200 pages, it has a lot of tables, it doesn't take that long. It has already been evaluated where it has tables and below we can see the content of all the pages. My index was on page three. Then that is recognized as a table. Then page four and page five was the rest. These are the three I want to import. Now I have the option to select multiple items, but when I do it this way, a separate query will be created for each of them. And then I can add them. Now, instead of doing it this way, I'm going to select the folder.
Right click your mouse and transform the data. This way I can add with one click. What I want are tables. So this was what I wanted. Then it was this and this. Well for these is the table. So let's add a filter and take a look at our tables. Now I have my index here. These are in the first three rows. So let's keep the top rows and put a three here and OK. Now, when I click here next to it, I can see the content of each of these. What I no longer need are these three.
I'm going to highlight and delete and now expand my data. Let's click OK. And I get my table of contents attached. Now I can clean this the way I want. Let's say I don't need these null values ​​here where I have the section number and the section name. Then I can filter them. What I can also do is create a separate column for my numbering here. The delimiter to use for this can be this one here. But notice that I have it several times for some of the topics. This means I have to be careful and not apply this delimiter every time.
So, let's select the column Split column, by delimiter. I wanted to customize it and I copied it before, so I'm going to paste it. I'm also going to add a space before and after, because that's what I have in each field. So here is the change I need to make. I don't want every occurrence of the delimiter. I want the leftmost delimiter and click OK. Now I have my numbers separately. Let's give this a proper name. This is the topic and these are my page numbers. Let's also update the format. So this should be an integer.
This is text and that is a decimal number. Everything else seems good. Let's send this to Excel. I wanted it as a table, let's put it in A1 and ok. It's loading everything right here. And I have my index along with the page numbers. These are the different ways you can import data from your PDF documents to Excel. I hope you enjoyed this video. If you did, don't forget to give it the thumbs up. And if you're not subscribed to this channel, consider subscribing before you leave. And I'll see you in the next video. (upbeat music)

If you have any copyright issue, please Contact