Watch the video
Full Transcript of video below - lightly edited for clarity
Hello everyone. My name is Ben. Today's talk is about tackling the mighty PDF. So we'll start with a little bit of context about PDF and then we'll get into some code stuff. We’ll talk about the challenges of working with PDFs, some of the challenges that we faced and then I'll demo some things that hopefully make working with PDFs a lot easier.
So co-founder CTO of Anvil, little bit of background, I have been a developer for a long time. Previous to Anvil, I worked at GitHub on the Atom editor and before that I founded a design tool startup.
The PDF kind of needs no introduction, but I want to give like a little bit of context before we dive into the code things. So you've done this before, you've got an email from somebody with a PDF that's asking for all of your most personal information. So you fill it all out and you mail it back, and you aren’t sure exactly where your data is going to end up. I know, not a lot of people like doing this, I sure don't but it's super common.
Most email attachments are PDFs, like most non cat picture email attachments are PDFs, and they're everywhere! If you had to guess how many PDFs were out there, what would you guess? Would you say like a billion? Ha ha. Would that be exaggerating? Turns out that there's two and a half trillion PDFs. That's not necessarily unique PDFs, but it's a lot of PDFs.
There's 20 billion PDFs just in Dropbox.
So there are some really good uses of PDFs aside from forms.
Resumes, ebooks, we submitted these talk slides by PDF, academic papers and one of my favorite uses that I just learned about is fashion specs. So all of these have a couple things in common, what do they have in common? They are all for consumption and sharing, PDFs are really good for things that you don't need to edit, you're not editing the PDF. You're not filling something out in the PDF.
All you're doing is you're reading them. So I think we can say that PDFs are really good for screens and for printers and this makes a lot of sense, because this is what PDFs were originally built for, they were built to display information exactly the same across all devices. Also be shareable.
So that way you can send your underwear PDF to the underwear factory and get them made exactly like you want them to, be because they don't worry about them having fonts or not. It'll look exactly the same as the way that you intended.
But the most common way that people have interacted with PDFs, certainly for me, is by filling out PDF forms. PDF forms are really annoying for a couple of reasons. Yes, one is that there's a lot of repetition so you have to fill out your name six times or fill out several PDFs with the exact same information. There's also a lot of PDFs that have logic, so you have to read and understand part of the PDF, and then make decisions.
So this is a 401k tax form. If you want to take money out of your 401k, there are complications, because it depends on your situation. So this is obviously complicated right? It turns out that humans are really bad at this.
They're really bad at filling out PDFs. A professional data entry person's error rate is 4% but I know that my error rate is a lot higher, unless I'm really really paying attention, and we've heard from our customers that their customers make a lot of mistakes.
Yes, so it is really good for screens and for printers, but maybe not so great for humans because filling out these forms is interesting.
Okay. So what would be better than having humans do it? Computers are obviously a lot better than having a human fill it out. Computers are really good at logic and repetition. So it makes sense that they could think and sort of make these decisions we can ask the user. What's your name, and your birth date, and your address, and then we can basically with a couple of pieces of information fill out all of those check boxes and fill out all the right fields. This is what we want.
We don't want people to be interacting with these forms. How do we abstract away of the PDF? We can't get rid of the PDF because there's two and a half trillion of them. And behavior change is really hard. But I think that we can make it so that no human has to deal with PDFs.
All right, so I guess the question is how do we interact with PDFs as developers? Do we go and read the spec? Probably not. I don't think anyones gonna do that. I certainly haven't read all of the PDF spec or the HTML spec and I write a fair amount of HTML. It's huge. It's complicated. What do we do? We are going to find a library on the internet. When we were starting Anvil, this is exactly what we did, we went to the internet and we were like, “PDF filling libraries”. What we learned is that developer attention to the technology is important. Some technologies have a lot of attention from developers, take web technologies.
It was created sort for the same reasons as PDF for display and consumption and that kind of thing, for sharing academic papers. But since then it's had like insane developer adoption. There's literally millions of developers building things. Like for the web, on the web, they’re building web libraries, there's people complaining on issues, there's people fixing those bugs. And what all of this collaboration results in is a bunch of really well made, maintained, high level libraries. They have good doc's, you know, they're updated all the time. So in this world, you're never interacting with the DOM APIs, you're actually interacting with react, sass and typescript and all of these cool technologies that people have built on top of web tech. So as a developer, I’ve become super accustomed to this kind of thing. I’ve got a problem. I can go to GitHub, I can find a library that does mostly what I want.
But PDF Tech isn't really there yet compared to web development. there's very few developers looking at it, so like every human has to deal with PDFs, but there aren't so many developers looking at this.
All right, there are libraries though and they generally fall into two categories. They're complete or incomplete. So the complete libraries can do everything that you would maybe want to do with a PDF. You can read them, you can write them, you can update them. On the incomplete side, they usually just do like one tiny thing, and often these incomplete libraries aren't well-maintained, haven’t been touched in five years, you can also run headless Chrome to generate PDFs, but you can't update a PDF with Chrome
So the complete side seems really good. Right? This is the route that we went down. But like the DOM APIs they're pretty low level. So you have to know a lot about the PDF and how its put together to read, edit, create them. So if you're saying you want to do something simple, something seemingly simple like creating an invoice. That can lead to like a lot of overhead to set a whole thing up to create this invoice.
Okay. So now I'm kind of into the code part of things, if we are working with low-level apis. If we are working with low level APIs and we have to know about these PDF details. What are some of the challenges? So I think you see some of the challenges that we face when we are building out our PDF service. And so these are some of the challenges. So number one is layout, if you want to create a new PDF or you want to add something to a PDF you have to deal with layout. Everything in a PDF is absolutely positioned, fixed sizing and there's no text wrapping. So there's not a lot of niceties for a content creator. This logic that would wrap things and whatever, is like pushed up to these from the application layer. So coming from HTML this might be a little weird. HTML is declarative. So it's like “here's what we want to display” and the browser will make decisions based on well, how wide is this screen? Like let me shrink it down and wrap stuff. You can also give the browser guidelines with CSS by saying “OK, I want to use flexbox and I want this column to be this wide”, but the browser might ignore you, the browser is making these decisions on how to display things.
So the PDF is opposite that where you are specifying exactly how to draw a document. There's a bunch of little drawing commands in here saying okay. I want to start text. I want to pick this font. I want to move the cursor here and then I'm going to paint the text, you know from that cursor position. So HTML will automatically wrap stuff, but if we want to do that, we have to split up the lines manually. So here's an example where we did that we measure the Box bounding box. And then you measure each character and then you figure out how many words will fit into that and you break it in the lines and then use the paint all the lines yourself. So this is a challenge for us. There are some libraries you can put on top of the libraries, but it was a challenge.
The second thing I want to talk about is fonts. It seems like it should be really simple but with PDFs they're not, and I think one of the reasons is because PDFs are supposed to be self-contained and they weren't really built to be edited. So there's no fallback fonts, the font glyphs like the specific characters to render that in that font is embedded inside of the PDF. You can't mix fonts in the text block. You can't say like here's a sentence bold. This word. You have to draw those things independently. So it is HTML going back to that. This is a lot simpler. Right? Like you just specify a whole bunch of fonts and the browser will figure it out. And if we use a character that our font doesn't support the browser will find a font. That does have that character, and display it. So it's Magic!
Fonts in a PDF are a little bit more complicated. So there's a lot going on in this slide. First of all I picked this on Google called Noto and it doesn't have the “shi” character. So I can't use it. It won't render in the PDF. The second thing is that I'm drawing the regular and the Bold word independently, and on the right there, we're embedding all of the glyphs for those specific words. So from the regular we're getting the hclo and then from the bold, we're getting over the world characters. So if we were to say use a lowercase H, then it wouldn't display because it's PDF doesn't have that font. And so if you're generating a new PDF, you have to be really careful about what characters come in because it has to be in the font. So that was a challenge. The next thing is rotations. Look at this you like that seems fun right seems fine. But then you go to print it and it's rotated.Okay cool. So this happens a lot with scanned documents. They'll scan them upside down and then something in this camera software will set a rotation on the pages to be 180. So it looks to the user like it's it's not rotated but under the hood it is.
And so this is hard because if you want to read the coordinates of something or you want to add something back into the PDF, you need to sort of back out all of these rotations and figure out like what exactly is happening so that you can orient your thing that way, so it looks right to user. So like you have to know if there's partitions of the page there's this thing called a Content stream. It can be rotated, there could be rotations on form fields. So this is a challenge to serve work out like okay, we're going to add this thing into a Content stream. How's the page? How’s the content stream and we have to sort of orient our coordinate system to manage that. Okay, so that's just a couple of the challenges that we ran into. There's a whole bunch of other things that are hard as well, like dealing with forms can be complicated depending on how you want to interact with them. Signatures are like their own talk. They're crazytown! And then extracting content can also be hard because of the absolute positioning. So the last line on a page could actually be first in the file. So extracting content can be difficult. There’s a whole bunch of other things too that can be hard.
Alright. So we've established that previous are good for screens and printers, not so great for humans because it's cumbersome and error prone to fill out PDFs, fill out forms. It's also pretty hard for developers to set up and maintain something, even for simple use cases. So like, could we fix this? If a computer fills out a PDF then the filled PDF is just a consumable, it's no different than say an e-book or resume or something like that, just a piece of information that you can reference later. So then it becomes good for humans. So the last piece of this then is trying to make it so that Developers can fill out all these PDFs so that humans don't have to deal with it. And that's what we've been working on. We've been working on, and we released recently, this PDF API, that does e-signatures, PDF filling, and PDF generation with Markdown right now.
So our goal is to make these things super easy to use.
Okay, now I'll get into the demo.
Okay. I hope everyone can see that. So I'm going to take you through the three different things that we do so we have a generated PDF, what we're going to do is going to generate a brand new PDF. Generate this invoice. And so we are able to send a JSON payload to a single endpoint and it will respond with the PDF bytes. So We're going to be using our node client, which is just for authentication helpers and then also some helpers for actually calling the endpoints. I'm going to build up our data here. And then all we're going to do is create the client and send, and then generate the PDF with our data. So each one of these is like each of these objects is a section on the PDF, and they’re separated by a little bit of padding here. They support markdown and then I have some helpers for creating tables, because it's a common use case. So this thing's already generated. So I'll show you a little bit. We'll add some more stuff here. We'll add will bold. If some we will have blood this thing. Let's make this left aligned, alright sweet. So, then we generate it and then magically works and so we just automatically get markdown support. Here and table support and all that. So that's it for generating APIs.
Okay, so moving on to the filling APIs. It's similar to the generate API except that we have a template. So we have an existing template that we have already created, uploaded into our system, and the boxes were found when we uploaded it. And so each one of these boxes has an ID on it, so we are able to send data to each one of these IDs and it will fill out the template. So we are going to generate this W4. Okay, so this thing works really similar to the... first I’ll generate it, how about that.
So we got it. We filled it out. Awesome. It works really similar to the generation endpoint where we just have a payload of data and then we create the client and we send the payload with the template ID to the system. And in response with the PDF bytes, we save it to disk and that's it. So it's pretty simple to add new things. So we'll go over here and we'll add a box here, and we’ll call this “things”. Now that we've done that we have a new “things”, it supports this new box, so we can go in here and this anywhere, we generated it. Okay, so then it worked.
Cool. So that's the The Filling piece.
And so the last thing that I want to talk about is the signature packet. So this is a little bit more complicated than the last two because there's just a lot more going on. So this example what we're going to do is we're going to create a packet for onboarding a new employee. So an employee needs to fill out this W4 and then we're going to have an NDA generated for them. And then we're going to have them sign, employees going to sign both documents in the employer is going to sign one document. So two PDFs, two signers. First thing we do is we set up the email addresses. Now for this purpose of this example, and simplicity, Anvil is going to handle the signing process, it will send out emails to all the signers when it's their turn to side, but that's totally configurable. So you can set up a packet where everything is handled in your application, where it's embedded in your application. We don’t send you our users any emails. Okay. So we're going to use the email setup, so we set up the emails so that the names this works semi-similar to the other ones, in that it's just one endpoint. This is a GraphQL endpoint. We're going to call color mutation with a bunch of variables to set up a packet, and it's going to respond with some packet information. So in the variables, we set up the files, we want to use our PDF or W4. We want to upload a new NDA, and we're going to draw the boxes, and then we specify the data that we want to go onto each one of these.
These files so that's the w4 one. Here's the NDA. And then we set up the signers as well. So it's just his like which files who signs which files. Okay cool. So we are going to go to this other terminal first and then we create the Etch packet.
Alright, so we get this little URL here that it's the details URL. So this is a URL for the details of our new packet. So you can go on your dashboard. You can see the status of the packet, and see who's signed and who hasn't and all those things. Also I should have an email to sign these documents. So Sally is the employee, we are going through, and I'm going to sign.
Accept my signature, start signing, so you can see that all of this data has been filled. So the W4 has been filled with the data that we specified, and the NDA has been filled with the data that we specified. So all we need to do is then sign this.
And it's waiting then for the employer. So get a new email. So this is the employer side of things. And then we can see all of the signatures, we can see Sally's signature here, and then we can see Sally’s signature there. We sign it and it's all done, so with that, everyone gets a “completed” email. and also we can see it on the dashboard. Okay, great. That’s all.
Resources
To sign up for our free developer sandbox or learn more about our API, head over to our developer center at www.useanvil.com/developers. There, you will find comprehensive documentation, simple tutorials, and client libraries to help you get started quickly and easily.
If you have questions, please do not hesitate to contact us at: developers@useanvil.com