033
03.10.2014, 19:03 Uhr
holm
|
PDF File Format: Basic Structure
1. Introduction
We all know that there are a number of attacks where an attacker includes some shellcode into a PDF document, which uses some kind of vulnerability in how the PDF document is analyzed and presented to the user to execute malicious code on the targeted system.
The next picture presents the number of vulnerabilities discovered in popular PDF Reader Adobe Acrobat Reader. The number of vulnerabilities is increasing over the years, but there are a little less vulnerabilities discovered this year (but the year isn’t over yet). The most important vulnerabilities are the Code Execution vulnerabilities, which an attacker can use to execute arbitrary code on the target system (if the Acrobat Reader hasn’t been patched yet).
This is an important indicator that we should regularly update our PDF Reader, because the number of vulnerabilities discovered recently is quite daunting.
2. PDF File Structure
Whenever we want to discover new vulnerabilities in software we should first understand the protocol or file format in which we’re trying to discover new vulnerabilities. In our case, we should first understand the PDF file format in detail. In this article we’ll take a look at the PDF file format and its internals.
PDF is a portable document format that can be used to present documents that include text, images, multimedia elements, web page links, etc. It has a wide range of features. The first thing we must understand is that the PDF file format specification is publicly available here and can be used by anyone interested in PDF file format. There are almost 800 pages of the documentation for the PDF file format alone, so reading through that is not a one-day quick read, but it takes a lot of time.
PDF has a lot more functions than just text; it can include images and other multimedia elements, it can be password protected, it can execute JavaScript, etc. The basic structure of a PDF file is presented in the picture below:
Every PDF document has the following elements:
- Header: This is the first line of a PDF file and specifies the version number of the used PDF specification which the document uses. If we want to find that out, we can use the hex editor or simply use the xxd command as below:
# xxd temp.pdf | head -n 1 0000000: 2550 4446 2d31 2e33 0a25 c4e5 f2e5 eba7 %PDF-1.3.%......
The temp.pdf PDF document uses the PDF specification 1.3. The ‘%’ character is a comment in PDF, so the above example actually presents the first and second line being comments, which is true for all PDF documents. The following bytes are taken from the output below: 2550 4446 2d31 2e33 0a25 c4e5 and correspond to the ASCII text “%PDF-1.3.%”. What follows are some ASCII characters that are using non-printable characters (note the ‘.’ dots), which are usually there to tell some of the software products that the file contains binary data and shouldn’t be treated as 7-bit ASCII text. Currently the version numbers are of the form 1.N, where the N is from range 0-7.
Sucht mal nach dem String "Creator" im PDF, etwa so:
Quellcode: | $ strings uha78.brd.pdf| fgrep -i creator /Creator(FreePDF XP 3.07 - http://shbox.de) $ oder $ strings P8000_WDC_Emulator.pdf|fgrep -i Creator <xmp:CreatorTool>FreePDF 4.08 - http://shbox.de</xmp:CreatorTool></rdf:Description> <rdf:Description rdf:about='uuid:4dc99e0d-f08f-11e2-0000-780f2674801b' xmlns:dc='http://purl.org/dc/elements/1.1/' dc:format='application/pdf'><dc:title><rdf:Alt><rdf:li xml:lang='x-default'>Printing Drucke Schaltplan</rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>olivleh1</rdf:li></rdf:Seq></dc:creator></rdf:Description> /Creator(FreePDF 4.08 - http://shbox.de) $ oder $ strings Serie-712.pdf|fgrep -i Creator <xmp:CreatorTool>PScript5.dll Version 5.2.2</xmp:CreatorTool> <dc:creator> </dc:creator> <Iptc4xmpCore:CreatorContactInfo rdf:parseType="Resource"> </Iptc4xmpCore:CreatorContactInfo> <xmp:CreatorTool>PScript5.dll Version 5.2.2</xmp:CreatorTool> <dc:creator> </dc:creator> <Iptc4xmpCore:CreatorContactInfo rdf:parseType="Resource"> </Iptc4xmpCore:CreatorContactInfo> <xmp:CreatorTool>PScript5.dll Version 5.2.2</xmp:CreatorTool> <dc:creator> </dc:creator> <Iptc4xmpCore:CreatorContactInfo rdf:parseType="Resource"> </Iptc4xmpCore:CreatorContactInfo> <xmp:CreatorTool>PScript5.dll Version 5.2.2</xmp:CreatorTool> <dc:creator> </dc:creator> <Iptc4xmpCore:CreatorContactInfo rdf:parseType="Resource"> </Iptc4xmpCore:CreatorContactInfo> <xmp:CreatorTool>PScript5.dll Version 5.2.2</xmp:CreatorTool> <dc:creator> </dc:creator> $
|
@Thomas: Evince kann in Tiffs blättern.
Dos/Windows hatte schon immer die Krankheit das es Programme gab die vieles konnten, aber nur eine Sache gut. Es sammelte sich auf den Platten immer ein Samelsurium von Programmpaketen an und man nahm jeweils nur einzelne Features. altes Beispiel PcTools und Norton.. Bei Unix hatte man den Weg favorisiert das ein Programm nur exakt eine Funktion haben sollte, für diese aber gut sein. Gleichzeitig sollten die Programme verkettbar sein. Deswegen gibts so viele.
Gruß,
Holm -- float R,y=1.5,x,r,A,P,B;int u,h=80,n=80,s;main(c,v)int c;char **v; {s=(c>1?(h=atoi(v[1])):h)*h/2;for(R=6./h;s%h||(y-=R,x=-2),s;4<(P=B*B)+ (r=A*A)|++u==n&&putchar(*(((--s%h)?(u<n?--u%6:6):7)+"World! \n"))&& (A=B=P=u=r=0,x+=R/2))A=B*2*A+y,B=P+x-r;} |