python - PDF page stored as what in bytes -


i'm trying write script work , having difficulty researching particular question. assumed each pdf page image, such jpg, though reading file doesn't happen case. question is: respective pdf pages stored if not images?

here code working work:

    pdf = user_file.file.read()     startmark = b"\xff\xd8"     startfix = 0     endmark = b"\xff\xd9"     endfix = 2     = 0      njpg = 0     while true:         istream = pdf.find("stream", i)         if istream < 0:             break         istart = pdf.find(startmark, istream, istream+20)         if istart < 0:             = istream+20             continue         iend = pdf.find("endstream", istart)         if iend < 0:             raise exception("didn't find end of stream!")         iend = pdf.find(endmark, iend-20)         if iend < 0:             raise exception("didn't find end of jpg!")          istart += startfix         iend += endfix         print "jpg %d %d %d" % (njpg, istart, iend) 

pdfs should stored bytes believe. used library called pypdf when parsing pdfs.


Comments

Popular posts from this blog

python - Healpy: From Data to Healpix map -

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -