python - PDF page stored as what in bytes -
i'm trying write script work , having difficulty researching particular question. assumed each pdf page image, such jpg, though reading file doesn't happen case. question is: respective pdf pages stored if not images?
here code working work:
pdf = user_file.file.read() startmark = b"\xff\xd8" startfix = 0 endmark = b"\xff\xd9" endfix = 2 = 0 njpg = 0 while true: istream = pdf.find("stream", i) if istream < 0: break istart = pdf.find(startmark, istream, istream+20) if istart < 0: = istream+20 continue iend = pdf.find("endstream", istart) if iend < 0: raise exception("didn't find end of stream!") iend = pdf.find(endmark, iend-20) if iend < 0: raise exception("didn't find end of jpg!") istart += startfix iend += endfix print "jpg %d %d %d" % (njpg, istart, iend)
pdfs should stored bytes believe. used library called pypdf when parsing pdfs.
Comments
Post a Comment