I would like to read in a JPEG-Header and analyze it.
According to Wikipedia, the header consists of a sequences of markers. Each Marker starts with FF xx
, where xx
is a specific Marker-ID.
So my idea, was to simply read in the image in binary format, and seek for the corresponding character-combinations in the binary stream. This should enable me to split the header in the corresponding marker-fields.
For instance, this is, what I receive, when I read in the first 20 bytes of an image:
binary_data = open('picture.jpg','rb').read(20)
print(binary_data)
b'xffxd8xffxe1-xfcExifx00x00MMx00*x00x00x00x08'
My questions are now:
1) Why does python not return me nice chunks of 2 bytes (in hex-format).
Somthing like this I would expect:
b'xff xd8 xff xe1 x-' ... and so on
. Some blocks delimited by 'x' are much longer than 2 bytes.
2) Why are there symbols like -, M, *
in the returned string? Those are no characters of a hex representation I expect from a byte string (only: 0-9, a-f, I think).
Both observations hinder me in writing a simple parser. So ultimately my question summarizes to: How do I properly read-in and parse a JPEG Header in Python?
See Question&Answers more detail:os