Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to decompress a stream from a PDF Object in this file:

 4 0 obj
<< 
/Filter /FlateDecode
/Length 64
>>
stream
x?s
QDw34V02UIS0′0P030PIQDpé?KIUH-.ITH.-*ê··×TéRp
á T‰
ê
endstream
endobj

I have this stream copy-pasted with the same format as in the original file in a file called Stream.file

x?s
QDw34V02UIS0′0P030PIQDpé?KIUH-.ITH.-*ê··×TéRp
á T‰
ê

This stream should translate to: Donde esta curro??. Added that stream to a Stream.file in a C# Console application.

using System.IO;
using System.IO.Compression;

namespace Filters
{
    public static class FiltersLoader
    {
        public static void Parse()
        {
            var bytes = File.ReadAllBytes("Stream.file");
            var originalFileStream = new MemoryStream(bytes);

            using (var decompressedFileStream = new MemoryStream())
            using (var decompressionStream = new DeflateStream(originalFileStream, CompressionMode.Decompress))
            {
                decompressionStream.CopyTo(decompressedFileStream);
            }    
        }
    }
}

However it yields an exception whil trying to copy it:

The archive entry was compressed using an unsupported compression method.

I'd like how to decode this stream with .net code if it's possible.

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
436 views
Welcome To Ask or Share your Answers For Others

1 Answer

The main problem is that the DeflateStream class can decode a naked FLATE compressed stream (as per RFC 1951) but the content of PDF streams with FlateDecode filter actually is presented in the ZLIB Compressed Data Format (as per RFC 1950) wrapping FLATE compressed data.

To fix this it suffices to drop the two-byte ZLIB header.

Another problem became clear in your first example document: That document was encrypted, so before FLATE decoding the stream contents therein have to be decrypted.

###Drop ZLIB header to get to the FLATE encoded data

The DeflateStream class can decode a naked FLATE compressed stream (as per RFC 1951) but the content of PDF streams with FlateDecode filter actually is presented in the ZLIB Compressed Data Format (as per RFC 1950) wrapping FLATE compressed data.

Fortunately it is pretty easy to jump to the FLATE encoded data therein, one simply has to drop the first two bytes. (Strictly speaking there might be a dictionary identifier between them and the FLATE encoded data but this appears to be seldom used.)

in case of your code:

var bytes = File.ReadAllBytes("Stream.file");
var originalFileStream = new MemoryStream(bytes);

originalFileStream.ReadByte();
originalFileStream.ReadByte();

using (var decompressedFileStream = new MemoryStream())
using (var decompressionStream = new DeflateStream(originalFileStream, CompressionMode.Decompress))
{
    decompressionStream.CopyTo(decompressedFileStream);
}   

###In case of encrypted PDFs, decrypt first

Your first example file pdf-test.pdf is encrypted as is indicated by the presence of an Encrypt entry in the trailer:

trailer
<</Size 37/Encrypt 38 0 R>>
startxref
116
%%EOF

Before decompressing stream contents, therefore, you have to decrypt them.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...