.net - PDF Convert to Black And White PNGs

Question

Welcome To Ask or Share your Answers For Others

.net - PDF Convert to Black And White PNGs

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I'm trying to compress PDFs using iTextSharp. There are a lot of pages with color images stored as JPEGs (DCTDECODE)...so I'm converting them to black and white PNGs and replacing them in the document (the PNG is much smaller than a JPG for black and white format)

I have the following methods:

    private static bool TryCompressPdfImages(PdfReader reader)
    {
        try
        {
            int n = reader.XrefSize;
            for (int i = 0; i < n; i++)
            {
                PdfObject obj = reader.GetPdfObject(i);
                if (obj == null || !obj.IsStream())
                {
                    continue;
                }

                var dict = (PdfDictionary)PdfReader.GetPdfObject(obj);
                var subType = (PdfName)PdfReader.GetPdfObject(dict.Get(PdfName.SUBTYPE));
                if (!PdfName.IMAGE.Equals(subType))
                {
                    continue;
                }

                var stream = (PRStream)obj;
                try
                {
                    var image = new PdfImageObject(stream);

                    Image img = image.GetDrawingImage();
                    if (img == null) continue;

                    using (img)
                    {
                        int width = img.Width;
                        int height = img.Height;

                        using (var msImg = new MemoryStream())
                        using (var bw = img.ToBlackAndWhite())
                        {
                            bw.Save(msImg, ImageFormat.Png);
                            msImg.Position = 0;
                            stream.SetData(msImg.ToArray(), false, PdfStream.NO_COMPRESSION);
                            stream.Put(PdfName.TYPE, PdfName.XOBJECT);
                            stream.Put(PdfName.SUBTYPE, PdfName.IMAGE);
                            stream.Put(PdfName.FILTER, PdfName.FLATEDECODE);
                            stream.Put(PdfName.WIDTH, new PdfNumber(width));
                            stream.Put(PdfName.HEIGHT, new PdfNumber(height));
                            stream.Put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
                            stream.Put(PdfName.COLORSPACE, PdfName.DEVICERGB);
                            stream.Put(PdfName.LENGTH, new PdfNumber(msImg.Length));
                        }
                    }
                }
                catch (Exception ex)
                {
                    Trace.TraceError(ex.ToString());
                }
                finally
                {
                    // may or may not help      
                    reader.RemoveUnusedObjects();
                }
            }
            return true;
        }
        catch (Exception ex)
        {
            Trace.TraceError(ex.ToString());
            return false;
        }
    }

    public static Image ToBlackAndWhite(this Image image)
    {
        image = new Bitmap(image);
        using (Graphics gr = Graphics.FromImage(image))
        {
            var grayMatrix = new[]
            {
                new[] {0.299f, 0.299f, 0.299f, 0, 0},
                new[] {0.587f, 0.587f, 0.587f, 0, 0},
                new[] {0.114f, 0.114f, 0.114f, 0, 0},
                new [] {0f, 0, 0, 1, 0},
                new [] {0f, 0, 0, 0, 1}
            };

            var ia = new ImageAttributes();
            ia.SetColorMatrix(new ColorMatrix(grayMatrix));
            ia.SetThreshold((float)0.8); // Change this threshold as needed
            var rc = new Rectangle(0, 0, image.Width, image.Height);
            gr.DrawImage(image, rc, 0, 0, image.Width, image.Height, GraphicsUnit.Pixel, ia);
        }
        return image;
    }

I've tried varieties of COLORSPACEs and BITSPERCOMPONENTs, but always get "Insufficient data for an image", "Out of memory", or "An error exists on this page" upon trying to open the resulting PDF...so I must be doing it wrong. I'm pretty sure FLATEDECODE is the right thing to use.

Any assistance would be much appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

361 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:57:43+0000

The Question:

You have a PDF with a colored JPG. For instance: image.pdf

If you look inside this PDF, you'll see that the filter of the image stream is /DCTDecode and the color space is /DeviceRGB.

Now you want to replace the image in the PDF, so that the result looks like this: image_replaced.pdf

In this PDF, the filter is /FlateDecode and the color space is change to /DeviceGray.

In the conversion process, you want to user a PNG format.

The Example:

I have made you an example that makes this conversion: ReplaceImage

I will explain this example step by step:

Step 1: finding the image

In my example, I know that there's only one image, so I'm retrieving the PRStream with the image dictionary and the image bytes in a quick and dirty way.

PdfReader reader = new PdfReader(src);
PdfDictionary page = reader.getPageN(1);
PdfDictionary resources = page.getAsDict(PdfName.RESOURCES);
PdfDictionary xobjects = resources.getAsDict(PdfName.XOBJECT);
PdfName imgRef = xobjects.getKeys().iterator().next();
PRStream stream = (PRStream) xobjects.getAsStream(imgRef);

I go to the /XObject dictionary with the /Resources listed in the page dictionary of page 1. I take the first XObject I encounter, assuming that it is an imagem and I get that image as a PRStream object.

Your code is better than mine, but this part of the code isn't relevant to your question and it works in the context of my example, so let's ignore the fact that this won't work for other PDFs. What you really care about are steps 2 and 3.

Step 2: converting the colored JPG into a black and white PNG

Let's write a method that takes a PdfImageObject and that converts it into an Image object that is changed into gray colors and stored as a PNG:

public static Image makeBlackAndWhitePng(PdfImageObject image) throws IOException, DocumentException {
    BufferedImage bi = image.getBufferedImage();
    BufferedImage newBi = new BufferedImage(bi.getWidth(), bi.getHeight(), BufferedImage.TYPE_USHORT_GRAY);
    newBi.getGraphics().drawImage(bi, 0, 0, null);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ImageIO.write(newBi, "png", baos);
    return Image.getInstance(baos.toByteArray());
}

We convert the original image into a black and white image using standard BufferedImage manipulations: we draw the original image bi to a new image newBi of type TYPE_USHORT_GRAY.

Once this is done, you want the image bytes in the PNG format. This is also done using standard ImageIO functionaltiy: we just write the BufferedImage to a byte array telling ImageIO that we want "png".

We can use the resulting bytes to create an Image object.

Image img = makeBlackAndWhitePng(new PdfImageObject(stream));

Now we have an iText Image object, but please note that the image bytes as stored in this Image object are no longer in the PNG format. As already mentioned in the comments, PNG is not supported in PDF. iText will change the image bytes into a format that is supported in PDF (for more details see section 4.2.6.2 of The ABC of PDF).

Step 3: replacing the original image stream with the new image stream

We now have an Image object, but what we really need is to replace the original image stream with a new one and we also need to adapt the image dictionary as /DCTDecode will change into /FlateDecode, /DeviceRGB will change into /DeviceGray, and the value of the /Length will also be different.

You are creating the image stream and its dictionary manually. That's brave. I leave this job to iText's PdfImage object:

PdfImage image = new PdfImage(makeBlackAndWhitePng(new PdfImageObject(stream)), "", null);

PdfImage extends PdfStream, and I can now replace the original stream with this new stream:

public static void replaceStream(PRStream orig, PdfStream stream) throws IOException {
    orig.clear();
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    stream.writeContent(baos);
    orig.setData(baos.toByteArray(), false);
    for (PdfName name : stream.getKeys()) {
        orig.put(name, stream.get(name));
    }
}

The order in which you do things here is important. You don't want the setData() method to tamper with the length and the filter.

Step 4: persisting the document after replacing the stream

I guess it's not hard to figure this part out:

replaceStream(stream, image);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();

Problem:

I am not a C# developer. I know PDF inside-out and I know Java.

If your problem is caused in step 2, then you'll have to post another question asking how to convert a colored JPEG image into a black and white PNG image.
If your problem is caused in step 3 (for instance because you are using /DeviceRGB instead of /DeviceGray), then this answer will solve your problem.

Categories

.net - PDF Convert to Black And White PNGs

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags