php - Weird characters when filling PDF with PDFTk

Question

Welcome To Ask or Share your Answers For Others

php - Weird characters when filling PDF with PDFTk

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I'm using php with PDFTK on Ubuntu. When filling a PDF with data, I get weird characters for this letters with accents: á ó í. I'm using UTF-8 encoding: I checked with echo mb_check_encoding($var, 'UTF-8') which outputs 1 - TRUE. Any idea what I can do?

I also tried converting to ISO with utf8_decode, but still, no luck.

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

562 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T20:04:48+0000

You're right, utf8_decode() will work for characters which can be encoded as Windows-1252 (i.e. U+0000–U+00FF).

However it won't work for characters which can't be encoded in Windows-1252.

You can always encode characters using UTF-16BE, though. You can do this for a single field only, e.g. to encode the word "?zil":

<<
/V (t?^@?^@z^@i^@l)
/T (name)
>>

(Here the "^@" indicates a NUL character (U+0000). This is how it looks in my editor (vim), if the file is encoded in Windows-1252 (latin1).)

Note that you need to use a byte order mark (which will appear as "t?" if your file is encoded in Windows-1252) and you'll need to encode the entire string (between the two parentheses) in UTF-16.

If you're generating the FDF in a PHP script you can do something like this:

<<
/V (<?php echo chr(0xfe) . chr(0xff) . str_replace(array('\', '(', ')'), array('\\', '(', ')'), mb_convert_encoding("?zil", 'UTF-16BE')); ?>)
/T (name)
>>

You can also write out the hex codes like this (i.e. enclosed in angular brackets rather than parentheses):

<<
/V <FEFF00F6007A0069006C>
/T (name)
>>

This has exactly the same result (the string "?zil"). It's less efficient in terms of characters, but it actually seems to be more reliable in pdftk, which has some bugs I've found (in version 2.02).

Finally, you can also write out the Unicode code point for any character in octal notation (ddd). For example, ? has codepoint U+00F6, which in octal is 366, so you can write:

<<
/V (366zil)
/T (name)
>>

However, this only works up to U+00FF (octal 377). Beyond that, you'd have to use UTF-16.

The PDF standard allows you to set the encoding to UTF-8 for the whole FDF document. I tried this and it didn't work with pdftk, however in theory it would be done like this:

%FDF-1.2
1 0 obj
<<
/Version /1.3
/Encoding /utf_8
/FDF

(You would presumably have to set the FDF version to 1.3 (or more) in the header too, according to the standard.)

You can also do this at the field level:

<<
/V (?zil)
/T (name)
/Encoding /utf_8
>>

But as I said, I didn't manage to get any of this to work. pdftk just seems to ignore it.

Categories

php - Weird characters when filling PDF with PDFTk

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags