How can I wrap an open binary stream – a Python 2 file
, a Python 3 io.BufferedReader
, an io.BytesIO
– in an io.TextIOWrapper
?
I'm trying to write code that will work unchanged:
- Running on Python 2.
- Running on Python 3.
- With binary streams generated from the standard library (i.e. I can't control what type they are)
- With binary streams made to be test doubles (i.e. no file handle, can't re-open).
- Producing an
io.TextIOWrapper
that wraps the specified stream.
The io.TextIOWrapper
is needed because its API is expected by other parts of the standard library. Other file-like types exist, but don't provide the right API.
Example
Wrapping the binary stream presented as the subprocess.Popen.stdout
attribute:
import subprocess
import io
gnupg_subprocess = subprocess.Popen(
["gpg", "--version"], stdout=subprocess.PIPE)
gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8")
In unit tests, the stream is replaced with an io.BytesIO
instance to control its content without touching any subprocesses or filesystems.
gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8"))
That works fine on the streams created by Python 3's standard library. The same code, though, fails on streams generated by Python 2:
[Python 2]
>>> type(gnupg_subprocess.stdout)
<type 'file'>
>>> gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'file' object has no attribute 'readable'
Not a solution: Special treatment for file
An obvious response is to have a branch in the code which tests whether the stream actually is a Python 2 file
object, and handle that differently from io.*
objects.
That's not an option for well-tested code, because it makes a branch that unit tests – which, in order to run as fast as possible, must not create any real filesystem objects – can't exercise.
The unit tests will be providing test doubles, not real file
objects. So creating a branch which won't be exercised by those test doubles is defeating the test suite.
Not a solution: io.open
Some respondents suggest re-opening (e.g. with io.open
) the underlying file handle:
gnupg_stdout = io.open(
gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
That works on both Python 3 and Python 2:
[Python 3]
>>> type(gnupg_subprocess.stdout)
<class '_io.BufferedReader'>
>>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
>>> type(gnupg_stdout)
<class '_io.TextIOWrapper'>
[Python 2]
>>> type(gnupg_subprocess.stdout)
<type 'file'>
>>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
>>> type(gnupg_stdout)
<type '_io.TextIOWrapper'>
But of course it relies on re-opening a real file from its file handle. So it fails in unit tests when the test double is an io.BytesIO
instance:
>>> gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8"))
>>> type(gnupg_subprocess.stdout)
<type '_io.BytesIO'>
>>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
io.UnsupportedOperation: fileno
Not a solution: codecs.getreader
The standard library also has the codecs
module, which provides wrapper features:
import codecs
gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout)
That's good because it doesn't attempt to re-open the stream. But it fails to provide the io.TextIOWrapper
API. Specifically, it doesn't inherit io.IOBase
and doesn't have the encoding
attribute:
>>> type(gnupg_subprocess.stdout)
<type 'file'>
>>> gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout)
>>> type(gnupg_stdout)
<type 'instance'>
>>> isinstance(gnupg_stdout, io.IOBase)
False
>>> gnupg_stdout.encoding
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/codecs.py", line 643, in __getattr__
return getattr(self.stream, name)
AttributeError: '_io.BytesIO' object has no attribute 'encoding'
So codecs
doesn't provide objects which substitute for io.TextIOWrapper
.
What to do?
So how can I write code that works for both Python 2 and Python 3, with both the test doubles and the real objects, which wraps an io.TextIOWrapper
around the already-open byte stream?