Your solution is based on a brute force approach. You are generating all possible alternative strings and then filtering out the ones that do not meet the criteria of only 3 changes. A better approach would be to look only at those combinations that will meet the criteria. I will ignore the part of saving to a file, since it will be the same for both solutions. A faster solution would be:
def change_string(input_string, mapping, replace=3):
input_string = list(input_string)
to_replace = dict()
for idx, replacement in enumerate(mapping):
if not replacement: continue
to_replace[idx] = replacement
if input_string[idx] in replacement:
to_replace[idx] = [char for char in replacement if char != mapping[idx]]
for indices in combinations(to_replace, r=replace):
for chars in product(*[to_replace[index] for index in indices]):
temp = input_string[:]
for index, char in zip(indices, chars):
temp[index] = char
yield ''.join(temp)
Explanation
I change the input string to a list, so I can do the replacement faster, since lists are mutable and strings are not.
Then I filter the mapping (p
) to represent only indices that are going to be changed. This removes all empty strings and provides me with the indices that I have to look at.
to_replace = dict()
for idx, replacement in enumerate(mapping):
if not replacement: continue
to_replace[idx] = replacement
if input_string[idx] in replacement:
to_replace[idx] = [char for char in replacement if char != mapping[idx]]
Note: I also make sure that the values in mapping are unequal to the original string values, which might not be what you want.
Then I create all possible combinations of indices with the required length (replace=3).
for indices in combinations(to_replace, r=replace):
Using your example this will contain the following group of indices:
(0, 1, 4)
(0, 1, 5)
(0, 4, 5)
(1, 4, 5)
Then I create all possible character
combinations from those indices:
for chars in product(*[to_replace[index] for index in indices]):
For example with indices (0, 1, 4)
or the values ('d1', 'c3', '0')
:
('d', 'c', '0')
('d', '3', '0')
('1', 'c', '0')
('1', '3', '0')
Are all the character combinations produced.
Then I create a copy of the input string (note it is a list, so we can perform fast replacements) and replace the characters at the correct indices.
Comparison
def OP(input_string, replace=3):
p = ["d1", "c3", "", "", "0", "56"]
d = {idx: (v if input_string[idx] in v else input_string[idx] + v) for idx, v in enumerate(p)}
all_of_em = (''.join(whatever) for whatever in product(*d.values()))
fewer = [w for w in all_of_em if sum(a != b for a, b in zip(w, input_string)) == replace]
return fewer
Replace is 3
print(timeit.timeit("OP('abc123')", setup="from __main__ import OP", number=100_000))
# 5.6281933 seconds
print(timeit.timeit("list(change_string('abc123', ['d1', 'c3', '', '', '0', '56']))",
setup="from __main__ import change_string", number=100_000))
# 1.3682368 seconds
Which is about 3 times as fast, now the interesting part is to see what happens if we increase the replace value to 4
Replace is 4
print(timeit.timeit("OP('abc123', replace=4)", setup="from __main__ import OP", number=100_000))
# 5.5450302 seconds
print(timeit.timeit("list(change_string('abc123', ['d1', 'c3', '', '', '0', '56'], replace=4))",
setup="from __main__ import change_string", number=100_000))
# 0.6179974 seconds
A whooping 9 times faster, since my solution only has to check a few combinations.
Similar increase can be seen with using replace is 2
or 1
.