scala - When applying `map` to a `Set` you sometimes want the result not to be a set but overlook this

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

scala - When applying `map` to a `Set` you sometimes want the result not to be a set but overlook this

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

Or how to avoid accidental removal of duplicates when mapping a Set?

This is a mistake I'm doing very often. Look at the following code:

def countSubelements[A](sl: Set[List[A]]): Int = sl.map(_.size).sum

The function shall count the accumulated size of all the contained lists. The problem is that after mapping the lists to their lengths, the result is still a Set and all lists of size 1 are reduced to a single representative.

Is it just me having this problem? Is there something I can do to prevent this happening? I think I'd love to have two methods mapToSet and mapToSeq for Set. But there is no way to enforce this, and sometimes you don't locally notice that you are working with a Set.

Maybe it's even possible that you were writing code for a Seq and something changes in another class and the underlying object becomes a Set?

Maybe something like a best practise to not let this situation arise at all?

Remote edits break my code

Imagine the following situation:

val totalEdges = graph.nodes.map(_.getEdges).map(_.size).sum / 2

You fetch a collection of Node objects from a graph, use them to get their adjacent edges and sum over them. This works if graph.nodes returns a Seq.

And it breaks if someone decides to make Graph return its nodes as a Set; without this code looking suspicious (at least not to me, do you expect every collection could possibly end up being a Set?) and without touching it.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

767 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:18:14+0000

It seems there will be many possible "gotcha's" if one expects a Seq and gets a Set. It's not a surprise that method implementations can depend on the type of the object and (with overloading) the arguments. With Scala implicits, the method can even depend on the expected return type.

A way to defend against surprises is to explicitly label types. For example, annotating methods with return types, even if it's not required. At least this way, if the type of graph.nodes is changed from Seq to Set, the programmer is aware that there's potential breakage.

For your specific issue, why not define your ownmapToSeq method,

scala> def mapToSeq[A, B](t: Traversable[A])(f: A => B): Seq[B] =
           t.map(f)(collection.breakOut)
mapToSeq: [A, B](t: Traversable[A])(f: A => B)Seq[B]

scala> mapToSeq(Set(Seq(1), Seq(1,2)))(_.sum)
res1: Seq[Int] = Vector(1, 3)

scala> mapToSeq(Seq(Seq(1), Seq(1,2)))(_.sum)
res2: Seq[Int] = Vector(1, 3)

The advantage of using breakOut: CanBuildFrom is that the conversion from a Set to a Seq has no additional overhead.

You can make use the pimp my library pattern to make mapToSeq appear to be part of the Traversable trait, inherited by Seq and Set.

Categories

scala - When applying `map` to a `Set` you sometimes want the result not to be a set but overlook this

Remote edits break my code

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags