apache spark - Avoid "Task not serialisable" with nested method in a class -
i understand usual "task not serializable" issue arises when accessing field or method out of scope of closure.
to fix it, define local copy of these fields/methods, avoids need serialize whole class:
class myclass(val myfield: any) { def run() = { val f = sc.textfile("hdfs://xxx.xxx.xxx.xxx/file.csv") val myfield = this.myfield println(f.map( _ + myfield ).count) } }
now, if define nested function in run method, cannot serialized:
class myclass() { def run() = { val f = sc.textfile("hdfs://xxx.xxx.xxx.xxx/file.csv") def mapfn(line: string) = line.split(";") val myfield = this.myfield println(f.map( mapfn( _ ) ).count) } }
i don't understand since thought "mapfn" in scope... stranger, if define mapfn val instead of def, works:
class myclass() { def run() = { val f = sc.textfile("hdfs://xxx.xxx.xxx.xxx/file.csv") val mapfn = (line: string) => line.split(";") println(f.map( mapfn( _ ) ).count) } }
is related way scala represents nested functions?
what's recommended way deal issue ? avoid nested functions?
isn't working in way in first case f.map(mapfn(_))
equivalent f.map(new function() { override def apply(...) = mapfn(...) })
, in second 1 f.map(mapfn)
? when declare method def
method in anonymous class implicit $outer
reference enclosing class. map
requires function
compiler needs wrap it. in wrapper refer method of anonymous class, not instance itself. if use val
, have direct reference function pass map
. i'm not sure this, thinking out loud...
Comments
Post a Comment