=ADD= =reftype= 14 =number= 01-03 =url= ftp://ftp.risc.uni-linz.ac.at/pub/techreports/2001/01-03.ps.gz =year= 2001 =month= 01 =author= Schreiner; Wolfgang + Kusper; Gabor + Bosa; Karoly =title= Introducing Fault Tolerance to Distributed Maple =abstract= We have extended the parallel computer algebra environment Distributed Maple by fault tolerance mechanisms such that the time spent in a long running computation is not any more wasted by the eventual occurrence of a session failure. The first mechanism is the logging of task return values and of shared object values such that after a failure the newly started session can (transparently to the application program) reuse already computed results. This is complicated by the fact that task arguments and results may embed task handles and that the scheduling layer of Distributed Maple has only a limited amount of information about the activities of the computing layer. The second mechanism is the migration of tasks such that a session may tolerate the failure of individual nodes without overall failure. Both fault tolerance mechanisms are considerably facilitated by the mostly functional nature of the parallel programming model; they allow to run computations that take much longer than the meantime between session failures. =sponsor= Supported by grant SFB F013/F1304 of the Austrian Science Foundation (FWF). =keywords= parallel computing, cluster computing, computer algebra, Java, Maple.