Aren't side effects fundamental in complexity analysis?

1. Stephen A. GossAug 24, 2012

This doesn't answer your fundamental questions, but there are purely functional arrays and other data structures that allow copying (with a changed element) with much better than O(N) performance, usually something like O(log N). They are built with tree structures to allow structural sharing. Clojure's fundamental data structures work like this. They exist natively and as libraries for many programming languages.

To answer your question, I think programming languages generally should encapsulate efficient (mutating) algorithms behind functional interfaces, so you pass in your immutable array to a function (eg. sort) and it does a single copy to a mutable array, does a really fast sort and then creates an immutable array to pass back to you. Since those copying actions only have to happen once, they don't generally affect the asymptotic complexity of the overall operation. As a user of those functions, you continue to have the benefit of immutable data structures, but get to have some efficient algorithms at the same time.

2. NathanAug 24, 2012

@Stephen – implementing an array as a tree has other costs though, even if you disregard the benefits of better utilization of cache. What's the cost of a read to a random location of an array implemented as a tree?

As a side note, I've found that in many cases (on my hardware, where N < ~50) implementing a frequently read map as a wrapper around std::vector in C++ is faster than implementing it directly as a std::map. That is to say, even when you have increased algorithmic complexity (O(N) vs O(log N)) you can still get a win if your accesses are fast enough. With an array implemented as a tree, I suspect you are trading off both algorithmic complexity (in other places, as well as in the copy as O(log N) is slower than O(1)) and memory access speed.

I do like your idea of keeping interfaces functional, but implementations mutable. Clean interfaces are always a good idea, and in this case you can get the benefits of both worlds.

3. Mikhail GlushenkovAug 24, 2012

Yes, purely functional algorithms can't be faster than the imperative ones [1]. Arrays and graphs are good examples, and for achieving the best performance using an imperative implementation is necessary in these cases (though we have purely functional array and graph libraries). But in many other cases there do exist functional data structures that are asymptotically as efficient as their imperative analogues (for more information, see Okasaki's "Purely Functional Data Structures" and [2]).

Purity is not a silver bullet that solves all parallelism problems, but it can make your life easier (it's obviously safe to run two different pure functions in parallel). The point is not to get rid of mutable state altogether (Haskell has great support for mutable state), but to separate pure code from the impure and make the compiler enforce that separation via the type system. Haskell's type system is even advanced enough to allow pure code that uses mutable state internally (and statically guarantee that the mutable state doesn't "escape") [3].

Writing pure code is not the only approach to parallel programming that Haskell offers. Software Transactional Memory[4], which is fundamentally an abstraction built on top of mutable state, also deserves a mention.

[1] http://stackoverflow.com/questions/1990464/efficiency-of-purely-functional-programming
[2] http://cstheory.stackexchange.com/questions/1539/whats-new-in-purely-functional-data-structures-since-okasaki
[3] http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Monad-ST.html
[4] http://www.haskell.org/haskellwiki/Software_transactional_memory

4. gascheAug 24, 2012

Please note that the unsafeFoo are *not* the expected way to use the array when you want mutable update. They're here as escape hatches to let people shoot themselves in the foot if they really insist on it, but as you say it gives up safety.

Instead, you should use typed interfaces that allow the type checker to verify that the use of mutation is only local, and cannot be observed from the outside (disallowing aliasing in particular); this is done with STArray for example, that uses the ST monad for this quite neat purpose.

Of course, not all cases will be covered, and if you look hard enough you'll see situations that are not covered (rejected as ill-typed by ST). In this case you're on your own, like in any other language.

(You're remark that it is possible to write code that is too hard to analyze due to extreme chaining of dependencies but still qualifies as "pure" is spot on. This is how use of Haskell monads for effects work. Similarly, if you use ST to use side-effects locally, well are likely to use an imperative style locally, with the well-known downsides.)

For the theoretical discussion on asymptotic bounds of functional programs, see this very interesting StackOverflow discussion: http://stackoverflow.com/questions/1990464/efficiency-of-purely-functional-programming

5. Mikhail GlushenkovAug 24, 2012

I should note that I was speaking from the Haskell perspective. Most other functional languages (e.g. OCaml, Scheme, Clojure) don't have this strict compiler-enforced separation between side-effecting and pure code.

6. Russ RossAug 24, 2012

I was a heavy user of OCaml for a while, and I teach Haskell in my programming languages course. I have come to believe that "purity" in programming languages is a great learning tool, but not a very good tool for building software. I am amazed at how far one can go with purely functional programming, and I think there is value in exploring those frontiers, but one should recognize that pursuit as largely academic. I believe the same applies to languages that rhapsodies about their OOP purity (I happen to think OOP has been taken too far even in mainstream programming). Some of your writings on Forth reflect a similar sentiment about that corner of the design space.

One of the things I love about the CTM book by Van-Roy and Haridi is that it has a nicely balanced view on this. They explore declarative programming (a cousin to FP) in detail, then talk about instances where the model becomes cumbersome enough that it is worth adding a bit of state just to simplify the code. They also define their concept of purity in a way that allows state to lurk behind an abstraction; if the interface meets all the purity requirements, they are willing to cheat behind the scenes and sneak some state in when no one is looking. They don't consider the result tainted (think of the Haskell IO monad that acts almost as a badge of shame for impure code).

In short, the extremes are a fertile ground for learning and discovering, but being able to pick and choose the best of each model for each job is better for practical use. I'm one of those who believes my imperative programming is better because of my FP experience, but not everything needs to be coerced into a side-effect free model

7. RafaelAug 24, 2012

Yossi, you're neither utterly wrong nor trivially right.

Indeed there's no magical solution for efficiently mutating arrays within purely functional programming. But there's a interesting way of working around it, by limiting (localizing?) the usage of side-effects. Take a look at the ST monad.
http://www.haskell.org/haskellwiki/Monad/ST

For the more general question you might want to look into the excellent "Purely Functional Data Structures" by Chris Okasaki.

8. RafaelAug 24, 2012

(oops, when I wrote my comment, Mikhail's and gasche's weren't there, it's mostly redundant now)

9. MaxAug 24, 2012

But direct memory update is already a O(log(N)) operation and no more efficient than a functional array! It's only O(1) because you can only use a finite amount of memory on the von Neumann machine under your desk.

10. Dmitry NovikAug 24, 2012

Inefficiency – this is precisely why functional languages did not gain such the adoption in the commercial code so far. But this may change in the future with the increase of computation power. You may do it inefficiently, but if the user can hardly notice any difference in time – then, who cares?

There of course will be always realms where efficienty is very important: games, graphics, scientific computation, etc. However, the mainstream desktop and web application front ends where performance rarely matters can switch to functional programming even today, gaining speed of development and safety at cost of performance.

11. Pascal BourguignonAug 24, 2012

First, the big O notation is made to count things.
You can count memory cells (which gives you a space complexity), or you can count operations (which gives you a time complexity), or something else.

When you count micro-instructions, in a programming language with side effects, indeed, indexing an array is O(1), in the number of micro-instructions.

But if your computer doesn't have Random Access Memory (RAM), but for example, Delay Line Memory (DLM http://en.wikipedia.org/wiki/Delay_line_memory ), then while the number of instructions needed to index an array will still be O(1), the clock time needed will depend on the position of the data in the line. And this doesn't only concerns old technology, the more recent magnetic bubble memories had linear access times too.

So the question is what complexity is setting a slot in a functional array. If you implement a sorting algorith in a functional programming language, you can count the number of slots read from functional arrays and the number of slot written in the functional arrays. Each kind of access counting for 1, have a complexity of O(1).

Of course, if you try to implement your functional programming language on a Von Neuman compter, and if you do so by copying the old elements, suddenly writing a slot will be O(n) in micro-instructions of the Von Neuman machine.

But nobody forces you to use a Von Neuman machine. You could use a quantum computer, and fork a new universe each time you write an array slot, so you don't have to do the copying. Then setting a functional array slot will be O(1) in universe forks.

Who knows if forking a universe takes aeons in paradize "time"?

12. AlanLAug 24, 2012

Perhaps an irrelevant tangent, but I am currently building my first iPhone app and I notice that Objective-C, although not a functional language, has mutable and immutable versions of all of its data structures, and the culture – documentation, example code etc. – encourages use of the immutable versions unless mutability is really needed.

A very common idiom, for example, is for the data provider to work on a mutable data structure internally and then return an immutable version, either by implicit casting (the mutable version is normally a subclass of the immutable one) or by copy-once-when-finished.

This strikes me as a good example of what Russ is talking about: you're not forced into pure functional contortions, but you are encouraged to localise and separate your state-fiddling.

13. Yossi KreininAug 25, 2012

Thanks for all the comments. The bit about "hidden side effects" (pure interface, impure implementation) seems really interesting; I wonder if this works for internal state variables maintained across calls (as opposed to only supporting clobbering temporaries).

Regarding von Neumann machines being just one possible model – firstly, well, all of the standard complexity analysis assumes this model and it's awfully widespread, but secondly, I think it scales rather nicely. In particular, why does access to a pointer require O(log(N)) operations – because you need log(N) pointer bits? 128 pointer bits, or even a smaller constant, AFAIK allow you to address every atom in the universe or something.

14. Xah LeeAug 26, 2012

reading this post felt a bit odd. Then i skipped to read the comments section. I think Pascal Bourguignon's comment (#11) captures the best my reaction to this post.

upon re-reading your post, i found this beginning paragraph, to be the culprit. Quote: «... very basic complexity analysis that we all use assumes a von Neumann machine with ...»

the heart of complexity analysis measures the level of steps of algorithm. The unit of a step is usually mathematically abstract ones.

for example, Euclid's algorithm for GCD (the earlist or one of the earlist algorithm), or prime sieve, or various sort algorithms. In literature, we don't see the mention of cpu/memory architecture, because it doesn't matter.

So, a functional programing algorithm to create a new array, i'd consider O(1), just as imperative language updating one element. Actual unit considerd depends on the context.

my 2 cents. Thank you for your blog. I enjoyed read them.

15. Yossi KreininAug 26, 2012

Well, it's true that we count "steps" and we could count anything as "a single step"; the question is when such counting is useful. If a new array with N elements and one element modified is said to be created in a single step, how is the resulting counting useful? The naive implementation – copying – will take a varying amount of (physical) time proportionate to the number of elements, N, so calling it O(1) is not useful for predicting observed performance. Which less naive implementation would actually be O(1), observably – take an amount of physical time limited by a reasonably small constant (not something larger than the amount of time the universe could possibly exist) that is not proportionate to N?

16. Xah LeeAug 26, 2012

hi Yossi (#15), i think what unit to use needs to be case-specific, goal-specific.
for example, let's say we implement factorial by recursion. We implement in Scheme and Python. In python, the storage will blow up quick. How can the same algorithm now have 2 different complexity? Of course it depends on level of detail, and our purpose. It is in this sense, i think that thinking of FP's complexity in terms of cpu level detail is incongruous in general.

suppose we write basic prime sieve to get the first one thousand primes, in OCaml and also in python. Depending what's the goal of our study, we could say they both have the same complexity because afterall it's the same algorithm. Or, we could compare the cpu instruction generated of a particular hardware. Then it also drags in specific compiler used, and how the code is exactly written in each lang. (e.g. is it using a “array”, a “list”, does it use list compresion or lambda/map) So, the same algorithm can actually result in different magnitude of O() for the 2 implementations, and i think that for SOME, the FP's resulting O() could be a magnitude less than Python. So yeah i think it all depends.

another example: what's the algorithmic complexity of i = i+1 vs ++i in C? if we compare cpu instructions, don't they differ by some magnitude with a decade old compiler?

17. Yossi KreininAug 26, 2012

In the i++ example, both versions are O(1). I was talking about asymptotic complexity.
Of course you could say that a functional specification of sorting is always O(n*log(n)) given a sufficiently smart compiler; in that sense, bubble sort is O(n*log(n)) given a sufficiently smart compiler. But that sort of misses the point of complexity analysis of algorithms.

I think the question of the impact of not having random access for writing on asymptotic complexity is a valid one – I'm not saying it is trivial or that I fully answered it but I don't think "it depends" answers it any more than "O(n*log(n))" is a valid answer to the question of "what's the complexity of bubble sort". I think that to show that losing random access doesn't affect complexity you'd need to show a compilation strategy working around the problem, not merely point out that it could exist in theory. I think there's no such strategy to date, which is why functional languages offer cons but no "new array with one modified element" operation.

18. Xah LeeAug 26, 2012

hi Yossi, i guess what you said makes sense. but i don't see the utility of this conclusion with its assumptions?

as a analogy, say we have a list of 1 to 10, and we want to add one to each. We can do a for loop, or we can map a lambda over it. With certain assumption, we can decidedly say that the map approach will have storage requirement double that of imperative approach, ALWAYS. But what's the utility of this overly general conclusion?

on a different note, i don't think that functional languages uses cons. I think mostly just lisp. (i.e. it's builtin function “list” is made of cons, which is called a “proper list” when structured in a certain way (last linked pair ending in “nil”), and such is called linked list.). Mathematica's “List” is C's array. Mathematica does not have anything of linked list. (nor Perl, PHP, Python, as far as i know.) Of course, in any highlevel lang, linked list a la lisp can be built by nesting pairs. This is a oft used technique in Mathematica. e.g. nest as a way to append, then call Flatten on it to rid of nesting.

19. Yossi KreininAug 26, 2012

I know almost no Mathematica; is it purely functional, and how do you build new collections out of old ones?

In Haskell and ML as well as Lisps, there is a cons – I don't think they name it so and perhaps they rightly don't since it's a bit of a weird mnemonic, but they have it. That is, you take a head and a tail and make a new list; this can be made to work predictably efficiently, I think.

How people usually build new arrays out of old ones in purely functional languages or language subsets I don't know; what does Mathematica do?

As to mapping a lambda vs a for – I think a relatively simple compiler can convert the map to a for; also it's not an asymptotic performance difference. What I wonder about is whether purity costs asymptotic performance, because that kind of thing is not something stronger hardware can really buy back – not for a sufficiently large N.

(I imagine one possible solution is to add a bunch of pure functional primitives such as histogram() that use side effects in their implementations but have a pure interface; if so, I wonder if you pay a price in generality – or is there actually an approach letting you encapsulate side effects in functional primitives so that every algorithm can be made as efficient as with a writable RAM? If such an approach exists, it'd answer my question spectacularly and prove my gut feeling wrong; if a less general approach exist that can be shown to handle an important sub-space of problems, it's rather interesting as well.)

20. Noam Yorav-RaphaelAug 26, 2012

Hi Yossi!

I haven't researched this enough, but it seems to me that pure functions with the added feature of destroying their input might be a solution. Imaging a (pure) function that creates a new array from an old one by replacing one element, that has a peculiar feature that using the original array again an error. It's obvious how using it in code translates to O(1) machine code, but the code is still functional.

Here http://www.cse.unsw.edu.au/~benl/papers/thesis/lippmeier-impure-world.pdf is a (surprisingly pleasant to read) PhD thesis, dealing with exactly this problem. I've read only the first two chapters, so I'm not sure what's exactly the solution he suggests, but the author dedicates the work to his beloved data structure, an array with O(1) updates.

Here http://code.google.com/p/decac/ is a programming language based on the ideas of the above thesis, which aims to give "bare-metal performance" while using modern programming languages ideas.

None of these languages got a lot of success or development. However, I do hope that something similar might work one day. I think that functional programming that can be translated to efficient machine code in an obvious way can be a really nice thing.

21. Mikhail GlushenkovAug 26, 2012

> The bit about “hidden side effects” (pure interface, impure implementation)
> seems really interesting; I wonder if this works for internal state variables
> maintained across calls (as opposed to only supporting clobbering
> temporaries).

You can do that in Haskell (a good example is the DiffArray data structure
[1][2]), but the type system won't be able to guarantee that your program does
not go wrong in that case, so you'll have to use an escape hatch
(such as unsafePerformIO).

[1] http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps
[2] http://hackage.haskell.org/package/diffarray

22. Xah LeeAug 26, 2012

Mathematica (M) isn't purely fuctional. It's somewhat like Common Lisp in that regard. It's big, has few thousand functions built-in. All paradigms can be used, but functional and pattern matching term rewriting system is naturally primary. It's my fav lang. ( http://en.wikipedia.org/wiki/Rewriting )

in M, there's Append and Prepend http://reference.wolfram.com/mathematica/ref/Prepend.html but it's costy. It basically make a copy of the array. For seasoned M programers, if we want to grow a list, a common idiom is to start with huge number of elements (e.g. of 0), then chop off inert tail when done. Or, nest them in each step e.g. mylist = {mylist, newItem}, then call Flatten[mylist] in the end.

am aware that Ocaml and Haskel has linked list. But only dabbled in OCaml. Tried to study Haskell but went nowhere. On the other hand, been coding emacs lisp for a few years. I hated lisp cons! I think the linked list in Haskell, OCaml are qualitatively different than lisp's list, because they don't expose the “cons” primitive as a list constructor. The fact of the linked list is only at the implementation level.

i don't have experience with low level langs such as C, C++, assembly, nor system programing, compilers, nor cpu/memory architecture .... So, i can only wave hands to your deeper technical inquiries in this discussion. ☺

23. Yossi KreininAug 26, 2012

@Noam: that's really interesting – thanks!

A function or expression updating its input instead of copying it on the grounds of there being no references to the input could take you some distance I guess; a simple recursive histogram function could work that way, for instance. It's not intuitive for me what exactly this distance is.

24. Görkem DemirtaşAug 27, 2012

One important problem FP solves is aliasing. Aliasing makes optimising memory/time trade off and reasoning about code hard. While a convenience, mutable state is what pulls the rug under you while you are not looking. Dependent types tries to address both the "preventing copying bits around" and "constant time access for constant amount of memory" problems by giving a little too much power to the type system. While optimisation opportunuties sound great in theory, it leads to rather baroque syntax which helps them to stay obscure for the rest of us.

Access to arbitrary amounts of memory is not actually constant time operation, average/worst case time is assumed for analysis. Arbitrary length operations doesn't run as fast as machine instructions either.

Another thing worth looking to understand where functional programming stands relative to imperative is reasoning about continuations. To get a logical view of world (world stays same upon continuation reentry), you either need to copy the whole call frames (and related state) and restore upon reentry or pretend nothing will change that may interest you. Pretending nothing will change is way safer for a functional language although it is heavy on memory consumption. An imperative language needs to take aliasing into account while copying the modified call frames. Just ditching the logical view guarantee and letting the programmer make sure nothing bad will happen is easier (i.e. prolog with cuts). Once you try to incorporate call/cc and friends, complexity analysis becomes impossible for mere mortals with mutable state.

25. Mikhail GlushenkovAug 28, 2012

> A function or expression updating its input instead of copying it on the grounds of there being no references to the input could take you some distance I guess;

This is basically what uniqueness types give you – the type system guarantees that values with unique types are used in a single threaded manner. Uniqueness types were developed as a solution to the problem of combining lazy evaluation with side effects, just like Haskell's ST and IO monads. Ben Lippmeier discusses these matters in much more detail in his wonderful dissertation (linked above) and also presents another solution to the same problem.

26. Micky LatowickiSep 12, 2012

Regarding uniqueness types and Lippmeier's thesis – I haven't read it, but it sounds a lot like linear types (http://en.wikipedia.org/wiki/Linear_type_system), which are the flagship feature of the Clean language (a cousin of Haskell), and have since been adopted elsewhere (see the Wikipedia article).

27. Dror LivneSep 24, 2012

AFAICT, there is a slight difference between uniqueness types (at least as implemented by Clean) and Linear types (as presented in Baker / Wadler):
in Clean, unique <= shared, so a unique type can be coerced to it's shared type.
In contrast, linear type systems require every binding to be used exactly once.

On a different note, there is actually a wealth of type systems designed to control aliasing with varying degrees of precision and sophistication.

see:
Francois Pottier: Wandering through linear types, capabilities, and regions.

28. dmbarbourApr 2, 2014

The perceived O(1) update for random access arrays is, unfortunately, misleading – a consequence of many simplifications for an abstract model that cannot physically be implemented. In practice, memory is allocated in physical space, and there are (minimally) speed of light delays. Further, unique addresses for memory necessarily have average size lg(N). If memory is allocated in 2D space, the best we can feasibly do is O(N^(1/2) * lg(N)). If memory is allocated in 3D space, we can improve this to O(N^(1/3) * lg(N)).

In practice, the discrepancy between O(1) and actual access times tends to show up in terms of cache invalidation, virtual memory pages, even disk reads or network lookups. This is interesting, in its own way, because our machines are essentially modeling "motion" of memory – the stuff we access is 'near' (in cache instead of memory, or in in memory) and thus more cheaply accessible.

In practice, it is not difficult to achieve identical asymptotic performance in a pure language. For example, if we have a Map of Integers to Values, we can typically update an integer in that map in O(lg(N)) time (ignoring physical layout of the map). If you used a map for your histogram, you'd ultimately have the same asymptotic performance as you do for an array... you'd simply be a lot more explicit and obvious about it.

If you're somewhat more clever, you could actually model full cache-like motions, and thus enable O(1) amortized manipulations to recently accessed values and their neighbors (cf. Gërard Huet's Zippers). So, if you're sequentially updating every value in a histogram, you can get equivalent physical-world asymptotic performance as modifying an array.

Of course, coefficients cannot reasonably be ignored. A map update is likely to involve, on average, many more memory operations and much more indirection than a stateful array manipulation. Zippers, if used, similarly require extra computations.

29. Yossi KreininApr 2, 2014

In practice your histogram is typically small enough to fit in an L1 cache. So being "much more explicit about asymptotic performance" means that you're bringing slowdowns which are unavoidable on huge inputs to problems where they're perfectly avoidable.

On the other hand I agree that realizing that random access is in practice much slower than sequential access in many cases is valuable. I'm just saying that fast RAM – small fast RAM as necessitated by physics but fast nonetheless – is darn useful and castrating it seems weird. The existence of memory hierarchy on most machines that you use as evidence that O(1) random access is impossible is actually also evidence that getting O(1) with as small constants as possible for an as large data set is possible is useful.

30. dmbarbourApr 3, 2014

"you're bringing slowdowns which are unavoidable on huge inputs to problems where they're perfectly avoidable" – True enough. I don't deny that coefficients matter. They only don't matter to asymptotic complexity, which was a significant focus of your article. :)

If we're aiming for coefficient performance in a purely functional programs, there are at least three paths to pursue.

First, it is quite feasible to develop cache-friendly, purely functional structures – e.g. by using wider tree nodes (not two branches per level, instead fifty or so – e.g. B-trees, or finger-trees with many nodes, or ropes built on these). By going wider, we better fit paging models and cache-lines. Of course, binary trees tend to be simpler to implement, so there is a tradeoff on complexity.

Second, if we can recognize that there is no aliasing for a data structure – or prevent it using the type system (i.e. linear or affine types) – then we can update it in place without losing our ability to reason about it in a purely functional manner. There are entire languages and concepts built around this – e.g. Clean, Mercury, DDC Haskell, ParaSail, the 'Wormholes' work for FRP. It's worth paying attention to. I'm also developing a language that leverages these features.

Third, the Zipper data structure – initially developed by Gërard Huet in 1997, but which remains unknown and underutilized by most functional programmers. It provides an excellent foundation for O(1) navigation steps and local updates for any tree-like structure. That includes kd-trees (which can model 2D grids and 3D volumes). It also includes text cursors, navigation and user-focus in a user interface. (Cleverly, ray-tracing can be understood as a form of navigation unto collision.)

The zipper is another of the bridges between functional and imperative. Like linearity, zippers do have a weakness with regards to aliasing: you can't have multiple zippers in the same data structure. The closest you can get is higher-order zippers, which model volumes instead of focus/location as a single point.

Related to the zipper and the tree, functional programming languages have also developed a "finger-tree" concept, which is basically a tree turned inside out so we can access and update the endpoints in O(1) time. These make excellent purely functional deques. :)

Anyhow, I think there is a lot of confusion about what is feasible in purely functional programming. An expert imperative programmer doesn't immediately know the idioms for functional performance, and might prefer to remain an expert of imperative than become a beginner of functional. It's easy to learn "use more trees", but it can take years to learn a full suite of idioms.

Anyhow, I'm not inclined to give up my "O(1) with as small constants as possible". As I mentioned above, and before, I use zippers in that role. Even better with linear in-place updates, and if we can keep it cache friendly with wider structures. :)

Aren't side effects fundamental in complexity analysis?

Post a comment