As the first more detailed post in this series on performance, I want to look at collections and one of the most common slow performing pattern we see with customers time and time again.
Which collection should I use?
First, let's do a quick recap. There are several factors which can affect which data structure best fits - and picking the correct one will generally have an impact on the performance we can expect. Here are some good questions to think about:
- Does the collection need to maintain a sequential (but not necessarily sorted) order?
- Do you need any uniqueness or de-duplication in the collection?
- Are you always looking up by a single, specific field e.g. finding an order by ID, or are you going to be searching for data in many different ways?
- How often are you modifying the collection?
There are a number of different collections in .NET and F#; I'm going to broadly split them up into two kinds:
- Sequential collections, in which every item is placed sequentially, one after another. You can usually look up items very quickly by the position (or index) of the item in the collection, but if you want to look up based on some other criteria e.g. based on a field on a record, it'll take longer - and generally the time will scale as the collection grows in size.
- Lookups, in which every value is accessed by some kind of key. There is no way to get the "first" or "last" item in the collection, because there's no notion of that - think of it more like a bag which contains everything in a hidden order. These collections are very quick at looking up items by the key.
Let's look at some collections that are popular in F# today.
- Array - The built-in .NET (mutable) array collection.
- ResizeArray - The built-in .NET (mutable)
- Sequence - A type alias for the
- List - The F#-specific immutable linked-list collection. It offers immutable "copy-and-update" style syntax.
- Dictionary - An implementation of the
System.Collections.Generic.IDictionaryinterface, created by calling the
dictfunction on any sequence of tuples.
- Read Only Dictionary - As above, except it returns an
IReadOnlyDictionarycreated by calling the
- Map - The F#-specific immutable Map. Unlike the Dictionary,
Mapoffers the ability to perform copy-and-update style modifications in an immutable fashion.
There are many variants of the above collections, either in the .NET BCL or as downloadable NuGet packages.
There is one final collection, the Set, which is a kind of hybrid of the two - it doesn't allow lookups, nor does it have any ordered sequence. However, it behaves like a sequence, allowing you to filter and find items. It also automatically guarantees that every item in the collection is unique, albeit based on their "comparison" rather than hash code, and allows "set-based" operations - so you can "subtract" one Set from another to get the elements missing from one.
High performing collections, slow performing code
I see lots of people talk about the different between arrays, sequences and lists in terms of performance (which is good to know) - but just as many fall foul of issues like the following:
// Get all customerslet customers = loadCustomers ()// Get all orderslet orders = loadOrders ()// For each order, find the parent customer and do something with the pair of themfor order in orders do let customer = customers |> Array.find(fun c -> c.CustomerId = order.CustomerId) printfn "Working with customer %A and order %A..." customer.CustomerId order.OrderId
Can you see the potential performance problem in this (admittedly arbitrary) example? Hidden away inside the for loop is a call to
For those of you saying "use
tryFind" - let's assume for the purpose of this sample we're not fussed about a customer not existing.
find, behind the scenes, simply iterates over every item in the collection of Customers until it finds the first matching one. Imagine we had 10,000 orders and 10,000 customers. Assuming on average that the customer we're looking for is mid-way through the collection, that's 10,000 * 5,000 = 50,000,000 iterations and comparisons. Fifty Million!
Lookups to the rescue
This is where a lookup comes in handy:
// Get all customerslet customers = loadCustomers ()// Convert into an in-memory lookup keyed on CustomerIdlet customerLookup = customers |> Array.map(fun c -> c.CustomerId, c) |> readOnlyDict// Get all orderslet orders = loadOrders ()// For each order, find the parent customer and do stuff with the pair of themfor order in orders do let customer = customerLookup.[order.CustomerId] printfn "Doing that stuff with customer %A and order %A..." customer.CustomerId order.OrderId
The key parts are that we create an in-memory
IReadOnlyDictionary of all customers with
CustomerId as the key, by creating a sequence of (key, value) tuples. Then, we can access the lookup using either indexer notation or built-in
Map methods such as
Benchmarking performance differences
The scenario I wanted to compare was this:
Given an already-constructed in-memory array of "lookup data", how much quicker is it to construct a specific hash-style lookup for repeated lookups - and at what point does it not become worthwhile?
To see how much the difference was between repeatedly doing finds on an array, and creating a dictionary and then using that for the lookups, I used Benchmark .NET - the code is available here. Benchmark .NET allows us to parameterise our benchmarks, so I could experiment with different combinations of Customers and Orders. In my case, I tested having 100, 1k, 10k or 100k customers and 1, 10, 100, 1k or 10k orders.
Firstly, to simulate the above scenario, I benchmarked the Array type, using pre-created collections of Customers and Orders. In other words, the benchmark only tests the ability to rapidly find customers based on a
Next, I compared this with three F# lookup-type collections:
- An F#
IReadOnlyDictionarycreated by using F#'s
- An F#
All of these were benchmarked not only for their lookup performance but also the cost of creating the lookup using the pre-created array. This was to reflect the read-world extra cost that would be required in such a scenario (as per the second code example shown earlier).
Most performant Lookup
I first wanted to identify out of the three lookup collections above, which was the most performant for the scenario identified above. The results were clear:
We can infer from this chart several interesting pieces of information:
- The read-only dictionary is much faster than either Map or Set - at times, over 10x quicker.
- For this scenario, construction of the lookup from the source collection costs much more than the time to perform the lookups themselves.
- The number of lookups appears to have minimal effect on overall time taken - even for larger numbers of customers.
- For this scenario, Set and Map behave very similarly.
Dictionary vs Array
Now that we know this, let's see how the performance behaviour differs for these two scenarios:
- Given existing arrays of customers and orders, how quickly can an Array look up all the customers linked to those orders?
- Given existing arrays of customers and orders, how quickly can we construct a lookup of all customers, and then use that to look up all the customers linked to those orders?
Note: When you look at this graph, observe that it is using a logarithmic scale. I was forced to do this, because in the worst case, the Array was so much slower than the Dictionary that it made the rest of the graph completely useless.
What can we infer from this chart?
We see an interesting pattern repeated for each "band" of customer size:(Video) What’s New in F# 5.0 & Beyond • Don Syme • YOW! 2021
- The first two sizes of Orders (i.e. number of lookups) are quicker with the Array.
- For lookups greater than 10, the Dictionary becomes quicker.
In other words, as long as you have relatively few lookups to perform - somewhere less than 100 - an array is quicker, because the cost of constructing the dictionary outweighs the benefits of the faster lookups.
- The performance of the array lookup scales proportionally to the size of the array. Given what we know about the way that
findworks, this is unsurprising but it's worth remembering: a collection 10x as large as another one will take 10x as long to find an element.
- As the number of customers grows the performance of the array becomes very slow in comparison to the Dictionary. In the worst case (10k orders and 100k customers), it takes the array nearly 200 times longer than the dictionary to perform the same number of lookups - 16.53ms vs 3.2 seconds.
Choosing the right collection data structure is important not only for correctness and ergonomics, but also has performance implications. We've seen here how, if you're performing more than just a few lookups, a Dictionary will outperform an Array - so the next time you're performing a
find within a for loop, think again!
As with all artificial benchmarks, take this article with a pinch of salt. The benchmark uses simplistic data (strings) to test out several data structures. However, many factors can impact the performance of these data structures - integers behave differently to strings which behave differently to Records etc. The best benchmarks you should do are ones that work for you and your specific scenarios - there is no one size fits all.
Hope you found this useful - until next time, have (fun _ -> ()).
Task Runtime Performance
Asynchronous code (Tasks) in C# runs faster than in F# because the compiler supports them natively and generates optimized code. The difference may be reduced, once F# supports tasks natively, too.
F# is an open-source, cross-platform programming language that makes it easy to write succinct, performant, robust, and practical code.What is the difference between sequence and list F#? ›
Sequences, commonly called sequence expressions, are similar to lists: both data structures are used to represent an ordered collection of values. However, unlike lists, elements in a sequence are computed as they are needed (or "lazily"), rather than computed all at once.Can F# use .NET classes? ›
For more complex requirements, F# natively supports . NET classes, interfaces, and structures, so the interop is still very straightforward. For example, you can write an ISomething interface in C# and have the implementation be done in F#.Why is F# not popular? ›
This attitude towards the language by its very progenitor is one reason why F# has not yet become highly recognised for its suitability, beyond just data science, for both front end and back end development, while the barrier to its adoption continues to be cyclical: there are few jobs advertising for F# developers ...Why is F# so slow? ›
From my experience generally speaking a program in idiomatic F# allocates more garbage and is harder to optimize by the JIT compiler than a similar program in idiomatic C#. The runtime and standard library are in practise mostly designed and optimized for C#, so fewer engineering hours are spent on making F# fast.Is F# better than Haskell? ›
F# is much easier to learn and apply to programming if the user moves from imperative languages. Haskell encourages a less functional style; users work in a localized state. Developers choose F# because of the Pattern matching style.Is F# still used? ›
Plug-ins supporting F# exist for many widely used editors including Visual Studio Code, Vim, and Emacs. F# is a member of the ML language family and originated as a . NET Framework implementation of a core of the programming language OCaml. It has also been influenced by C#, Python, Haskell, Scala and Erlang.Is F# good for machine learning? ›
F# excels at data science and machine learning. This article gives links to some significant resources related to this mode of use of F#. For information about other options that are available for machine learning and data science, see the F# Software Foundation's Guide to Data Science with F#.Is list better than array? ›
An array is faster than a list in python since all the elements stored in an array are homogeneous i.e., they have the same data type whereas a list contains heterogeneous elements. Moreover, Python arrays are implemented in C which makes it a lot faster than lists that are built-in in Python itself.
truncate creates a sequence from another sequence, but limits the sequence to a specified number of elements. Seq. take creates a new sequence that contains only a specified number of elements from the start of a sequence.Is F# good for data science? ›
F# is an excellent solution for programmatic data science as it combines efficient execution, REPL-scripting, powerful libraries and scalable data integration.Is F# good for web development? ›
F# excels at building efficient, scalable and robust web solutions. Web programming is based around receiving a single HTTP request and replying with a result, which maps very well to a stateless, functional approach.What is F# equivalent to? ›
Haskell, OCaml, Scala, Python, and Clojure are the most popular alternatives and competitors to F#.Why is F# used in finance? ›
F# is a functional programming language that allows you to write simple code for complex problems. Currently, it is most commonly used in the financial sector. Quantitative finance makes heavy use of mathematics to model various parts of finance in the real world.Is GB or F# more common? ›
F# is much more common than Gb, so we'll approach most of the chords below from the F# perspective. Each of these notes (degrees of the scale) can be assigned a number as it ascends so you can use a helpful formula to work out chords from it.Are the unions of F# discriminated? ›
In this case the new type is the “sum” of the integer type plus the boolean type. In F#, a sum type is called a “discriminated union” type. Each component type (called a union case) must be tagged with a label (called a case identifier or tag) so that they can be told apart (“discriminated”).Does F# have lazy evaluation? ›
Lazy computation is a feature of F#. Lazy computation does not evaluate immediately. It is executed when result is needed. It can help to improve performance of your code.Is F# A good functional language? ›
F# makes it easy to write concise code to solve complex problems on all the major desktop and mobile platforms, primarily using functional programming. F# is a strongly typed, functional-first programming language that lets you solve complex problems by writing simple code.Is F# faster than C++? ›
C++ is always claimed to be faster than any other language - especially when compared to managed languages such as C#, F#, or Java. When compiling a simple int32 accumulator in C++ (native!) and F# (managed!)
Haskell does not support building Haskell shared libraries on all platforms, and it does not have a backward compatibility story for upgrading Haskell libraries without breaking code dependent on those shared libraries. Curiously, this is also the reason C++ is not used for systems APIs.Why F# is the best enterprise language? ›
F# makes it very, very easy – and therefore cheap – to create and use types, meaning that you generally can afford to be a little more specific in how you model your systems. Doing this means that your code can be used to encode more business rules and is clearer and easier to read than it might otherwise be.Why is C# better than F#? ›
The C# code has lots of “noise”, things like curly braces, semicolons, etc. And in C# the functions cannot stand alone, but need to be added to some class (“SumOfSquaresHelper”). F# uses whitespace instead of parentheses, needs no line terminator, and the functions can stand alone.Is F# similar to Python? ›
F# is a functional programming language for . NET that is succinct (concise, readable, and type-safe) and kind of Pythonic. F# is in many ways very similar to Python, but F# can also do a lot of things better than Python: Strongly typed, if it compiles it usually works making refactoring much safer.Is F# A major key? ›
F-sharp major (or the key of F♯) is a major scale based on F♯, consisting of the pitches F♯, G♯, A♯, B, C♯, D♯, and E♯.Why use F# over Python? ›
"Pattern-matching" is the top reason why over 40 developers like F#, while over 1022 developers mention "Great libraries" as the leading cause for choosing Python.Is C sharp harder than Python? ›
While Python is easier to learn and write than C# and has vast standard libraries. Both C# and Python are excellent programming languages. Thus, picking one over the other is more a matter of preference than the risk of choosing the wrong language for the project.Is F# static or dynamic? ›
F# is a statically typed language, which means that the compiler deduces an exact type for each construct during compilation.What is faster than an ArrayList? ›
An Array is a collection of similar items. Whereas ArrayList can hold item of different types. An array is faster and that is because ArrayList uses a fixed amount of array. However when you add an element to the ArrayList and it overflows.Why use Ienumerable instead of array? ›
IEnumerables also help ensure immutability, as you are always querying the source there are no unintended side effects. Lists and Arrays create objects in memory and allow access to a whole lot of methods associated with those types ( Lists | Arrays ).
Stack is a LIFO (Last-In, First-Out) list, a list-like structure in which elements may be inserted or removed from only one end (last-in, first-out). Stacks are less flexible than lists, but are easier to implement, and more efficient (for those operations they can do).Why TRUNCATE is more faster than DELETE? ›
TRUNCATE is faster than DELETE , as it doesn't scan every record before removing it. TRUNCATE TABLE locks the whole table to remove data from a table; thus, this command also uses less transaction space than DELETE . Unlike DELETE , TRUNCATE does not return the number of rows deleted from the table.Is it better to TRUNCATE or DROP TABLE? ›
The TRUNCATE command works faster than the DROP command and DELETE command because it deletes all the records from the table without any condition. The Integrity Constraints remain the same in the DELETE command. The Integrity Constraints get removed for the DROP command.Is TRUNCATE faster than DELETE? ›
TRUNCATE TABLE removes the data by deallocating the data pages used to store the table data and records only the page deallocations in the transaction log. DELETE command is slower than TRUNCATE command. While the TRUNCATE command is faster than the DELETE command.Are lists faster than maps? ›
Map function is faster than list comprehension when the formula is already defined as a function earlier. So, that map function is used without lambda expression.Which is better map or list? ›
Use a map when you want your data structure to represent a mapping for keys to values. Use a list when you want your data to be stored in an arbitrary, ordered format.How to concatenate two lists in F#? ›
You can concatenate lists that have compatible types by using the @ operator, as in the following code. If list1 is [2; 3; 4] and list2 is [100; 2; 3; 4] , this code creates list3 as [2; 3; 4; 100; 2; 3; 4] .Is 8 cores enough for data science? ›
4 cores- 8 threads is the minimum requirement that is recommended. If you don't have a tight budget then go for 6 cores or 8 cores or higher. It's the best.How much RAM is enough for data science? ›
Enough RAM: I would argue that most important feature of a laptop for a data scientist is RAM. You absolutely want at least 16GB of RAM. And honestly, your life will be a lot easier if you can get 32GB.Which programming language is the #1 choice for data scientists? ›
Python. Python is the most widely used data science programming language in the world today. It is an open-source, easy-to-use language that has been around since the year 1991.
F# also supports asynchronous programming, message queuing system and support for event handling. Data in F# is immutable by default so sharing of data is safe. It avoids lock during code communication.Why is F# so popular? ›
F# has a number of built-in libraries to help when more than one thing at a time is happening. Asynchronous programming is very easy, as is parallelism. F# also has a built-in actor model, and excellent support for event handling and functional reactive programming.What is F# best for? ›
F# is a universal programming language for writing succinct, robust and performant code. F# allows you to write uncluttered, self-documenting code, where your focus remains on your problem domain, rather than the details of programming.Is C# the fastest? ›
Performance: C++ is widely used when higher level languages are not efficient. C++ code is much faster than C# code, which makes it a better solution for applications where performance is important.What is the advantage of F#? ›
F# has a very powerful type system which prevents many common errors such as null reference exceptions. And in addition, you can often encode business logic using the type system itself, so that it is actually impossible to write incorrect code, because it is caught at compile time as a type error.What is the fastest programming language ever? ›
C++ is the fastest language according to a number of measures, including compilation time and execution speed. In this section, we will look at some ways in which C++ beats out other programming languages in terms of performance.What is the fastest coding language? ›
Conclusion. Generally, C is preferred for tasks that require to be executed quickly, and hence the programmer has to deal with minimum runtime. The cost paid while using C is the absence of functionalities provided by other languages. Hence C is the fastest language.Why is F# better than C#? ›
The C# code has lots of “noise”, things like curly braces, semicolons, etc. And in C# the functions cannot stand alone, but need to be added to some class (“SumOfSquaresHelper”). F# uses whitespace instead of parentheses, needs no line terminator, and the functions can stand alone.
In practice, C# programs actually run faster than Python ones, and they use up less memory to do it.Why is Python slower than C#? ›
Unlike other popular programming languages including C# or JAVA, Python is dynamically typed and an interpreted language. It is slow primarily due to its dynamic nature and versatility.