Some of Elixir’s features, like first-class functions, pattern matching, the pipe operator (|>), and the cons operator (|), as well as techniques like recursion, even allow us to write programs that are just as readable (if not more so) than those of other non-functional languages.
How to implement String operations in Elixir?
Elixir is a dynamic, functional programming language designed especially for building scalable and maintainable applications. Some of Elixir’s features, like first-class functions, pattern matching, the pipe operator (|>), and the cons operator (|), as well as techniques like recursion, even allow us to write programs that are just as readable (if not more so) than those of other non-functional languages.
One of the most basic types of just about any language is the String type. However, Elixir does not actually have a dedicated String type.
Instead Strings are represented as Binaries or Character Lists.
In this article we’re going to be looking at working with Strings in Elixir.
Before talking about Strings in Elixir we need to know about Strings in Erlang. In Erlang they defined two types of strings, one is Bit Strings and Binaries and another one is String. Now coming back to Elixir we have access to both this types and in future we may get new string types, but many Elixir developers basically use Binaries commonly. Many developers may not aware of what and how strings are implemented in Erlang/Elixir. Many websites assume that we know its data-structure implementation already and not discuss much about it. But if your requirement strongly depends on performance then you need to know performance of the string operation that your language works on. Explanations for the strings in Erlang/Elixir are as follows.
Bit Strings and Binaries (Erlang) / String (Elixir)
It refers to a sequence of bytes(8bits). In memory it’s space is allotted continuously for each bytes while assigning to a variable. Let us assume the following statement.
a = “abc” (Elixir)
Elixir identifies its data (“abc”) as Binaries type and converts to as follows
<<”abc”>> (Erlang)
Which is also similar to
<<97, 98, 99<< (Erlang)
In memory the space for its data is allocated as follows
(address of var a starts here)
In Memory =>> < — some bits — > 01100001 01100010 01100011 end bits < — some bts —>
(“a”) (“b”) (“c”) Basically all I/O data transferred uses this binary forms. This is same as other languages too.Note: In Elixir we can use 2 bytes to represent a unicode character, such as special letters like “é” . This is based on the Grapheme concept in Elixir.
Advantages
We can directly use this data to be stored or retrieved. It is easy to embed this data in the HTML files.
Looping operations, Keyword search operations, patterns matching are faster in this data type. Drawbacks
Each time you do an appending operation or any data modifying operations in a string it creates output in an another free space in the memory. So its performance will be slower.
Example:If
A = “abc” and B = “def”
Then
C = A + B mean
It will visit each byte in A and B one by one and copies the data into the new location first. Then the new result address is assigned to variable C. Then we cannot use A or B to store the result. So the string type Binaries are Immutable in nature. So it is not fit for doing multiple character append or remove operations in Binaries. Don’t worry we have alternate data type for doing this kind of work, that is String (Erlang) as follows.
String (Erlang) / Refer as Integer list for Erlang type string (Elixir)
You don’t find much about this in the Elixir documentation. But still you can use this in Elixir by enclosing strings with the single quotation ( ‘ ) symbol for declaring a string. The Erlang VM basically store this kind of string as an Integer list, based on their respective ASCII code. Basically List is a built-in data-type in Erlang. The data-structure used to implement the List data-type is Singly linked list. All methods for working with List is also works for this kind of Strings.
a = “abc” (Erlang)
a = ‘abc’ (Elixir)
Which is also similar to
a = [97, 98, 99] (Erlang / Elixir)
In memory each byte in the string are not stored in a sequence order. Instead they are stored in random order but linked by their address locations. The variable a keeps the address of the first byte of the string for the value 97 and considered as head of the list.
a = [97 (head), 98, 99]
In memory =>>
Var A =
……(a head)01100001
…………………………………..(b)01100010
…………………..(c end/last)01100010<null/end bits>…………The dots here are to represent bits belongs to other variables.
Note: size of address bits depends on architecture of CPU used.
Advantages
Looping operations, Keyword search operations, patterns matching are faster in this data type.
String appending operations are faster, because we just change the next address values.
Example:
If
A = “abc” and B = “def”
Then
C = A + B mean
=>>
End address of A is the head address of B (Here values A and B are concatenated)
C = Start address of A (Store the new result in C)
( or )
A = Start address of A (Store the new result in A)
Note: This operation is faster compared to operations in Binaries. If you have to append large number of strings using this data-type, try to append string in the reverse order. Because appending a data in List at its head is always easy and faster than visiting the entire list and append a data to its end.
Drawbacks
We cannot directly use this data to be stored or retrieved. It is easy to embed this data in the HTML files. But it is not a big problem at all, because we have a function to convert this kind of string to String Binaries.
Conclusion
Now rest thing is up to the elixir developers, how they use data-types and what operations will they perform on it for which situation. Now we have performance ready solution to work with any amount of string data in Elixir.