Compression, Encryption and Hashing:
Compression:
Compression is the process used to reduce the storage space that is required by a file, meaning that
more files can be stored in the same amount of storage space. Compression is particularly important
for sharing files over networks or the internet. The larger the file, the longer it takes to transfer, so
compressing files increases the number of files that can be transferred in a given time.
Apps, like google photos compress files, so they can be quickly searched for and downloaded.
Downloading a compressed file over the internet is faster than downloading a full version of the file.
There are 2 categories of compression:
Lossy compression
Lossless compression
Lossy Compression:
lossy compression reduces the size of a file, while also removing some of the information. This can
result in a more pixelated image or less clear audio recording. When using lossy compression, you
can’t retrieve the original version of the file.
Lossless Compression:
Lossless compression reduces the size of the file without losing any information. When using lossless
compression, the original file can be recovered from the compressed version.
Run Length Encoding:
Run Length Encoding is a method of lossless compression in which repeated values are removed and
replaced with one occurrence of the data followed by the number of times that it should be
repeated.
For example, the string AAAAAABBBBBCCC would be represented as A6B5C3.
In order to work well, Run Length Encoding relies on consecutive pieces of data being the same. If
there is little repetition, Run Length Encoding doesn’t offer a great reduction in the file size.
Dictionary Encoding:
Dictionary Encoding is an example of lossless compression. Frequently occurring pieces of data are
replaced with an index and compressed data is stored alongside a dictionary which matches the
frequently occurring data to an index. The original data can then be restored using the dictionary.
For example:
We shall go on to the end.
We shall fight in France.
We shall fight on the seas and oceans.
We shall fight with growing confidence and growing strength in the air.
we shall defend our island, whatever the cost may be.
Frequently occurring phrases, such as “We shall”, “fight”, “the”, “on” and “in”. these phrases will be
placed in a dictionary and replacing any occurrence with the phrase’s index, the size of the passage is
substantially reduced.
, Index Phrase 1 go onto 3 end.
1 We shall 1 2 5 France
2 Fight
3 The 1 2 on 3 seas 6 oceans.
4 On
1 2 with growing confidence 6 growing strength 5 3 air.
5 In
6 and 1 defend our island, whatever 3 cost may be.
Data compressed using dictionary compression must be transferred alongside its dictionary. Without
a dictionary, the data can’t be used.
Encryption:
Encryption is used to keep data secure when it’s being transmitted. There are a variety of different
methods which can be used to scramble data before it’s transmitted and then decipher it once it
arrives at its destination. There are 2 main types of encryptions:
Symmetric encryption
Asymmetric encryption
Symmetric Encryption:
With symmetric encryption, both the sender and receiver share the same private key, which they
distribute to each other in a process called a key exchange. This key is used for both encrypting and
decrypting data.
It’s important that the private key is kept secret. If the key is intercepted during key exchange, then
any communications sent can be intercepted and decrypted using the key.
Asymmetric Encryption:
When sending information using asymmetric encryption, two keys are used. One public key and a
second private key. The public key can be published anywhere, free for the world to see, while the
private key must be kept secret. Together, these keys are known as a key pair and are
mathematically related to each other.
In contrast to symmetric encryption a single key can’t be used to both encrypt and decrypt
communication. Instead, messages encrypted with the recipient’s public key can only be decrypted
with the recipient’s private key, which should only be in the possession of the recipient.
If someone wants to send a message, they will need to find the recipient’s public key. There are
many websites that allow this.
Hashing:
Hashing is a name given to a process which an input (key) is turned into a fixed size value known as a
hash. There are many numbers of algorithms, called hash functions.
Unlike encryption, the output of a hash function can’t be reversed to form the key. This quality
makes hashing useful for storing passwords. A password entered by a user can be hashed and
checked against the key to see if it is correct, but a successful hacker would only gain access to the
keys which can’t be reversed to gain the passwords.
Another use of hashing is hash tables. A hash table is a data structure which holds key-value pairs.
Formed from a bucket array and a hash function, hash tables can be used to look up data in an array
, in constant time. When data needs to be inserted, it’s used as the key for the hash function and
stored in the bucket corresponding to the hash.
Hash tables are used extensively in situations where a lot of data needs to be stored with constant
access times, for example, in caches and databases.
If 2 pieces of data produce the same hash, a collision has occurred. There are a variety of methods to
overcome collisions, including storing items together in a list under the hash value or using a 2 nd hash
function to generate a new hash. A good hash function should have a low collision chance and
should be quick to calculate. It should also provide an output smaller than the input.