Новости
12.04.2024
Поздравляем с Днём космонавтики!
08.03.2024
Поздравляем с Международным Женским Днем!
23.02.2024
Поздравляем с Днем Защитника Отечества!
Оплата онлайн
При оплате онлайн будет
удержана комиссия 3,5-5,5%








Способ оплаты:

С банковской карты (3,5%)
Сбербанк онлайн (3,5%)
Со счета в Яндекс.Деньгах (5,5%)
Наличными через терминал (3,5%)

WEB-ORIENTED CACHING SYSTEMS

Авторы:
Город:
Москва
ВУЗ:
Дата:
17 февраля 2019г.

The internet and web services are rapidly developing nowadays. Websites are handling greater functions and popularity of web resources rising. Websites are now considered as part of highly loaded systems.

Highly loaded systems are nonstop access systems. They provide access without long delays while maintaining active functioning system status. Moreover, high loads will not affect the speed of data receiving. Caching sub- systems are there to solve that.

Cache

A cache is a quick-access memory purposed to accelerate data handling that is stored in the hard drive. Processors, hard drives, and browsers, web-services use caches.

Each cache consists of a set of entries. Each entry is associated with a specific data or data block (mediocre amount of data) that is a copy of data from hard drive. Each entry has its own ID also known as a “tag” that defines the link between the elements of data in the cache and their copies in the hard drive.

When the cache client sends a request, the cache space is the first place to be searched. If the appropriate data ID happens to be found in the cache, then the client will use data from the cache. This is a case of a successful “cache hit”. If the requested entry was not found in the cache, then this entry will be read from the hard drive and becomes available in cache. This case is known as “cache miss”.

Caching algorithms

The cache has limited memory space that stores entries of data. Cached data needs to be refreshed, which means deleting old entries and adding new ones. There are several popular strategies to refresh cached data entries:

·                   LRU (Least Recently Used) is an algorithm that forces out oldest entries. This would require to store the time of entry when it was added. LRU starts to force out the oldest entry after N count was reached.

·                   MRU (Most Recently Used) is the opposite of the previous algorithm.

·                   FIFO (first in, first out) places “cache miss” data to the rear of the queue. When the cache overflows, the first entry of the queue is deleted.

·                   LFU (Least Frequently Used) stores the counter of requests for each cached data entry. Entries are added with counter set to 1. Every “cache hit” increases the counter by 1. This way algorithm forces out the least used entries.

Caching system

The purposes of caching sub-systems are to increase efficiency of data receiving and to ease up database load. Requested data get into the cache for the next request to provide faster system feedback while lacking circulation to the database and back.

A caching sub-system is a component of a web-system which saves recent requests and data for a specific time. In case multiple users request same data, the sub-system loads up the request from its own saved data instead of loading it up from the database. Functional solutions could be the core of the caching sub-systems. One example are systems which storage cached data into its RAM.

There is a limit of processing power in systems without caching sub-systems. Caching is used for uploading of following files:

·                   Images

·                   CSS

·                   Static HTML files

·                   JavaScript Files

A server needs to send a new reply for each received request. The request to upload a page could result as four separate requests for each of file types listed above. During the upload of big file size images, servers could overload from numerous users around the globe. In that case, users would suffer from slower page loading.

RAM Cache

Any application that has a slower operation such as SQL requests or requests to external API can be stored for some time. Caching into RAM allows for faster handling of requested data, as well as faster access to data stored in RAM. Reading data from RAM is faster than hard drive stored data.

There are several computing platforms worth mentioning, which manage data handling into memory:

·                   Ehcache

·                   Hazelcast

·                   Memcache

The main purpose of those is to increase the efficiency of system and ease up database load. A special feature of such platforms is data handling of specific format known as «key-value». «Key» is part of it is unique object ID, and «value» stores a number that is linked to specific key. Using the key-value format allows to reduce search feedback time of specific object among big amounts of data.

Another important feature of caching sub-system is persistence. Persistence provides data safety in case of server reboot. It can be achieved via data back-up that is created beforehand on the hard drive and updates every set time interval. Data is loading into RAM after the server has been restarted.

Persistent caching using NoSQL DBMS

A cache could appear as the persistent kind, which means that application reboot would not delete data, but store it to a hard drive instead. This avoids loading data into cache after each system reboot. One of the ways to achieve persistent cache is to use NoSQL databases. There are many classes of said NoSQL DBs (key-value, document-oriented, column storages etc.

NoSQL data organization

NoSQL is a way of data structuring. It features getting rid of data management limits. NoSQL DB are using an unstructured approach, offering efficient ways of data processing.

The document-oriented approach of separating stored document for every cached object keeps all the metadata. The function range of such DBMS allows to search data with applied parameter filters, which obviously reduces system feedback to request of specific objects among the database. Using any other approach rather than document-oriented approach, each object would have to be loaded out of the cache, along with its metadata, in order to perform such an action.

As mentioned earlier, NoSQL database stores data as documents. It is worth pointing out that JSON is the most preferable kind of format. This format allows it to perform said parameter-filtered object search. For example, we have to search up group of objects using following filter:



In this case, the cache would check all the images with jpeg format with size under 200kb. This displays search conditions without knowledge of object ids.

It is worth mentioning that document-oriented approach allows NoSQL DBMS to change data structure, including removing and adding specific parameters to the object.

Increasing functionality of caching sub-system

Advantages of NoSQL DBMS integration into web-system are allowing to search data with applied parameter filters, the versatility of data structure alteration, and utilizing a persistent cache. However, one disadvantage would be searching for one specific object parameter stored in the cache. The result would be loading the whole document, which could lead to increase of load time and data transmission.

 

References

 

1.        Caching – what is cache? - http://fkn.ktu10.com

2.        Ehcache - https://ru.bmstu.wiki/Ehcache

3.        In-Memory Computing - http://www.tadviser.ru

4.        MongoDB - https://metanit.com

5.        NoSQL database: understanding the essence - https://habr.com/