Antivirus Research And Development Techniques Essay Example
Antivirus Research And Development Techniques Essay Example

Antivirus Research And Development Techniques Essay Example

Available Only on StudyHippo
  • Pages: 17 (4605 words)
  • Published: August 4, 2018
  • Type: Case Study
View Entire Sample
Text preview

The commercial market offers a variety of antivirus software products that are highly competitive and constantly evolving to provide the most advanced and efficient defensive detection solutions.

This thesis examines the various methods used by antivirus products. It provides a general overview of viruses and antivirus software and explores research on the performance impact associated with using antivirus programs. Additionally, it delves into the commonly used signature-based detection technique employed by antivirus software and explains how new virus signatures are incorporated into these programs. The thesis also investigates specific algorithms used by these techniques, offering a detailed explanation of how each algorithm identifies infected or uninfected files. During the experimentation stage, three popular antivirus software products are utilized to detect a virus, and the resulting reports generated by these products are compared and analyzed.

Chapter 1: Introduc

...

tion

In today's modern lifestyle, computers play a crucial role in almost every field. However, they are susceptible to malicious attacks such as viruses.

Various types of computer viruses, including worms, malware, and Trojan horses, are capable of infecting computers. These viruses primarily originate from the World Wide Web, a platform where malicious individuals freely distribute malware. To combat virus attacks, numerous researchers have devised techniques and protocols while developing specialized software called "Anti-Virus" to eradicate these threats. Viruses can spread through multiple channels such as emails, floppy disks, the internet, among others. Typically, they propagate by moving from one computer to another and causing damage or data deletion. The most prevalent transmission methods for viruses involve unknowingly downloading illegal software through the internet or emails.

Viruses have the capacity to harm different parts of a computer system, such as the boo

View entire sample
Join StudyHippo to see entire essay

sector, system files, data files, software, and even the system BIOS. Additionally, modern viruses can target other components of the computer. Viruses can spread through different methods like booting the computer with an infected file, executing or installing an infected file, or opening infected data or files. Floppy disks, compact disks, USB or external hard drives, and unsafe connections with other computers are common sources of virus contamination.

Antivirus software faces challenges due to the rapid growth of viruses in various areas, including prevention, preparation, detection, recovery, and control. There are numerous tools currently available that can remove viruses from computers and offer protection against future attacks. However, concerns about privacy and computer security arise when using such software. Despite implementing multiple safety measures, the rate at which viruses spread continues to increase alarmingly, posing significant risks. This thesis will investigate the history and evolution of antivirus software by examining the origins and types of viruses that have emerged and discussing the discovery of antivirus software.

This thesis discusses three techniques related to antivirus scanning methods. It provides an example of how antivirus products update their virus database and explains how computers gather information for protection against zero-day viruses. The paper also compares different virus types, defines them, and highlights the threats they pose. Additionally, it explores the impact of each virus type on operations. Furthermore, the study evaluates the performance of antivirus software on various virus types and analyzes the strengths and weaknesses of current antivirus techniques.

Chapter 2 – Overview

This chapter provides a general introduction to viruses and antivirus software. It discusses the history of viruses, the evolution of antivirus software, different types of viruses based

on their attacking features, techniques used by antivirus products, and basic knowledge about various antivirus products.

2.1 History of Viruses

A computer virus is a program that replicates itself and infects the system without user permission (Vinod et al. 2009). Viruses encompass various types of malware such as worms, trojan horses, rootkits, spyware, and adware.

In 1949, John Von Neumann conducted the initial work on computer programs (wiki 2010) and introduced the idea of a program that could duplicate itself, although it was not referred to as a "virus" at that time. The first virus, called Creeper, emerged in the early 1990s. Creeper would propagate across a network by replicating itself onto other computers and display the message "I'M THE CREEPER: CATCH ME IF YOU CAN" on the infected machine. Despite being harmless, Reaper was later created to capture and terminate Creeper.

Back in 1974, a program called "Rabbit" was created to rapidly multiply itself and crash the infected system upon reaching a certain limit or number of copies. Moving on to the 1980s, the virus known as "Elk Cloner" managed to infect numerous PCs. Specifically targeting the Apple II computer, which was launched in 1977 and utilized floppy disks to load its operating system, Elk Cloner cleverly implanted itself into the boot sector of these floppy disks. This allowed it to load prior to the operating system. One noteworthy virus in this period was "©Brain," which became the first stealth virus for IBM-compatible systems. This sneaky virus concealed its existence and, if detected, made an effort to read the infected boot sector and display the original data. However, it remained uninfected. Finally, in 1987, the Vienna virus made

headlines as one of the most dangerous viruses. This virus specialized in infecting .COM files.

Whenever the infected file was executed, it would infect other .COM files in the same directory. This virus was the first one successfully neutralized by Bernd Fix, leading to the development of antivirus software. Following this, several other viruses emerged, including the Cascade virus, which was the first self-encrypting virus, and the Suriv Family virus, a memory resident DOS file virus. One particularly dangerous virus was the "Datacrime" virus, known for destroying FAT tables and causing data loss. In the 1990s, additional viruses such as the Chameleon Virus, Concept virus, and CIH virus appeared. In the 2000s, there were notable viruses like ILOVEYOU, My Doom, and Sasser.

(Loebenberegr 2007) Vinod et al. 2009 provides a definition of a computer virus as “A program that infects other program by modifying them and their location such that a call to an infected program is a call to a possibly evolved, functional similar, copy of virus. To protect from the attacks, the antivirus software companies include many different methodologies for protecting against the virus attacks.”

2.2 Virus Detectors

The virus detector scans the file or a program to check whether the file/program is malicious or benign. In this research, there will be usage of some technical terms and detection methods which are defined below. The primary objective of testing the file/program is to find false positives, false negatives, and hit ratio. (Vinod et. al.

2009) False Positive: This happens when the scanner mistakenly identifies a non-infected file as a 'virus'. It can be a waste of time and resources. False Negatives: This occurs when the scanners fail to

detect the 'virus' in infected files. Polymorphic viruses are viruses that mutate by hiding the original code. The virus includes encrypted malware code along with a decrypted unit. They generate new mutants every time they are executed.

The figure 2.2.2 depicts the encryption process where the infected file encrypts the main or original code to generate a decrypted virus code. These Metamorphic viruses employ obfuscation techniques to dynamically reprogram themselves, resulting in new variants that are different from the original. As shown in figure 2.2.3, the signatures of the subsets differ from that of the main set, demonstrating variations in the original virus and its forms represented by s1, s2, and s3.

2.3 Detection Methods


2.3.1 Signature based detection

In this method, scanners search for sequence of bytes known as signatures within the virus code, which helps identify malicious programs. Signature development becomes easier when network behavior is recognized.

Signature based detection, also known as pattern matching, is a technique that originated from the days of DOS, when viruses were parasitic and targeted host files and common executable files (Daniel, Sanok 2005).

2.3.2 Heuristic based detection

Heuristic based detection involves scanning a virus by evaluating its behavioral patterns. This method determines the likelihood of a file or program being a virus by testing its uniqueness and matching its behavior to indicators stored in the antivirus heuristic database.

It is beneficial to identify viruses that lack signatures or hide their signatures. Additionally, detecting metamorphic viruses is useful. (Daniel, Sanok 2005)

2.3.3 Obfuscation Technique

This technique allows viruses to transform an original program into a virus program using transformation functions. These functions make the virus program irreversible, perform comparably to the original program,

and include the functionality of the original program. Metamorphic and polymorphic viruses primarily utilize this technique. (Daniel, Sanok 2005)

Chapter 3: Literature Review

3.1 Antivirus workload characterization

A study conducted by (Derek, Mischa, David 2005) demonstrates that antivirus software packages employ various techniques to determine if a file is infected. However, according to the research of (Derek, Mischa, David 2005), the best way to compare the overheads introduced by different antivirus software packages is by examining their respective impacts during on-access execution. When running antivirus software, two main models are commonly used:

  • on-demand.
  • on-access.

On-demand scanning involves checking specific files specified by the user. On-access scanning involves monitoring system-level and user-level operations and scanning when an event occurs.

The article explores the behavior of four anti-virus software packages on an Intel Pentium IV computer with Windows XP Professional. The study includes three test scenarios: copying a small executable file from a CDROM to the hard disk, executing calc.exe, and executing wordpad.exe. All of these executable files are tested on the Windows XP Professional operating system. The anti-virus packages used in the experiment are Cillin, F-Port, McAfee, and Norton. The execution of the files is performed using these anti-virus packages. Figure 3.1.1 demonstrates that using these packages adds some overhead during execution, thereby increasing the execution time.

A test was conducted to determine the additional instructions executed during file system operations and when a binary is loaded and executed. Both scenarios involved a small binary of very small size. It was discovered that the execution is mainly influenced by specific hot basic blocks in each antivirus package. A basic block is categorized as "hot" if it is visited over fifty

thousand times. To detect the behavior of antivirus software packages, Derek, Mischa, and David (2005) employed a platform that was primarily targeted by virus attacks and also must have some commercial antivirus software installed. The framework used for simulation is called Virustech Simics, which has the architectural structure outlined in table 3.1.1.

Virustech Simics is a simulator that aims to simulate the execution of antivirus software on a system and obtain cycle-accurate performance numbers. This is done by incorporating a cycle-accurate micro-architectural model. To achieve this, the simulator is configured to simulate the micro-processor. The host, which is the simulator, executes the operating system that is loaded via a simulated hard drive. On top of the operating system, the researchers install and run the antivirus software and conduct test scenarios (refer to figure 3.1.2).

The comparison is made between the execution of the baseline configuration without antivirus software and systems equipped with four different antivirus packages. The summary of the five configurations is presented. A CDROM image file is created and loaded into the machine for each experiment. The utility is executed with special instructions at the beginning and end of each collection to ensure accurate profile collection.

The CDROM file is copied to the hard drive, and then the calculator and wordpad accessories are run using a shortcut. Throughout the profile runs, it is determined that there is less than a one percent difference in the work load parameters. The antivirus characterization reveals a gradual increase in cache activity, indicating that F-Port has the smallest overheads and Norton has the highest. When running the antivirus software, Norton and McAfee have larger impacts on memory compared to the

Base case, F-Port, and Cillin.

The development techniques used in antivirus software involve a framework that combines various methods to detect malware. Over time, advancements have been made in these techniques with the goal of identifying previously undetectable viruses and showing improvements in technique. Along with detecting viruses, antivirus software also identifies other types of malicious code like worms, Trojan horses, and spyware—all collectively referred to as malware. Malware encompasses any harmful code or program created to harm a computer. To effectively filter out malware, it is necessary to install specific antivirus software that includes detection techniques and algorithms. Many commercial antivirus programs utilize signature-based matching technique which requires regular updates to keep an up-to-date virus dictionary containing new malware signatures.

As technology advances, malware writers are constantly seeking better hiding techniques. Specifically, rootkits have become a security concern due to their superior hiding capabilities. To combat this, new detection methods have been developed, including machine learning and data mining techniques. In a 2010 study by Zolkipli and Jantan, a new framework for malware detection was proposed. This framework combines signature-based detection with machine learning techniques. It consists of three main sections: signature-based detection, genetic algorithm-based detection, and signature generator. The researchers define malware as software that executes actions intended by an attacker without the owner's consent. Each malware has its own distinct characteristics, attack goals, and transmission methods.

According to Zolkipli, M.F.; Jantan, A., 2010, a virus is a type of malware that attempts to replicate itself into other executable code within a host. However, as technology has advanced, the creation of malware has become more sophisticated and improved since its early days. The most common approach

to detect malware is through signature-based matching techniques, which involve contrasting file content with a signature using a method called string scan that searches for pre-defined bit patterns. Despite its popularity and reliability for host-based security tools, this technique has limitations that need to be addressed. One issue with the signature-based matching technique is its failure to detect zero-day virus or zero-day malware attacks.

Zero-day malware attacks, also known as new launch malware, involve infecting a number of computers in order to collect and store a new virus pattern for future use. The framework utilized in this process monitors and logs un-trusted programs, providing defense against both known and unknown malware without requiring any previous information about the un-trusted programs. Additionally, from the user's perspective, no modifications to existing programs are necessary, and the user does not need to observe the program running within the framework since it remains invisible to both known and unknown malware.

This text demonstrates that the framework was utilized on a Windows environment and successfully detected all malware changes, unlike commercial tools that rely on signature-based techniques. The malware detection technique employed a machine learning algorithm and addressed the limitations of signature-based techniques by implementing an adaptive data compression method. According to Zolkipli, M.F.; Jantan, A., 2010, the two main limitations of signature-based techniques are: not all malicious programs have characteristic bit patterns that prove their malicious intent, and these patterns may not be recorded in virus dictionaries.

The usage of various bit patterns in obfuscated malware hinders its detection through signature-based methods. To overcome this limitation, the Genetic Algorithm (GA) exploits system constraints and enables the detection of zero-day malware.

In this approach, an analysis technique known as IMAD has been developed using the GA algorithm to detect newly emerging malware. This technique aims to counteract the limitations of conventional signature-based detection methods.

Data mining has been previously applied to malware detection. The standard algorithm for data mining classifies each block of file content as either normal or potentially malware. An Intelligent Malware Detection System (IMDS) called IMDS was developed to overcome the limitations of signature-based antivirus programs. This system utilized Object Oriented Association, specifically the OOA_Fast_FPGrowth algorithm. The experimentation focused on the sequence of windows API files, particularly PE files.

The King Soft Corporation antivirus laboratory provided a large collection of PE files for the purpose of comparing different malware detection methods. The findings indicate that the IMDS system outperforms Norton and McAfee. The suggested framework integrates two techniques - signature-based and GA - to address two challenges in malware detection. The framework comprises s-based detection, s-based generator, and GA detection (refer to figure 3.2.2).

The first layer of defense against malware is s-based detection, followed by GA detection as a second layer. GA detection is used to identify newly launched malware. Signature-based detection, which involves creating signatures from zero-day malware, is then employed. This technique, also known as static analysis or scan strings, involves examining the code and determining its maliciousness based on its malware characterization. Signature-based detection is a standard method used in all antivirus products.

In general, all malware has unique character patterns that can serve as signatures. When a program is executed, antivirus software scans the data stream bytes and compares them with thousands of signatures stored in its database. This comparison is

done using a searching algorithm to match the program code with the signatures. The Zolkipli, M.F.; Jantan, A., 2010 framework selected this technique at the outset due to its effectiveness in detecting well-known viruses. The goal of incorporating this technique into the framework was to enhance computer operation performance.

The G.A detection technique is widely used to identify recently released malware. It involves using genetic programming to learn and solve algebraic or statistical research problems by evolving a population. This machine learning technique utilizes chromosomes as a means of representing data. These chromosomes are represented as bit string values, and new chromosomes are generated by combining bits from existing ones.

Basing the nature of the problem, a solution is provided. GA encompasses two basic operations, crossover and mutation, which are utilized to address issues related to polymorphic viruses and new types of malware. This technique was introduced in this framework to detect codes of malware that employ hidden techniques. The learning and filtering aspects of virus behavior enable the detection of such malware. (Zolkipli, M.F.; Jantan, A., 2010) Signatures, generated by an S-based generator, are used to characterize and identify viruses. Forensic experts create signatures once they discover a new virus sample. These signatures are based on the virus behavior. Each antivirus product creates its own signatures, which are encrypted when accessing records in case multiple antivirus software are installed on a computer.

The signature database is updated with a new signature as soon as it is created. To defend against new viruses, every computer user must update their antivirus product with the database. A signature pattern consists of 16 bytes, which is sufficient to detect a 16-bit

virus (Zolkipli, M.F.; Jantan, A., 2010). This generator mimics the behavior of a virus identified by GA detection. The generated virus signature pattern is then added to the virus database for signature-based detection. This framework aims to replace the tasks of forensic experts.

This creation of framework proved to be highly beneficial in the detection of new virus signatures and in enhancing computer efficiency and performance.

3.3 Enhancing speed of signature scanners using BMH algorithm.

This article addresses the issue of virus detection using signature scanning method that relies on a fast pattern matching algorithm. Essentially, this technique involves searching for a virus signature pattern throughout a file. However, this algorithm can be a resource-intensive task, negatively impacting performance. If the pattern matching algorithm is slow and time-consuming, users may become impatient. To overcome this issue, a faster pattern matching algorithm, namely the Boyer-Moore Horspool algorithm, is utilized for the scanner. Compared to the Boyer-Moore algorithm and Turbo Boyer Moore algorithm, this technique has been proven to be the fastest pattern matching algorithm. In technical terms, a virus consists of three components: trigger, infection mechanism, and payload.

The primary mechanism, known as the infection mechanism, is responsible for searching for fatalities and often avoids multiple infections. Once it detects fatalities, it has the ability to either overwrite them or attach itself to the beginning or end of a file. The trigger is an event that specifies when the payload should be executed. The payload is the foundation of malicious behavior, which can include corruption of the boot sector or manipulation of files. Detection and disinfection of infected files are the two most crucial

tasks of antivirus software algorithms. Therefore, the defense system code of the algorithm must include a component capable of identifying any type of virus code.

Integrity checking technique: This program provides checker codes that can be checksums, CRCs, or hashes of files. These codes are used to check for viruses. Regularly, the checksum is recomputed and compared against the previous checksum. If the two checksums do not match, it indicates that the file is infected because the file has been modified. This technique can detect the presence of viruses by detecting changes in files and can also detect new or unknown viruses. However, this technique has several drawbacks. Firstly, the initial checksum calculation must be performed on a virus-free system, so the technique cannot detect viruses if the system is already infected.

Secondly, if the system is modified during execution, there can be a high number of false positives (Sunitha Kanaujiya, et., al 2010)

Signature scanning technique: This technique is widely used to detect viruses on a large scale. It involves reading data from a system and applying a pattern matching algorithm to a list of existing virus patterns. If a match is found with any of the existing patterns, it is identified as a virus. This scanning technique is effective but requires frequent updating of the pattern database, which is easily manageable. There are several advantages to this scanner, such as increased scanning speed and the ability to detect other types of malicious programs like Trojan horses, worms, logic bombs, etc. Therefore, for virus detection, only the signature of the virus is needed and should be updated in the database.

This technique is utilized on multiple viruses due

to this justification.

The technique of activity monitoring:

This approach is utilized for monitoring the behavior of programs executed by other programs. These monitoring programs, known as behavior monitors, remain in main memory. The behavior monitors generate alarms or take action to prevent the program when it attempts unusual activities like interfering with tables, partition tables, or boot sectors. The database records every virus behavior that is anticipated. The main drawback arises when a new virus employs an infecting method that is not in the database, rendering virus detection futile. Additionally, viruses elude defense by activating earlier in the boot sequence before the behavior monitors are activated.

And in the absence of hardware memory protection, viruses can alter the monitors.

Heuristic Scanner: This technique examines the attributes of a file and is able to detect unfamiliar viruses. The dynamic and statistical checking component of this technique forecasts the likelihood of infection. It can identify numerous new viruses prior to execution. However, a flaw is that occasionally an unaffected file may be included in the list of infected files. The pattern matching algorithm plays a crucial role in the signature scanning technique.

The system performance needed to be improved by using a faster pattern matching algorithm. To achieve this, the detection tools utilize the Boyer-Moore Horspool algorithm (BMH), which is faster than other sequential pattern searching algorithms and is used by certain popular software. The pattern matching problem involves locating a pattern "P" of length "m" within a text "T" of length "n". The implementation of the Boyer-Moore Horspool algorithm requires two position indicators. (Sunitha Kanaujiya, et., al 2010)

The pattern set up is indicated by "j" and the target text

is denoted as "k". Under the target text "T", the first letter of pattern "P" is aligned with the first letter. This alignment is similar to a text window that displays only "m" characters, which is the length of the pattern. After the window shifts to the right, other positions are allowed. The second position indicator "i" records the location of the rightmost text position that can be seen through the window. It is initialized to "m-1". Starting with the comparison between the letter Pm-1 and all the letters in text Tk, all comparisons occur between text Tk and pattern Pj.

Both j and k are decreased after a successful comparison until there is a character match and un-compared characters remain in the pattern P. If j=-1, it means all pattern characters have been matched and the pattern occurrence in the text has been found. Regardless of a match being found or not, the window is shifted to the right a certain distance d, with k becoming i and j set to m-1. This process continues until the end of the text is reached.

The signature scanner has two main parts: a database signature and a scanning engine that scans for virus signatures from the database.

They both cannot work separately as they complement each other. For the first step, the implementation of a signature scanner involves updating the virus signature database to the most recent version. The second step is to search for viruses within the stored signature database. As stated by Sunitha Kanaujiya, et al. in 2010, a signature database is a collection of distinct signatures that identify specific viruses. A signature is a sequence of

machine code that exists within an executable virus.

The virus code contains the following fields. To constantly update the database with new viruses, a data entry program is used. To update the database, the user is prompted to enter the virus signature in a HEX format (a hexadecimal code) without blank spaces and commas. Then, the user needs to enter the virus type and provide a virus description. To save the verified data in the database, the description must include the virus name, properties, comments, and other relevant information. The virus detection engine scans the boot sector, partition table, and all types of files.

The scanner begins scanning after reading virus details. Then, the virus code is matched to find an exact match and identify it as a virus. To enhance scanning speed, the Boyer Moore-Horspool algorithm is utilized, which is a rapid pattern matching algorithm. During scanning, the file is scanned from the first byte to the last byte against a signature database. The user is notified when irregular patterns are detected. Performance measures are analyzed, including searching for boot sectors, partition tables, and all types of viruses. The approach taken by (Sunitha Kanaujiya, et al., 2010) involves implementing the algorithm in C. The target text is divided into slices of 1024 characters, with the last slice containing only a few characters.

The measurements were all conducted incrementally, increasing in step size from one slice to the full target size. These algorithms were tested on various patterns, with the first test focusing on the boot sector using virus signatures specific to the boot sector. The target text consisted of 512 bytes, as a smaller size

was used for this particular test.

The Boyer Moore algorithm and its alternatives have a very slight performance difference. The fastest algorithm is the Boyer Moore-Horspool algorithm, which is faster than the sequential algorithm. Another test was conducted on partition table viruses, specifically on the hard disk to search for partition table virus signatures. The target text for this test was also 512 bytes, resulting in the same results. The third test involved all types of file viruses, with a total of 1127 files occupying 1.5GB. Table 3.3.1 presents the performance of all algorithms regarding the signature database. This performance evaluation considers the number of patterns used, the algorithms utilized, and compares the performance of each algorithm based on the time factor.

The algorithm's performance varies based on the size of the pattern used. The Sequential algorithm does not utilize the skip table in order to increase its effectiveness. Additionally, the shift function takes the same amount of time for all patterns used.

Get an explanation on any task
Get unstuck with the help of our AI assistant in seconds
New