Digital Steganography as an Advanced Malware Detection Evasion Technique
A Masters Thesis | © Copyright 2018 | All rights reserved.
Digital Steganography as an Advanced Malware Detection Evasion Technique
The twenty-first century has systematically evolved into the Age of the Internet. Programming code includes the firmware of electronic devices, computer operating systems, and software applications virtually run the developed world. Over time, the economic pressures and demands for rapid software development have resulted in a situation where security is an afterthought. Traditionally, these ‘afterthoughts’ have been remedied by issuing software patches released by software vendors that users have the option of downloading automatically or selectively on their system. In reality, however, often these software patches are never applied for whatever reason, leaving countless numbers of computers vulnerable to previously patched exploits. This perpetual cycle of band-aid software security has presented a limitless well of opportunities for cybercriminals including Nation-state Advanced Persistent Threat (APT) cyber attacks expressly designed to steal data instead of damage networks by exploiting software code vulnerabilities with custom-crafted malicious software better known as malware. Malware used by Nation-state APT groups can take many different forms. However, never before has there been a single piece of malware as devastatingly effective as the Stuxnet virus. First discovered in 2010, Stuxnet forever changed the cyber warfare threat landscape by causing a cascade failure of the Natanz uranium enrichment facility that resulted in actual physical damage to Industrial Control System (ICS) equipment remotely executed through subtle software code manipulation (Zetter, 2014). There are evolutional similarities between the Duqu malware which uses digital steganography to mask its exploit payload and the Stuxnet virus (Wendzel et al., 2014, pp. 123–133). The Internet has pervaded into the darkest recesses of the civilized world and for many people, it has become a one-stop shop for their communication, news, entertainment, online banking, educational research, collaborative work efforts, personal organization, and Cloud-based data storage. Lurking beneath the glossy Graphical User Interfaces (GUI) of Internet websites, however, are googols of 1’s, and 0’s communicating in binary format in near-light-speed between computer systems across different layers of Internet Protocol (Barwise, 2015, p. 4). For many Internet users there exists this naïvely blind trust that these 1’s and 0’s have not been molested in any malicious way, a loose trust at best that is in peril of eroding due to the discovery of digital steganography increasingly being paired with malware code. Cyber threat actors are proliferating malware that incorporates digital steganography to evade anti-malware detection software which, if successful, could have significant threat implications for United States (U.S.) Critical Infrastructure and Key Resources (CIKR) Information Systems (IS) connected to the Internet.
In terms of basic self-defense principles, it is virtually impossible to defend against that which cannot be seen. The criminal use of digital steganography, often combined with strong encryption, presents significant challenges for United States (U.S.) law enforcement, the Intelligence Community (IC), and U.S. allies in being able to both detect and decrypt potentially illegal and dangerous Internet traffic that targets U.S. CIKR. As with many modern technologies, the technological underpinnings which comprise digital steganography represent the duality in human nature whereby it can either be used for beneficial or nefarious purposes. Albeit, the popularity of digital steganography appears to mostly center on its evil applications in reality and Hollywood movies and television shows. Documented cases in which cybercriminals such as cyber threat actors, pedophiles, and terrorists have used digital steganography to conceal communications or evade malware detection do exist, however, they are rare. Alarmingly, research has shown that very few organizations are actively scanning to detect the digital steganography network threat despite the potential risk involved with malware infection.
Due to demand from both public and private industry, anti-virus software development companies have increased the effectiveness of their malware file signature-based scanning software forcing cybercriminals to become more sophisticated and creative to evade detection. The goal of this study is to identify specific examples in which cyber threat actors have employed digital steganography in combination with malware with the direct intention of evading malware detection and also to provide recommendations for improving detection through the use of steganalysis software applications. This paper will evaluate current uses of digital steganography technology on the Internet to both understand and determine potential methods by which cybercriminals and terrorist organizations employ it to mask their nefarious communications and Internet activity. This paper will also discuss the characteristics associated with different digital steganography modalities and their respective noise error rates. Combined with the data gathered from this study, recommended methods of steganalysis detection will emerge that may potentially help Internet Service Providers (ISP), law enforcement and Intelligence Community (IC) agencies to detect and mitigate the potential future effects of digital steganography-infused malware against U.S. CIKR.
This study intends to prove or disprove three hypotheses. These hypotheses will be evaluated via results of the literature review and also by conducting low-risk experimental tests of stego-file (i.e., steganographic type of file) uploads to selected ISP website platforms.
H1. There is relevant evidence to suggest that malware creators are incorporating digital steganography as an advanced technique that is specifically designed to evade anti-virus/malware software tools.
H2. If ISPs are not able to or choose not to scan for digital steganography, then cyber threat actors will use these ISP platforms to post illegal content or propagate malware using digital steganography.
H3. If cyber threat actors are successfully employing digital steganography to hide malware within seemingly normal data traffic that evades anti-virus/malware detection, then the U.S. government, CIKR, private industry, and ISPs are all at risk of hidden malware infection that could potentially cripple the entire nation.
Significance of the Study
In the year 2016 alone, malicious cyber activity cost the U.S. between $57 to $109 billion and due to America’s reliance on the CIKR, potential cyber attacks aimed at CIKR systems could have devastating economic effects (CEA, 2018, p. 1). This study intends to advance the understanding of digital steganography as it relates to its incorporation with malware and steganalysis application technology that currently exists. Should the aforementioned hypotheses be proven true, then this study will have served to further inform U.S. cybersecurity policy by highlighting the emerging need of lawmakers to pass regulation that requires government agencies, in particular, the sectors of critical infrastructure, as well as ISPs to develop and employ anti-virus/malware that includes steganalysis scanning software on all CIKR networks. Additionally, this study will attempt to link previously published findings relating to digital steganography used in conjunction with malware and demonstrate that digital steganography is, in fact, actively being used as an advanced anti-virus/malware detection evasion technique by cyber threat actors.
The fundamental question in cybersecurity today is how to best design a secure information system that is defensible against all manner of cyber attack threats? While technology may never be able to completely protect against all manner of cyber threats, it is a worthy endeavor nonetheless for the sake of protecting proprietary and national security information. Cyber defense and resiliency are predicated on the concepts of being able to withstand successfully and recover from a cyber attack. Invisible cyber threats such as malware that use digital steganography to cloak its presence and evade malware detection software tools pose significant risks to any IS. If an adversary is to able to penetrate a network successfully and unsuspectingly install malware onto a system that uses digital steganography to hide its presence, then the network and all associated data contained therein should be considered entirely compromised. If there were such a way as to detect malware infused with digital steganography more accurately using real-time network anti-malware scanning software and then require its use for all CIKR IS as a measure punishable by law, then arguably, the overall cybersecurity and resiliency of the nation’s CIKR would be significantly enhanced.
Definition of Unclear Terms
CIKR: Critical Infrastructure and Key Resources (e.g., energy, water, emergency services).
Digital steganography: the art and science of digitally hiding information in computer file formats using mathematical algorithms to compress and embed data within a cover medium.
Encryption: the conversion of plaintext data to ciphertext data using mathematical algorithms to obfuscate data contents as well as to maintain the confidentiality and integrity of the data.
IS: Information System (i.e., a computer system, a network, a standalone computer system).
ISP: Internet Service Provider (e.g., Google, Yahoo, Facebook, Twitter, Time Warner Cable).
Malware: malicious software specifically designed to exploit vulnerabilities in software code and damage or exfiltrate data from computer information systems.
Steganalysis: the study of detecting the presence of steganography.
Stego-file: a file containing hidden files that are embedded within a cover medium file using digital steganography techniques.
Vulnerability: a flaw or weakness in software code that can be exploited by an attacker.
• The current study is partially-based on information that has been previously published in academic and peer-reviewed journals. Therefore, the study is not representative of all the reported and unsubstantiated information that is available on the Internet.
• The current study is not inclusive of any classified information regarding malware evasion techniques used by cyber threat actors that U.S. Intelligence Community (IC) activities may or may not possess that could potentially change the outcome of this study.
• The current study does not legally allow for the ability to attempt to upload actual malware known to incorporate digital steganography to a targeted system to test anti-virus/malware scanning defenses against this type of cyber threat.
• Digital steganography is increasingly being used by creators of malware specifically to evade anti-virus/malware detection tools.
• U.S. CIKR IS and ISPs are not currently scanning for digital steganography.
• It is still possible to upload stego-files to popular social media platforms such as Facebook, YouTube, and Twitter.
History of Digital Steganography
It is essential to approach steganography from its root origins to gain an appreciation for how it functions and how it has been cleverly applied throughout history before being able to fully comprehend how digital steganography in the twenty-first century is being used as an advanced malware detection evasion technique. The word steganography is derived from the ancient Greek language in which “steganos” translates as hidden and “graphy” translates as writing or drawing in, together the two ancient words mean “covered” or “hidden” writing (Warkentin, Schmidt, & Bekkering, 2008, p. 17). In ancient Greece, primitive applications of steganography were employed that included tattooing a message on people’s scalps and allowing the hair to grow back and completely cover the message prior to dispatching the messenger to the intended recipient (Yugala & Rao, 2013, p. 1629). The ancient Romans invented a different type of primitive steganography that involved writing secret messages in between the lines of scrolls using common substances such as fruit juice, urine, and milk as invisible inks that when heated would darken and become legible (Yugala & Rao, 2013, p. 1629). The preponderance of documented historical cases of steganography appears to demonstrate that the vast majority of steganography users were antagonists rather than protagonists. While this linkage may, in fact, illustrate a connection between secretive communications and nefarious intent, it could also demonstrate the unquenchable fascination society has for the spy novel thriller that many people suspect but that is seldom ever confirmed. These suspicions are deeply rooted in real-life events such as the case in World War II (WWII) when a German spy was discovered to have sent a seemingly unintelligible cryptic message that used a form of steganography whereby only the second letter of each word was used to form a secret message about General Pershing sailing from New York on June 1 (Yugala & Rao, 2013, p. 1629). Additionally, a WWII spy movie included a steganography technique that involved embedding a secret message within a particular musical note such as B-flat (Adamy, 2012, p. 48).
In the twenty-first century, digital steganography can then be logically described as the scientific method of using computers and mathematical compression algorithms to digitally embed covert messages within a cover file such as an image, audio, or video file. Due to its similarities with cryptography, steganography has on occasion been called the ‘dark cousin’ of cryptography for its use of algorithmic codes to compress and hide data while providing secrecy as opposed to cryptography’s primary objective of privacy (Wingate et al., 2007, p. 177). When combined with cryptography, digital steganography can be a dangerous digital weapon because it is both unseen and not able to be read if detected by steganalysis (EC-Council, 2010, pp. 1–13).
A plethora of scholarly work has been published on digital steganography and the seemingly unending variations of different applications of the technology. More recently, however, reports have begun to emerge that link digital steganography with malware. Preliminary work on the topic of digital steganography as an advanced malware detection evasion technique has been limited to reviewing recently published works and organizing the data in such a manner that will show it is a potentially dangerous technique employed in the wild that developed nations should be aware of. There have been several notable instances of digital steganography that have served to bring it into the spotlight even if only momentarily. Terrorist organizations have used digital steganography to communicate in secret. Both the Abu Nidal and Al Qaeda terrorist organizations reportedly used digital steganography to hide terrorist material embedded within pornographic video files, ordinary images, and emails (Yugala & Rao, 2013, p. 1633). An internationally-led law enforcement operation dubbed “Operation Twins” resulted in the takedown of a UK-based child pornography ring known as the “Shadowz Brotherhood” whose members used encryption and digital steganography to transmit and upload their pedophilia imagery online (Zielińska, Mazurczyk, & Szczypi, 2014, p. 88). In 2010, Russian spies were found to have used digital steganography to steal classified information and transmit it back to Russia via the Internet (Wendzel et al., 2014, p. 2).
A Brief Explanation of How Steganography Works
There are three different types of steganography that are broken down into classification types: technical, linguistic, and digital steganography (EC-Council, 2010, p. 1–3). All of these steganography methods can be extremely effective at secretly communicating as they are nearly impossible to spot and decipher by the naked human eye in the wild. It is outside the scope and the intent of this paper to cover every type of digital steganography technique. Instead, this study focuses only on those techniques that have been discovered to have employed in conjunction with some variation of malware proliferation technique.
Technical steganography involves hiding information using specialized techniques such as indelible ink and tiny dots the size of a period (approx. 1mm in diameter) that contain an entire page worth of information (EC-Council, 2010, p. 1–3). The Nazi’s used the microdot technical steganography during World War II to secretly communicate information across the global battlefronts and the French resistance also used technical steganography to send secret messages in invisible ink on the backs of couriers (Hay, 2015, p. 10).
Linguistic steganography uses language codes to hide messages in the open with signs, symbols, or by rearranging text letters or font types and sizes that are known as either visual or text semagrams that signify a particular meaning to those who view them (EC-Council, 2010, p. 1–4). Open codes make use of previously published text like that which might be found in a newspaper and also by using previously agreed upon jargon codes that only the participants involved can decipher. Additionally, linguistic steganography makes use of covered ciphers where there is data hidden within a graph, diagonally-ordered crossword puzzles, or by using a custom-made grill cipher that when placed over a particular written publication, blocks out every other letter except those that are part of the hidden message (EC-Council, 2010, p. 1–4). These types of steganography schema are known as stegosystems. Most stegosystems share four common elements consisting of an embedded or hidden message; some type of cover medium that serves to obfuscate the true meaning of the secret message to unintended recipients who inadvertently view it; a stego-key that is used to decrypt the hidden message; and lastly, the stego-medium which is the end result of the combination of the cover medium file that contains the embedded covert communication information (EC-Council, 2010, p. 1–2).
Digital steganography takes steganography to a much more sophisticated level by leveraging the technology of computers and mathematical compression algorithms to digitally embed hidden messages within virtually any type of computer file format. The importance of digital steganography cannot be overstated as the personal computer combined with freely available digital steganography applications have brought about the ability to communicate secretly to billions of computer users worldwide. In countries that censor the Internet and closely monitor communications of their citizens online, digital steganography is a profoundly important technology that facilitates private communication. There are several different forms of digital steganography. A few of the more notable methods of digital steganography are Least Significant Bit (LSB), spread-spectrum encoding, and graphstega (EC-Council, 2010, p. 1–5).
Generally speaking, a single byte of data contains of (8)-bits of data, of which the last-most or far-right (i.e., a byte of data is read from left-to-right) bit is considered to be the least significant bit (e.g., 00101101) (“CS 101,” n.d.). A bit in binary terms can either be turned “on” or “off” which translates as either a “1” for “on” or a “0” for “off”. Data read in binary form may seem unnecessarily complicated for human translation. However, modern computers with high-speed processor chips are capable of reading binary at incredibly fast speeds, therefore, a “1” or a “0” in a different location of the byte sequence can have an entirely different effect that could result in an embedded file or image. Altering the least significant bits several hundreds or thousands of times throughout an entire file (e.g., 00101101 is changed to 00101100) such as with an image file and replacing those bits with the bits of hidden data will not produce a significant “noise” error rate that is visually or audibly detectable by human senses. This method of digital steganography is commonly referred to as LSB (EC-Council, 2010, pp. 1–5 to 1–6).
Spread-spectrum digital steganography makes use of randomly inserted “noise” in the audio data packets to hide data that is spread across the entire available frequency spectrum (EC-Council, 2010, p. 1–11). The composition of digital Radio Frequency (RF) signals allows for many opportunities to hide information (Adamy, 2012, p. 48). The art of digital steganography involves manipulating digital file formats in such a way as to secretly embed information without creating noticeable noise that can be detected. Video files contain both image and audio data which make video file formats ideal candidates for digital steganography due to the fact that they are typically larger file sizes that allow for a greater amount of “noise” injection that will cause some degree of distortion but that will ultimately be unnoticeable to the human eyes and ears (EC-Council, 2010, p. 1–11).
Graphstega is short for graphical steganography, a textual form of digital steganography that uses textual fabrication techniques within graph-cover mediums to conceal secret messages (Desoky & Younis, 2008, p. 27). Whereas digital image steganography produces a certain detectable level of frequency “noise,” textual digital steganography is noiseless. With Graphstega, the secret message is digitally embedded as graphical data points (e.g., numerical percentages, quantities) within the graph (Desoky & Younis, 2008, p. 28). Graphstega possesses many characteristics that make it more advantageous than other forms of digital steganography. For example, Graphstega is completely noiseless, and it does not need to password or passphrase-protect the stego-file by using a stego-key (Desoky & Younis, 2008, p. 28). Graphstega is also not vulnerable to comparison data attack unlike digital image steganography, and it is less likely to raise suspicion since graphs are commonly displayed in books, newspapers, magazines and Internet websites (Desoky & Younis, 2008, p. 28). Because there are so many different types of graphs to depict the hidden messages within such as pie charts, bar graphs, scatter plots, histograms, bubble charts and many more, it makes the accessibility of Graphstega literally off the “charts” (pun intended). The ubiquity of Microsoft (MS) Office suite application like MS Word, MS Excel, and MS Powerpoint make Graphstega highly desirable as a mode of covert channel due to its extremely low cost to produce and it’s resiliency to steganalysis attacks when data subsets are used (Desoky & Younis, 2008, p. 28). There are limitations with Graphstega that should be noted, however, such as ensuring that the encoded data points are consistent with standard graph data levels for whatever type of graph is selected as a cover medium. Additionally, the associated text that accompanies the chosen graph should support what the graph is depicting to avoid suspicion. Another limitation of Graphstega is the amount of data that can be concealed within the graph cover which is not well-suited for concealing large messages (Desoky & Younis, 2008, p. 31).
Methods of Employing Digital Steganography
The recorded incidence of malware proliferated on the Internet has dramatically increased in recent years, a sign that points to the use of automated tools by malware creators. In 2012, there were approximately 100 million known malicious program file signatures that anti-virus/malware (AV) software scanned for on a subscriber’s computer (AV Test, 2018). In 2018, however, that number of malicious programs has exponentially increased to well over 700 million unique malware signatures which makes for clunky and cumbersome AV scanning software with that many signatures to scan for (AV Test, 2018). Recent studies have shown a trend towards the incorporation of digital steganography into malware to hide its existence on a network (Wendzel et al., 2014, p. 1). Malware developers have begun exploiting commonly used Transmission Control Protocol (TCP) Internet Protocols (IP), or TCP/IP, by burying the stego-malware in obscure Internet protocols that would make them unlikely to be noticed (Hay, 2015, p. 10). Digital steganography uses compression algorithms to hide secret data in slack space and other areas of commonly used file formats in such a manner as to not affect the overall content which makes it nearly impossible to detect (Hay, 2015, p. 11). The question remains as to whether this combination of malware and digital steganography could be used in a widescale cyber attack to cripple a nation’s critical infrastructure, and, if so, which APT groups would have the necessary resources to launch this sort of attack against U.S. CIKR?
One of the most severe threats to any organization is the insider threat because this type of attacker already has physical access to the facility where the IS resides and most likely also has been granted network access to perform their work duties. Digital steganography in the hands of a malicious insider who already has access to an organizational information system can be a devastatingly effective tool unless security controls are implemented that prevent general users from installing software programs (.exe files). In 2008, the U.S. Justice Department was the victim of data theft of sensitive financial information that was exfiltrated from the network by a malicious insider using digital steganography (Wendzel et al., 2008, p. 2). The malicious insider most likely has insider knowledge of the file and folder structure and therefore knows exactly where the valuable information is stored on a network. Security controls such as implementing least privilege with role-based account creation (RBAC), encryption of data-at-rest, disabling of USB ports and optical drives, as well as configuring the network shared folders for discretionary access control (DAC) can drastically minimize the potential damage that a malicious insider can perform. However, these security controls by themselves would not necessarily prevent a malicious insider from downloading and installing a steganography application on the network unless that specific action was disabled for general users. Another protective measure that can be applied to information systems that process sensitive or classified government information is to “air-gap” the network entirely by logically and physically segregating it from the Internet. A legitimately air-gapped network is not connected to the Internet in any way and therefore is not remotely accessible by an attacker. The only way in which a malicious insider or attacker could exfiltrate data from a genuinely air-gapped network would be to physically steal a Hard Disk Drive (HDD) from a computer or server and smuggle it out of the facility.
The Internet was initially created as an instrument of academic research collaboration, not necessarily with security in mind. Similarly, in the early days of software development, security was not a valid factor of consideration because hacking was not yet a legitimate threat at that time. If security were a primary concern at the time the Internet was created, Internet protocols would have been designed much more securely, and sensitive networks would have been air-gapped entirely. Instead, the Internet continues to grow exponentially and defying all plausible logic some U.S. CIKR Industrial Control Systems (ICS) remain visibly connected to the open Internet for unexplained reasons when checked using free tools such as the website <https://www.shodan.io> which can be paired against known malware exploits. The reality is that most networks are connected to the Internet in some fashion or another for business or collaborative purposes. It is extremely common for companies to have an online presence to sell products via a Web server that may be strategically placed inside of a Demilitarized Zone (DMZ) with a firewall protecting intruders from getting past the outer network perimeter without appropriately established login credentials to the company Web server. The internal network or Intranet may also be protected by a separate firewall that is configured to deny most inbound traffic to the Intranet and prevent critical data from being exfiltrated outside of the Intranet. These types of network configurations are relatively basic and commonplace within a significant number of organizations. However, the security controls mentioned do not necessarily prevent an Intranet user from inadvertently or maliciously downloading stego-malware via the Internet that successfully permeates through the firewalls and that infects the company Intranet.
Network digital steganography is a newer phenomenon that has been receiving greater attention from the digital steganography developer and researcher communities. Network traffic is a very desirable candidate for covert channels such as digital steganography because there is a high volume of network traffic on any given network at any given time making it very unlikely that it will be noticed. Hidden data can be embedded in available fields of network protocol headers (e.g., HTTP, POP3, ICMP, SMTP) or by slightly increasing the size of network packets to allow for the extra hidden embedded data (Wendzel et al., 2014, p. 2). The imagination only limits the possibilities involving digital steganography as virtually any technology, or protocol is capable of being exploited. In fact, digital steganography has evolved to the point now where developers have created applications that can be used to exploit Cloud services using side-channel attacks to exfiltrate data, Wireless Local Area Networks (WLANs), voice-over IP (VoIP) telephony network traffic using transcoding steganography, Quick-Response (QR) codes, smartphones, and even popular peer-to-peer (P2P) services such as Skype (Wendzel et al., 2014, pp. 3–5). Digital steganography developers are also actively improving what is known as the Peak Signal-to-Noise Ratio (PSNR) caused by injecting “noise” to an otherwise innocuous file, the noise, of course, being the result of embedding hidden data within the carrier file. The greater the reduction of the PSNR, the less likely the discovery of the stego-file is (Wendzel et al., 2014, p. 7).
Another possible avenue of network protocol exploitation is with the commonly used Domain Name System (DNS) Internet protocol. DNS is a type of TCP/IP-compatible protocol that translates website Uniform Resource Location (URL) addresses into their respective IP addresses and vice-a-versa so that users can remember and easily identify website address names. The critical nature of DNS makes it difficult for system administrators to restrict the protocol, lest it should cause adverse network performance issues (Drzymala, Szczypiorski, & Urbanski, 2016, p. 343). This inherently loose configuration of the DNS protocol allows for malicious manipulation, and it has been discovered to have been previously exploited by the W32.Morto bug which exploited text records within the DNS protocol to communicate with the C&C server channel (Drzymala, et al., 2016, p. 343). Next, the W32.Morto malware would then send a query to the DNS server with a text file instead of the typical IP address queries it usually receives. The DNS server would then reply with text that would be auto-decrypted and which contained even worse malware code for subsequent downloading by the infected computer (Drzymala, et al., 2016, p. 343). It is also possible to manipulate packet headers by inserting (2)-bytes (i.e., 16-bits) of hidden data in the Answer.IP DNS header flag field that instructs the DNS server to perform a Distributed Denial of Service (DDoS) attack in which botnet-infected zombie computers continuously spam traffic to a victim IP address (Drzymala, et al., 2016, p. 345). The digital steganography embedded data transmitted in the DNS packet header fields could contain any manner of hidden malware, a DDoS attack is only one type of attack for example.
The larger the size of the chosen carrier file format type, the more suitable carrier file for digital steganographic techniques to embed a large of amount of secret data within the carrier file. This concept is perhaps better explained by comparing the file size of an image file and the file size of a video file. Depending on the image resolution of a high-resolution image file (.jpg), it may be 5–10 Megabytes (MB) in maximum file size. A video, on the other hand, can be hundreds of MB or even up 4.7 Gigabytes (GB) on a 1-sided Digital Versatile Disc (DVD). Secretly hiding data within an image file without causing noticeable image quality distortion requires the use of a digital steganography application that will compress and optionally also encrypt the hidden data with a password or passphrase so that if the stego-file is discovered, the hidden data cannot be accessed without the accompanying password or passphrase. If the embedded data is too large, then it will drastically affect the PSNR which could distort the carrier image file and raise suspicion. Steganalysis techniques are expensive and time-consuming but can be used to detect and analyze stego-files. If an adversary detects the presence of steganography, encrypting the stego-file within the carrier file provides another layer of protection against an adversary discovering the secret information (Adamy, 2012, p. 47).
The literature review of relative academic journal articles indicates that security researchers are focusing on various network protocols commonly used in the seven layers of the Open Systems Interconnection (OSI) stack that are capable in some way of being exploited by digital steganography techniques. In the same way that malware developers have been known to monitor updates to the MITRE Common Vulnerabilities and Exposures (CVE) list, it should then also be assumed that malware developers are reading these same academic journals and publications for new ideas on how to take advantage of the newfound network technology vulnerabilities. This is yet a further example of the duality of security research which can be used for both positive and negative applications. Network steganography is a sub-field of study within the broader digital steganography field of study. Network steganography is cutting edge research and is even more highly technical than other areas of digital steganography. Network steganography requires a basic level of understanding of how networks operate not just on the Network Layer 3 of the OSI stack, but all of the various layers and protocols that can potentially be used. Nearly every protocol currently in existence can be subverted for use by digital steganography applications, the developer of the tools need only to customize the digital steganography techniques and code to the specific protocol being targeted. If these software tools are never publicly released on the Internet, then it essentially becomes a zero-day exploit that anti-virus/malware software has no application signature from which to base scans on against it. Newer technologies that have been exploited by digital steganography include popular Internet services such as Skype (SkyDe), BitTorrent (StegTorrent), and Google Suggest (StegSuggest) (Sekhar et al., 2015, p. 767). Even Quick Response (QR) codes commonly found on commercial packaging are basically just digital images that can be exploited by digital steganography along with the Wireless Local Area Network (WLAN) Institute of Electrical and Electronics Engineers (IEEE) 802.11 standard protocols by employing the “WiPad” steganographic technique (Sekar et al., 2015, p. 767).
Smartphone technology is rapidly becoming the next platform that digital steganography has begun exploiting. In the year 2018, it is common for people to take their smartphones everywhere and use them for just about everything to include social media, map directions, telephone calls, personal financial management applications, accessing Internet Web sites, entertainment, personal photos and videos, gaming, organization, and much more. Smartphones are the way of the future, technology experts have predicted that they will replace desktop and laptop computers entirely. People are increasingly less inclined to sit behind a laptop or desktop computer and more inclined to use the small form factor computer at their fingertips, the smartphone, while they go about their day. Considering that there are an estimated 1.08 billion smartphone users globally, smartphones are a juicy target that offers a broad gamut of wireless protocols that digital steganography applications and malware developers can exploit such as Bluetooth, 3G, 4G, 5G, Global Positioning System (GPS) signals, VoIP and more (Mazurczyk & Caviglione, 2015, p. 334). VoIP telephony network traffic is not commonly monitored which makes it a prime candidate for digital steganography. The “MoshiMoshi” botnet was discovered to have used VoIP combined with digital steganography as a covert communication channel to the C&C server where the botmaster issued commands to the infected bots by encoding hidden messages within data sent to specific VoIP phone numbers (Soltani, et al., 2014, p. 124). Virtually any computer network protocols and technologies that are used by computer networks and smartphones can be exploited by some type of digital steganography technique. Recent trends reported by McAfee anti-virus/malware company have shown an astonishing 1,800 percent increase in smartphone malware which is startling considering many users lack even basic AV software on their mobile devices (Mazurczyk & Caviglione, 2015, p. 90).
Another criminal application of digital steganography has been its use in hiding the distribution of child pornography material on the Internet. Peer-to-peer (P2P) network protocols such as Gnutella and BitTorrent have been abused to share child porn content for decades. P2P networks have been abused on such a large scale that law enforcement (LE) lacks the proper resources to track down and prosecute the millions of IP addresses involved in sharing the pedophilia imagery and videos (Liberatore, Levine, & Shields, 2010, p. 1). The problem is not with the identification of the specific IP addresses involved, so much as it has to do with the multitude of IP addresses involved and the fact that many users mask their IP address by using anonymous browsers such as Tor, I2P, or Virtual Private Networks (VPNs) (Liberatore, Levine, & Shields, 2010, p. 2). The time and resources it takes law enforcement to build a criminal case, track down, and prosecute the individuals involved in the criminal trafficking of child pornography online is similar to casting a small pebble into an ocean of suspects. The suspects could potentially be insulated further by the fact that they operate outside of U.S. legal jurisdiction, from a country that does not recognize international LE extradition or cooperate legally with the U.S. (e.g., North Korea, Iran, China, Russia). Albeit P2P protocol exploitation is not a specific example of malware that incorporates digital steganography, the model serves to illustrate further the correlation between how digital steganography can be used in various modalities.
Another scary application of digital steganography that has perilous risk implications involves hiding an entire TrueCrypt encrypted volume container inside of a .mp4 video file using a Python language application called “tcsteg.py” that was developed by a German software engineer named Martin Fielder (Drager, 2011). For this technique to work optimally, the carrier .mp4 video file should be roughly equal to the size of the TrueCrypt container for the “tcsteg.py” application to successfully embed the entire container-within-a-container inside of the .mp4 video file (Drager, 2011). The potential implications of an attacker being able steganographically hide an entire volume of data that is encrypted are both serious and eye-opening to LE and IC agencies in the wake of malicious insider classified material leaks carried out by Edward Snowden and Chelsea Manning.
Legitimate Applications of Digital Steganography
Digital steganography is not well-known by the layperson. It is usually only remembered for its illicit applications of the technology and steganography, generally speaking, is often a misunderstood concept due to its inherent complexity. The majority of the population, even many IT professionals outside for that matter, consider digital steganography to be an obscure encryption technique when, in fact, it is a digital compression and data embedding technique that rooted in higher-order mathematical algorithms that may also allow for adding strong encryption of the hidden data. There are several legitimate applications of digital steganography such as digital watermarking of copyrighted media and communicating in secret to evade government monitoring or Web censorship. Perhaps no industry more so than the entertainment industry has incorporated digital steganography to facilitate Digital Rights Management (DRM) and digital watermarking of audio and video media that is designed to combat piracy of digital media (Yugala & Rao, 2013, p. 1629). In order to combat digital media piracy, the practice of digitally watermarking media such as images created by an artist, music, video games, and movie videos has been implemented which is accomplished through the use of advanced computer software applications (Yugala & Rao, 2013, p. 1629). If a person is suspected of pirating digital media, investigators will quickly be able to discover if the media is authentic if it contains the appropriate digital watermark. If the media in possession lacks the corresponding digital watermark, then it can confidently be determined to be a counterfeit reproduction and therefore subject to legal recourse.
Digital steganography can also be used legitimately to hide the existence of encrypted communications (Wendzel et al., 2014, p. 1). If encoded data is transmitted across the Internet and an adversary is monitoring it, the adversary will be able to determine who is receiving the encrypted data packets since the TCP/IP header packet contains the source and destination IP addresses. However, if digital steganography is used to hide the encrypted data contents within a plaintext message, image, audio, or video file, then the adversary will not automatically know that there is secret information being transmitted to the destination IP address(es). Instead, it could be fake cover documents that are designed to mislead the adversary. In this manner, digital steganography can be used legitimately as a covert channel to conceal the existence of encrypted communications (Wendzel et al., 2014, pp. 1–2). It is predicted that network-based digital steganography will proliferate to other domains such as for the command and control (C&C) of botnets involving smart buildings and cities which has raised concerns for U.S. CIKR vulnerabilities (Wendzel et al., 2014, p. 8).
Evading Malware Detection Using Digital Steganography
Which specific APT groups are using digital steganography to evade malware detection is much less important than the fact that this practice has, in fact, been discovered on multiple occasions, thus confirming that it is a serious cyber threat that warrants monitoring. This anomaly may be partly due to a lack of centrally-located malware reporting, with each AV company, LE and IC agency maintaining their own repository. However, it is more likely due to the difficulty of cyber attribution for each unique piece of malware discovered on the Internet. APT groups with skilled malware authors are not typically in the habit of leaving a breadcrumb trail that makes it easy to attribute malware back to its source. The alarming trend and the basis of this study is that Black Hat hackers have recently begun incorporating digital steganography into their arsenal of malware weapons (Zielińska, Mazurczyk, & Szczypi, 2014, p. 88).
The “Regin” Trojan is a highly sophisticated malware used for cyber espionage of individual targets, corporate and government organization targets, as well as researchers since at least 2008 that remained undetected for six years until 2014 (Mazurczyk & Caviglione, 2015, p. 89). The highly sophisticated examples of malware are designed for custom modification even after deployment by APT groups that may be spying on behalf of Nation-states. Malware creators have perfected their techniques to the point of being able to remotely control the malware from C&C servers and then, later on, remotely customize its code to succeed against emerging defenses in its operational environment (Mazurczyk & Caviglione, 2015, p. 89). The days of simple Morris worms and Melissa viruses (Potter, 2009) are over, enter the dawn of a new era. An era of polymorphic malware that hides itself using digital steganography and spreads itself invisibly through insecurely-designed network protocols. The impetus for investing such a great deal of time and resources to develop this highly sophisticated Nation-state APT malware is proportional to the effort invested in attempting to hide the existence of the malware. No APT group wants to invest substantial precious resources of work hours, money, and effort into developing a sophisticated malware toolkit only to have it be discovered by adversaries or an AV software company within a few weeks of initial deployment. There can be no mistaking the fact that APT-level malware such as Regin is purposely designed to avoid detection for as long as possible through the use of information hiding techniques accomplished through the use of digital steganography. “Linux.Forkirtor,” “Feederbot,” and “Smuggler” are additional examples of digital steganography-infused malware that exfiltrated data invisibly under the guise of normal network traffic (Mazurczyk & Caviglione, 2015, p. 90). The “Soundcomber” and “AirHopper” malware are specifically engineered to exploit mobile devices by leaking data such as banking application personal Identification Numbers (PIN) up to 30 meters (~90 feet) away at a rate of 9 bits/second using ultrasonic soundwaves to a receiving device, or via emission of electromagnetic signals from the device graphics processing unit (GPU) at a rate of 100 to 500 bits/second up to 7 meters (~21 feet) away (Mazurczyk & Caviglione, 2015, p. 91). Symantec discovered the “Linux.Fokirtor” malware in 2013, it was a Trojan that secretly embedded malware communications back and forth to its C&C servers using innocuous network protocols like secure shell (SSH) via TCP port 22, but not before first encrypting the transmitted data using the Blowfish encryption algorithm (Mazurczyk & Caviglione, 2015, p. 92).
In 2011, “Operation Shady RAT” was discovered to be a highly successful piece of malware that was distributed by cybercriminals via phishing emails designed to trick recipients into opening the message. Once openeed, the malware then implanted a Remote Access Trojan (RAT) on the infected computer and further downloaded Hyper Text Markup Language (HTML) or .jpeg image files that were able to bypass firewalls. The HTML or .jpeg images contained special control commands that contacted C&C servers which subsequently downloaded executable malware code allowing cyber criminals to steal files from the infected computer (Zielińska, Mazurczyk, & Szczypi, 2014, p. 88). The primary program within the “Operation Shady RAT” malware that was later found to be responsible for infections was an email phishing virus labeled “Trojan.Downbot” that created backdoors on an infected host machine through which HTML pages and images were downloaded that contained hidden commands allowing it to access key operating system files (Mazurczyk & Caviglione, 2015, p. 90). Not coincidentally, 2011 was a year that saw an incredible number of high-level hacks, including the discovery of the “Alureon” malware and the “Duqu” worm by security researchers who after reverse engineering their respective codes found Duqu to be very similar to the Stuxnet worm. However, unlike the Stuxnet worm that contained a destructive malware payload, the Duqu worm was specially engineered to be an information collection agent and is primarily considered by security experts to have been a precursor to Stuxnet (Mazurczyk & Caviglione, 2015, p. 92). Both Duqu and the Alureon malware bundled the stolen data into common image files using digital steganography so as not to raise suspicion as the files containing the stolen data were quietly exfiltrated under the guise of ordinary network traffic (Zielińska, Mazurczyk, & Szczypi, 2014, pp. 86–88).
A French security researcher discovered in 2014 that the “Zeus” malware also had a variant of the malware known as “ZeusVM” that used digital steganography to hide commands being sent to infected hosts via C&C servers (Hay, 2015, p. 11). Shortly after this discovery, a different security researcher discovered that the “Lurk campaign” had incorporated the ZeusVM malware and was using its digital steganographic code to drop other malware onto infected hosts (Hay, 2015, p. 11). Much like the “Trojan.Downbot” malware program that used digital steganography to hide encrypted URLs within image files, the “Lurk” malware employed similar techniques to download its malware payload (Mazurczyk & Caviglione, 2015, p. 92). “Stegoloader” is a particularly dangerous digital steganography application that is classified as malware itself because it allows malware authors to incorporate digital steganographic and counter-forensics techniques into the malware code to avoid AV software detection (Hay, 2015, p. 12). Malware authors have already begun to see the full potential of incorporating digital steganography into their malware code, a trend that will surely continue to expand over time as its popularity increases. Digital steganography is already known to have been used for data exfiltration (e.g., the Department of Justice), it enables botmasters to remotely and covertly communicate with infected hosts, and it has also been used to sneak other types of malware across network defenses such as firewalls and anti-virus software (Hay, 2015, p. 12). Also in 2014, the “Trojan.Zbot” discovery showcased how the malware employed digital steganography to create a .jpeg image file onto an infected computer that contained all of the banking and credit card information for transmission back to its C&C servers (Mazurczyk & Caviglione, 2015, p. 335).
Social media networks such as Facebook, Twitter, Instagram, and YouTube all operate with enormous user bases numbering over 2 billion users for Facebook alone. These user bases consist of personally-created user accounts that may require as little as a first and last name, a chosen username, an email address, and a password to access the account. It is not surprising that many users do not use their real names on these types of social media platforms for security reasons or if the user has something to hide. It is possible to write custom scripts that will automatically generate bot accounts with different usernames and user characteristics that are controlled by a botmaster. The apparent concern within this context is the human psychological tendency of people to be susceptible to peer pressure. The implications are rather substantial if enough of these bot accounts were controlled by a botmaster and potentially used to sway popular opinion on political matters such as is alleged to have occurred by the Russian GRU Intelligence organization during the 2016 U.S. Presidential election (Shane, 2017).
Traditionally, network forensic investigations have yielded evidence supporting the fact that botnets have primarily been controlled by botmasters via Internet Relay Chat (IRC) or Peer-to-Peer (P2P) network protocols (Venkatachalam, 2017, p. 6081). However, network security analysts are not able to detect and monitor malicious network traffic when digital steganography is used in combination with malware. The stego-malware can be used to scrape sensitive account information from databases and communicate it covertly between the “Stegobot” social media accounts and the botmaster by embedding the stolen data within images shared on social media platforms such as Facebook and Twitter or the videos of YouTube (Venkatachalam, 2017, p. 6081). The potential for social media Stegobots to be used by adversary Nation-state Intelligence organizations, such as the Russian GRU, as a political mechanism to sway popular opinion on important issues is not a threat that should be taken lightly. The Russians have proven to be very adept at cyber espionage techniques in attempts to hijack several international democratic elections in the recent past, the 2016 U.S. Presidential election notwithstanding. In his testimony before Congress on May 8, 2017, then Director of National Intelligence (DNI) James Clapper referring to the 2016 U.S. Presidential election tampering by Russian Intelligence stated that, “If there has ever been a clarion call for vigilance and action against a threat to the very foundation of our democratic political system, this episode is it” (Calabresi, 2017).
The problem is exacerbated by the potential for normal user accounts to become infected with malware, perhaps by unsuspectingly clicking on a news article link within a Facebook post or even merely viewing an image will download the image to the computer thereby infecting it with the hidden malware (Soltani, et al., 2014, p. 124). This phenomena is reported to have happened to a verified Twitter user that stopped using her account in 2014 and who unbeknownst to her, learned that the account was hacked in November 2016 and used for political propaganda purposes to spam targeted Twitter users such as Presidential candidate @realDonaldTrump who could sway popular opinion during the 2016 U.S. Presidential election (Shane, 2017). Whether digital steganography played a part in this is not known, but it is certainly plausible that it could have been used in combination with malware to compromise social media accounts.
With a relatively small team of semi-skilled Intelligence analysts or hackers perhaps belonging to an APT group, it would not be unimaginable for a Nation-state to potentially construct an enormous Stegobot army of social media bot accounts over time that could be used as an instrument to “like” and “re-post” every political news story written with a certain spin or bias that is politically advantageous to the meddling adversary. If not detected and stopped by the social media platform system administrators or reported by suspicious users, the Stegobot army could grow exponentially unabated, eventually consisting of tens or hundreds of thousands of Stegobot social media accounts. With this level of social media capital that are all being remotely controlled by a central authority possibly located far away from America, it is not difficult to imagine an “enemy of the [deep] state” situation in which the bot accounts could be used to “sway” popular opinion on political issues. Facebook and Twitter have reportedly identified and shutdown thousands of suspected Russian bot accounts in the wake of the 2016 U.S. Presidential election (Shane, 2017). This type of cyber attack is known as an information warfare attack which has been happening for several years between the U.S. and Russia with both nations’ Intelligence Community (IC) heavily involved in covert campaigns (Shane, 2017).
Protections against fake bot accounts on websites such as the Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHAs) are not as foolproof as designers had intended. The CAPTCHAs are easily circumvented by creative botmasters who can still script code that generates a significant amount of fake social media accounts in short order (Venkatachalam, 2017, p. 6082). It is possible to analyze social media account profile feature attributes to determine if the user account is real or possibly a bot account. However, it is not feasible to conduct such a level of analytic scrutiny on a widescale basis without some automated scanning tool such as an application programming interface (API) (Venkatachalam, 2017, p. 6083). Network forensic analysts can detect a Stegobot by utilizing a form of steganalysis that involves Discrete Cosine Transform (DCT) feature analysis that compares the binary bit composition of image files and known digital steganography compression algorithms while also analyzing noise frequency to detect the presence of stego-file content (Venkatachalam, 2017, p. 6083).
Countering Digital Steganography with Steganalysis
Whereas the goal of steganography is to hide information in plain sight within cover files, steganalysis is the science of detecting steganography. The sheer number of steganography applications available, many of them free for download on the Internet today number more than twelve-hundred with significantly fewer steganalysis applications available. This imbalance demonstrates the difficulty of successful digital steganography detection using the available steganalysis application tools. To detect the presence of steganography, or the “payload” as it also called, steganalysis seeks to attack suspicious files using a technique known as the “blind detection technique” either visually, structurally, or statistically (Wingate et al., 2007, p. 177). These three different blind steganalysis detection techniques each offer a measure of accuracy in detecting a stego-payload and sometimes merely detecting the presence of digital steganography in a carrier file alone is sufficient to achieve the desired effect of altering the file and thus rendering the hidden information unreadable by the intended recipient (Wingate et al., 2007, pp. 177–178). Steganalysis detection is equivalent to a compromise of secrecy.
Another steganalysis detection technique is signature-based detection which is similar to AV software virus and malware signatures that are used in scans to detect known viruses and malware on a protected computer. Fingerprints are checksums or hash values of a particular file using well-known hashing algorithms such as MD5, CRC-64, or SHA-2 (Wingate et al., 2007, p. 178). File hashing or “fingerprinting” is useful for steganalysis purposes because it can be used to generate a one-way hash of a file that is irreversible and that can be compared against a carrier file suspected of containing hidden files within it. If the file hashes are different, then a user knows that the file has been altered in some way. An altered file could represent the presence of digital steganography. Hashing has long been used by software vendors who post software on websites so that users know that the file downloaded is authentic after hashing it and comparing the two values prior to installation. It is possible, however, that an attacker could hack the website and replace the downloadable file with malware and post a hash of the malware executable file making it appear legitimate to potential users wanting to download what would otherwise appear as a normal program (.exe) file. Unless the vendor is actively auditing the Web server logs, it would not necessarily become aware of the fact that it was hacked and the malware downloads would continue until it was determined later that the file had been swapped out. Albeit, this is not a digital steganography example, it does provide another perspective into how devastating plain view attacks can potentially be without proper safeguards in place.
The more advanced versions of preventive network devices such as firewalls and reactive AV software perform both signature or string-based scanning and heuristic, or behavior-based, malware detection analysis of key components of the computer operating system such as the file registry (Do, et al., 2017, p. 2). Technological advancements in anti-malware detection are important because malware costs the world economy hundreds of billions of dollars per year and wastes billions of hours to respond to and fix (Do, et al., 2017, p. 1). Different threat analysis models and formulas can be used to mitigate risk to the greatest extent possible. Game theory, for instance, can be used for juxtaposition of two opposing forces such as cyber attackers and cybersecurity defenders (Do, et al., 2017, p. 2). Think of it as a game of mud football where each team attempts to stack the deck in their favor by selecting the best players that will help achieve victory over the other team. Game theory then pits each action and predicted counter-action against each other until the expected outcome is derived that demonstrates the balance relationship between cyber attacker and defender (Do, et al., 2017, p. 2). Game theoretical approaches to digital steganography can be applied by assigning a set of numerical values to the steganographer and the steganalyst which indicate their respective abilities to modify file content and detect file modification that help to evaluate design options for steganalysis detection tools (Do, et al., 2017, p. 23).
There are seven different classifications of steganography “attacks” that can be performed which steganalysis may help to detect. There are stego-only attacks, known-cover attacks, known-message attacks, known-stego attack, chosen-stego attack, chosen-message attacks, and disabling or active attacks (EC-Council, 2010, pp. 1–18 to 1–19). Each of these different classifications of steganography attacks involve different elements of either the cover medium file, the hidden message, or the stego-file. Disabling or active attacks include some amount of image blurring, noise reduction, sharpening images, rotation, resampling, and softening of images used to embed hidden messages to further reduce the chances of steganalysis detection (EC-Council, 2010, p. 1–19).
The threat of digital steganography is such a serious threat that the National Institute of Standards and Technology (NIST) specifically addressed it in their 2013 Special Publication (SP) 800–53, Revision 4, Security and Privacy Controls for Federal Information Systems and Organizations, which contains an exhaustive catalog listing of security controls designed to best protect Federal IS (NIST, 2013). NIST specifically addressed the threat of digital steganography in three different security controls for protection against the covert exfiltration of information across network boundaries [SC-7(7)], malicious code hidden through the use of digital steganography (SI-3), and information system monitoring [SI-4(18)] to detect covert data exfiltration (Wingate, 2013). The NIST SP 800–53 (rev. 4) security control that pertains to malicious code requires networks to implement real-time scanning of external files at the network boundary firewall (Wingate, 2013). To date, however, the only commercial enterprise to develop a product capable of performing real-time network scanning for digital steganography application signatures is Backbone Security (Wingate, 2013). Backbone Security has developed it’s “Steganography Analyzer Real-Time Scanner” (a.k.a., “StegAlyzerRTS”) that scans files for over 1,150 unique steganography application signatures that are accessed from a master repository database maintained by Backbone Security’s Steganography Analysis and Research Center (SARC) (Wingate, 2013). These services are provided for a subscription fee and the fact that there is only one company offering this service makes it a very lucrative position to capitalize on considering NIST publishes cybersecurity guidance and requirements for the entire Federal government to include the Department of Defense (DoD). Certainly there are other steganalysis application vendors that organizations can choose from, but none of them offer the real-time network scanning service tied to its repository database that Backbone Security offers.
The purpose of this section is to explain the research methodology and design approach that will be used to create the detailed research plan for this study. This section is strategically compartmentalized to cover research questions, hypothesis, identification and operationalization of variables, data collection, summary, and limitations of the research involved in this study which, albeit some of which were previously mentioned in the paper, were not sufficiently explained heretofore.
Digital steganography by itself is a challenging subject to study due to its extremely technical nature and the rarity of detection in the wild which does not lend itself towards an abundance of relevant case studies. Further narrowing of the research aperture when combining instances wherein digital steganography was used in conjunction with malware specifically as a mechanism for escaping detection makes research exponentially more challenging. However, there are enough documented cases of digital steganography-infused malware that may be used for this study. It is for this reason that research of cases in which digital steganography was used to evade malware detection must then be considered predominantly academic and theoretical due to the rarity of available case studies. Research for this study focused on a somewhat limited pool of published scholarly and peer-reviewed literature that details instances where and when digital steganography was found to have been used as an advanced malware detection evasion technique. Additionally, controlled experiments involving the injection of a cover medium file containing embedded stego-file data to the Internet via various ISP sites such as YouTube, Facebook, and Twitter will provide invaluable research results on whether digital steganography application signatures are being scanned for by some of the most popular ISPs (Barwise, 2015).
The primary thesis research question for this study is:
1. Is digital steganography being used by malware developers as an advanced evasion technique?
Given the path that the primary thesis research question begins with, logically it can be further dissected into the following supplemental research questions, such as:
2. Is it probable that popular Internet content host sites are being used to propagate malware using digital steganography?
3. Does adversarial use of malware that incorporates digital steganography to evade detection pose significant threat implications for U.S. CIKR IS connected to the Internet?
Research on these topics may be best performed by employing a mixed method approach consisting of qualitative information analysis of historical data gathered from the study’s literature review, and quantitative analysis of low-risk stego-file injection experiments to popular ISP sites.
This study intends to prove or disprove three hypotheses that are derived from the study research questions. These hypotheses will be evaluated qualitatively through analysis of the historical data gathered during the literature review and also quantitatively by conducting controlled experimental tests that involve stego-file injection uploads to various selected ISP website platforms.
H1. If there is enough recent and relevant evidence to suggest that malware creators are incorporating digital steganography as an advanced malware detection evasion technique, then it can be assumed that its use has become a new widespread malware creator evasion tactic.
H2. If ISPs do not have the means to or choose not to scan for digital steganography, then cyber threat actors will use these ISP platforms to post illegal content or propagate malware using digital steganography.
H3. If cyber threat actors are successfully employing digital steganography to hide malware within seemingly normal data traffic that evades malware detection, then the United States CIKR, private industry, and ISPs are all at risk of malware infection that could potentially cripple the entire nation.
Qualitative analysis of the results of the literature review and controlled comparative experiments that are quantitatively analyzed will determine if it is currently possible to inject hidden data in cover medium files and upload the modified files to popular social media sites. Qualitative analysis will be used to study the prevalence of malware that used digital steganography to evade malware detection on the Internet. A comparative experimental research design is the best design option for finding the prevalence of a phenomenon such as malware that incorporates digital steganography on the Internet and comparing data files that may or may not contain steganography. This study will make use of analysis of information, historical data, and controlled experiments to either prove or disprove the given hypotheses. For this study, the research will focus primarily on malware that has been discovered to incorporate digital steganography for the specific purpose of evading anti-malware software detection.
Identification and Operationalization of Variables
Research variables were identified during the literature review. The major variable at play in this study is digital steganography-infused malware. Other variables involved are the multitude of Internet content host provider sites (a.k.a., ISPs) that may inadvertently be hosting embedded malware hidden within carrier files; the various file format types that are commonly uploaded to ISP sites; and the plethora of digital steganography software applications that enable a user to embed hidden data within cover medium files (Barwise, 2015). This research study also attempts to establish a correlation between digital steganography-infused malware and cyber threat implications to U.S. CIKR IS. Cryptography is an extraneous variable that when combined with digital steganography further exacerbates the difficulty of accessing embedded encrypted data following steganalytic detection. Comparative and exploratory methodologies are used to identify instances where digital steganography was used as an advanced malware detection evasion technique on the open Internet (Kumar, 2014, p. 125).
Each of the three research questions of the study are addressed in different sections in this thesis. The primary research question of whether digital steganography is being used by malware developers as an advanced evasion technique is addressed by citing specific examples discovered in the literature review. It is outside the scope of this study and would also not be ethically feasible to find and cite new and, as of yet, undiscovered examples of digital steganography-infused malware due to legality reasons and appropriateness of research content as authorized by the university. The secondary research question of whether it is probable that popular Internet content host sites are being used to propagate malware using digital steganography will be addressed in the results of the low-risk comparative design stego-file injection experiments that are to be conducted. Lastly, the third research question of whether the adversarial use of malware that incorporates digital steganography to evade anti-malware detection poses significant threat implications for U.S. CIKR IS connected to the Internet will also be addressed in the results section of the paper.
The research and data collection approach methodology for this study centers primarily on the review of peer-reviewed journal and academic articles that pertain to digital steganography that is combined with some type of malware. In addition to the literature review, three separate controlled experiments will be conducted that involve uploading a stego-file consisting of a text file as an embedded but simulated piece of malware onto the popular ISP platforms of YouTube, Facebook, and Twitter. The study author will attempt to upload the simulated malware by embedding it into an (.mp4) video file format, and (.jpg) image file format using freely available steganography applications onto each ISP social media platform and document the resulting successes or failures. If the upload of the modified files is successful for each file type, it is then plausible to assume that cybercriminals and terrorists are freely using these ISP social media platforms for nefarious purposes based on both historical references and the current unrestricted availability of this option. The “Pictricity version 1.10” digital steganography application will be used to inject hidden data into the image (.jpeg) digital file format for upload to Twitter. The “OpenPuff version 4.0” digital steganography application will be used to embed hidden data in the video (.mp4) file format for upload to Facebook and YouTube. The “Stegdetect version 0.6” steganalysis application will be used to detect files that have been altered using the various digital steganography application embedding techniques. Comparative analysis will be used to find statistical differences in file size and binary code file structure by performing file hashing which will demonstrate that digital steganography may have been used to alter the original file content. As a matter of legal consequence, absolutely no Personally Identifiable Information (PII) or ISP proprietary information shall be published in the study results.
Data is to be sampled from the identified variables taken from the literature review and the results of the comparative analysis of the controlled stego-file injection experiments. Examples of illegal Internet content would be child pornography embedded within innocuous Easter bunny photos, terrorist propaganda documents (e.g., bomb-making instructions, terrorist operation plans, etc.) embedded within pornographic videos or images, and malware hidden within seemingly benign executable application files. Only historical citations of illegal Internet content are used for the purposes of this study. Additionally, data from various Web content host sites, file type options available for users to upload Internet content, available steganography software applications, and encryption options that are commonly used in conjunction with steganography applications will be sampled.
Applicable data relating to how digital steganography has been used in malware code to evade malware detection will be collected from the literature review readings. The data resulting from controlled stego-file upload experiments will contain cover medium file types, hidden stego-file types, original file sizes and CRC-64 file hashes, compressed file sizes after stego-file compression and CRC-64 file hashes, ISP platform attacked, and success or failure results.
Summary of Analysis Procedures
The research in this study will be collected using a mixed method approach of both qualitative and quantitative research design methods to analyze and study the stated research variables. Data collected during this study will primarily be ascertained from the literature review of peer-reviewed scholarly articles, academic books, and verifiable information found on the Internet. However, of equal importance in proving or disproving the stated hypotheses will be the results of the comparative, controlled and limited-scope experiments that will use various freely-available steganography applications to embed hidden data in different cover medium file types that will be uploaded to various Internet content host sites to test the viability of using steganography to upload illegal Web content. If successful upload of digital steganography-modified (i.e., stego) files is achieved for YouTube, Facebook, and Twitter, then further analysis will be performed using the “Stegdetect” tool to determine if the use of steganography is detected.
Selected scholarly works relating to digital steganography and malware will be reviewed, and critical portions of those articles that are consistent with the overall theme of the thesis will be summarized in the literature review section of the study. Successful determination of the controlled stego-file uploads will be readily apparent if the files successfully upload to the selected ISP platforms. Upload success/fail rate percentages will be documented in the results portion of the study. The controlled stego-file upload experiments will demonstrate whether popular ISP platforms are scanning for digital steganography application signatures and further show if it is possible to upload stego-malware onto these sites.
Any study of digital steganography will have inherent limitations in the research design due to various factors such as limited scholarly literature published on the subject, complexity of the topic as it relates to explaining it to the layreader, legality of data content when trying to demonstrate illegal instances that have been used, and the repeatability of the comparative experiment results using various newer versions of freely available steganography software applications. It is rare for digital forensic analysts to discover that digital steganography was used in the open Internet, and more unique still that if found it can be accessed and read. Several historical accountings of digital steganography discovered on the Internet involving illegal applications and malware were not published in peer-reviewed or academic journals which makes their inclusion in an academic study inappropriate. Instead, these types of documented instances are typically published by less credible sources such as technology magazines, news sites, and security research authors. The study may be in some small measure biased towards research that has been conducted by developed nations due to limited or the complete lack of availability of data from developing countries.
This study served to demonstrate that digital steganography is a potentially dangerous technology application that is actively being used by cyber threat actors as an advanced technique to evade current anti-malware detection software and also that digital steganography is not widely scanned for by public or private sectors. Given the results of the literature review and controlled experiments, the three hypotheses of this study are confirmed correct.
H1 posited that if there is enough recent and relevant evidence to suggest that malware creators are incorporating digital steganography as an advanced malware detection evasion technique, then it can be assumed that its use has become a new widespread malware creator evasion tactic. The literature review demonstrates that incorporating digital steganography into malware to evade malware detection is a new tactic of Black Hat malware developers.
H2 posited that if ISPs do not have the means to or choose not to scan for digital steganography, then cyber threat actors will use these ISP platforms to post illegal content or propagate malware using digital steganography. While there is concrete proof that each of the three selected ISP social media platforms are not scanning for digital steganography, it is not known whether the ISPs do not possess the means by which to scan files adequately or if the ISPs just choose not to scan for it.
H3 posited that if cyber threat actors are successfully employing digital steganography to hide malware within seemingly normal data traffic that evades malware detection, then the U.S. CIKR, private industry, and ISPs are all at risk of malware infection that could potentially cripple the entire nation and its digitally-based economy. It is logical to assume that critical infrastructure or virtually any Internet-connected network is possibly at risk of being infected by stego-malware given the fact that there does not appear to be any proof that ISPs are performing steganography application signature scanning on any of the selected ISP social media platforms.
Controlled Stego-File Injection Experiments
A simple text file containing the words “Secret stuff goes here” was created and embedded into each carrier file type extension as simulated malware that had the following characteristics:
It should also be noted that to prevent violation of each ISP platform Terms of Service agreement that all users are required to adhere to, the controlled experiment stego-files were immediately deleted following successful experiment upload. Twitter’s Help Center states that photos can only be up to 5 MB in size and animated GIFs can only be up to 5 MB on the Twitter mobile application or up to 15 MB on the Web; and Twitter only accepts .gif, .jpeg, and .png image file format types. The Facebook Help index lists the following supported file extension format types: 3g2, 3gp, 3gpp, .asf, .avi, .dat, .divx, .dv, .f4v, .flv, .gif, m2ts, .m4v, .mkv, .mod, .mov, .mp4, .mpe, .mpeg, .mpg, .mts, .nsv, .ogm, .qt, .tod, .ts, .vob, .wmv. YouTube supports the following upload file extension types: .MOV, .MPEG4, .MP4, .AVI, .WMV, .MPEGPS, .FLV, 3GPP, WebM, DNxHR, ProRes, CineForm, HEVC (h265). The initial to attempt upload the controlled experiment stego-file to YouTube failed with the error message contained in Figure 1. A second attempt, however, resulted in a successful stego-file upload.
Impact of Free Digital Steganography Software on the Internet
The number of steganographic applications on the Internet is relatively small when compared to other types of software freely shared across the Internet. In comparison with steganalysis applications, there are certainly a greater proportion of digital steganography applications than steganalysis applications which may make it more difficult for steganography detection. It is arguable whether digital steganography software enables bad people to do bad things because there are other instances in which it is also used to circumvent tyrannical regime Internet communication monitoring that is very important. The duality of human nature dictates that some users will use these tools for good purposes, and others for bad. Instead, it is more appropriate to require digital steganography scanning on ISP platforms and for U.S. CIKR IS.
Impact of Criminal Malware Using Digital Steganography on the Internet
Malware does not come without a cost and in 2016 alone, the United States Council of Economic Advisors estimated that cybercrime resulted in economic losses of between $57 to $109 billion (CEA, 2018, p.1). Going back even further to 2014, economic damages were estimated at $491 billion with counter-spending to combat malware measured at a measly $25 billion (Do, et al., 2017, p. 1). It’s not difficult who the winner will be in that battle simply by following the numbers. However, the financial cost of a coordinated cyber attack by a capable adversary using stego-malware against the U.S. CIKR IS could be catastrophic and send the U.S. economy into an unrecoverable downward spiral. An adversary could plant the malware on U.S. CIKR IS and then wait for a time of their choosing to trigger the malware attack. This type of “cybergeddon” attack is not difficult to imagine. A cyber attack of this magnitude that even if only temporarily shuts down the power grids, halts the Wall Street stock exchange, stops emergency services systems, shuts down transportation, water and wastewater treatment facilities, would undoubtedly be considered an act of war that would lead to kinetic warfare.
Through the literature review and controlled stego-file injection experiments, the study aimed to detail how digital steganography is being used an advanced malware detection evasion technique. Additionally, the study was conducted to determine whether ISPs are scanning for the presence of steganography file signatures and if this dangerous combination of digital steganograpy and malware could somehow be used to cripple U.S. CIKR IS. The study determined through the literature review that there are several documented cases in which malware creators specifically used digital steganography in an attempt to hide the presence of the malware. Additionally, the controlled experiments validated the hypotheses that ISPs are not currently scanning for the presence of stego-files and that malware developers could use digital steganography combined with malware to attack U.S. CIKR IS. The knowledge gained during this study can be used to improve the national cybersecurity posture of ISPs further but also for America’s most important critical infrastructure systems and those allies it chooses to share this information with.
Based on the results of this study, it is highly recommended that Congress require all U.S. critical infrastructure information systems to include those in the DoD to immediately comply with NIST SP 800–53 Rev. 4 security controls that stipulate the need for real-time network scanning for steganography use. Although NIST recommends real-time network scanning as part of the Risk Management Framework, it is at this time at least, not widely implemented across the Federal government most likely due to prohibitive cost reasons. It is recommended that the government encourage the creation of new network scanning technologies with digital steganography application signature repositories to prevent monopolization of the market by a single company. This could be done using an existing government contract vehicle that would be used to solicit bidders that would compete to win the coveted government contract and design this technology that will be widely implemented. To reduce the proliferation of digital steganography malware in general, Congress could also provide tax incentives for commercial enterprises such as Cable Internet providers and social media platforms that voluntarily implement this type of network scanning to prevent malware from being invisibly distributed on their Internet platforms. These are just a couple of small steps that could be taken to improve national cybersecurity.
Adamy, D. (2012). Steganography. Journal of Electronic Defense, 35(10), 47–49.
AV-TEST. (2018). Total malware. Retrieved from https://www.av-test.org/en/statistics/malware/
Barwise, I. (2015, February 25). Illicit applications of steganography on the Internet (Unpublished Master’s thesis). American Military University, Charles Town, WV.
Calabresi, M. (2017, May 18). Inside Russia’s social media war on America. Retrieved from http://time.com/4783932/inside-russia-social-media-war-america/
Council of Economic Advisors. (2018, February 16). The cost of malicious cyber activity to the U.S. economy. Retrieved from https://www.whitehouse.gov/articles/cea-report-cost-malicious-cyber-activity-u-s-economy/
“CS 101.” (n.d.) Bits and bytes. Retrieved from https://web.stanford.edu/class/cs101/bits-bytes.html
Desoky, A., & Younis, M. (2008). Graphstega: Graph steganography methodology. Journal of Digital Forensic Practice, 2(1), 27–36, doi: 10.1080/15567280701797087
Do, C., Tran, N., Hong, C., Kamhoua, C., Kwiat, K., Blasch, E., . . . Iyengar, S. (2017). Game theory for cyber security and privacy. ACM Computing Surveys (CSUR), 50(2), 1–37, doi: http://dx.doi.org/10.1145/3057268
Drager, D. (2011, February 26). Embed a TrueCrypt volume in a playable video file. Retrieved from https://lifehacker.com/5771142/embed-a-truecrypt-volume-in-a-playable-video-file
Drzymala, M., Szczypiorski, K., & Urbanski, M.L. (2016). Network steganography in the DNS protocol. International Journal of Electronics and Telecommunications, 62(4), 343–346, doi: 101515/eletel-2016–0047.
EC-Council. 2010. Computer forensics: Investigating data and image files. Clifton Park, NY: Course Technology Cengage Learning.
Hay, A. (2015). Steganography: A new technique of hiding malware. Database & Network Journal, 45(5), 10–12.
“Huge malvertising campaign uses steganography to hide malware in plain sight.” (2016, July 30). ICT Monitor Worldwide. Retrieved from https://search-proquest-com.ezproxy2.apus.edu/docview/1807715521?accountid=8289
Kumar, R. (2014). Research methodology: A step-by-step guide for beginners. (4th edition). Thousand Oaks, CA. Sage Publications.
Liberatore, M., Levine, B.N., & Shields, C. (2010). Strengthening forensic investigations of child pornography on P2P networks. Proceedings of the 6th International Conference, ACM CoNEXT, November 30-December 3, 2010, doi: 10.1145/1921168.1921193
Mazurczyk, W., & Caviglione, L. (2015). Information hiding as a challenge for malware detection. IEEE Security & Privacy, 13(2), 89–93. 10.1109/MSP.2015.33
Mazurczyk, W., & Caviglione, L. (2015). Steganography in modern smartphones and mitigation techniques. IEEE Communications Surveys & Tutorials, 17(1), 334–357. doi:10.1109/COMST.2014.2350994
National Institute of Standards and Technology. (2013, April 30). Security and privacy controls for Federal information systems and organizations. Retrieved from http://dx.doi.org/10.6028/NIST.SP.800-53r4
Potter, N. (2009, September 3). Top 10 computer viruses and worms. Retrieved from https://abcnews.go.com/Technology/top-computer-viruses-worms-internet-history/story?id=8480794
Sekhar, A., G., M.K., & Rahiman, M.A. (2015). A novel approach for hiding data in videos using network steganography methods. Procedia Computer Science, 70, 764–768. 10.1016/j.procs.2015.10.115
Shane, S. (2017, September 7). The fake Americans Russia created to influence the election. Retrieved from https://www.nytimes.com/2017/09/07/us/politics/russia-facebook-twitter- election.html Soltani, S., Seno, S. A. H.,
Nezhadkamali, M., & Budiarto, R. (2014). A survey on real world botnets and detection mechanisms. International Journal of Information and Network Security, 3(2), 116–127. http://dx.doi.org.ezproxy2.apus.edu/10.11591/ijins.v3i2.6231
Venkatachalam, N., & Anitha, R. (2017). A multi-feature approach to detect Stegobot: A covert multimedia social network botnet. Multimedia Tools and Applications, 76(4), 6079–6096.
Warkentin, M., Schmidt, M., & Bekkering, E. (2008). Steganography: Forensic, security, and legal issues. Journal of Digital Forensics, Security and Law, 3(2), 17–34.
Wendzel, S., Mazurczyk, W., Caviglione, L., & Meier, M. (2014, July 8). Hidden and Uncontrolled — On the Emergence of Network Steganographic Threats. ISSE 2014 Securing Electronic Business Processes, 123–133. doi:10.1007/978–3–658–06708–3_
Wingate, J.E. (2013, April 30). Revision to NIST security controls catalog addresses steganography threat. Retrieved from https://www.backbonesecurity.com/NISTAddressesSteganography.aspx
Wingate, J.E., Watt, G.D., Kurtz, M., Davis, C.W., & Lipscomb, R. (2007). Defending against insider use of digital steganography. Proceedings of the Conference on Digital Forensics, Security and Law, 175–184. Retrieved from https://search-proquest-com.ezproxy2.apus.edu/docview/211492820?accountid=8289
Yugala, K., & Rao, K.V. (2013, May). Steganography. International Journal of Engineering Trends and Technology (IJETT), 4(5):1629–1635. ISSN: 2231–5381.
Zetter, K. (2014). Countdown to Zero Day: Stuxnet and the launch of the worlds first digital weapon. New York: Crown.
Zielińska, E., Mazurczyk, W., & Szczypi, K. (2014, March). Trends in steganography. Communications of the ACM, 57(3), 86–95. doi:10.1145/2566590.2566610