Publications | Science of Security Virtual Organization

Static Malware Analysis using PE Header files API

In today’s fast pacing world, cybercrimes have time and again proved to be one of the biggest hindrances in national development. According to recent trends, most of the times the victim’s data is breached by trapping it in a phishing attack. Security and privacy of user’s data has become a matter of tremendous concern. In order to address this problem and to protect the naive user’s data, a tool which may help to identify whether a window executable is malicious or not by doing static analysis on it has been proposed. As well as a comparative study has been performed by implementing different classification models like Logistic Regression, Neural Network, SVM. The static analysis approach used takes into parameters of the executables, analysis of properties obtained from PE Section Headers i.e. API calls. Comparing different model will provide the best model to be used for static malware analysis

Authored by Naman Aggarwal, Pradyuman Aggarwal, Rahul Gupta

Dynamic malicious code detection technology based on deep learning

In this paper, the malicious code is run in the sandbox in a safe and controllable environment, the API sequence is deduplicated by the idea of the longest common subsequence, and the CNN and Bi-LSTM are integrated to process and analyze the API sequence. Compared with the method, the method using deep learning can have higher accuracy and work efficiency.

Authored by Lizhuo Wei, Fengkai Xu, Ni Zhang, Wei Yan, Chuchu Chai

MicroBlind: Flexible and Secure File System Middleware for Application Sandboxes

Virtual machine (VM) based application sandboxes leverage strong isolation guarantees of virtualization techniques to address several security issues through effective containment of malware. Specifically, in end-user physical hosts, potentially vulnerable applications can be isolated from each other (and the host) using VM based sandboxes. However, sharing data across applications executing within different sandboxes is a non-trivial requirement for end-user systems because at the end of the day, all applications are used by the end-user owning the device. Existing file sharing techniques compromise the security or efficiency, especially considering lack of technical expertise of many end-users in the contemporary times. In this paper, we propose MicroBlind, a security hardened file sharing framework for virtualized sandboxes to support efficient data sharing across different application sandboxes. MicroBlind enables a simple file sharing management API for end users where the end user can orchestrate file sharing across different VM sandboxes in a secure manner. To demonstrate the efficacy of MicroBlind, we perform comprehensive empirical analysis against existing data sharing techniques (augmented for the sandboxing setup) and show that MicroBlind provides improved security and efficiency.

Authored by Saketh Maddamsetty, Ayush Tharwani, Debadatta Mishra

Sandbox Integrated Gateway for the Discovery of Cybersecurity Vulnerabilities

Emails are widely used as a form of communication and sharing files in an organization. However, email is widely used by cybercriminals to spread malware and carrying out cyber-attacks. We implemented an open-source email gateway in conjunction with a security sandbox for securing emails against malicious attachments. The email gateway scans all incoming and outgoing emails and stops emails containing suspicious files. An automated python script would then send the suspected email to the sandboxing element through sandbox API for further analysis, while the script is used also for the prevention of duplicate results. Moreover, the mail server administrator receives notifications from the email gateway about suspicious attachments. If detected attachment is a true positive based on the sandbox analysis result, email is deleted, otherwise, the email is delivered to the recipient. The paper describes in an empirical way the steps followed during the implementation, results, and conclusions of our research.

Authored by Alexandre Rekeraho, Titus Balan, Daniel Cotfas, Petru Cotfas, Rebecca Acheampong, Cristian Musuroi

Challenges in Migrating Imperative Deep Learning Programs to Graph Execution: An Empirical Study

Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code that supports symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce DL code that is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, less error-prone imperative DL frameworks encouraging eager execution have emerged at the expense of run-time performance. While hybrid approaches aim for the “best of both worlds,” the challenges in applying them in the real world are largely unknown. We conduct a data-driven analysis of challenges-and resultant bugs-involved in writing reliable yet performant imperative DL code by studying 250 open-source projects, consisting of 19.7 MLOC, along with 470 and 446 manually examined code patches and bug reports, respectively. The results indicate that hybridization: (i) is prone to API misuse, (ii) can result in performance degradation-the opposite of its intention, and (iii) has limited application due to execution mode incompatibility. We put forth several recommendations, best practices, and anti-patterns for effectively hybridizing imperative DL code, potentially benefiting DL practitioners, API designers, tool developers, and educators.

Authored by Tatiana Vélez, Raffi Khatchadourian, Mehdi Bagherzadeh, Anita Raja

A Framework for Automated API Fuzzing at Enterprise Scale

Web-based Application Programming Interfaces (APIs) are often described using SOAP, OpenAPI, and GraphQL specifications. These specifications provide a consistent way to define web services and enable automated fuzz testing. As such, many fuzzers take advantage of these specifications. However, in an enterprise setting, the tools are usually installed and scaled by individual teams, leading to duplication of efforts. There is a need for an enterprise-wide fuzz testing solution to provide shared, cost efficient, off-nominal testing at scale where fuzzers can be plugged-in as needed. Internet cloud-based fuzz testing-as-a-service solutions mitigate scalability concerns but are not always feasible as they require artifacts to be uploaded to external infrastructure. Typically, corporate policies prevent sharing artifacts with third parties due to cost, intellectual property, and security concerns. We utilize API specifications and combine them with cluster computing elasticity to build an automated, scalable framework that can fuzz multiple apps at once and retain the trust boundary of the enterprise.

Authored by Riyadh Mahmood, Jay Pennington, Danny Tsang, Tan Tran, Andrea Bogle

A Demo of a Software Platform for Ubiquitous Big Data Engineering, Visualization, and Analytics, via Reconfigurable Micro-Services, in Smart Factories

Intelligent, smart, Cloud, reconfigurable manufac-turing, and remote monitoring, all intersect in modern industry and mark the path toward more efficient, effective, and sustain-able factories. Many obstacles are found along the path, including legacy machineries and technologies, security issues, and software that is often hard, slow, and expensive to adapt to face unforeseen challenges and needs in this fast-changing ecosystem. Light-weight, portable, loosely coupled, easily monitored, variegated software components, supporting Edge, Fog and Cloud computing, that can be (re)created, (re)configured and operated from remote through Web requests in a matter of milliseconds, and that rely on libraries of ready-to-use tasks also extendable from remote through sub-second Web requests, constitute a fertile technological ground on top of which fourth-generation industries can be built. In this demo it will be shown how starting from a completely virgin Docker Engine, it is possible to build, configure, destroy, rebuild, operate, exclusively from remote, exclusively via API calls, computation networks that are capable to (i) raise alerts based on configured thresholds or trained ML models, (ii) transform Big Data streams, (iii) produce and persist Big Datasets on the Cloud, (iv) train and persist ML models on the Cloud, (v) use trained models for one-shot or stream predictions, (vi) produce tabular visualizations, line plots, pie charts, histograms, at real-time, from Big Data streams. Also, it will be shown how easily such computation networks can be upgraded with new functionalities at real-time, from remote, via API calls.

Authored by Mirco Soderi, Vignesh Kamath, John Breslin

Android Malware Detection Based on Heterogeneous Information Network with Cross-Layer Features

As a mature and open mobile operating system, Android runs on many IoT devices, which has led to Android-based IoT devices have become a hotbed of malware. Existing static detection methods for malware using artificial intelligence algorithms focus only on the java code layer when extracting API features, however there is a lot of malicious behavior involving native layer code. Thus, to make up for the neglect of the native code layer, we propose a heterogeneous information network-based Android malware detection method with cross-layer features. We first translate the semantic information of apps and API calls into the form of meta-paths, and construct the adjacency of apps based on API calls, then combine information from different meta-paths using multi-core learning. We implemented our method on the dataset from VirusShare and AndroZoo, and the experimental results show that the accuracy of our method is 93.4%, which is at least 2% higher than other related methods using heterogeneous information networks for malware detection.

Authored by Ren Xixuan, Zhao Lirui, Wang Kai, Xue Zhixing, Hou Anran, Shao Qiao

Exploiting Input Sanitization for Regex Denial of Service

Web services use server-side input sanitization to guard against harmful input. Some web services publish their sanitization logic to make their client interface more usable, e.g., allowing clients to debug invalid requests locally. However, this usability practice poses a security risk. Specifically, services may share the regexes they use to sanitize input strings - and regex-based denial of service (ReDoS) is an emerging threat. Although prominent service outages caused by ReDoS have spurred interest in this topic, we know little about the degree to which live web services are vulnerable to ReDoS. In this paper, we conduct the first black-box study measuring the extent of ReDoS vulnerabilities in live web services. We apply the Consistent Sanitization Assumption: that client-side sanitization logic, including regexes, is consistent with the sanitization logic on the server-side. We identify a service's regex-based input sanitization in its HTML forms or its API, find vulnerable regexes among these regexes, craft ReDoS probes, and pinpoint vulnerabilities. We analyzed the HTML forms of 1,000 services and the APIs of 475 services. Of these, 355 services publish regexes; 17 services publish unsafe regexes; and 6 services are vulnerable to ReDoS through their APIs (6 domains; 15 subdomains). Both Microsoft and Amazon Web Services patched their web services as a result of our disclosure. Since these vulnerabilities were from API specifications, not HTML forms, we proposed a ReDoS defense for a popular API validation library, and our patch has been merged. To summarize: in client-visible sanitization logic, some web services advertise Re-DoS vulnerabilities in plain sight. Our results motivate short-term patches and long-term fundamental solutions. “Make measurable what cannot be measured.” -Galileo Galilei

Authored by Efe Barlas, Xin Du, James Davis