Multicore Computing Security (Update)

As high performance computing has evolved into larger and faster computing solutions, new approaches to security have been identified. The articles cited here focus on security issues related to multicore environments. These articles focus on a new secure processor that obfuscates its memory access trace, proactive dynamic load balancing on multicore systems, and, an experimental OS tailored to multicore processors of interest in signal processing. These materials were published in the first half of 2014.

Sushil Jajodia, Krishna Kant, Pierangela Samarati, Anoop Singhal, Vipin Swarup, Cliff Wang, Secure Cloud Computing, Springer Publishing Company, ©2014. ISBN:1461492777 9781461492771. (ID#:14-1698) URL: http://dl.acm.org/citation.cfm?id=2584533&coll=DL&dl=GUIDE&CFID=390360820&CFTOKEN=56962601 This book presents a range of cloud computing security challenges and promising solution paths.

Marat Zhanikeev, “A software Design And Algorithms For Multicore Capture In Data Center Forensics,” SFCS '14 Proceedings of the 2nd International Workshop On Security And Forensics In Communication Systems, June 2014, Pages 11-18. (ID#:14-1699) URL: http://dl.acm.org/citation.cfm?id=2598918.2598923&coll=DL&dl=GUIDE&CFID=390360820&CFTOKEN=56962601 or http://dx.doi.org/10.1145/2598918.2598923 With rapid dissemination of cloud computing, data centers are quickly turning into platforms that host highly heterogeneous collections of services. Traditional approach to security and performance management finds it difficult to cope in such environments. Specifically, it is becoming increasingly difficult to capture and process all the necessary information at data centers in real time, where packet capture at data center gateways can serve as a practical example. This paper proposes a generic design for capturing and processing information on multicore architectures. The two main parts of the proposal are (1) the optimization formulation for distributing tasks across cores and (2) practical design and implementation of a shared memory which can be used for communication between processes in a non-traditional way that does not require memory locking or message passing. Keywords: data center forensics, information capture, lock-free design, multicore architecture, multicore capture, packet capture, parallel processing, shared memory

Ruby Lee, Weidong Shi, Proceedings of the Third Workshop on Hardware and Architectural Support for Security and Privacy, June 2014. (ID#:14-1700) URL: http://dl.acm.org/citation.cfm?id=2611765&coll=DL&dl=GUIDE&CFID=390360820&CFTOKEN=56962601 It is our great pleasure to introduce the technical program for the 3nd International workshop on Hardware and Architectural Support for Security and Privacy (HASP 2014), which will be held in conjunction with the 41st International Symposium on Computer Architecture (ISCA 2014) Minneapolis, MN, USA on June 15, 2014. Although much attention has been directed to the study of security at the system and application levels, security and privacy research focusing on hardware and architecture aspects is at a new frontier. In the era of cloud computing, pervasive intelligent systems, and nano-scale devices, practitioners and researchers have to address new challenges and requirements in order to meet the ever-changing landscape of security research and new demands from consumers, enterprises, governments, defense and other industries. The goal of HASP is to bring together researchers, developers, and practitioners from academia and industry, to share practical insights, experiences and implementations related to all aspects of hardware and architectural support for security and privacy, and to discuss future trends in research and applications. We encourage contributions describing innovative work on hardware and architectural support for trust management, security of cloud computing, smartphones and Internet of Things, FPGA, SOC and multicore security, etc.

Bryan Jeffery Parno, Trust Extension as a Mechanism for Secure Code Execution on Commodity Computers, Trust Extension as a Mechanism for Secure Code Execution on Commodity Computers Association for Computing Machinery and Morgan & Claypool, New York, NY, USA ©2014. ISBN: 978-1-62705-477-5. (ID#:14-1701) With the increase of digitizing sensitive information, it is it is imperative that we adopt adequate security protections. This mandate conflicts with consumer expectations of commodity computers. With regards to security and features, the author discusses aspects of trust, performance and features of commodity devices and services. Keywords: multicore computing, security

Shih-Hao Hung, Po-Hsun Chiu, Chia-Heng Tu, Wei-Ting Chou, Wen-Long Yang, “Message-Passing Programming for Embedded Multicore Signal-Processing Platforms,” Journal of Signal Processing Systems, Volume 75 Issue 2, May 2014, Pages 123-139. (ID#:14-1702) Recently, embedded multicore platforms have become popular for signal processing, but software development for such platforms is still very slow. The authors suggest the use ofa standard message-passing programming such as a light-weight MPI to support message passing on popular embedded multicore signal-processing platforms. Keywords: Embedded systems, Message-passing, Multicore, Performance optimization, Signal processing, Software portability

Berger, M.; Erlacher, F.; Sommer, C.; Dressler, F., "Adaptive load allocation for combining Anomaly Detectors using controlled skips," Computing, Networking and Communications (ICNC), 2014 International Conference on , vol., no., pp.792,796, 3-6 Feb. 2014. (ID#:14-1703) URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6785438&isnumber=6785290 Traditional Intrusion Detection Systems (IDS) can be complemented by an Anomaly Detection Algorithm (ADA) to also identify unknown attacks. We argue that, as each ADA has its own strengths and weaknesses, it might be beneficial to rely on multiple ADAs to obtain deeper insights. ADAs are very resource intensive; thus, real-time detection with multiple algorithms is even more challenging in high-speed networks. To handle such high data rates, we developed a controlled load allocation scheme that adaptively allocates multiple ADAs on a multi-core system. The key idea of this concept is to utilize as many algorithms as possible without causing random packet drops, which is the typical system behavior in overload situations. We developed a proof of concept anomaly detection framework with a sample set of ADAs. Our experiments confirm that the detection performance can substantially benefit from using multiple algorithms and that the developed framework is also able to cope with high packet rates. Keywords: multiprocessing systems; real-time systems; resource allocation; security of data; ADA; IDS; adaptive load allocation; anomaly detection algorithm; controlled load allocation; controlled skips; high-speed networks; intrusion detection systems; multicore system; multiple algorithms; real-time detection; resource intensive; unknown attacks; High-speed networks; Intrusion detection; Probabilistic logic; Reliability; Uplink; World Wide Web

Kong, J.; Koushanfar, F., "Processor-Based Strong Physical Unclonable Functions With Aging-Based Response Tuning," Emerging Topics in Computing, IEEE Transactions on , vol.2, no.1, pp.16,29, March 2014. (ID#:14-1704) URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6656920&isnumber=6824880 A strong physically unclonable function (PUF) is a circuit structure that extracts an exponential number of unique chip signatures from a bounded number of circuit components. The strong PUF unique signatures can enable a variety of low-overhead security and intellectual property protection protocols applicable to several computing platforms. This paper proposes a novel lightweight (low overhead) strong PUF based on the timings of a classic processor architecture. A small amount of circuitry is added to the processor for on-the-fly extraction of the unique timing signatures. To achieve desirable strong PUF properties, we develop an algorithm that leverages intentional post-silicon aging to tune the inter- and intra-chip signatures variation. Our evaluation results show that the new PUF meets the desirable inter- and intra-chip strong PUF characteristics, whereas its overhead is much lower than the existing strong PUFs. For the processors implemented in 45 nm technology, the average inter-chip Hamming distance for 32-bit responses is increased by 16.1% after applying our post-silicon tuning method; the aging algorithm also decreases the average intra-chip Hamming distance by 98.1% (for 32-bit responses). Keywords: Aging; Circuit optimization; Delays; Logic gates; Microprocessors; Multicore processing; Network security; Silicon; Temperature measurement; Circuit aging; Multi-core processor; Negative bias temperature instability; Physically unclonable function; Post-silicon tuning; Secure computing platform; circuit aging; multi-core processor; negative bias temperature instability; postsilicon tuning; secure computing platform

Kishore, N.; Kapoor, B., "An efficient parallel algorithm for hash computation in security and forensics applications," Advance Computing Conference (IACC), 2014 IEEE International , vol., no., pp.873,877, 21-22 Feb. 2014. (ID#:14-1705) URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6779437&isnumber=6779283 Hashing algorithms are used extensively in information security and digital forensics applications. This paper presents an efficient parallel algorithm hash computation. It's a modification of the SHA-1 algorithm for faster parallel implementation in applications such as the digital signature and data preservation in digital forensics. The algorithm implements recursive hash to break the chain dependencies of the standard hash function. We discuss the theoretical foundation for the work including the collision probability and the performance implications. The algorithm is implemented using the OpenMP API and experiments performed using machines with multicore processors. The results show a performance gain by more than a factor of 3 when running on the 8-core configuration of the machine. Keywords: application program interfaces; cryptography; digital forensics; digital signatures; file organization; parallel algorithms; probability; OpenMP API;SHA-1 algorithm; collision probability; data preservation; digital forensics; digital signature; hash computation; hashing algorithms; information security; parallel algorithm; standard hash function; Algorithm design and analysis; Conferences; Cryptography; Multicore processing; Program processors; Standards; Cryptographic Hash Function; Digital Forensics; Digital Signature;MD5; Multicore Processors; OpenMP; SHA-1

Dean Michael Ancajas, Koushik Chakraborty, Sanghamitra Roy, “Fort-NoCs: Mitigating the Threat of a Compromised NoC,” DAC '14 Proceedings of the 51st Annual Design Automation Conference on Design Automation Conference, June 2014, Pages 1-6. (ID#:14-1706) URL: http://dl.acm.org/citation.cfm?id=2593069.2593144&coll=DL&dl=GUIDE&CFID=390360820&CFTOKEN=56962601 or http://dx.doi.org/10.1145/2593069.2593144 In this paper, we uncover a novel and imminent threat to an emerging computing paradigm: MPSoCs built with 3rd party IP NoCs. We demonstrate that a compromised NoC (C-NoC) can enable a range of security attacks with an accomplice software component. To counteract these threats, we propose Fort-NoCs, a series of techniques that work together to provide protection from a C-NoC in an MPSoC. Fort-NoCs's foolproof protection disables covert backdoor activation, and reduces the chance of a successful side-channel attack by "clouding" the information obtained by an attacker. Compared to recently proposed techniques, Fort-NoCs offers a substantially better protection with lower overheads. Keywords: (not provided)

Kekai Hu, Tilman Wolf, Thiago Teixeira, Russell Tessier, “System-Level Security for Network Processors with Hardware Monitors,” DAC '14 Proceedings of the 51st Annual Design Automation Conference on Design Automation Conference, June 2014, Pages 1-6. (ID#:14-1707) URL: http://dl.acm.org/citation.cfm?id=2593069.2593226&coll=DL&dl=GUIDE&CFID=390360820&CFTOKEN=56962601 or http://dx.doi.org/10.1145/2593069.2593226 New attacks are emerging that target the Internet infrastructure. Modern routers use programmable network processors that may be exploited by merely sending suitably crafted data packets into a network. Hardware monitors that are co-located with processor cores can detect attacks that change processor behavior with high probability. In this paper, we present a solution to the problem of secure, dynamic installation of hardware monitoring graphs on these devices. We also address the problem of how to overcome the homogeneity of a network with many identical devices, where a successful attack, albeit possible only with small probability, may have devastating effects. Keywords: (not provided)

Richard L. Moore, Chaitan Baru, Diane Baxter, Geoffrey C. Fox, Amit Majumdar, Phillip Papadopoulos, Wayne Pfeiffer, Robert S. Sinkovits, Shawn Strande, Mahidhar Tatineni, Richard P. Wagner, Nancy Wilkins-Diehr, Michael L. Norman, “Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science,” XSEDE '14 Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, July 2014, Article No. 39. (ID#:14-1708) URL: http://dl.acm.org/citation.cfm?id=2616498.2616540&coll=DL&dl=GUIDE&CFID=390360820&CFTOKEN=56962601 or http://dx.doi.org/10.1145/2616498.2616540 NSF-funded computing centers have primarily focused on delivering high-performance computing resources to academic researchers with the most computationally demanding applications. But now that computational science is so pervasive, there is a need for infrastructure that can serve more researchers and disciplines than just those at the peak of the HPC pyramid. Here we describe SDSC's Comet system, which is scheduled for production in January 2015 and was designed to address the needs of a much larger and more expansive science community-- the "long tail of science". Comet will have a peak performance of 2 petaflop/s, mostly delivered using Intel's next generation Xeon processor. It will include some large-memory and GPU-accelerated nodes, node-local flash memory, 7 PB of Performance Storage, and 6 PB of Durable Storage. These features, together with the availability of high performance virtualization, will enable users to run complex, heterogeneous workloads on a single integrated resource. Keywords: GPU, High performance computing, high throughput computing, parallel file system, science gateways, scientific applications, solid-state drive, user support, virtualization

Chia-Che Tsai, Kumar Saurabh Arora, Nehal Bandi, Bhushan Jain, William Jannen, Jitin John, Harry A. Kalodner, Vrushali Kulkarni, Daniela Oliveira, Donald E. Porter, “Cooperation and Security Isolation Of Library Oses For Multi-Process Applications,” EuroSys '14 Proceedings of the Ninth European Conference on Computer Systems, April 2014, Article No. 9. (ID#:14-1709) URL: http://dl.acm.org/citation.cfm?id=2592798.2592812&coll=DL&dl=GUIDE&CFID=390360820&CFTOKEN=56962601 or http://dx.doi.org/10.1145/2592798.2592812 Library OSes are a promising approach for applications to efficiently obtain the benefits of virtual machines, including security isolation, host platform compatibility, and migration. Library OSes refactor a traditional OS kernel into an application library, avoiding overheads incurred by duplicate functionality. When compared to running a single application on an OS kernel in a VM, recent library OSes reduce the memory footprint by an order-of-magnitude. Previous library OS (libOS) research has focused on single-process applications, yet many Unix applications, such as network servers and shell scripts, span multiple processes. Key design challenges for a multi-process libOS include management of shared state and minimal expansion of the security isolation boundary. This paper presents Graphene, a library OS that seamlessly and efficiently executes both single and multi-process applications, generally with low memory and performance overheads. Graphene broadens the libOS paradigm to support secure, multi-process APIs, such as copy-on-write fork, signals, and System V IPC. Multiple libOS instances coordinate over pipe-like byte streams to implement a consistent, distributed POSIX abstraction. These coordination streams provide a simple vantage point to enforce security isolation. Keywords: (not provided)

Note:

Articles listed on these pages have been found on publicly available internet pages and are cited with links to those pages. Some of the information included herein has been reprinted with permission from the authors or data repositories. Direct any requests via Email to SoS.Project (at) SecureDataBank.net for removal of the links or modifications to specific citations. Please include the ID# of the specific citation in your correspondence.