TVM(tensor virtual machine) as a deep learning compiler which supports the conversion of machine learning models into TVM IR(intermediate representation) and to optimise the generation of high-performance machine code for various hardware platforms. While the traditional approach is to parallelise the cyclic transformations of operators, in this paper we partition the implementation of the operators in the deep learning compiler TVM with parallel scheduling to derive a faster running time solution for the operators. An optimisation algorithm for partitioning and parallel scheduling is designed for the deep learning compiler TVM, where operators such as two-dimensional convolutions are partitioned into multiple smaller implementations and several partitioned operators are run in parallel scheduling to derive the best operator partitioning and parallel scheduling decisions by means of performance estimation. To evaluate the effectiveness of the algorithm, multiple examples of the two-dimensional convolution operator, the average pooling operator, the maximum pooling operator, and the ReLU activation operator with different input sizes were tested on the CPU platform, and the performance of these operators was experimentally shown to be improved and the operators were run speedily.
Authored by Zhiyu Li, Xiang Zhou, Wenbin Weng
Binary analysis is pervasively utilized to assess software security and test vulnerabilities without accessing source codes. The analysis validity is heavily influenced by the inferring ability of information related to the code compilation. Among the compilation information, compiler type and optimization level, as the key factors determining how binaries look like, are still difficult to be inferred efficiently with existing tools. In this paper, we conduct a thorough empirical study on the binary's appearance under various compilation settings and propose a lightweight binary analysis tool based on the simplest machine learning method, called DIComP to infer the compiler and optimization level via most relevant features according to the observation. Our comprehensive evaluations demonstrate that DIComP can fully recognize the compiler provenance, and it is effective in inferring the optimization levels with up to 90% accuracy. Also, it is efficient to infer thousands of binaries at a millisecond level with our lightweight machine learning model (1MB).
Authored by Ligeng Chen, Zhongling He, Hao Wu, Fengyuan Xu, Yi Qian, Bing Mao
Neural program embeddings have demonstrated considerable promise in a range of program analysis tasks, including clone identification, program repair, code completion, and program synthesis. However, most existing methods generate neural program embeddings di-rectly from the program source codes, by learning from features such as tokens, abstract syntax trees, and control flow graphs. This paper takes a fresh look at how to improve program embed-dings by leveraging compiler intermediate representation (IR). We first demonstrate simple yet highly effective methods for enhancing embedding quality by training embedding models alongside source code and LLVM IR generated by default optimization levels (e.g., -02). We then introduce IRGEN, a framework based on genetic algorithms (GA), to identify (near-)optimal sequences of optimization flags that can significantly improve embedding quality. We use IRGEN to find optimal sequences of LLVM optimization flags by performing GA on source code datasets. We then extend a popular code embedding model, CodeCMR, by adding a new objective based on triplet loss to enable a joint learning over source code and LLVM IR. We benchmark the quality of embedding using a rep-resentative downstream application, code clone detection. When CodeCMR was trained with source code and LLVM IRs optimized by findings of IRGEN, the embedding quality was significantly im-proved, outperforming the state-of-the-art model, CodeBERT, which was trained only with source code. Our augmented CodeCMR also outperformed CodeCMR trained over source code and IR optimized with default optimization levels. We investigate the properties of optimization flags that increase embedding quality, demonstrate IRGEN's generalization in boosting other embedding models, and establish IRGEN's use in settings with extremely limited training data. Our research and findings demonstrate that a straightforward addition to modern neural code embedding models can provide a highly effective enhancement.
Authored by Zongjie Li, Pingchuan Ma, Huaijin Wang, Shuai Wang, Qiyi Tang, Sen Nie, Shi Wu
Derivatives are key to numerous science, engineering, and machine learning applications. While existing tools generate derivatives of programs in a single language, modern parallel applications combine a set of frameworks and languages to leverage available performance and function in an evolving hardware landscape. We propose a scheme for differentiating arbitrary DAG-based parallelism that preserves scalability and efficiency, implemented into the LLVM-based Enzyme automatic differentiation framework. By integrating with a full-fledged compiler backend, Enzyme can differentiate numerous parallel frameworks and directly control code generation. Combined with its ability to differentiate any LLVM-based language, this flexibility permits Enzyme to leverage the compiler tool chain for parallel and differentiation-specitic optimizations. We differentiate nine distinct versions of the LULESH and miniBUDE applications, written in different programming languages (C++, Julia) and parallel frameworks (OpenMP, MPI, RAJA, Julia tasks, MPI.jl), demonstrating similar scalability to the original program. On benchmarks with 64 threads or nodes, we find a differentiation overhead of 3.4–6.8× on C++ and 5.4–12.5× on Julia.
Authored by William Moses, Sri Narayanan, Ludger Paehler, Valentin Churavy, Michel Schanen, Jan Hückelheim, Johannes Doerfert, Paul Hovland
With memory safety and security issues continuing to plague modern systems, security is rapidly becoming a first class priority in new architectures and competes directly with performance and power efficiency. The capability-based architecture model provides a promising solution to many memory vulnerabilities by replacing plain addresses with capabilities, i.e., addresses and related metadata. A key advantage of the capability model is compatibility with existing code bases. Capabilities can be implemented transparently to a programmer, i.e., without source code changes. Capabilities leverage semantics in source code to describe access permissions but require customized compilers to translate the semantics to their binary equivalent.In this work, we introduce a complete capabilityaware compiler toolchain for such secure architectures. We illustrate the compiler construction with a RISC-V capability-based architecture, called Zeno. As a securityfocused, large-scale, global shared memory architecture, Zeno implements a Namespace-based capability model for accesses. Namespace IDs (NSID) are encoded with an extended addressing model to associate them with access permission metadata elsewhere in the system. The NSID extended addressing model requires custom compiler support to fully leverage the protections offered by Namespaces. The Zeno compiler produces code transparently to the programmer that is aware of Namespaces and maintains their integrity. The Zeno assembler enables custom Zeno instructions which support secure memory operations. Our results show that our custom toolchain moderately increases the binary size compared to nonZeno compilation. We find the minimal overhead incurred by the additional NSID management instructions to be an acceptable trade-off for the memory safety and security offered by Zeno Namespaces.
Authored by Jacob Abraham, Alan Ehret, Michel Kinsy
Model checking is one of the most commonly used technique in formal verification. However, the exponential scale state space renders exhaustive state enumeration inefficient even for a moderate System on Chip (SoC) design. In this paper, we propose a method that leverages symbolic execution to accelerate state space search and pinpoint security vulnerabilities. We automatically convert the hardware design to functionally equivalent C++ code and utilize the KLEE symbolic execution engine to perform state exploration through heuristic search. To reduce the search space, we symbolically represent essential input signals while making non-critical inputs concrete. Experiment results have demonstrated that our method can precisely identify security vulnerabilities at significantly lower computation cost.
Authored by Shibo Tang, Xingxin Wang, Yifei Gao, Wei Hu
Fractional repetition (FR) codes are a special family of regenerating codes with the repair-by-transfer property. The constructions of FR codes are naturally related to combinatorial designs, graphs, and hypergraphs. Given the file size of an FR code, it is desirable to determine the minimum number of storage nodes needed. The problem is related to an extremal graph theory problem, which asks for the minimum number of vertices of an α-regular graph such that any subgraph with k vertices has at most δ edges. In this paper, we present a class of regular graphs for this problem to give the bounds for the minimum number of storage nodes for the FR codes.
Authored by Hongna Yang, Yiwei Zhang
We present a ternary source coding scheme in this paper, which is a special class of low density generator matrix (LDGM) codes. We prove that a ternary linear block LDGM code, whose generator matrix is randomly generated with each element independent and identically distributed, is universal for source coding in terms of the symbol-error rate (SER). To circumvent the high-complex maximum likelihood decoding, we introduce a special class of convolutional LDGM codes, called block Markov superposition transmission of repetition (BMST-R) codes, which are iteratively decodable by a sliding window algorithm. Then the presented BMST-R codes are applied to construct a tandem scheme for Gaussian source compression, where a dead-zone quantizer is introduced before the ternary source coding. The main advantages of this scheme are its universality and flexibility. The dead-zone quantizer can choose a proper quantization level according to the distortion requirement, while the LDGM codes can adapt the code rate to approach the entropy of the quantized sequence. Numerical results show that the proposed scheme performs well for ternary sources over a wide range of code rates and that the distortion introduced by quantization dominates provided that the code rate is slightly greater than the discrete entropy.
Authored by Tingting Zhu, Jifan Liang, Xiao Ma
Nowadays, improving the reliability and security of the transmitted data has gained more attention with the increase in emerging power-limited and lightweight communication devices. Also, the transmission needs to meet specific latency requirements. Combining data encryption and encoding in one physical layer block has been exploited to study the effect on security and latency over traditional sequential data transmission. Some of the current works target secure error-correcting codes that may be candidates for post-quantum computing. However, modifying the popularly used channel coding techniques to guarantee secrecy and maintain the same error performance and complexity at the decoder is challenging since the structure of the channel coding blocks is altered which results in less optimal decoding performance. Also, the redundancy nature of the error-correcting codes complicates the encryption method. In this paper, we briefly review the proposed security schemes on Turbo codes. Then, we propose a secure turbo code design and compare it with the relevant security schemes in the literature. We show that the proposed method is more secure without adding complexity.
Authored by Ahmed Aladi, Emad Alsusa
Vulnerability discovery is an important field of computer security research and development today. Because most of the current vulnerability discovery methods require large-scale manual auditing, and the code parsing process is cumbersome and time-consuming, the vulnerability discovery effect is reduced. Therefore, for the uncertainty of vulnerability discovery itself, it is the most basic tool design principle that auxiliary security analysts cannot completely replace them. The purpose of this paper is to study the source code vulnerability discovery method based on graph neural network. This paper analyzes the three processes of data preparation, source code vulnerability mining and security assurance of the source code vulnerability mining method, and also analyzes the suspiciousness and particularity of the experimental results. The empirical analysis results show that the types of traditional source code vulnerability mining methods become more concise and convenient after using graph neural network technology, and we conducted a survey and found that more than 82% of people felt that the design source code vulnerability mining method used When it comes to graph neural networks, it is found that the design efficiency has become higher.
Authored by Zhenghong Jiang
In this paper, we propose a new ordered statistics decoding (OSD) for linear block codes, which is referred to as local constraint-based OSD (LC-OSD). Distinguished from the conventional OSD, which chooses the most reliable basis (MRB) for re-encoding, the LC-OSD chooses an extended MRB on which local constraints are naturally imposed. A list of candidate codewords is then generated by performing a serial list Viterbi algorithm (SLVA) over the trellis specified with the local constraints. To terminate early the SLVA for complexity reduction, we present a simple criterion which monitors the ratio of the bound on the likelihood of the unexplored candidate codewords to the sum of the hard-decision vector’s likelihood and the up-to-date optimal candidate’s likelihood. Simulation results show that the LC-OSD can have a much less number of test patterns than that of the conventional OSD but cause negligible performance loss. Comparisons with other complexity-reduced OSDs are also conducted, showing the advantages of the LC-OSD in terms of complexity.
Authored by Yiwen Wang, Jifan Liang, Xiao Ma
Vulnerability detection has always been an essential part of maintaining information security, and the existing work can significantly improve the performance of vulnerability detection. However, due to the differences in representation forms and deep learning models, various methods still have some limitations. In order to overcome this defect, We propose a vulnerability detection method VDBWGDL, based on weight graphs and deep learning. Firstly, it accurately locates vulnerability-sensitive keywords and generates variant codes that satisfy vulnerability trigger logic and programmer programming style through code variant methods. Then, the control flow graph is sliced for vulnerable code keywords and program critical statements. The code block is converted into a vector containing rich semantic information and input into the weight map through the deep learning model. According to specific rules, different weights are set for each node. Finally, the similarity is obtained through the similarity comparison algorithm, and the suspected vulnerability is output according to different thresholds. VDBWGDL improves the accuracy and F1 value by 3.98% and 4.85% compared with four state-of-the-art models. The experimental results prove the effectiveness of VDBWGDL.
Authored by Xin Zhang, Hongyu Sun, Zhipeng He, MianXue Gu, Jingyu Feng, Yuqing Zhang
This paper presents secure MatDot codes, a family of evaluation codes that support secure distributed matrix multiplication via a careful selection of evaluation points that exploit the properties of the dual code. We show that the secure MatDot codes provide security against the user by using locally recoverable codes. These new codes complement the recently studied discrete Fourier transform codes for distributed matrix multiplication schemes that also provide security against the user. There are scenarios where the associated costs are the same for both families and instances where the secure MatDot codes offer a lower cost. In addition, the secure MatDot code provides an alternative way to handle the matrix multiplication by identifying the fastest servers in advance. In this way, it can determine a product using fewer servers, specified in advance, than the MatDot codes which achieve the optimal recovery threshold for distributed matrix multiplication schemes.
Authored by Hiram López, Gretchen Matthews, Daniel Valvo
Social Internet of Vehicle (SIoV) has emerged as one of the most promising applications for vehicle communication, which provides safe and comfortable driving experience. It reduces traffic jams and accidents, thereby saving public resources. However, the wrongly communicated messages would cause serious issues, including life threats. So it is essential to ensure the reliability of the message before acting on considering that. Existing works use cryptographic primitives like threshold authentication and ring signatures, which incurs huge computation and communication overheads, and the ring signature size grew linearly with the threshold value. Our objective is to keep the signature size constant regardless of the threshold value. This work proposes MuSigRDT, a multisignature contract based data transmission protocol using Schnorr digital signature. MuSigRDT provides incentives, to encourage the vehicles to share correct information in real-time and participate honestly in SIoV. MuSigRDT is shown to be secure under Universal Composability (UC) framework. The MuSigRDT contract is deployed on Ethereum's Rinkeby testnet.
Authored by Badavath Naik, Somanath Tripathy, Susil Mohanty
Nowadays, microservice architecture is known as a successful and promising architecture for smart city applications. Applying microservices in the designing and implementation of systems has many advantages such as autonomy, loosely coupled, composability, scalability, fault tolerance. However, the complexity of calling between microservices leads to problems in security, accessibility, and data management in the execution of systems. In order to address these challenges, in recent years, various researchers and developers have focused on the use of microservice patterns in the implementation of microservice-based systems. Microservice patterns are the result of developers’ successful experiences in addressing common challenges in microservicebased systems. However, hitherto no guideline has been provided for an in-depth understanding of microservice patterns and how to apply them to real systems. The purpose of this paper is to investigate in detail the most widely used and important microservice patterns in order to analyze the function of each pattern, extract the behavioral signatures and construct a service dependency graph for them so that researchers and enthusiasts use the provided guideline to create a microservice-based system equipped with design patterns. To construct the proposed guideline, five real open source projects have been carefully investigated and analyzed and the results obtained have been used in the process of making the guideline.
Authored by Neda Mohammadi, Abbas Rasoolzadegan
The long-living nature and byte-addressability of persistent memory (PM) amplifies the importance of strong memory protections. This paper develops temporal exposure reduction protection (TERP) as a framework for enforcing memory safety. Aiming to minimize the time when a PM region is accessible, TERP offers a complementary dimension of memory protection. The paper gives a formal definition of TERP, explores the semantics space of TERP constructs, and the relations with security and composability in both sequential and parallel executions. It proposes programming system and architecture solutions for the key challenges for the adoption of TERP, which draws on novel supports in both compilers and hardware to efficiently meet the exposure time target. Experiments validate the efficacy of the proposed support of TERP, in both efficiency and exposure time minimization.
Authored by Yuanchao Xu, Chencheng Ye, Xipeng Shen, Yan Solihin
A weather radar is expected to provide information about weather conditions in real time and valid. To obtain these results, weather radar takes a lot of data samples, so a large amount of data is obtained. Therefore, the weather radar equipment must provide bandwidth for a large capacity for transmission and storage media. To reduce the burden of data volume by performing compression techniques at the time of data acquisition. Compressive Sampling (CS) is a new data acquisition method that allows the sampling and compression processes to be carried out simultaneously to speed up computing time, reduce bandwidth when passed on transmission media, and save storage media. There are three stages in the CS method, namely: sparsity transformation using the Discrete Cosine Transform (DCT) algorithm, sampling using a measurement matrix, and reconstruction using the Orthogonal Matching Pursuit (OMP) algorithm. The sparsity transformation aims to convert the representation of the radar signal into a sparse form. Sampling is used to extract important information from the radar signal, and reconstruction is used to get the radar signal back. The data used in this study is the real data of the IDRA beat signal. Based on the CS simulation that has been done, the best PSNR and RMSE values are obtained when using a CR value of two times, while the shortest computation time is obtained when using a CR value of 32 times. CS simulation in a sector via DCT using the CR value two times produces a PSNR value of 20.838 dB and an RMSE value of 0.091. CS simulation in a sector via DCT using the CR value 32 times requires a computation time of 10.574 seconds.
Authored by Muhammad Ammar, Rita Purnamasari, Gelar Budiman
The Compressive Sensing (CS) has wide range of applications in various domains. The sampling of sparse signal, which is periodic or aperiodic in nature, is still an out of focus topic. This paper proposes novel Sparse Spasmodic Sampling (SSS) techniques for different sparse signal in original domain. The SSS techniques are proposed to overcome the drawback of the existing CS sampling techniques, which can sample any sparse signal efficiently and also find location of non-zero components in signals. First, Sparse Spasmodic Sampling model-1 (SSS-1) which samples random points and also include non-zero components is proposed. Another sampling technique, Sparse Spasmodic Sampling model-2 (SSS-2) has the same working principle as model-1 with some advancements in design. It samples equi-distance points unlike SSS-1. It is demonstrated that, using any sampling technique, the signal is able to reconstruct with a reconstruction algorithm with a smaller number of measurements. Simulation results are provided to demonstrate the effectiveness of the proposed sampling techniques.
Authored by Umesh Mahind, Deepak Karia
Signals get sampled using Nyquist rate in conventional sampling method, but in compressive sensing the signals sampled below Nyquist rate by randomly taking the signal projections and reconstructing it out of very few estimations. But in case of recovering the image by utilizing compressive measurements with the help of multi-resolution grid where the image has certain region of interest (RoI) that is more important than the rest, it is not efficient. The conventional Cartesian sampling cannot give good result in motion image sensing recovery and is limited to stationary image sensing process. The proposed work gives improved results by using Radial sampling (a type of compression sensing). This paper discusses the approach of Radial sampling along with the application of Sparse Fourier Transform algorithms that helps in reducing acquisition cost and input/output overhead.
Authored by Tesu Nema, M. Parsai
Mid-infrared spectroscopic imaging (MIRSI) is an emerging class of label-free, biochemically quantitative technologies targeting digital histopathology. Conventional histopathology relies on chemical stains that alter tissue color. This approach is qualitative, often making histopathologic examination subjective and difficult to quantify. MIRSI addresses these challenges through quantitative and repeatable imaging that leverages native molecular contrast. Fourier transform infrared (FTIR) imaging, the best-known MIRSI technology, has two challenges that have hindered its widespread adoption: data collection speed and spatial resolution. Recent technological breakthroughs, such as photothermal MIRSI, provide an order of magnitude improvement in spatial resolution. However, this comes at the cost of acquisition speed, which is impractical for clinical tissue samples. This paper introduces an adaptive compressive sampling technique to reduce hyperspectral data acquisition time by an order of magnitude by leveraging spectral and spatial sparsity. This method identifies the most informative spatial and spectral features, integrates a fast tensor completion algorithm to reconstruct megapixel-scale images, and demonstrates speed advantages over FTIR imaging while providing spatial resolutions comparable to new photothermal approaches.
Authored by Mahsa Lotfollahi, Nguyen Tran, Chalapathi Gajjela, Sebastian Berisha, Zhu Han, David Mayerich, Rohith Reddy
A power amplifier(PA) is inherently nonlinear device and is used in a communication system widely. Due to the nonlinearity of PA, the communication system is hard to work well. Digital predistortion (DPD) is the way to solve this problem. Using Volterra function to fit the PA is what most DPD solutions do. However, when it comes to wideband signal, there is a deduction on the performance of the Volterra function. In this paper, we replace the Volterra function with B-spline function which performs better on fitting PA at wideband signal. And the other benefit is that the orthogonality of coding matrix A could be improved, enhancing the stability of computation. Additionally, we use compressive sampling to reduce the complexity of the function model.
Authored by Cen Liu, Laiwei Luo, Jun Wang, Chao Zhang, Changyong Pan
Communication systems across a variety of applications are increasingly using the angular domain to improve spectrum management. They require new sensing architectures to perform energy-efficient measurements of the electromagnetic environment that can be deployed in a variety of use cases. This paper presents the Directional Spectrum Sensor (DSS), a compressive sampling (CS) based analog-to-information converter (CS-AIC) that performs spectrum scanning in a focused beam. The DSS offers increased spectrum sensing sensitivity and interferer tolerance compared to omnidirectional sensors. The DSS implementation uses a multi-antenna beamforming architecture with local oscillators that are modulated with pseudo random waveforms to obtain CS measurements. The overall operation, limitations, and the influence of wideband angular effects on the spectrum scanning performance are discussed. Measurements on an experimental prototype are presented and highlight improvements over single antenna, omnidirectional sensing systems.
Authored by Petar Barac, Matthew Bajor, Peter Kinget
The camera constructed by a megahertz range intensity modulation active light source and a kilo-frame rate range fast camera based on compressive sensing (CS) technique for three-dimensional (3D) image acquisition was proposed in this research.
Authored by Quang Pham, Yoshio Hayasaki
The compressed sensing (CS) method can reconstruct images with a small amount of under-sampling data, which is an effective method for fast magnetic resonance imaging (MRI). As the traditional optimization-based models for MRI suffered from non-adaptive sampling and shallow” representation ability, they were unable to characterize the rich patterns in MRI data. In this paper, we propose a CS MRI method based on iterative shrinkage threshold algorithm (ISTA) and adaptive sparse sampling, called DSLS-ISTA-Net. Corresponding to the sampling and reconstruction of the CS method, the network framework includes two folders: the sampling sub-network and the improved ISTA reconstruction sub-network which are coordinated with each other through end-to-end training in an unsupervised way. The sampling sub-network and ISTA reconstruction sub-network are responsible for the implementation of adaptive sparse sampling and deep sparse representation respectively. In the testing phase, we investigate different modules and parameters in the network structure, and perform extensive experiments on MR images at different sampling rates to obtain the optimal network. Due to the combination of the advantages of the model-based method and the deep learning-based method in this method, and taking both adaptive sampling and deep sparse representation into account, the proposed networks significantly improve the reconstruction performance compared to the art-of-state CS-MRI approaches.
Authored by Wenwei Huang, Chunhong Cao, Sixia Hong, Xieping Gao
Scanning Transmission Electron Microscopy (STEM) offers high-resolution images that are used to quantify the nanoscale atomic structure and composition of materials and biological specimens. In many cases, however, the resolution is limited by the electron beam damage, since in traditional STEM, a focused electron beam scans every location of the sample in a raster fashion. In this paper, we propose a scanning method based on the theory of Compressive Sensing (CS) and subsampling the electron probe locations using a line hop sampling scheme that significantly reduces the electron beam damage. We experimentally validate the feasibility of the proposed method by acquiring real CS-STEM data, and recovering images using a Bayesian dictionary learning approach. We support the proposed method by applying a series of masks to fully-sampled STEM data to simulate the expectation of real CS-STEM. Finally, we perform the real data experimental series using a constrained-dose budget to limit the impact of electron dose upon the results, by ensuring that the total electron count remains constant for each image.
Authored by D. Nicholls, A. Robinson, J. Wells, A. Moshtaghpour, M. Bahri, A. Kirkland, N. Browning