Data Deletion and “Forgetting” 2015

 

 
SoS Logo

Data Deletion and “Forgetting” 2015

A recent court decision has focused attention on the problem of “forgetting,” that is, eliminating links and references used on the Internet to focus on a specific topic or reference.  “Forgetting,” essentially a problem in data deletion, has many implications for security and for data structures.  The work cited here was presented in 2015.


Ranjan, A.K.; Kumar, V.; Hussain, M., "Security Analysis of Cloud Storage with Access Control and File Assured Deletion (FADE)," in Advances in Computing and Communication Engineering (ICACCE), 2015 Second International Conference on, pp. 453-458, 1-2 May 2015. doi: 10.1109/ICACCE.2015.10

Abstract: Today most of the enterprises are outsourcing their data backups onto online cloud storage, services offered by third party. In such environment, security of offsite data is most prominent requirement. Tang et el. Has proposed, designed and implemented FADE, a secure overlay cloud storage system. Fade assures file deletion, making files unrecoverable after their revocation and also it associates outsourced file with fine grained access policies to avoid unauthorised access of data. In their paper, we have done security analysis of FADE and have found some design vulnerability in it. We could also discover few attacks and find out their causes. We have also suggested few countermeasures to prevent those attacks and made few improvements in the FADE system.

Keywords: authorisation; cloud computing; storage management; FADE; access control; design vulnerability; file assured deletion; online cloud storage; secure overlay cloud storage system; security analysis; Access control; Authentication; Cloud computing; Encryption; Silicon; FADE; access policies; assured deletion; attacks; cloud storage; design vulnerabilities (ID#: 15-8201)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7306728&isnumber=7306547

 

Sanatinia, A.; Noubir, G., "OnionBots: Subverting Privacy Infrastructure for Cyber Attacks," in Dependable Systems and Networks (DSN), 2015 45th Annual IEEE/IFIP International Conference on,  pp.69-80, 22-25 June 2015. doi: 10.1109/DSN.2015.40

Abstract: Over the last decade botnets survived by adopting a sequence of increasingly sophisticated strategies to evade detection and take overs, and to monetize their infrastructure. At the same time, the success of privacy infrastructures such as Tor opened the door to illegal activities, including botnets, ransomware, and a marketplace for drugs and contraband. We contend that the next waves of botnets will extensively attempt to subvert privacy infrastructure and cryptographic mechanisms. In this work we propose to preemptively investigate the design and mitigation of such botnets. We first, introduce OnionBots, what we believe will be the next generation of resilient, stealthy botnets. OnionBots use privacy infrastructures for cyber attacks by completely decoupling their operation from the infected host IP address and by carrying traffic that does not leak information about its source, destination, and nature. Such bots live symbiotically within the privacy infrastructures to evade detection, measurement, scale estimation, observation, and in general all IP-based current mitigation techniques. Furthermore, we show that with an adequate self-healing network maintenance scheme, that is simple to implement, OnionBots can achieve a low diameter and a low degree and be robust to partitioning under node deletions. We develop a mitigation technique, called SOAP, that neutralizes the nodes of the basic OnionBots. In light of the potential of such botnets, we believe that the research community should proactively develop detection and mitigation methods to thwart OnionBots, potentially making adjustments to privacy infrastructure.

Keywords: IP networks; computer network management; computer network security; data privacy; fault tolerant computing; telecommunication traffic; Cyber Attacks; IP-based mitigation techniques; OnionBots; SOAP; Tor; botnets; cryptographic mechanisms; destination information; host IP address; illegal activities; information nature; node deletions; privacy infrastructure subversion; resilient-stealthy botnets; self-healing network maintenance scheme; source information; Cryptography; IP networks; Maintenance engineering; Peer-to-peer computing; Privacy; Relays; Servers; Tor; botnet; cyber security; privacy infrastructure; self-healing network (ID#: 15-8202)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7266839&isnumber=7266818

 

Askarov, A.; Moore, S.; Dimoulas, C.; Chong, S., "Cryptographic Enforcement of Language-Based Information Erasure," in Computer Security Foundations Symposium (CSF), 2015 IEEE 28th, pp.334-348, 13-17 July 2015. doi: 10.1109/CSF.2015.30

Abstract: Information erasure is a formal security requirement that stipulates when sensitive data must be removed from computer systems. In a system that correctly enforces erasure requirements, an attacker who observes the system after sensitive data is required to have been erased cannot deduce anything about the data. Practical obstacles to enforcing information erasure include: (1) correctly determining which data requires erasure, and (2) reliably deleting potentially large volumes of data, despite untrustworthy storage services. In this paper, we present a novel formalization of language-based information erasure that supports cryptographic enforcement of erasure requirements: sensitive data is encrypted before storage, and upon erasure, only a relatively small set of decryption keys needs to be deleted. This cryptographic technique has been used by a number of systems that implement data deletion to allow the use of untrustworthy storage services. However, these systems provide no support to correctly determine which data requires erasure, nor have the formal semantic properties of these systems been explained or proven to hold. We address these shortcomings. Specifically, we study a programming language extended with primitives for public-key cryptography, and demonstrate how information-flow control mechanisms can automatically track data that requires erasure and provably enforce erasure requirements even when programs employ cryptographic techniques for erasure.

Keywords: programming language semantics; public key cryptography; trusted computing; cryptographic enforcement; cryptographic technique; data deletion; decryption key; erasure requirement; formal security requirement; formal semantic property information-flow control mechanism; language-based information erasure; programming language; public-key cryptography; sensitive data; untrustworthy storage service; Cloud computing; Cryptography; Reactive power; Reliability; Semantics; Standards (ID#: 15-8203)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7243743&isnumber=7243713

 

Kavak, P.; Demirci, H., "LargeDEL: A Tool for Identifying Large Deletions in the Whole Genome Sequencing Data," in Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2015 IEEE Conference on, pp. 1-7, 12-15 Aug. 2015. doi: 10.1109/CIBCB.2015.7300280

Abstract: DNA deletions are one of the main genetic reasons of disease. Currently there are many tools which are capable of detecting structural variations. However, these tools usually require long running time and lack ease of use. It is generally not possible to restrict the search to a region of interest. The programs also yield excessive number of results which obstructs further analysis. In this work, we present LargeDEL, a tool which quickly scans aligned paired-end next generation sequencing (NGS) data for finding large deletions. The program is capable of extracting the candidate deletions according to desired criteria. It is a fast, easy to use tool for finding large deletions within the critical regions in the whole genome.

Keywords: DNA; bioinformatics; diseases; genomics; DNA deletion; LargeDEL; deletion identification; disease; genome sequencing data; next generation sequencing data; structural variation detection; Arrays; Bioinformatics; Biological cells; Diseases; Genomics; Sequential analysis (ID#: 15-8204)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7300280&isnumber=7300268

 

Peipei Wang; Dean, D.J.; Xiaohui Gu, "Understanding Real World Data Corruptions in Cloud Systems," in Cloud Engineering (IC2E), 2015 IEEE International Conference on, pp. 116-125, 9-13 March 2015. doi: 10.1109/IC2E.2015.41

Abstract: Big data processing is one of the killer applications for cloud systems. MapReduce systems such as Hadoop are the most popular big data processing platforms used in the cloud system. Data corruption is one of the most critical problems in cloud data processing, which not only has serious impact on the integrity of individual application results but also affects the performance and availability of the whole data processing system. In this paper, we present a comprehensive study on 138 real world data corruption incidents reported in Hadoop bug repositories. We characterize those data corruption problems in four aspects: 1) what impact can data corruption have on the application and system? 2) how is data corruption detected? 3) what are the causes of the data corruption? and 4) what problems can occur while attempting to handle data corruption? Our study has made the following findings: 1) the impact of data corruption is not limited to data integrity, 2) existing data corruption detection schemes are quite insufficient: only 25% of data corruption problems are correctly reported, 42% are silent data corruption without any error message, and 21% receive imprecise error report. We also found the detection system raised 12% false alarms, 3) there are various causes of data corruption such as improper runtime checking, race conditions, inconsistent block states, improper network failure handling, and improper node crash handling, and 4) existing data corruption handling mechanisms (i.e., data replication, replica deletion, simple re-execution) make frequent mistakes including replicating corrupted data blocks, deleting uncorrupted data blocks, or causing undesirable resource hogging.

Keywords: cloud computing; data handling; Hadoop; MapReduce systems; big data processing; cloud data processing; cloud systems; data corruption; data corruption problems; data integrity; improper network failure handling; improper node crash handling; inconsistent block states; race conditions; real world data corruptions; runtime checking; Availability; Computer bugs; Data processing; Radiation detectors; Software; Yarn (ID#: 15-8205)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7092909&isnumber=7092808

 

Wenji Chen; Yong Guan, "Distinct Element Counting in Distributed Dynamic Data Streams," in Computer Communications (INFOCOM), 2015 IEEE Conference on, pp. 2371-2379, April 26 2015-May 1 2015. doi: 10.1109/INFOCOM.2015.7218625

Abstract: We consider a new type of distinct element counting problem in dynamic data streams, where (1) insertions and deletions of an element can appear not only in the same data stream but also in two or more different streams, (2) a deletion of a distinct element cancels out all the previous insertions of this element, and (3) a distinct element can be re-inserted after it has been deleted. Our goal is to count the number of distinct elements that were inserted but have not been deleted in a continuous data stream. We also solve this new type of distinct element counting problem in a distributed setting. This problem is motivated by several network monitoring and attack detection applications where network traffic can be modelled as single or distributed dynamic streams and the number of distinct elements in the data streams, such as unsuccessful TCP connection setup requests, is calculated to be used as an indicator to detect certain network events such as service outage and DDoS attacks. Although there are known tight bounds for distinct element counting in insertion-only data streams, no good bounds are known for it in dynamic data streams, neither for this new type of problem. None of the existing solutions for distinct element counting can solve our problem. In this paper, we will present the first solution to this problem, using a space-bounded data structure with a computation-efficient probabilistic data streaming algorithm to estimate the number of distinct elements in single or distributed dynamic data streams. We have done both theoretical analysis and experimental evaluations, using synthetic and real data traces, of our algorithm to show its effectiveness.

Keywords: computer network security; transport protocols; DDoS attacks; TCP connection; attack detection applications; continuous data stream; distinct element counting; distributed dynamic data streams; distributed setting; network monitoring; network traffic; probabilistic data streaming algorithm; service outage; space bounded data structure; Computers; Data structures; Distributed databases; Estimation; Heuristic algorithms; Monitoring; Servers (ID#: 15-8206)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7218625&isnumber=7218353

 

Xiaokui Shu; Jing Zhang; Danfeng Yao; Wu-chun Feng, "Rapid and Parallel Content Screening for Detecting Transformed Data Exposure," in Computer Communications Workshops (INFOCOM WKSHPS), 2015 IEEE Conference on, pp. 191-196, April 26 2015-May 1 2015. doi: 10.1109/INFCOMW.2015.7179383

Abstract: The leak of sensitive data on computer systems poses a serious threat to organizational security. Organizations need to identify the exposure of sensitive data by screening the content in storage and transmission, i.e., to detect sensitive information being stored or transmitted in the clear. However, detecting the exposure of sensitive information is challenging due to data transformation in the content. Transformations (such as insertion, deletion) result in highly unpredictable leak patterns. Existing automata-based string matching algorithms are impractical for detecting transformed data leaks because of its formidable complexity when modeling the required regular expressions. We design two new algorithms for detecting long and inexact data leaks. Our system achieves high detection accuracy in recognizing transformed leaks compared with the state-of-the-art inspection methods. We parallelize our prototype on graphics processing unit and demonstrate the strong scalability of our data leak detection solution analyzing big data.

Keywords: Big Data; security of data; Big Data analysis; automata-based string matching algorithms; data leak detection solution; graphics processing unit; organizational security; sensitive data; Accuracy; Algorithm design and analysis; Graphics processing units; Heuristic algorithms; Leak detection; Security; Sensitivity (ID#: 15-8207)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7179383&isnumber=7179273

 

De, D.; Das, S.K., "SREE-Tree: Self-Reorganizing Energy-Efficient Tree Topology Management in Sensor Networks," in Sustainable Internet and ICT for Sustainability (SustainIT), 2015, pp. 1-8, 14-15 April 2015. doi: 10.1109/SustainIT.2015.7101370

Abstract: The evolving applications of Information and Communications Technologies (ICT), such as smart cities, often need sustainable data collection networks. We envision the deployment of heterogeneous sensor networks that will allow dynamic self-reorganization of data collection topology, thus coping with unpredictable network dynamics and node addition/ deletion for changing application needs. However, the self-reorganization must also assure network energy efficiency and load balancing, without affecting ongoing data collection. Most of the existing literature either aim at minimizing the maximum load on a sensor node (hence maximizing network lifetime), or attempt to balance the overall load distribution on the nodes. In this work we propose to design a distributed protocol for self-organizing energy-efficient tree management, called SREE-Tree. Based on the dynamic choice of a design parameter, the in-network self-reorganization of data collection topology can achieve higher network lifetime, yet balancing the loads. In SREE-Tree, starting with an arbitrary tree the nodes periodically apply localized and distributed routines to collaboratively reduce load on the multiple bottleneck nodes (that are likely to deplete energy sooner due to a large amount of carried data flow or low energy availability). The problem of constructing and maintaining optimal data collection tree (Topt) topology that maximizes the network lifetime (L(Topt)) is an NP-Complete problem. We prove that a sensor network running the proposed SREE-Tree protocol is guaranteed to converge to a tree topology (T) with sub-optimal network lifetime. With the help of experiments using standard TinyOS based sensor network simulator TOSSIM, we have validated that SREE-Tree achieves better performance as compared to state-of-the-art solutions, for varying network sizes.

Keywords: communication complexity; distributed processing; energy conservation; power aware computing; protocols; resource allocation; telecommunication network management; telecommunication network topology; trees (mathematics);wireless sensor networks; ICT; NP-complete problem; SREE-Tree protocol; TOSSIM; TinyOS based sensor network simulator; data collection topology; design parameter; distributed protocol; dynamic self-reorganization; energy-efficient tree topology management; heterogeneous sensor networks; in-network self-reorganization; information and communications technologies; load balancing; load distribution; network dynamics; network energy efficiency; network lifetime maximization; network sizes; node addition/ deletion; optimal data collection tree topology; sensor node; smart cities; suboptimal network lifetime; sustainable data collection networks; Data collection; Network topology; Power demand; Protocols; Sensors; Switches; Topology (ID#: 15-8208)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7101370&isnumber=7101353

 

Haibin Zhang; Yan Wang; Jian Yang, "Space Reduction for Contextual Transaction Trust Computation in E-Commerce and E-Service Environments," in Services Computing (SCC), 2015 IEEE International Conference on, pp. 680-687, June 27 2015-July 2 2015. doi: 10.1109/SCC.2015.97

Abstract: In the literature, Contextual Transaction Trust computation (termed as CTT computation) is considered an effective approach to evaluate the trustworthiness of a seller. Specifically, it computes a seller's reputation profile to indicate his/her dynamic trustworthiness in different product categories, price ranges, time periods, and any necessary combination of them. Then, in order to promptly answer a buyer's requests on the results of CTT computation, CMK-tree has been designed to appropriately index the precomputed aggregation results over large-scale ratings and transaction data. Nevertheless, CMK-tree requires additional storage space. In practice, a seller usually has a large volume of transactions. Moreover, with significant increase of historical transaction data (e.g., Over one or two years), the size of storage space consumed by CMK-tree will become much larger. In reducing storage space consumption for CTT computation, the aggregation results that are generated based on the ratings and transaction data from remote history, e.g., "12 months ago" can be deleted, as the ratings from remote history are less important for evaluating a seller's recent behavior. However, to achieve nearly linear and robust query performance, the deletion operations in the CMK-tree become complicated. In this paper, we propose three deletion strategies for CTT computation based on CMK-tree. With our proposed deletion strategies, the additional storage space consumption can be restricted to a limited range, which offers great benefit to trust management with millions of sellers. Finally, we have conducted experiments to illustrate both advantages and disadvantages of the proposed deletion strategies.

Keywords: Web services; electronic commerce; query processing; trees (mathematics) CMK-tree; CTT computation; contextual transaction trust computation; deletion strategies; dynamic trustworthiness; e-commerce environments; e-service environments; historical transaction data; large-scale ratings; price ranges; product categories; query performance; space reduction; storage space; time periods; trust management; Aggregates; Computational modeling; Context; Context modeling; Data structures; Indexes; Robustness; Aggregation Index; Contextual Transaction Trust; Deletion Strategy; E-Commerce; Trust and Reputation (ID#: 15-8209)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7207415&isnumber=7207317

 

Tanwir; Hendrantoro, G.; Affandi, A., "Early Result from Adaptive Combination of LRU, LFU and FIFO to Improve Cache Server Performance in Telecommunication Network," in Intelligent Technology and Its Applications (ISITIA), 2015 International Seminar on, pp. 429-432, 20-21 May 2015. doi: 10.1109/ISITIA.2015.7220019

Abstract: telecommunications system network server is a multimedia storage medium, load server is storing data transmission can be reduced with an additional caches servers which store data while making it easier for clients to access information. The more clients to access information causing increasing caches capacity is needed deletion of caches with using a combination of algorithm LRU, LFU and FIFO Queue method, in time of the initial data to be deleted (FIFO), the other algorithm will detect if such data has the most references (LFU) or LRU algorithm so that frequently accessed data to be stored is cached it will reduce delay time, Throughput and Loss Browsing.

Keywords: Internet; cache storage; client-server systems; information retrieval; network servers; FIFO queue method; LFU queue method; LRU queue method; cache server performance improvement; caches capacity; clients access; data transmission storage; delay time reduction; load server; loss browsing reduction; multimedia storage medium; telecommunication system network server; throughput reduction; Cache memory; Delays; Multimedia communication; Object recognition; Servers; Telecommunications; Throughput; Algorithms; Cache Server; FIFO; LFU; LRU (ID#: 15-8210)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7220019&isnumber=7219932

 

Fakcharoenphol, J.; Kumpijit, T.; Putwattana, A., "A Faster Algorithm for the Tree Containment Problem for Binary Nearly Stable Phylogenetic Networks," in Computer Science and Software Engineering (JCSSE), 2015 12th International Joint Conference on, pp. 337-342, 22-24 July 2015

doi: 10.1109/JCSSE.2015.7219820

Abstract: Phylogenetic networks and phylogenetic trees are leaf-labelled graphs used in biology to describe evolutionary histories of species whose leaves correspond to a set of taxa in the study. Given a phylogenetic network N and a phylogenetic tree T over the same set of taxa, if one can obtain T from N by edge deletions and contractions, we say that N contains T. A fundamental problem, called the tree containment problem, is to determine if N contains T. In general networks, this problem is NP-complete, but can be solved in polynomial time when N is a normal network, a binary tree-child network, or a level-k network. Recently, Gambette, Gunawan, Labarre, Vialette and Zhang showed that it is possible to solve the problem for a more general class of networks called binary nearly stable networks. Not only that binary nearly stable networks include normal and tree-child networks, they claim that important evolution histories also match this generalization. Their algorithm is also more efficient than previous algorithms as it runs in time O(n2) where n is the number of taxa. This paper presents a faster O(n log n) algorithm. We obtain this improvement from a simple observation that the iterative algorithm of Gambette et al. only performs very local modifications of the networks. Our algorithm employs elementary data structures to dynamically maintain certain internal data structures used in their algorithm instead of recomputing at every iteration.

Keywords: computational complexity; network theory (graphs); tree data structures; trees (mathematics); NP-complete problem; binary nearly stable phylogenetic networks; binary tree-child network; biology; edge contractions; edge deletions; elementary data structures; internal data structures; leaf-labelled graphs; level-k network; phylogenetic trees; tree containment problem; Contracts; Data structures; Heuristic algorithms; History; Phylogeny; Standards; Vegetation (ID#: 15-8211)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7219820&isnumber=7219755

 

Qisong Hu; Chen Yi; Kliewer, J.; Wei Tang, "Asynchronous Communication for Wireless Sensors Using Ultra Wideband Impulse Radio," in Circuits and Systems (MWSCAS), 2015 IEEE 58th International Midwest Symposium on, pp. 1-4, 2-5 Aug. 2015. doi: 10.1109/MWSCAS.2015.7282170

Abstract: This paper addresses simulations and design of an asynchronous integrated ultra wideband impulse radio transmitter and receiver suitable for low-power miniaturized wireless sensors. This paper first presents software simulations for asynchronous transmission over noisy channels using FSK-OOK modulation, which demonstrates that the proposed architecture is capable to communicate reliably at moderate signal-to-noise ratios and that the main errors are due to deletions of received noisy transmit pulses. Then, we address a hardware chip implementation of the integrated UWB transmitter and receiver, which is fabricated using an IBM 0.18μm CMOS process. This implementation provides a low peak power consumption, i.e., 10.8 mW for the transmitter and 5.4 mW for the receiver, respectively. The measured maximum baseband data rate of the proposed radio is 2.3 Mb/s.

Keywords: CMOS integrated circuits; amplitude shift keying; frequency shift keying; power consumption; radio receivers; radio transmitters; telecommunication power management; ultra wideband communication; wireless channels; wireless sensor networks; CMOS process; FSK-OOK modulation; UWB receiver; UWB transmitter; asynchronous communication; hardware chip implementation; signal-to-noise ratio size 0.18 mum;ultra wideband impulse radio receiver; ultra wideband impulse radio transmitter; wireless sensor; Frequency shift keying; Radio transmitters; Receivers; Sensors; Signal to noise ratio; Wireless communication; Wireless sensor networks; Asynchronous Communication; Integrated Circuits; Low Power Wireless Sensors; Ultra Wideband Impulse Radio (ID#: 15-8212)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7282170&isnumber=7281994

 

Ritter, M.; Bahr, G.S., "An Exploratory Study to Identify Relevant Cues for the Deletion of Faces for Multimedia Retrieval," in Multimedia & Expo Workshops (ICMEW), 2015 IEEE International Conference on, pp. 1-6, June 29 2015-July 3 2015. doi: 10.1109/ICMEW.2015.7169806

Abstract: Within our approach to big data, we reduce the number of images in video footage by applying a shot detection with a keyframe extraction of single frames. This can be followed by duplicate removal and face detection processes yielding to a further data reduction. Nevertheless, additional reductions steps are necessary in order to make the data manageable (searchable) for the end user in a meaningful way. Therefore, we investigated human inspired forgetting as a data reduction tool. We conducted an exploratory study on a subset of the remaining face data to examine patterns in the selection process of faces that are considered most memorable showing a potential of roughly above 75 % for elimination. The results of the study considered the quality and the size of the faces as important measures. In these terms, we finally show a connection to characteristics of state-of-the-art face detectors.

Keywords: Big Data; data reduction; face recognition; information retrieval; object detection; video signal processing; big data; data reduction tool; face deletion; face detection process; keyframe extraction; multimedia retrieval; selection process; shot detection; single frame; video footage; Data mining; Detectors; Face detection; Feature extraction; Indexes; Standards; Training; Big Data; Face Detection; Face Sizes; Forgetting; Most Memorable Faces; Shot Detection; Video (ID#: 15-8213)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7169806&isnumber=7169738

 

Chaudhari, A.; Phadatare, P.M.; Kudale, P.S.; Mohite, R.B.; Petare, R.P.; Jagdale, Y.P.; Mudiraj, A., "Preprocessing of High Dimensional Dataset for Developing Expert IR System," in Computing Communication Control and Automation (ICCUBEA), 2015 International Conference on, pp. 417-421, 26-27 Feb. 2015. doi: 10.1109/ICCUBEA.2015.87

Abstract: Now-a-days due to increase in the availability of computing facilities, large amount of data in electronic form is been generated. The data generated is to be analyzed in order to maximize the benefit of intelligent decision making. Text categorization is an important and extensively studied problem in machine learning. The basic phases in the text categorization include preprocessing features like removing stop words from documents and applying TF-IDF is used which results into increase efficiency and deletion of irrelevant data from huge dataset. This paper discusses the implication of Information Retrieval system for text-based data using different clustering approaches. Applying TF-IDF algorithm on dataset gives weight for each word which summarized by Weight matrix.

Keywords: decision making; information retrieval systems; learning (artificial intelligence); text analysis; TF-IDF algorithm; clustering approaches; electronic form; expert IR system; high dimensional dataset processing; information retrieval system; intelligent decision making; machine learning; text categorization; text-based data; weight matrix; Clustering algorithms; Databases; Flowcharts; Frequency measurement; Information retrieval; Text categorization; Information retrieval; TF IDF; stopwords; text based clustering (ID#: 15-8214)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7155880&isnumber=7155781

 

Gazzah, Sami; Hechkel, Amina; Essoukri Ben Amara, Najoua, "A Hybrid Sampling Method for Imbalanced Data," in Systems, Signals & Devices (SSD), 2015 12th International Multi-Conference on, pp. 1-6, 16-19 March 2015. doi: 10.1109/SSD.2015.7348093

Abstract: With the diversification of applications and the emergence of new trends in challenging applications such as in the computer vision domain, classical machine learning systems usually perform poorly while confronting two common problems: the training data of negative examples, which outnumber the positive ones, and the large intra-class variations. These problems lead to a drop in the system performances. In this work, we propose to improve the classification accuracy in the case of imbalanced training data by equally balancing a training data set using a hybrid approach which consists in over-sampling the minority class using a ???SMOTE star topology???, and under-sampling the majority class by removing instances that are considered less relevant. The feature vector deletion has been performed with respect to intra-class variations, based on the distribution criterion. The experimental results, achieved in bio-metric data, show that the proposed approach significantly improves the overall performances measured in terms of true-positive rate.

Keywords: Correlation; Databases; Feature extraction; Principal component analysis; Support vector machines; Training; Training data; Data analysis; Imbalanced data sets; Intra-class variations; One-against-all SVM; Principal component analysis (ID#: 15-8215)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7348093&isnumber=7348090

 

Klomsae, Atcharin; Auephanwiriyakul, Sansanee; Theera-Umpon, Nipon, "A Novel String Grammar Fuzzy C-Medians," in Fuzzy Systems (FUZZ-IEEE), 2015 IEEE International Conference on, pp. 1-5, 2-5 Aug. 2015. doi: 10.1109/FUZZ-IEEE.2015.7338109

Abstract: One of the popular classification problems is the syntactic pattern recognition. A syntactic pattern can be described using string grammar. The string grammar hard C-means is one of the classification algorithms in syntactic pattern recognition. However, it has been proved that fuzzy clustering is better than hard clustering. Hence, in this paper we develop a string grammar fuzzy C-medians algorithm. In particular, the string grammar fuzzy C-medians algorithm is a counterpart of fuzzy C-medians in which a fuzzy median approach is applied for finding fuzzy median string as the center of string data. However, the fuzzy median string may not provide a good clustering result. We then modified a method to compute fuzzy median string with the edition operations (insertion, deletion, and substitution) over each symbol of the string. The fuzzy C-medians with regular fuzzy median and the one with the modified fuzzy median are implemented on 3 real data sets, i.e., Copenhagen chromosomes data set, MNIST database of handwritten digits, and USPS database of handwritten digits. We also compare the results with those from the string grammar hard C-means. The results show that the string grammar fuzzy C-medians is better than the string grammar hard C-means.

Keywords: Biological cells; Clustering algorithms; Grammar; Mathematical model; Prototypes; Syntactics; Training; Levenshtein distance; fuzzy median; string grammar fuzzy c-medians; syntactic pattern recognition (ID#: 15-8216)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7338109&isnumber=7337796

 

El Rouayheb, S.; Goparaju, S.; Han Mao Kiah; Milenkovic, O., "Synchronizing Edits in Distributed Storage Networks," in Information Theory (ISIT), 2015 IEEE International Symposium on, pp. 1472-1476, 14-19 June 2015. doi: 10.1109/ISIT.2015.7282700

Abstract: We consider the problem of synchronizing data in distributed storage networks under edits that include deletions and insertions. We present modifications of codes on distributed storage systems that allow updates in the parity-check values to be performed with one round of communication at low bit rates and a small storage overhead. Our main contributions are novel protocols for synchronizing both frequently updated and semi-static data, and protocols for data deduplication applications, based on intermediary coding using permutation and Vandermonde matrices.

Keywords: matrix algebra; parity check codes; Vandermonde matrices; code modifications; data deduplication applications; distributed storage networks; intermediary coding; parity-check values; permutation; Decision support systems; Distributed databases; Encoding; Maintenance engineering; Protocols; Synchronization; Tensile stress; Distributed storage; Synchronization (ID#: 15-8217)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7282700&isnumber=7282397

 

Qiwen Wang; Cadambe, V.; Jaggi, S.; Schwartz, M.; Medard, M., "File Updates Under Random/Arbitrary Insertions and Deletions," in Information Theory Workshop (ITW), 2015 IEEE, pp. 1-5, April 26 2015-May 1 2015. doi: 10.1109/ITW.2015.7133118

Abstract: A client/encoder edits a file, as modeled by an insertion-deletion (InDel) process. An old copy of the file is stored remotely at a data-centre/decoder, and is also available to the client. We consider the problem of throughput- and computationally-efficient communication from the client to the data-centre, to enable the server to update its copy to the newly edited file. We study two models for the source files/edit patterns: the random pre-edit sequence left-to-right random InDel (RPES-LtRRID) process, and the arbitrary pre-edit sequence arbitrary InDel (APES-AID) process. In both models, we consider the regime in which the number of insertions/deletions is a small (but constant) fraction of the original file. For both models we prove information-theoretic lower bounds on the best possible compression rates that enable file updates. Conversely, our compression algorithms use dynamic programming (DP) and entropy coding, and achieve rates that are approximately optimal.

Keywords: file organisation; APES-AID process; DP; RPES-LtRRID process; client/encoder; compression algorithms; compression rates; data-centre/decoder; dynamic programming; edited file; entropy coding; file updates; information-theoretic lower bounds; insertion-deletion process; pre-edit sequence arbitrary InDel process; random/arbitrary insertions; source files/edit patterns; Computational modeling; Decoding; Entropy; Markov processes; Radio access networks; Synchronization (ID#: 15-8218)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7133118&isnumber=7133075

 

Mori, Shohei; Shibata, Fumihisa; Kimura, Asako; Tamura, Hideyuki, "Efficient Use of Textured 3D Model for Pre-observation-based Diminished Reality," in Mixed and Augmented Reality Workshops (ISMARW), 2015 IEEE International Symposium on, pp. 32-39, Sept. 29 2015-Oct. 3 2015. doi: 10.1109/ISMARW.2015.16

Abstract: Diminished reality (DR) deletes or diminishes undesirable objects from the perceived environments. We present a pre-observation-based DR (POB-DR) framework that uses a textured 3D model (T-3DM) of a scene for efficiently deleting undesirable objects. The proposed framework and T-3DM data structure enable geometric and photometric registration that allow the user to move in six degrees-of-freedom (6DoF) under dynamic lighting during the deletion process. To accomplish these tasks, we allow the user to pre-observe backgrounds to be occluded similar to existing POB-DR approaches and preserve hundreds of view-dependent images and triangle fans as a T-3DM. The proposed system effectively uses the T-3DM for all of processes to fill in the target region in the proposed deletion scheme. The results of our experiments demonstrate that the proposed system works in unknown 3D scenes and can handle rapid and drastic 6DoF camera motion and dynamic illumination changes.

Keywords: Cameras; Fans; Image color analysis; Lighting; Real-time systems; Rendering (computer graphics); Three-dimensional displays; Diminished reality; color correction; image-based rendering; mixed/augmented reality; tracking (ID#: 15-8219)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7344754&isnumber=7344734

 

Sangari, A.S.; Leo, J.M., "Polynomial Based Light Weight Security in Wireless Body Area Network," in Intelligent Systems and Control (ISCO), 2015 IEEE 9th International Conference on, pp. 1-5, 9-10 Jan. 2015. doi: 10.1109/ISCO.2015.7282331

Abstract: Wireless body area networks have grown more attention in healthcare applications. The development of WBAN is essential for tele medicine and Mobile healthcare. It enable remote patient monitoring of users during their day to day activities without affecting their freedom. In WBAN, the body sensors in and around the patient body that collect all patient information and transferred to remote server through wireless medium. The wearable sensors are able to monitor vital signs such as temperature, pulse, glucose information and ECG. However there are lots of research challenges in WBAN when deployed in the network. The sensors have limited resources in terms of memory, size, memory and computational capacity. The WBAN operation is closely related to patient's sensitive medical information. Because, the unsecured information will lead to wrong diagnosis and treatment. The security is important thing in wireless medium. In WBAN, the unauthorized people can easily access the patient's data and data can be modified by the attackers. The creation, deletion, modification of medical information needs a strict security mechanism.

Keywords: body area networks; body sensor networks; health care; patient diagnosis; patient monitoring; telecommunication security; telemedicine; WBAN; mobile healthcare; patient information collection; patient sensitive medical information; polynomial based light weight security; remote patient monitoring; remote server; telemedicine; wearable sensor; wireless body area network; wireless medium; Biomedical monitoring; Body area networks; Monitoring; Reliability; Wireless communication; Zigbee; Electro cardiogram signal; Wireless body area network (ID#: 15-8220)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7282331&isnumber=7282219

 

Dongyao Wang; Lei Zhang; Yinan Qi; Quddus, A.u., "Localized Mobility Management for SDN-Integrated LTE Backhaul Networks," in Vehicular Technology Conference (VTC Spring), 2015 IEEE 81st, pp. 1-6, 11-14 May 2015. doi: 10.1109/VTCSpring.2015.7145916

Abstract: Small cell (SCell) and Software Define Network (SDN) are two key enablers to meet the evolutional requirements of future telecommunication networks, but still on the initial study stage with lots of challenges faced. In this paper, the problem of mobility management in SDN-integrated LTE (Long Term Evolution) mobile backhaul network is investigated. An 802.1ad double tagging scheme is designed for traffic forwarding between Serving Gateway (S-GW) and SCell with QoS (Quality of Service) differentiation support. In addition, a dynamic localized forwarding scheme is proposed for packet delivery of the ongoing traffic session to facilitate the mobility of UE within a dense SCell network. With this proposal, the data packets of an ongoing session can be forwarded from the source SCell to the target SCell instead of switching the whole forwarding path, which can drastically save the path-switch signalling cost in this SDN network. Numerical results show that compared with traditional path switch policy, more than 50% signalling cost can be reduced, even considering the impact on the forwarding path deletion when session ceases. The performance of data delivery is also analysed, which demonstrates the introduced extra delivery cost is acceptable and even negligible in case of short forwarding chain or large backhaul latency.

Keywords: Long Term Evolution; mobility management (mobile radio); quality of service; software defined networking; synchronisation; telecommunication network topology; telecommunication traffic; wireless LAN;IEEE 802.1ad double tagging scheme; LTE mobile backhaul network; Long Term Evolution; QoS; S-GW; SCell network; SDN; backhaul latency; data delivery; delivery cost; dynamic localized forwarding scheme; forwarding chain; forwarding path deletion; localized mobility management; packet delivery; path switch policy; path-switch signalling cost; quality of service; serving gateway; small cell; software defined network; telecommunication networks; traffic forwarding; traffic session; Handover; Mobile computing; Mobile radio mobility management; Switches (ID#: 15-8221)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7145916&isnumber=7145573

 

Da Zhang; Hao Wang; Kaixi Hou; Jing Zhang; Wu-chun Feng, "pDindel: Accelerating Indel Detection on a Multicore CPU Architecture with SIMD," in Computational Advances in Bio and Medical Sciences (ICCABS), 2015 IEEE 5th International Conference on, pp. 1-6, 15-17 Oct. 2015. doi: 10.1109/ICCABS.2015.7344721

Abstract: Small insertions and deletions (indels) of bases in the DNA of an organism can map to functionally important sites in human genes, for example, and in turn, influence human traits and diseases. Dindel detects such indels, particularly small indels (> 50 nucleotides), from short-read data by using a Bayesian approach. Due to its high sensitivity to detect small indels, Dindel has been adopted by many bioinformatics projects, e.g., the 1,000 Genomes Project, despite its pedestrian performance. In this paper, we first analyze and characterize the current version of Dindel to identify performance bottlenecks. We then design, implement, and optimize a parallelized Dindel (pDindel) for a multicore CPU architecture by exploiting thread-level parallelism (TLP) and data-level parallelism (DLP). Our optimized pDindel can achieve up to a 37-fold speedup for the computational part of Dindel and a 9-fold speedup for the overall execution time over the current version of Dindel.

Keywords: Bayes methods; DNA; Genomics; Multicore processing; Parallel processing; Sensitivity; Sequential analysis; Dindel; OpenMP; indel detection; multithreading; short-read mapping; vectorization (ID#: 15-8222)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7344721&isnumber=7344698

 

Dengfeng Yao; Abulizi, A.; Renkui Hou, "An Improved Algorithm of Materialized View Selection within the Confinement of Space," in Big Data and Cloud Computing (BDCloud), 2015 IEEE Fifth International Conference on, pp. 310-313, 26-28 Aug. 2015. doi: 10.1109/BDCloud.2015.49

Abstract: Data warehouses are used to store large quantities of materialized views to accelerate OLAP server response to query. The method to efficiently and accurately return the correct results at materialized view in a limited storage space is an important question that is being emphasized and a recognized difficulty for the ROLAP server design. This paper presents an improved and effective algorithm for materialized view selection. The algorithm considered the effect on the overall space and cost by adding candidate materialized view and reducing the views, as well as optimized the addition and deletion of candidate materialized view by selecting a lower cost for selecting views. The analysis and tests show that the algorithm achieved good results and was efficient.

Keywords: data mining; data warehouses; storage management; OLAP server response; ROLAP server design; data warehouse; materialized view selection; space confinement; storage space; Algorithm design and analysis; Electronics packaging; Greedy algorithms; Indexes; Market research; Servers; Time factors; ROLAP; materialized view; multi-dimensional analysis (ID#: 15-8223)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7310763&isnumber=7310694


Note:

Articles listed on these pages have been found on publicly available internet pages and are cited with links to those pages. Some of the information included herein has been reprinted with permission from the authors or data repositories. Direct any requests via Email to news@scienceofsecurity.net for removal of the links or modifications.