Text Analytics 2015

 

 
SoS Logo

Text Analytics

2015



The term “text analytics” refers to linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for intelligence, exploratory data analysis, research, or investigation. The research cited here focuses on large volumes of text mined to identify insider threats, intrusions, and malware detection. It is of interest to the Science of Security community relative to metrics, scalability and composability, and human factors. Research cited here was published in 2015.




Hsia-Ching Chang; Chen-Ya Wang, “Cloud Incident Data Analytics: Change-Point Analysis and Text Visualization,” in System Sciences (HICSS), 2015 48th Hawaii International Conference on, vol., no., pp. 5320–5330, 5–8 Jan. 2015. doi:10.1109/HICSS.2015.626

Abstract: When security incidents occur in a cloud computing environment, it constitutes a wake-up call to acknowledge potential threats and risks. Compared to other types of incidents (e.g., Extreme climate events, terror attacks and natural disasters), incidents pertaining to the cloud security issues seem to receive little attention from academia. This study aims to provide a starting point for further studies via analytics. Bayesian change-point analysis, often employed to detect abrupt regime shifts in a variety of events, was performed to identify the salient changes in the cloud incident count data retrieved from Cloutage.org database. Additionally, to get to the root of such incidents, this study utilized text mining techniques with word clouds to visualize non-obvious patterns in the summaries of cloud incidents. Both quantitative and qualitative analyses for exploring cloud incident data offer new insights in finding commonality and differences among the causes of cloud vulnerabilities over time.

Keywords: Bayes methods; cloud computing; data analysis; data mining; data visualisation; security of data; text analysis; Bayesian change-point analysis; Cloutage.org database; abrupt regime shifts; change-point analysis; cloud computing environment; cloud incident count data; cloud incident data analytics; cloud security issues; cloud vulnerabilities; nonobvious pattern visualization; security incidents; text mining techniques; text visualization; wake-up call; word clouds; Bayes methods; Cloud computing; Computational modeling; Data analysis; Data visualization; Security; Tag clouds; cloud computing security; cloud incidents

(ID#: 15-7592)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7070455&isnumber=7069647 



Skillicorn, D.B., “Empirical Assessment of Al Qaeda, ISIS, and Taliban Propaganda,” in Intelligence and Security Informatics (ISI), 2015 IEEE International Conference on, vol., no., pp. 61–66, 27–29 May 2015. doi:10.1109/ISI.2015.7165940

Abstract: The jihadist groups AQAP, ISIS, and the Taliban have all produced glossy English magazines designed to influence Western sympathizers. We examine these magazines empirically with respect to models of the intensity of informative, imaginative, deceptive, jihadist, and gamification language. This allows their success to be estimated and their similarities and differences to be exposed. We also develop and validate an empirical model of propaganda; according to this model Dabiq, ISIS’s magazine ranks highest of the three.

Keywords: natural language processing; social sciences computing; AQAP; Al Qaeda; ISIS; gamification language; glossy English magazines; jihadist groups; taliban propaganda; Complexity theory; Decision support systems; Corpus linguistics; radicalization; terrorism; text analytics (ID#: 15-7593)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7165940&isnumber=7165923



Martinez, E.; Fallon, E.; Fallon, S.; MingXue Wang, “ADAMANT — An Anomaly Detection Algorithm for MAintenance and Network Troubleshooting,” in Integrated Network Management (IM), 2015 IFIP/IEEE International Symposium on, vol., no., pp. 1292–1297, 11–15 May 2015. doi:10.1109/INM.2015.7140484

Abstract: Network operators are increasingly using analytic applications to improve the performance of their networks. Telecommunications analytical applications typically use SQL and Complex Event Processing (CEP) for data processing, network analysis and troubleshooting. Such approaches are hindered as they require an in-depth knowledge of both the telecommunications domain and telecommunications data structures in order to create the required queries. Valuable information contained in free form text data fields such as “additional_info”, “user_text” or “problem_text” can also be ignored. This work proposes An Anomaly Detection Algorithm for MAintenance and Network Troubleshooting (ADAMANT), a text analytic based network anomaly detection approach. Once telecommunications data records have been indexed, ADAMANT uses distance based outlier detection within sliding windows to detect abnormal terms at configurable time intervals. Traditional approaches focus on a specific type of record and create specific cause and effect rules. With the ADAMANT approach all free form text fields of alarms, logs, etc. are treated as text documents similar to Twitter feeds. All documents within a window represent a snapshot of the network state that is processed by ADAMANT. The ADAMANT approach focuses on text analytics to provide automated analysis without the requirement for SQL/CEP queries. Such an approach provides distinct network insights in comparison to traditional approaches.

Keywords: performance evaluation; search engines; security of data; text analysis; ADAMANT; CEP; SQL; Twitter feeds; abnormal terms detection; additional_info; anomaly detection algorithm for maintenance and network troubleshooting; complex event processing; configurable time intervals; data processing; distance based outlier detection; network analysis; network operators; network performance; network state; problem_text; search engine; sliding windows; telecommunications analytical applications; telecommunications data records; telecommunications data structures; telecommunications domain; text analytic based network anomaly detection approach; text documents; user_text; Algorithm design and analysis; Big data; Conferences; Detection algorithms; Indexes; Search engines; Telecommunications; distance based; outlier; search Engine; sliding windows; text anomaly (ID#: 15-7594)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7140484&isnumber=7140257



Stoll, J.; Bengez, R.Z., “Visual Structures for Seeing Cyber Policy Strategies,” in Cyber Conflict: Architectures in Cyberspace (CyCon), 2015 7th International Conference on, vol., no., pp. 135–152, 26–29 May 2015. doi:10.1109/CYCON.2015.7158474

Abstract: In the pursuit of cyber security for organizations, there are tens of thousands of tools, guidelines, best practices, forensics, platforms, toolkits, diagnostics, and analytics available. However according to the Verizon 2014 Data Breach Report: “after analysing 10 years of data... organizations cannot keep up with cyber crime-and the bad guys are winning.” Although billions are expended worldwide on cyber security, organizations struggle with complexity, e.g., the NISTIR 7628 guidelines for cyber-physical systems are over 600 pages of text. And there is a lack of information visibility. Organizations must bridge the gap between technical cyber operations and the business/social priorities since both sides are essential for ensuring cyber security. Identifying visual structures for information synthesis could help reduce the complexity while increasing information visibility within organizations. This paper lays the foundation for investigating such visual structures by first identifying where current visual structures are succeeding or failing. To do this, we examined publicly available analyses related to three types of security issues:

1) epidemic, 2) cyber attacks on an industrial network, and 3) threat of terrorist attack. We found that existing visual structures are largely inadequate for reducing complexity and improving information visibility. However, based on our analysis, we identified a range of different visual structures, and their possible trade-offs/limitation is framing strategies for cyber policy. These structures form the basis of evolving visualization to support information synthesis for policy actions, which has rarely been done but is promising based on the efficacy of existing visualizations for cyber incident detection, attacks, and situation awareness.

Keywords: data visualisation; security of data; terrorism; Verizon 2014 Data Breach Report; cyber attacks; cyber incident detection; cyber policy strategies; cyber security; information synthesis; information visibility; situation awareness; terrorist attack; visual structures; Complexity theory; Computer security; Data visualization; Organizations; Terrorism; Visualization; cyber security policy; human-computer interaction; organizations; visualization (ID#: 15-7595)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7158474&isnumber=7158456



Arora, D.; Malik, P., “Analytics: Key to Go from Generating Big Data to Deriving Business Value,” in Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on, vol., no., pp. 446–452, March 30

2015–April 2 2015. doi:10.1109/BigDataService.2015.62

Abstract: The potential to extract actionable insights from Big Data has gained increased attention of researchers in academia as well as several industrial sectors. The field has become interesting and problems look even more exciting to solve ever since organizations have been trying to tame large volumes of complex and fast arriving Big Data streams through newer computing paradigms. However, extracting meaningful and actionable information from Big Data is a challenging and daunting task. The ability to generate value from large volumes of data is an art which combined with analytical skills needs to be mastered in order to gain competitive advantage in business. The ability of organizations to leverage the emerging technologies and integrate Big Data into their enterprise architectures effectively depends on the maturity level of the technology and business teams, capabilities they develop as well as the strategies they adopt. In this paper, through selected use cases, we demonstrate how statistical analyses, machine learning algorithms, optimization and text mining algorithms can be applied to extract meaningful insights from the data available through social media, online commerce, telecommunication industry, smart utility meters and used for variety of business benefits, including improving security. The nature of applied analytical techniques largely depends on the underlying nature of the problem so a one-size-fits-all solution hardly exists. Deriving information from Big Data is also subject to challenges associated with data security and privacy. These and other challenges are discussed in context of the selected problems to illustrate the potential of Big Data analytics.

Keywords: Big Data; business data processing; data integration; data mining; data privacy; learning (artificial intelligence); optimisation; statistical analysis; text analysis; Big Data integration; Big Data streams; analytical skills; analytical techniques; business benefits; business teams; business value; computing paradigms; data privacy; data security; enterprise architectures; information extraction; large-data volumes; machine learning algorithm; maturity level; one-size-fits-all solution; online commerce; optimization algorithm; security improvement; smart utility meters; social media; statistical analysis; telecommunication industry; text mining algorithm; Algorithm design and analysis; Big data; Companies; Machine learning algorithms; Security; Sentiment analysis; algorithms; big data; machine learning; review (ID#: 15-7596)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7184914&isnumber=7184847



Wingyan Chung; Saike He; Daniel Dajun Zeng; Victor Benjamin, “Emotion Extraction and Entrainment in Social Media: The Case of U.S. Immigration and Border Security,” in Intelligence and Security Informatics (ISI), 2015 IEEE International Conference on, vol., no., pp. 55–60, 27–29 May 2015. doi:10.1109/ISI.2015.7165939

Abstract: Emotion plays an important role in shaping public policy and business decisions. The growth of social media has allowed people to express their emotion publicly in an unprecedented manner. Textual content and user linkages fostered by social media networks can be used to examine emotion types, intensity, and contagion. However, research into how emotion evolves and entrains in social media that influence security issues is scarce. In this research, we developed an approach to analyzing emotion expressed in political social media. We compared two methods of emotion analysis to identify influential users and to trace their contagion effects on public emotion, and report preliminary findings of analyzing the emotion of 105,304 users who posted 189,012 tweets on the U.S. immigration and border security issues in November 2014. The results provide strong implication for understanding social actions and for collecting social intelligence for security informatics. This research should contribute to helping decision makers and security personnel to use public emotion effectively to develop appropriate strategies.

Keywords: behavioural sciences computing; public administration; security; social networking (online); US immigration; border security; business decisions; decision makers; emotion entrainment; emotion extraction; political social media; public emotion; public policy; security informatics; security issues; security personnel; social intelligence; social media networks; textual content; user linkages; Communities; Couplings; Informatics; Media; Public policy; Security; emotion; entrainment; influence; social media analytics; social network analysis; text mining (ID#: 15-7597)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7165939&isnumber=7165923 



Heath, F.F.; Hull, R.; Khabiri, E.; Riemer, M.; Sukaviriya, N.; Vaculin, R., “Alexandria: Extensible Framework for Rapid Exploration of Social Media,” in Big Data (BigData Congress), 2015 IEEE International Congress on, vol., no., pp. 483–490,

June 27 2015–July 2 2015. doi:10.1109/BigDataCongress.2015.77

Abstract: The Alexandria system under development at IBM Research provides an extensible framework and platform for supporting a variety of big-data analytics and visualizations. The system is currently focused on enabling rapid exploration of text-based social media data. The system provides tools to help with constructing “domain models” (i.e., Families of keywords and extractors to enable focus on tweets and other social media documents relevant to a project), to rapidly extract and segment the relevant social media and its authors, to apply further analytics (such as finding trends and anomalous terms), and visualizing the results. The system architecture is centered around a variety of REST-based service APIs to enable flexible orchestration of the system capabilities, these are especially useful to support knowledge-worker driven iterative exploration of social phenomena. The architecture also enables rapid integration of Alexandria capabilities with other social media analytics system, as has been demonstrated through an integration with IBM Research’s SystemG. This paper describes a prototypical usage scenario for Alexandria, along with the architecture and key underlying analytics. 

Keywords: Big Data; data analysis; data visualisation; social networking (online); text analysis; Alexandria system; IBM Research SystemG; REST-based service API; big-data analytics; big-data visualization; domain model; extensible framework; knowledge-worker driven iterative exploration; rapid text-based social media data exploration; social media documents; social media extraction; social media segmentation; tweets; Analytical models; Data visualization; Government; Indexes; Media; Twitter; analytics exploration; analytics process management; social media analytics; text analytics (ID#: 15-7598)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7207261&isnumber=7207183



Agrawal, P.K.; Alvi, A.S., “Textual Feedback Analysis: Review,” in Computing Communication Control and Automation (ICCUBEA), 2015 International Conference on, vol., no., pp. 457–460, 26–27 Feb. 2015. doi:10.1109/ICCUBEA.2015.95

Abstract: Internet has become a more popular medium of sharing the opinions or feedback about particular topics. The feedbacks are often in the form of numerical ratings and text. Numerical ratings are easily processed but waste amount of unstructured textual data present on the internet in the forms of web blogs, emails, customer experiences, tweets etc that is left unprocessed. This data should be processed in order to retrieve more specific opinions that will be helpful in making more appropriate decision. In this paper we reviewed all the different approaches that are used for processing the text data. The different approaches help us to identified challenges and scope present in the textual feedback analysis.

Keywords: information analysis; information retrieval; text analysis; feedback sharing; numerical ratings; opinion retrieval; opinion sharing; textual feedback analysis; Accuracy; Data mining; Electronic mail; Internet; Organizations; Pragmatics; Sentiment analysis; Natural language processing; Text analytics; feedback analysis; opinion and sentiment analysis (ID#: 15-7599)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7155888&isnumber=7155781



Polig, R.; Giefers, H.; Stechele, W., “A Soft-Core Processor Array for Relational Operators,” in Application-specific Systems, Architectures and Processors (ASAP), 2015 IEEE 26th International Conference on, vol., no., pp. 17–24, 27–29 July 2015. doi:10.1109/ASAP.2015.7245699

Abstract: Despite the performance and power efficiency gains achieved by FPGAs for text analytics queries, analysis shows a low utilization of the custom hardware operator modules. Furthermore the long synthesis times limit the accelerator’s use in enterprise systems to static queries. To overcome these limitations we propose the use of an overlay architecture to share area resources among multiple operators and reduce compilation times. In this paper we present a novel soft-core architecture tailored to efficiently perform relational operations of text analytics queries on multiple virtual streams. It combines the ability to perform efficient streaming based operations while adding the flexibility of an instruction programmable core. It is used as a processing element in an array of cores to execute large query graphs and has access to shared co-processors to perform string-and context-based operations. We evaluate the core architecture in terms of area and performance compared to the custom hardware modules, and show how a minimum number of cores can be calculated to avoid stalling the document processing.

Keywords: field programmable gate arrays; query processing; text analysis; FPGA; area resource sharing; context-based operation; document processing; field programmable gate array; hardware operator modules; instruction programmable core; overlay architecture; processing element; relational operators; soft-core processor array; static queries; streaming based operation; string-based operation; text analytics queries; virtual streams; Arrays; Field programmable gate arrays; Hardware; Radio frequency; Random access memory; Registers (ID#: 15-7600)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7245699&isnumber=7245687



Jain, S.; Gupta, L.; Bora, J.; Baghel, A.K., “Context-Preserving Concept Cloud,” in Computing, Communication & Automation (ICCCA), 2015 International Conference on, vol., no., pp. 595–599, 15–16 May 2015. doi:10.1109/CCAA.2015.7148477

Abstract: Word cloud is an amazing tool that is accelerating in the Internet and has gained much appreciation in text analytics. Though word clouds can summarize the document at one go without actually going through the document but they do not preserve the context of the source text. A concept cloud considers the topics or phrases as keywords rather than words. It is context preserving in the sense that it indulges the importance of those keywords that are importuned in nature. Hence it helps a user to gain the actual insight and deduce ideas after examining the relatedness among the keywords. This paper introduces an approach where a concept cloud will be generated by preserving the context of the document.

Keywords: cloud computing; text analysis; Internet; context-preserving concept cloud; document context preservation; keywords; phrases; text analytics; topics; word cloud; Automation; Color; Context; Layout; Semantics; Tag clouds; Visualization; concept cloud; semantic preservingness; text visualization (ID#: 15-7601)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7148477&isnumber=7148334



Seonggyu Lee; Jinho Kim; Sung-Hyon Myaeng, “An Extension of Topic Models for Text Classification: A Term Weighting Approach,” in Big Data and Smart Computing (BigComp), 2015 International Conference on, vol., no., pp. 217–224, 9–11 Feb. 2015. doi:10.1109/35021BIGCOMP.2015.7072834

Abstract: Text classification has become a critical step in big data analytics. For supervised machine learning approaches to text classification, availability of sufficient training data with classification labels attached to individual text units is essential to the performance. Since labeled data are usually scarce, however, it is always desirable to devise a semi-supervised method where unlabeled data are used in addition to labeled ones. A solution is to apply a latent factor model to generate clustered text features and use them for text classification. The main thrust of the current research is to extend Latent Dirichlet Allocation (LDA) for this purpose by considering word weights in sampling and maintaining balances of topic distributions. A series of experiments were conducted to evaluate the proposed method for classification tasks. The result shows that the topic distributions generated by the balance weighted topic modeling method add some discriminative power to feature generations for classification.

Keywords: Big Data; data analysis; learning (artificial intelligence); natural language processing; pattern classification; pattern clustering; text analysis; Big Data analytics; LDA; balance weighted topic modeling method; classification labels; clustered text feature generation; discriminative power; individual text units; labeled data; latent Dirichlet allocation; latent factor model; supervised machine learning approach; term weighting approach; text classification; topic distribution; training data; word weights; Data models; Feature extraction; Resource management; Text categorization; Training; Training data; Vocabulary; Latent Dirichlet Allocation; Topic modeling; feature generation; text clustering (ID#: 15-7602)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7072834&isnumber=7072806 



O’Shea, N.; Pence, J.; Mohaghegh, Z.; Ernie Kee, “Physics of Failure, Predictive Modeling & Data Analytics for LOCA Frequency,” in Reliability and Maintainability Symposium (RAMS), 2015 Annual, vol., no.,  pp. 1–7, 26–29 Jan. 2015. doi:10.1109/RAMS.2015.7105125

Abstract: This paper presents: (a) the Data-Theoretic methodology as part of an ongoing research which integrates Physics-of-Failure (PoF) theories and data analytics to be applied in Probabilistic Risk Assessment (PRA) of complex systems and (b) the status of the application of the proposed methodology for the estimation of the frequency of the location-specific loss-of-coolant accident (LOCA), which is a critical initiating event in PRA and one of the challenges of the risk-informed resolution for Generic Safety Issue 191 (GSI-191) [1]. The proposed methodology has the following unique characteristics: (1) it uses predictive causal modeling along with sensitivity and uncertainty analysis to find the most important contributing factors in the PoF models of failure mechanisms. This model-based approach utilizes importance-ranking techniques, scientifically reduces the number of factors, and focuses on a detailed quantification strategy for critical factors rather than conducting expensive experiments and time-consuming simulations for a large number of factors. This adds validity and practicality to the proposed methodology. (2) Because of the evolving nature of computational power and information-sharing technologies, the Data-Theoretic method for PRA expands the classical approach of data extraction and implementation for risk analysis. It utilizes advanced data analytic techniques (e.g., data mining and text mining) to extract risk and reliability information from diverse data sources (academic literature, service data, regulatory and laboratory reports, expert opinion, maintenance logs, news, etc.) and executes them in theory-based PoF networks. (3) The Data-Theoretic approach uses comprehensive underlying PoF theory to avoid potentially misleading results from use of solely data-oriented approaches, as well as support the completeness of the contextual physical factors and the accuracy of their causal relationships. (4) When the important factors are identified, the Data-Theoretic approach applies all potential theory-based techniques (e.g., simulation and experimentation).

Keywords: data analysis; fission reactor accidents; academic literature; complex systems; computational power; data analytics; data extraction; data mining; data sources; data-theoretic methodology; failure mechanisms; frequency estimation; generic safety issue; importance-ranking techniques; information-sharing technologies; location-specific loss-of-coolant accident; maintenance logs; model-based approach; physics-of-failure theories; potential theory-based techniques; predictive causal modeling; probabilistic risk assessment; risk-informed resolution; sensitivity analysis; service data; text mining; time-consuming simulations; uncertainty analysis; Analytical models; Data models; Estimation; Failure analysis; Frequency estimation; Mathematical model; Stress; LOCA frequency; Probabilistic Physics of Failure; Probabilistic Risk Assessment (ID#: 15-7604)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7105125&isnumber=7105053 



Leggieri, M.; Davis, B.; Breslin, J.G., “Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Prediction,” in Wireless Communications and Mobile Computing Conference (IWCMC), 2015 International, vol., no.,

pp. 210–215, 24–28 Aug. 2015. doi:10.1109/IWCMC.2015.7289084

Abstract: The logging of Activities of Daily Living (ADLs) is becoming increasingly popular mainly thanks to wearable devices. Currently, most sensors used for ADLs logging are queried and filtered mainly by location and time. However, in an Internet of Things future, a query will return a large amount of sensor data. Therefore, existing approaches will not be feasible because of resource constraints and performance issues. Hence more fine-grained queries will be necessary. We propose to filter on the likelihood that a sensor is relevant for the currently sensed activity. Our aim is to improve system efficiency by reducing the amount of data to query, store and process by identifying which sensors are relevant for different activities during the ADLs logging by relying on Distributional Semantics over public text corpora and unsupervised hierarchical clustering. We have evaluated our system over a public dataset for activity recognition and compared our clusters of sensors with the sensors involved in the logging of manually-annotated activities. Our results show an average precision of 89% and an overall accuracy of 69%, thus outperforming the state of the art by 5% and 32% respectively. To support the uptake of our approach and to allow replication of our experiments, a Web service has been developed and open sourced.

Keywords: Internet of Things; computerised instrumentation; query processing; sensors; unsupervised learning; ADL logging; Web service; activities of daily living; distributional semantics; fine grained queries; public text corpora; sensor data; sensor relevancy prediction; unsupervised clustering; unsupervised hierarchical clustering; wearable devices; Accuracy; Art; Artificial intelligence; Cleaning; Hidden Markov models; Semantics; Sensors; Distributional Semantics; Human Activity Recognition; Sensor Network; Sensor Selection; Unsupervised Learning (ID#: 15-7605)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7289084&isnumber=7288920

 

Trovati, Marcello; Hodgsons, Philip; Hargreaves, Charlotte, “A Preliminary Investigation of a Semi-Automatic Criminology Intelligence Extraction Method: A Big Data Approach,” in Intelligent Networking and Collaborative Systems (INCOS), 2015 International Conference on, vol., no., pp. 454–458, 2–4 Sept. 2015. doi:10.1109/INCoS.2015.37

Abstract: The aim of any science is to advance the state-of-the-art knowledge via a rigorous investigation and analysis of empirical observations, as well as the development of new theoretical frameworks. Data acquisition and ultimately the extraction of novel knowledge, is therefore the foundation of any scientific advance. However, with the increasing creation of data in various forms and shapes, identifying relevant information from structured and unstructured data sets raises several challenges, as well as opportunities. In this paper, we discuss a semi-automatic method to identify, analyse and generate knowledge specifically focusing on Criminology. The main motivation is to provide a toolbox to help criminology experts, which would potentially lead to a better understanding and prediction of the properties to facilitate the decision making process. Our initial validation shows the potential of our method providing relevant and accurate results.

Keywords: Algorithm design and analysis; Big data; Focusing; Force; Sentiment analysis; Text mining; Criminology; Data analytics; Information extraction; Knowledge discovery; Networks (ID#: 15-7606)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7312116&isnumber=7312007

 

Bari, N.; Vichr, R.; Kowsari, K.; Berkovich, S.Y., “Novel Metaknowledge-Based Processing Technique for Multimedia Big Data Clustering Challenges,” in Multimedia Big Data (BigMM), 2015 IEEE International Conference on, vol., no., pp. 204–207, 20–22 April 2015. doi:10.1109/BigMM.2015.78

Abstract: Past research has challenged us with the task of showing relational patterns between text-based data and then clustering for predictive analysis using Golay Code technique. We focus on a novel approach to extract metaknowledge in multimedia datasets. Our collaboration has been an on-going task of studying the relational patterns between data points based on met features extracted from metaknowledge in multimedia datasets. Those selected are significant to suit the mining technique we applied, Golay Code algorithm. In this research paper we summarize findings in optimization of metaknowledge representation for 23-bit representation of structured and unstructured multimedia data in order to be processed in 23-bit Golay Code for cluster recognition.

Keywords: data mining; knowledge representation; multimedia computing; pattern clustering; text analysis; 23-bit representation; Golay code technique; cluster recognition; metaknowledge representation; metaknowledge-based processing technique; mining technique; multimedia big data clustering challenges; multimedia datasets; predictive analysis; relational patterns; structured multimedia data; text-based data; unstructured multimedia data; Big data; Conferences; Multimedia communication; 23-Bit Meta-knowledge template; Big Multimedia Data Processing and Analytics; Content Identification; Golay Code; Information Retrieval Challenges; Knowledge Discovery; Meta-feature Extraction and Selection; Metalearning System (ID#: 15-7607)

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7153879&isnumber=7153824


Note:

Articles listed on these pages have been found on publicly available internet pages and are cited with links to those pages. Some of the information included herein has been reprinted with permission from the authors or data repositories. Direct any requests via Email to news@scienceofsecurity.net for removal of the links or modifications to specific citations. Please include the ID# of the specific citation in your correspondence.