NDM provides an opportunity to integrate data to obtain a single address source, a common view for disparately sourced data. Hence, a decision as to which sources are accessible and most relevant to a problem is the initial consideration. Having arranged and decided the sources, the next question relates to modelling, such as which fields of data should serve as entities/nodes (and attributes on the entities) and which should serve as links (and attributes on the links). Multiple data models can and often are created to address a particular problem.
The sources will contain entities and a myriad of linkages between them and must be presented to the screen in a visually meaningful, color-coded way so as to simplify and facilitate the discovery of underlying patterns of data items that are linked to other items and on-linked to still other items. This is especially so with large volumes of data, e.g. many millions of links and entities which the user must be able to easily address and then readily make sense of from an interpretative and analytical point of view.
‘Train of thought’ analysis linkage between data items means that the discovery of patterns can be a process whereby the analyst uses an intuitive approach. Explicit querying is less often used. For example, “Why are all those black links going over there? What are they attached to, and in turn what are they attached to?” Such ‘train of thought’ processes invariably lead the analyst to discover patterns or trends that would not be possible via more explicit querying or exception-based approaches – for the specification of the queries is not known. Hence, intuition and cognition are integral, and need to be harnessed in the analytical process especially at the discovery phase in those cases where there is only limited domain knowledge to guide the analyst to an understanding.
Below a clear easy to view visualisation of a generic problem showing that U2.0 (top left) is linked to many nodes, maybe people, maybe credit cards or addresses? Notice on the left at (9 oclock) there is a green link that only has one link to U1.0 but via that one link that node is in fact linked most of the other nodes on the visualisation. At this stage the “intuition” comes into play is U1.0 in fact a director of an offshore company in a tax haven hiding behind many company structures? Or a claims investigator who via one connection is processing claims for all his family and friends? That’s the realm of the analyst.
Discovery is an emergent process, not a prescriptive one. It is not possible to prescribe ahead of time all the query rules and exception criteria that should apply to a problem, otherwise the problem would have already been solved. By taking an emergent or bottom up approach to what the data is ‘saying’, patterns and linkages can be discovered in a way that is not too different from ‘good old fashioned’ policing, where curiosity and intuition have always been integral in the ability to discover the facts, then qualifying them and solving the
Any linkage pattern observed on screen is simply that, an observation of potential interest. In the context of retail loss prevention for example, any sales assistant with a high ratio of refunds to sales (statistically flagged) might attract attention. But what if the perpetrators of a scam knew about such exception criteria? In a previous NetMap case, long term employees “in the know”, remained undetected, until a visualization step was implemented. They took it in turns to report levels of refunds always just under the limits no matter what the limits were varied to over an extensive period.
Network data mining is complementary to statistical summarizing and exception detection data mining. NetMap Visualisation is particularly useful in the discovery phase of finding things previously unknown. Once discovered, particular patterns and abnormal behaviours and exceptions are of course able to be better defined, and these scenarios interfaced with relational databases and any original sources of data.
NetMap Data Mining identifies emergent groups within myriads of individual data items and utilizes special algorithms that aid visualization of ’emergent’ patterns and trends in the linkage. It complements conventional data mining methods, which assume the independence between the attributes and the independence between the values of these attributes. These techniques typically flag, alert or alarm instances or events that could represent anomalous behaviour or irregularities because of a match with pre-defined patterns or rules. They serve as ‘exception detection’ methods where the rules or definitions of what might constitute an exception are able to be known and specified ahead of time. Many problems are suited to this approach. Many other problems however, especially those of a more complex nature, are not well suited. The rules or definitions simply cannot be specified. This is NetMap Analytics territory.