DataWalk

An Interview with Chris Westphal

Chris Westphal of DataWalk

Chris Westphal’s innovations have helped transform intelligence analytics. Westphal created Visual Analytics Inc., which he sold to Raytheon in 2013. That company provided an analytical solution called Data Clarity® which could be implemented quickly and tailored to the needs of specific investigative teams within an organization. The system federated information from a variety of sources. It provided a point-and-click solution which gets users up and running without the lengthy and complex installation and configuration processes some systems require.

Westphal pioneered many techniques and methods for data harmonization, federated queries, entity resolution, disambiguation routines, and visualizations to address a wide range of analytical complexities. Westphal joined DataWalk in 2017 as the startups chief analytics officer. The US government and other organizations continue to encounter data challenges. DataWalk has generated significant interest within the intelligence community and law enforcement because it has an approach which allows developers and users to reduce the time required to respond to the demand for immediate analysis of real time information flows and the appetite for systems which can be used by a professional who must juggle operational duties with data access and analysis. Not every organization has a professional dedicated to analytics of the caliber of Chris Westphal. DarkCyber learned that one of DataWalk’s most significant innovations is a platform which provides point-and-click access to the expertise of an experienced data analyst. Instead of hours in the classroom, DataWalk, in DarkCyber’s view, equips an intelligence or law enforcement professional with icons able to perform automatically many of the activities the actual Chris Westphal has performed over his professional career. DarkCyber interviewed Westphal in June 2019.

The full text of that interview appears below:

DarkCyber: Thanks for agreeing to speak with me, Chris. I interviewed you before you sold Visual Analytics to Raytheon in 2013. [Editor’s note: You can read that interview in the Search Wizards Speak series of interviews.] Would you bring me up to date on your background?

I co-founded Visual Analytics Inc (VAI) in 1998 and we pioneered many of the entity-based analytical techniques still in use today. For over 15 years, I oversaw our corporate operations and was involved in every aspect of the business including contracts, project management, partner programs, proposal generation, customer support, sales, business development, training, engineering, and systems design. During this time, the company supported many analytical operations around the world and I remained very hands-on with our customer base.

After the business was acquired by Raytheon (RTN) in 2013, my role transitioned to overseeing larger pursuits (systems-of-systems) while working with very capable and experienced teams. The work was more process-oriented, repeatable, and scalable. It was an incredible experience and I forged some great friendships and business relationships during my tenure with them.

Once I connected with DataWalk, I knew it would be a perfect fit. Still operating as a startup, there was plenty of room to directly affect the platform and the overall corporate operations. I’ve drawn heavily on my experiences at VAI and RTN to contribute and outfit the company with the necessary inputs, guidance, and support needed to become the acknowledged standard in analytical platforms used throughout the community.

DarkCyber: Why did you decide to join DataWalk?

Easy. It was the technology, the people, and the vision. DataWalk is a disruptive technology that makes analytics easier by democratizing data using intuitive visualizations to expose patterns and trends while capturing organizational knowledge for reuse, alerts, and audit. End users are in control of their data and effortlessly integrate different sources and content (at scale) to support their investigations.

The system gets “smarter” by encoding the analytical workflows used to query the data; it stores the steps, values, and filters to produce results thereby delivering more consistency and reliability while minimizing the training time for new users. These workflows (aka “easy buttons”) represent domain or mission-specific knowledge acquired directly from the client’s operations and derived from their own data; a perfect trifecta!

DarkCyber: How are DataWalk’s customers using the system?

DataWalk is actively used to identify money launderers, track TCOs (transnational criminal organizations), detect human trafficking networks, expose fraud-rings, explore communication patterns, analyze crypto currency flows, and support counter-terrorism operations. Our customers regularly come up with new applications of our technology. We’re learning that there is no limitation or restriction to the domain, data, or analyses.

DarkCyber: Some of the analytics systems available today can hit six or seven figures. What’s your sweet spot?

DataWalk is also a cost-affordable, integrator friendly platform that’s extensible to accommodate the integration of other technologies such as natural language processing (NLP), machine learning (ML), and image recognition; plus, it connects to a wide range of external data sources (for example, open-source, subscription-based, or deep-web). We look forward to partnering with system integrators and other technology providers to offer the marketplace a viable alternative to the status quo. We tailor our configurations to the needs of our partners, integrators, and customers. We are very competitive on price and the total cost of ownership, which is one of our major advantages.

DarkCyber: What’s the allure of analytics for you?

I’ve been working in Washington DC for almost 35 years. Early in my career, I was a hardcore programmer developing expert systems for a variety of DARPA projects on Symbolics (LISP) and Sun Microsystems (C/C++). I had to clearly understand the logistics, workflows, and sequences used to reason across very complex and technical data sets. I then went to work at a federally funded research and development center (FFRDC) to support a project focused on the FBI’s terrorism and organized crime systems using advanced reasoning techniques (Prolog) and cutting-edge visualizations (X-Windows/X11). It was my first-time delivering outputs directed at human users (verses a machine) such as analysts and special agents. I had to start thinking like a “user” to understand the world from their perspective. I was seeing things that no one else was able to find. It was exciting.

DarkCyber: Has that been easy?

No, and, in fact, I am still learning. Developing systems for partners and customers today means that we have to understand the requirements of professionals who grew up with mobile devices and work environments in which people rotate in and out often rapidly. It’s not just numbers. DataWalk addresses ease-of-use, interface requirements, and outputs that are easily digestible. DataWalk is in a leadership position, and that means my colleagues and I are in learn mode 24x7.

DarkCyber: What other work experiences have influenced DataWalk?

Criminals try to avoid detection and I had to stay several steps ahead of them. As you know, I worked very closely with two great organizations; FinCEN (Financial Crimes Enforcement Network) and IRS-CID (Criminal Investigations Division) – setting up their analytical systems for conducting large-scale money laundering investigations and targeted for use by hundreds of analysts utilizing dozens of data sources. Out of necessity, I established a methodological framework to represent and analyze data from virtually any source, type, or format. Traditional methods did not deliver actionable outputs with the consistency I wanted.

DarkCyber: What did you do?

I had to create novel methods to address data quality issues, value inconsistencies, and pattern exceptions. I pioneered new algorithms to exploit metadata content, created heuristics to perform entity-resolution, and researched new techniques to proactively identify suspicious behaviors and expose high-value targets.

DarkCyber: What was the payoff in terms of your understanding of real world application of advanced analytics?

Through my years of experience, I clearly understood the data, the analytical methods, and the expected outcomes. But I realized that every time I sat down to perform an analysis, I was back at square-one and had to recreate the process each and every time; what to query, when to compare results, how to find new targets, etc. The systems I personally designed, created, and deployed, albeit very capable, required a lot of user interaction, domain expertise, and familiarity to operate. This was my Achilles’ heel. I needed a better way to encode the analytical expertise into the software to support the end-user community regardless of their affiliation, experience, or background.

DarkCyber: How are you addressing the problem of “one Chris Westphal” for many demanding customers?

That’s where DataWalk comes into the picture. When I first encountered DataWalk, they thought their primary differentiator was scaling to billions of records or their speed-to-operational-results faster or their methods for integrating multiple-sources more efficient. Of course, all these features offered under one platform is clearly a major accomplishment compared to the limitations of other products in the marketplace – as no single system could previously deliver all of these benefits in their offering.

DarkCyber: Okay, that addresses scaling and real time, but there is still only “one Chris Westphal” and without you, there’s no way to apply your expertise and experience to a problem that has to be solved in the next 10 minutes, right?

You’re correct. However, when I saw DataWalk’s ability to capture the analytical processes directly from the interactions of the users, where each set, filter, or connection selected is stored as a breadcrumb and collectively saved into a unique workflow (for example, an “easy button”), in my opinion, they had the Holy Grail for analytics. Finally, a generalized way to consistently support large numbers of users on their own data using their own context with an auditable, repeatable, and adaptable method to generate more consistent and reliable results.

DarkCyber: DataWalk let’s a customer have “Chris Westphal” and his capabilities encapsulated into an icon. Click the icon, and you get the outputs that you can deliver, just via DataWalk. Is that a fair analogy?

I never thought of myself as an icon, but, yes, the workflow idea is one of the ingredients in the DataWalk secret sauce. Not only do these workflows capture the domain expertise of the users and offer management insights and metrics into their operations such as utilization, performance, and throughput, they also form the basis for scoring any entity in the system. DataWalk allows users to create risk scores for any combination of workflows, each with a user-defined weight, to produce an overall, aggregated score for every entity. Want to find the most suspicious person? Easy, just select the person with the highest risk-score and review which workflows were activated. Simple. Adaptable. Efficient.

DarkCyber: What areas of research are of interest to you and your colleagues at DataWalk?

A primary and fundamental requirement of enterprise-class analytic tools is scaling to large volumes of data (billions of objects). DataWalk achieves this using a horizontally scalable architecture for storing and processing data, with a dynamic and adaptive data model to handle new data types, media, and requirements. Furthermore, DataWalk technology also solves three significant problems often associated with horizontal scalability regardless of the business model or data mapping performed. First, there’s even distribution of data across multiple nodes, without predefined content. Next, no data re-balancing needed to execute any query across any data set. And, we deliver maximum information-join on stored content in a single compute node.

DarkCyber: What about storage of inputs and analytic outputs?

Storage is usually not exciting, but it impacts the analytic outputs. We think about storage a lot, and we believe our approach is reasonably exciting. Our commercial-grade data solution provides flexible information management with high efficiencies required for deploying enterprise systems. We think this is unique. Plus we developed patented technology that allows users to ask any questions via simple, intuitive visualizations. No SQL, scripts, or other programming languages are needed. This technology delivers fast, sophisticated, multi-dimensional analyses that rapidly execute on multi-billion record data sets.

DarkCyber: One of the problems organizations face is disparate information. What do you do with unstructured content?

We can integrate different data types and structures, from many sources, into one cohesive picture. Our approach reflects a natural, human perception of information and makes DataWalk an easy-to-use system for performing complex analytics. DataWalk also supports large numbers of users, multiple workflows, frequent alerts, active security controls, and supports a flexible and powerful API framework. We continually update and refine our platform to ensure it meets the demanding needs of large, enterprise-scale deployments.

DarkCyber: One of the complaints I have heard about some of the analytic vendors is that no one listens to the person who has to figure out how to do something. Suggestions and complaints are just ignored. What’s the approach to feedback at DataWalk?

We listen carefully to our end-user community. We actively solicit their feedback and we prioritize their inputs. We try to solve problems versus selling licenses. When a new suggestion arrives, we match it against our road map and adjust to meet a wide variety of requirements and usage conditions from our user base. Traditionally, many investigations require users to access multiple systems, run different queries, and correlate all the output into a consolidated report or file. This is labor intensive, takes considerable time, and is subject to inconsistencies, omissions, and errors. Therefore, an active area of “development” at DataWalk is focused on interfacing to a wide range of data providers and other technology companies. We want to create a seamless user experience that maximizes the utility of the system in the context of our client’s operational environments.

DarkCyber: At the June 2018 TechnoSecurity & Digital Forensics Conference, I heard a number of complaints about interfaces. Systems are too difficult to use, even for investigators who interact with a system a couple of times a week. What is DataWalk’s interface philosophy?

We’re also innovating in a number of areas to include creating alternative interfaces to conduct federated searches, discovering new ways to visually represent complex networks, establishing methods to correlate independent events, working on advanced techniques for entity-resolution, and delivering active entity-deconfliction among investigations and cases. With several existing patents, and many more filed, DataWalk uniquely overcomes the limitations and restrictions encountered in many other commercial systems through innovation and teamwork.

DarkCyber: What are the benefits to a partner, commercial organization working with your firm?

I strongly advocate partnering and prefer working with vendors, consultants, technologies, and companies where each contributes their expertise and experiences. I specifically want to partner with System Integrators (SI) to deploy DataWalk into vertical marketplaces throughout the public sector, servicing law enforcement, homeland security, accountability offices (risk assessment), and the intelligence community. SI’s provide the infrastructure, personnel, contract vehicles, security clearances, past performance, and qualifications while DataWalk delivers the analytical platform and support necessary to ensure success. Our roles are clear, complementary, and transparent. Together, we’ll provide a working solution to solve some of the hardest problems facing our client base.

DarkCyber: Do you have a partner program in place?

Yes, we have an established partnership program where we can offer our combined products and services to deliver complete end-to-end solutions. The DataWalk Partner Program significantly increases revenues by leveraging complementary strengths, while serving and supporting customers within the government, commercial and international communities. This provides incremental revenue opportunities through discounted software licenses while expanding a firm’s overall corporate capabilities, experience, and presence in the ever-growing analytical marketplace.

DarkCyber: Is there an option for a third-party to license your system and create a solution for a specific market sector similar to an original equipment manufacturer providing parts to Ford Motor Company?

Yes, we are also white-labeling our system for “special projects” with companies working in very specific verticals where adopting the DataWalk framework is more economical than building an in-house capability from scratch. It alleviates the upfront costs, overhead, dedicated resources, and infrastructure needed to implement a custom solution while allowing the company to remain focused on their core strengths and offerings. They receive a modern, commercial-grade, fully-supported analytical system configured to meet their specific needs at a fraction of the cost. We think this is a win-win-win situation. Our goal is to create success for the users, the partners, and DataWalk.

DarkCyber: Let’s talk about the competition. What’s your view of the current marketplace for law enforcement and intelligence systems?

I’ve read some horror stories from the published media regarding some of the mainstream and status-quo systems currently in the marketplace. Certainly, there are some “real” differences between our platforms. For one competitive system, all data must be converted into their internal format; which requires additional time, effort, and costs to reformat the data into their proprietary ontology. This is also perceived as a type of “vendor lock-in” which is why a number of agencies discontinued the use of their platform.

DarkCyber: And what about the cost of some systems available as commercial off the shelf solutions?

Pricing is an issue for some of the companies in this market sector. If you review published GSA Schedules, the cost of a single processing core-license for one competitor is more than four times as expensive as a DataWalk core-license. Often some of these vendors deliver installations which require millions of dollars in licenses fees to stand up a basic system. Additionally, some vendors require extensive consulting time for their forward deployed engineers to configure the system to meet basic customer needs. Plus, their ongoing maintenance costs require a proportional level of funding.

DarkCyber: Are the law enforcement, intelligence, security, and financial services customers able to procure systems from newcomers like DataWalk?

Yes, there is significant interest in alternatives. Many system integrators operating in the government space compete to deliver an end-to-end solution, thereby limiting a client’s choice of what vendor can provide the licenses, on-site services, and related support. Our business model at DataWalk is to work with integrators, not compete with them. Since DataWalk is a COTS or commercial off the shelf platform with feature road maps, APIs, manuals, training guides, and regular updates, we work very well with clients and their integrators to get them proficient on our platform. This approach saves significant resources and costs for ongoing operations and maintenance.

DarkCyber: What the competitive fence around DataWalk’s systems and methods?

As for competitive barriers, a major factor is the unique capabilities enabled by our patented technologies, as they out-perform and out-deliver the competition. In general, we’ve set the bar very high in terms of analytical functionality, usability, quality of support, and the affordability of our system. The DataWalk team is very experienced and dedicated to the mission. We are continually innovating, improving, and always moving forward.

DarkCyber: Tell me about the application programming interface for DataWalk, please?

As a COTS platform, DataWalk maintains APIs and open-standards to ensure data can be imported/exported without restriction. From an integration perspective for machine-to-machine import/export, DataWalk consists of several transfer interfaces for use by external processes for more seamless and automated integration.

External interfaces are documented for use by integrators, clients, or third-party platforms. Any interface schema change which impacts an existing installation is published in the change log. Interfaces are also versioned to help minimize any impact to existing operations.

DarkCyber: Would you share a use case for DataWalk?

Absolutely. DataWalk could be used by an immigration agency to “risk score” people entering the country. The datasets loaded into DataWalk could include arrival/departure data, criminal records, financial transactions, social media posts, public records, intelligence reports, and a host of other content including several watch lists. The name and passport information could be pushed from the inspector’s station (via a passport reader) onto a message queue where DataWalk APIs could then process the content against the defined workflows and produce their respective scores. The results sent back, also via the API, to the inspector would be a simple pass/fail score to enter the country or go to a secondary inspection, plus any additional relevant information.

DarkCyber: Is DataWalk limited to law enforcement, security, and intelligence applications?

No, we’ve also created streamlined web-interfaces to perform enterprise searches on particular content where the results are shown as a basic HTML table. And, there has been increasing interest in creating mobile apps to deliver specific features and functionality to field operations via a mobile device. DataWalk is designed for adaptability and is embeddable as part of a system of systems. It also makes “white labeling” the platform very straightforward because it’s easy to configure and control the functionality.

DarkCyber: Do you have any partnerships in place which you can talk about?

Yes,  we have an ever-expanding foundation of relationships with consulting firms, system integrators, technology companies, and commercial data-providers. Our goal is not to reinvent the wheel, especially when another company has already created a viable capability, dataset, or technology for a specific purpose; we’d rather partner. Companies like Whooster and ShadowDragon have been great partners and very accommodating with providing API access to their platforms. It’s something we want to promote and show in our demonstrations. This also removes perceived “risk” from a client’s perspective since the integration is already in place. Creating new connections to other providers is very straightforward and we’ve built methods to intake content from systems such as the Cellebrite, Rosoka, Rosette, Blockchain, Webhose.io, Whitepages.com, and many others. We are also well positioned to accommodate content from Thomson Reuters, TransUnion, and LexisNexis.

DarkCyber: Can clients integrate their data into DataWalk?

Yes and this “bring your own data” also touches again on white-labeling the DataWalk system as companies can create their own ecosystems driven by different connectors, content, workflows, and interfaces. The intellectual property they create is in how they structure, combine, and define its operations and deliver a very capable and mature product to their marketplace. In concept, it is very similar to the strategy used by the Unreal Engine adopted throughout the gaming industry.

DarkCyber: Can you share some of the DataWalk visualizations with me?

Sure, here’s a visualization of the universe viewer, depicting all the data sets available to the user based on that user’s access permissions. The user simply clicks on the sets, applies filters, and immediately sees the results.

DataWalk Universe 
Viewer
DataWalk's Universe Viewer. Click for full size.

This allows the users to understand quickly what type of questions they can ask of the data. It is very easy to see. Plus, the approach follows a more natural way to represent connections among different sources.

This is a geo-location output showing content using a basic heat map icon. The geo-fence is placed in an area in which the activity and suspects are concentrated. Once again, these are hot linked, so the user can drill down into the underlying data; for example, latitude and longitude information.

DataWalk 
geo-location 
output
DataWalk's geo-location output. Click for full size.

DarkCyber: These are crisp and uncluttered.

Thank you, without a doubt, visualization is a key element of any modern-day analytical platform and within DataWalk there are many ways to present data including network diagrams, time lines, flow visualizations, maps, graphs, histograms, dossiers, reports, and tables. Each modality is dependent on the analytical context for depicting certain types of situations or conveying content. For example, using a network diagram to show transfers of money between bank accounts versus a geo-spatial map to show the locations for illicit massage parlors.

DarkCyber: What do you do to make it easy to know what’s going on under the DataWalk hood?

That’s a key question. Analysis is a process that often requires several iterations over the data. So DataWalk retains all the steps used to query and construct the results, like chapters in a book. Each step is presented as part of a story showing how the final results were derived. The user can review any step, return to a previous decision point, or jump right to the end. It makes it easier to understand the sequence used to derive the results and collaborate with other users. Also, within DataWalk, the interfaces for controlling the content are very consistent so users can quickly transition from a link chart to a dossier to a table-view to an object folder; navigating data in DataWalk is very easy and intuitive.

DarkCyber: Is your technology conforming to proprietary standards or open standards?

Absolutely. Open standards. Presenting outputs is done using open standards, depending on the content, including XLSX, CVS, PNG, PDF, and others. The APIs (previously discussed) are unbounded and content is easily shared, converted, or presented using any type of interface or format to meet operational needs. Within DataWalk, data is easy to get in and easy to get back out – there’s no vendor lock-in.

DarkCyber: Put on your wizard hat. What are the three most significant technologies that you see having an influence on the developments at DataWalk?

There are many factors that impact DataWalk’s development activity and road map planning. It’s hard to pick only three, but these are probably the most relevant to discuss: ML, NLP, and GA.

DarkCyber: Okay, lets start with machine learning.

Machine Learning (ML) is a very popular topic of discussion across many industries and certainly has utility within the DataWalk platform. We’ve already interfaced with specific ML methods for fraud-detection (insurance) and are looking at techniques to deliver better entity resolution for names and addresses, data quality processes for standardizing content, and schema mapping to automatically align data formats. We’re also looking to apply ML resources for detecting pattern-of-life (POL) behaviors (sense making) from very large and diverse graphs for exposing the recurring, sequential, and/or anomalous patterns associated with event-based (temporal and geo-spatial) sources.

DarkCyber: And natural language processing?

NLP integration is another important area for incorporating unstructured content. Having already interfaced with several platforms including Rosoka and Rosette there are many facets to consider, not only for entity resolution, but categorization, reference terms, and the use of certain constructs to help define the meaning of the content. Utilizing these dimensions helps us provide better risk-scoring and interpretation for documents, narratives, and reports.

DarkCyber: Graph algorithms?

Yes, Graph Algorithms or GA continue to be a hot-topic with respect to discovering insights into how entities are interconnected and their interpretation within specific contexts; especially if they can scale in performance. DataWalk is actively looking at all sorts of new approaches for extracting patterns, showing commonalities, defining collusion, and detecting pathways among the data. We’re researching Graph Convolutional Networks (GCNs) to leverage the information contained in the nodes and edges of the assimilated graph data and researching topology metrics for describing the geometry of targeted subgraphs (behavioral factors). Our technology enables us to execute graph algorithms very quickly, even across massive amounts of data.

DarkCyber: Thanks for speaking with me. One final question: Where does a reader get more information about your firm?

There is a considerable amount of information on our website www.datawalk.com. Also take a look at the videos since there are different scenarios covering lawful intercept, human trafficking, various frauds, intelligence operations, and crypto currency investigations. Each video presents a different viewpoint and covers distinctive functionality. Additionally, I’m always available to talk with anyone about analytics, functionality, partnering, teaming, or anything else! Feel free to drop us an email at: info @ datawalk.com

DarkCyber Comment

DarkCyber believes that DataWalk has leveraged analytics and data federation technology. The firm’s focus on the user is refreshing. However, the key competitive advantage DataWalk has seized up is its integration of workflow capabilities within it analytics platform. Other vendors include some workflow elements. But in our list of 50 intelligence centric platforms, integrated workflow capabilities are immature or non-existent. Therefore, DarkCyber believes that DataWalk is a vendor to watch and to invite to participate in request for information processes and bid on projects for security, law enforcement, and intelligence related projects.

Stephen E Arnold, publisher of “Dark Cyber Annex” and producer of DarkCyber, a weekly video news program for law enforcement and intelligence professionals. Access these information sources at www.arnoldit.com/wordpress.

July 2, 2019