Interview with Danny Rogers
For decades (if not centuries) correlating data about people, places, organizations, and events has been a manual job. Specialized training and on-the-job experience made it possible for an investigator to have learn that confidential or classified information has be compromised.
The problem with slow discovery of a data loss is that significant damage may have been done and been unknown for weeks, months or longer. Terbium Labs is in the business of making it easier and faster to experience the thrill of a previously-unknown connection between an organization and an event and a person of interest. Terbium is one of a very small number of high-technology companies able to use a digital equivalent of fingerprinting to identify compromised data whether on the Clear Web (that’s the Internet most people use each day and the Dark Web (the hidden content and services requiring special software to access).
It is becoming increasingly obvious to those engaged in investigations and intelligence activities for government entities or financial services firms that automation is long overdue. One problem is the volume of data that must be processed, analyzed, and converted to actionable information. The other challenge is the critical shortage of analysts, investigators, and intelligence professionals who can convert actionable intelligence into an arrest or an operation with feet on the street.
Terbium has developed technology that makes use of advanced mathematics—what I call numerical recipes—to perform analyses for the purpose of finding connections. The firm’s approach is one that deals with strings of zeros and ones, not the actual words and numbers in a stream of information. By matching these numerical tokens with content such as a data file of classified documents or a record of bank account numbers, Terbium does what strikes many, including myself, as a remarkable achievement.
Terbium can identify highly probable instances of improper use of classified or confidential information. Terbium can identify where the compromised data reside on either the Clear Web, another network, or on the Dark Web. Terbium can alert the organization about the compromised data and work with the victim of Internet fraud to resolve the matter in a satisfactory manner.
There are other applications of Terbium’s patented system and method. In this exclusive interview, Danny Rogers provides information about this remarkable technology and its benefits to law enforcement, security, and intelligence professionals. The full text of the interview appears below.
What’s the history of Terbium?
We founded Terbium Labs in late 2013 with the thesis that defense, while still necessary, is no longer sufficient – security is increasingly a risk management problem, as much or more than it is an IT problem. We felt that companies needed to be more proactive about detecting breaches and understanding where their data was appearing, so we started Terbium to fill that need.
What information challenges does this search-related service seek to resolve for its clients?
Right now, most data breaches are discovered by third parties such as journalists or law enforcement organizations. Companies are finding out after the fact, often at the same time as the public, that their sensitive data is out in the wild being exploited by criminals or competitors. We work to give clients the ability to know immediately and automatically when their data appears in unexpected places on the internet, without needing to share the contents of that data with us or anyone else for monitoring.
When and how did you become interested in indexing Dark Web content and what got you interested in this technical sector?
My co-founder Michael Moore and I had been doing research and development for the government, which is well supported when it comes to information security technologies. While there, we realized that the commercial sector was woefully underserved and was where the immediate damage was occurring. We felt there was a significant opportunity to bring our understanding and imagination regarding advanced threats to the commercial space.
Terbium appears to have been been investing in content intelligence and advanced analytics for a considerable period of time. What are the general areas of your development activities; for example, algorithms or user facing applications?
We spent a significant amount of time working on both the private data fingerprinting protocol and the infrastructure required to privately index the dark web. We pull in billions of hashes daily, and the systems and technology required to do that in a stable and efficient way are extremely difficult to build. Right now we have over a quarter trillion data fingerprints in our index, and that number is growing by the billions every day.
How did you develop your interest and expertise in processing Dark Web content using the technology that you hope to patent?
Our product was inspired by a conversation with an information security officer at a large financial institution. He indicated that he wanted to be able to find out immediately if a high-profile client list was ever leaked to the internet, but he could never give us that list. My background is in cryptography and Michael’s is in large-scale computing and advanced analytics. Together, we built Matchlight to meet precisely that need.
Almost every day there is news of some new hacker intrusion and theft of sensitive date from government or private business. What is your view of the current state of data vulnerability and hacker expertise? Are we in the midst of a cyber war?
I wouldn’t say we’re in a cyber war; war implies death and destruction, and right now the damage that is occurring is primarily financial and reputational. That said, what we are seeing is an unprecedented level of sophistication in the threats being turned on commercial industry. Traditionally, these capabilities were reserved for government-to-government activities. Now, they are going after commercial entities, which have never had to respond to this level of sophistication or aggression. I’d liken the current period of insecurity on the internet much more to the high seas of the 18th century or the wild west of the 19th century, where the rule of law was tenuous and crime was rampant. Some day we will develop something akin to the law of the sea for the internet, especially as more and more international commerce comes to depend on it. But until then, we have a lot to keep us busy.
What’s your opinion on the scope and effectiveness of the responses by governments and corporations to these continuing assaults? Which side is winning the war?
I think I have to say that the adversaries are winning right now. Despite billions being spent on information security, breaches are happening every single day. Currently, the best the industry can do is be reactive. The adversaries have the perpetual advantage of surprise and are constantly coming up with new ways to gain access to sensitive data. Additionally, the legal system has a long way to go to catch up with technology. It really is a free-for-all out there, which limits the ability of governments to respond. So right now, the attackers seem to be winning, though we see Terbium and Matchlight as part of the response that turns that tide.
Terbium’s key product is Matchlight. Without divulging your firm’s methods or clients, will you characterize the key innovations that Matchlight embodies?
Matchlight is the world’s first truly private, truly automated data intelligence system. It uses our data fingerprinting technology to build and maintain a private index of the dark web and other sites where stolen information is most often leaked or traded. While the space on the internet that traffics in that sort of activity isn’t intractably large, it’s certainly larger than any human analyst can keep up with. We use large-scale automation and big data technologies to provide early indicators of breach in order to make those analysts’ jobs more efficient. We also employ a unique data fingerprinting technology that allows us to monitor our clients’ information without ever having to see or store their originating data, meaning we don’t increase their attack surface and they don’t have to trust us with their information.
How does Matchlight compare to other tools, defensive or offensive, that are available to target organizations?
There are many options for network and endpoint defense, and yet breaches are happening every day. As such, we take the position that it’s not a matter of if an organization is going to be breached, but when. In fact, often it’s not even a matter of when, but a matter of what might have already happened that an organization doesn’t know about. Our goal with Matchlight is to complement defensive technologies such as DLP with a tool that looks outside an organization’s network for the appearance of stolen data, alerting them when their defenses may have missed something and doing so in a fully automated, fully private way.
What are the benefits to a commercial organization or a government agency when working with your firm? What are the payoffs your solution delivers?
Typically, breaches are discovered by third parties such as journalists or law enforcement. In fact, according to Verizon’s 2014 Data Breach Investigations Report, that was the case in 85% of data breaches. Furthermore, discovery, because it is by accident, often takes months, or may not happen at all when limited personnel resources are already heavily taxed. Estimates put the average breach discovery time between 200 and 230 days, an exceedingly long time for an organization’s data to be out of their control. We hope to change that. By using Matchlight, we bring the breach discovery time down to between 30 seconds and 15 minutes from the time stolen data is posted to the web, alerting our clients immediately and automatically. By dramatically reducing the breach discovery time and bringing that discovery into the organization, we’re able to reduce damages and open up more effective remediation options.
What’s in the future for the service? How might the technology be applied to other content domains outside the Dark Web?
While Terbium’s immediate focus is on building Matchlight and collecting dark web data intelligence, our data fingerprinting protocol has a number of potential uses outside of our system. It’s an example of what the cryptography community calls a Private Set Intersection protocol. These are useful any time one wants to do private queries on a database, and our protocol is useful specifically in case where the database itself doesn’t need to be cryptographically secured. As such, there are examples in the world of eDiscovery, intellectual property enforcement, or privacy-protected data mining where our protocol could be useful.
Terbium is focused on encrypted text. Does this focus help or hinder your service’s growth?
Many search engine technologies mine user searches for information the search provider can monetize. This makes them wholly inappropriate for security applications, where the queries are usually highly sensitive. As such, our focus on providing provably blind, privacy-protected search capabilities makes our tool uniquely useful to those who want to search for sensitive data such as credit card numbers or client lists. We can search for data our clients would never be able to search for otherwise, and we do it in a way that clients can audit. That kind of transparency has been very appealing to our customers.
Can you describe the methods you are using to deliver relevant results to your users? How does a customer use your system’s output?
Matchlight alerts can be delivered in a number of formats, but usually our clients choose to interact with Matchlight via the API. Alerts can then be incorporated into larger automated processes within the organization. How our customers respond to alerts depends on both the types of data involved and the client’s internal breach remediation procedures. For example, when we detect a compromised credit card number, our banking clients can immediately and automatically disable that account. Other information, like trade secret information, is rarer to encounter but demands a more forceful response, often in the form of a legal or international trade action. Generally, we act as an early warning system, allowing our clients to initiate whatever breach remediation processes they already have in place more quickly and effectively.
What’s unique about Terbium’s approach and what competitive barriers have you erected to prevent another company from duplicating your service?
As I mentioned previously, Matchlight is the only fully automated, fully private data intelligence system. It allows clients to immediately identify elements of their sensitive data out on the dark web without having to reveal to us what it is they’re searching for. To our knowledge, there is no other comparable service out there, and we not only have patents pending on various aspects of our system, but we also have an existing index that would be hard to replicate. This past week our index topped a quarter-trillion items, and that is growing by billions per day.
One challenge to those involved with squeezing useful elements from large volumes of content is the volume of content AND the rate of change in existing content objects. What does your firm provide to customers to help them deal with the volume problem?
This is precisely the challenge in searching the dark web for stolen data and exactly the reason we built Matchlight. Right now, most stolen data is found by hand, much like when Yahoo catalogued the internet by hand, which is why it takes so long to discover breaches and why that discovery is often accidental. With Matchlight, we use cutting-edge big data technologies to automate this process, giving our clients access to a sophisticated, private, automated search engine in order to help them sift through such large volumes of data more efficiently.
What is the latency for index updates? How long does it take for your system to process a typical flow of content? What are you doing to reduce latency between indexing content and making that refreshed index available to your users?
Right now a typical alert takes about 15 minute to filter through our system. While that’s significantly better than the average of 200 or more days it typically takes to discover a breach, we’re constantly improving our system to speed it up and make it more efficient.
What features does Matchlight offer regarding alerts to users, report generation, and personalization within a work flow? Do you have an application programming interface? Do you support work flow components? Could you share an example or screenshots of the user interface?
The primary way to interact with Matchlight in an enterprise setting is via the API. The API allows us to easily integrate into other automated security processes such as breach notification or remediation, as well as with other systems such as DLP or DevOps tools. There is also a web-based user interface and email alerting, and we’re rolling out some more detailed reporting features over the next quarter that will make Matchlight even more useful.
There seems to be a popular perception that the world will be doing computing via iPad devices and mobile phones. My concern is that serious computing infrastructures are needed and that users are “cut off” from access to more robust systems? How does your firm see the computing world over the next 12 to 18 months? Is Terbium able to deliver outputs to mobile devices?
We definitely see mobile becoming more important, especially in the security space, where response times are key. Since Matchlight is a SaaS offering, the data is perfectly accessible via tablets and other mobile platforms.
Put on your wizard hat. What are the three most significant technologies that you see affecting your content processing system? How will your company respond?
On the positive side, big data technologies are still in their early days and only getting better. Many of the cloud and big data technologies that enable services like Matchlight didn’t exist two years ago, and we expect what we’re doing to get easier and more efficient as time goes on. Of course, we’re always worried about this sort of activity going deeper underground, but honestly, if we end up making it that much more difficult for the perpetrators to operate, that would actually be a win for our clients and for the community.
What is the entry-level pricing for your firm’s service? (A ballpark figure is okay).
We’re still in the pilot stage, so pricing is somewhat in flux as we get a better idea of how much data clients typically want to monitor, and how heavily their workflows hit our API. Right now we charge a monthly fee for a given amount of data under monitoring.
Where does a reader get more information about your firm and your products?
The best place to start is the website (https://terbiumlabs.com/). Now that we’re out of stealth mode, there is a lot of good information up there. There is also a contact form on the page; that’s a great way to get in touch with us.
Terbium Labs LLC is a company directly addressing the time, cost, and usability of sophisticated automated intelligence systems. The company has attracted the attention of a number of commercial enterprises and US government entities. The reality is that human-centric intelligence processes are too cumbersome to cope with the flood of digital information. Human analysts and investigators may not be able to explore the Dark Web without compromising an investigation. Terbium’s ability to acquire Clear and Dark Web content, make sense of those data, and then convert the results of the analyses is important. The fact that the company offers its technology as a service provides an alternative to on-premises installations and their costs. Also, Terbium operates at high speed. Instead of waiting days, weeks, or months for information about a loss of high value content, Terbium can deliver that information in near real time. This is a company worth learning more about.
Stephen E Arnold, July 28, 2015