Updated: January 1, 2021
Companies build software products. They release said software products into the wild. People start using them, and sometimes, there be problems. But how can companies really know that their products are working as expected? User interaction? Unreliable. Some form of automated mechanism that tracks software usage and reports data back? Yes.
Software telemetry is a relatively transparent way of collecting data about human-software interactions, with the noble goal of improving software products. Alas, if only the reality was so cuddly and naive. As it happens, humans quickly realized that data collected in this manner can be used for more than just improvements in the product. And thus, overnight, a noble goal became ignoble. So the question is, should you, as an end user, allow, encourage or accept telemetry in the software you use?
There's no harm in a little bit of spying, is there?
The answer to this question is both simple and complicated. Let's analyze the situation through an example. Say you have a word processor. No one will argue that this is a complex piece of software, with tons of menu options, buttons and functions. Give it to a hundred different users, and you will have a hundred different interaction stories, complete with their own efficiency, errors and usage patterns.
If there is a way to capture these individual experiences in an aggregate way, the company behind the word processor could then perhaps analyze the data and figure out if there are ways to improve their code. Now, the word improve is general. It could mean a bunch of things. For example, efficiency. This would mean perhaps repositioning menu options and buttons to make them more visible, accessible. Reduce the number of mouse clicks or finger taps. Allow faster usage of common options.
Another example would be to remove ambiguity in the UI and/or reduce errors. Prevent users from making obvious or non-intuitive mistakes while using the software. Prevent accidental deletion of data. Prevent actions that have a wrong outcome. And so forth.
Right now, none of what we wrote above is in any way wrong per se. It does not identify the user, and it's all about technical, even mechanical usage of the software (word processor), without paying any attention to the actual content written in the main window. So why would anyone object to this, right?
I have a need, a need for greed
As it happens, data gains value as it becomes more abundant. Because large data sets offer insights that may not be visible or possible when looking at small, individual sets of data. For example, you can't know that a function or a button in a UI is wrong unless you observe the activity of hundreds or thousands of users. Patterns start to emerge once you reach a certain scale, and then, you can have patterns of patterns, and so on. At some point, it becomes more than just what the software does.
At some point, telemetry becomes a mathematical understanding of human behavior.
In other words, if a million people all do this one thing in the software - if you control that one thing, in essence, you control what people do, including - just a WILD example - how they spend money.
And this is, unfortunately, where the model breaks apart. It's the moment when the understanding of the vast potential in indirect data measurements gives you the opportunity to directly shape the usage, and by proxy, govern activities that have nothing to do with the software itself. When you attach this potential to one of the most potent [sic] human emotions - greed - all good intentions just disappear.
This has happened countless time before. Technically-shaped products slowly morph into software designed to maximize profit (the numero uno goal of every company, especially publicly traded ones), where the primary decisions behind software changes are no longer just functional but also (mostly) financial. This is something that happens slowly, organically, even imperceptibly. The product still remains, of course, and companies do want to have happy users (arguably), of course, but that's not the only thing that matters anymore. There's a hidden force of profit changing things. As a banal example, you don't need 100% happy users right. You can have just 77% happy users, because the cost of losing 23% unhappy users is less than investing the time and effort in satisfying their every need.
And telemetry? Telemetry becomes the pen that writes the story.
Why should you care?
You could rightly say, as an end user, why should you worry about what the company does with this data? After all, you're using a product, you're happy, the company occasionally introduces improvements and changes, most of them are okay or at least they don't affect you that badly. No biggie. So some of your UI interactions are translated into numbers and vectors and whatnot and then sent to a computational machine somewhere that spits out recommendations. So what.
Well, now we get into the realm of privacy - and profit once again.
The main issue with your data (whatever it is) being sent to a remote location is that once it leaves your environment (your computer), you have no more control over what happens to the data. None. It could be stored and never used, aggregated and anonymized ten times over, deleted, used to calculate the mouse pointer velocity, who cares. You have zero control over that entire sequence.
Data also has a tendency to become bigger, more complex - and thus less private. While most people thing that the more data there is, the less visible they are (a drop of water in an ocean), the opposite is true. Every bit of data helps create a higher-resolution picture of what's happening out there. In fact, you can be "profiled" by other people's data. That is, if you know there are 100 people in a building, and you know what 99 people are doing except that one (privacy) freak, you also know what that one person is NOT doing. And that also helps you understand a great deal.
Data also has a tendency to be analyzed and analyzed and analyzed yet again - and often, the purpose of data analysis changes. We already have a bunch of laws and regulations designed to limit the processing of personally identifiable information - GDPR and CCPA. But there's much more to it. You don't need personal data. You just needs lots of any data. And with lots, you can create accurate understanding of systems (and people) without ever having observed them or been in contact with them. Great if somewhat extreme examples: WWII cryptography efforts, like the breaking of German and Japanese ciphers, and the reconstruction of the encryption tools used.
No need to go that far, though. Technically speaking, for instance, if you look at a person's browsing habits, even if you exclude any personal information from the equation - IP address, browser signature, browser window size, computer hardware, whatever, you can still identify people by their rather predictable usage patterns. This could be simple things like the order by which they open and access websites, or the fine timing in their mouse and keyboard actions when using software. This data is not associated with a name or a face, but it is associated with the human entity that so uniquely generates these signals. Think of your own browsing patterns. When you sit down behind a computer in the morning, you probably visit the same 5-11 sites every single time, often in a very predictable order.
We already have three problems - no control over data sent to a remote location, data analysis changes beyond the original intended purpose, and non-personal data can be used to create unique personal profiles.
And then, there's the last, the biggest problem - data leaks.
The IT Boogaloo
Welcome to the modern Wild West - the Incompetence Technology (IT) industry.
So. Data leaks fall into two categories - intentional and unintentional.
And the first one is not what you think.
The intentional leak is not someone uploading files to a share somewhere for everyone to see. No, it's the people who control the data deciding that the data can be used for purposes other than the intended usage, because why not, profit, greed, it's anonymized (technically), so there's no real harm in it, innit. There ain't no amount of technology that can prevent this - only regulations and laws.
Then, we get to the second leak - unintentional, and this is when data escapes the boundaries of its original containment, whether by accident, hacking, malpractice, or whatever. And this covers both personal and non-personal data, both data that has been analyzed for purpose and data that has been taken through a long chain of profit-shaping algorithms.
And THIS is where the root of the biggest problem lies.
The initial usage of data can be completely fine - voluntary, even desired, benevolent, intentional. The end state is that it ends up used on the black, gray, red, purple market somewhere somehow, because someone did or didn't do something. You might consider this the IT version of Godwin's Law:
The longer the data exists out there, the probability of it being leaked out approaches one.
We are all well aware of data breaches and leaks. They happen all the time. It's beyond absurd. And it will get worse and worse and worse and worse.
As I mentioned before - you have zero control over what other companies do - and what they do with your data. The only thing you can control is what data you share, with the awareness that once it leaves your computer, it will inevitably be used and re-used a billion times over, even in ways that companies themselves don't always know, understand or predict.
But telemetry shouldn't matter, should it?
I want a piece of that sweet pie
Well it does matter. Because, one, you are effectively allowing company to do research based on your "work" - should you not enjoy a slice of that? Of course. Hence, you have "free" products. Now, not all free products are there just to put you into a comfy virtual lab, but many are. However, if you're paying for a product, why should you then not be financially compensated for telemetry gathering - or allowed not to be telemetrized at all? Alas, quite often, even paid software often contains telemetry, and you can't always disable it, or be given discount or whatnot if you allow it.
Two, the data you provide - even if it's 100% non-personal - may end up being used to improve other things than just the raw functionality of the product. On its own, this is fine. But if you look at what happened in the IT industry over the past two decades, the quality of software hasn't cardinally improved - unless such changes directly improve the bottom line. In fact, we're often seeing the opposite. Functional regressions, which are made because they are profitable.
Three, the data you provide - even if it's 100% non-personal at face value - may one day be leaked or lost and end up being used for something neither you nor the people who want to collect the data could ever have envisioned.
So what's the solution?
Optional telemetry, of course, but again - this creates a selective and unpredictable reality. Ordinary people don't care either way, and nerds will always make deliberate choices that often have nothing to do with product or profit or anything else entirely.
But if you ask me, paid products should have the cost of testing and usage rolled into the price, and/or offer actual cash reward for people who do enable/allow telemetry. Otherwise, there should be no telemetry in paid products. Simple. Before software, companies would pay for market research, or have people sit down with pen and paper, and check what "users" did - hiring those people cost money. Well.
Free products - well, you gotta give something. Understandable. But as we've all seen, things have changed a great deal in the past fifteen years or so. For instance, what big tech companies used to do is not what they do today, even if their products remain seemingly unchanged. That's always a good indicator of what to expect.
Lastly, privacy - and long-term implications of data collection and sharing. This is something that does not have a price really. You could try to translate it into a monetary figure, but that would be just a guess. So the simpler reasoning here is - if you really care about privacy, and given the public track record of pretty much every company out there, then the prudent and logical thing is not to allow telemetry, even if you may end up with a product that is of somewhat lesser quality than it might otherwise be. But that's the price you pay for asserting control over your data.
From my article, you may wrongly assume that if we have no telemetry, products (including websites) would suffer greatly, i.e. the quality of service would go down as companies would not know what to do. That would be a good argument IF THE QUALITY OF SERVICE WAS GOOD TO BEGIN WITH. But it is not. Even with data collection and artificial intelligence and whatever else you want to name drop, most products are awful, badly designed, and shoddily tested. Because even if there's a framework of data in place, remember greed. It's easier to have a cheap good-enough product than an expensive excellent product.
This is why really high-quality, premium products cost a lot. A lot. This is something you can appreciate in day-to-day life, but is less evident in the software world. But the thing is, the Internet and by proxy, most of the software out there, isn't some elite club for connoisseurs. It's a cesspit, the lowest common denominator of news and socializing, in digital form. So don't be surprised by the dynamic of forces in this field.
Telemetry, like most digital solutions, started from a (noble) idea. But like most digital solutions, it's changed beyond recognition. And because the track record shows zero correlation between data collection and quality you get as a user (if anything, there's negative correlation as software is getting worse all the time), then there's no real reason for you to enable or allow it, privacy needs notwithstanding. And this be the end of this article. May the Borg be with you.
P.S. All of the images used in this article are in the public domain.