We live in interesting times, when it comes to user privacy and protection of data. Access to large amounts of our personal data has increasingly become the price we pay to use many online services — all too often without much knowledge of what data we’re giving up, or how it’s being used.
This has largely been unregulated, as the law has struggled to keep up with technological change. GDPR is the latest in a line of regulations that attempt to make the use of personal information more transparent. In many respects, it looks like GDPR may be the most successful.
In this post we’ll take a look at some of the changes that have come into effect, along with the approach we took as a company to assess and work towards compliance.
What is GDPR?
The General Data Protection Regulation (GDPR) is legislation from the EU which specifies the rights of every EU citizen when they interact with websites and services. That sentence alone includes an incredibly significant nuance: the regulation is applied to all companies that have EU citizens as users or customers. This includes companies, such as Razeware, that are based in the US.
The core tenets of GDPR can be summarized as follows:
- Collect only the data you need.
- Obtain consent to collect it.
- Allow users to download their data.
- Allow users to delete their data.
- Store all personal data securely.
- Use collected data only for the intended purpose.
- Don’t keep personal data longer than necessary.
- Take responsibility for 3rd-parties collecting and processing your users’ data.
These all seem like perfectly reasonable requirements for collecting personal data. But what actually constitutes personal data?
‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
— article 2, GDPR
Most significantly, anything that can uniquely identify a user is considered personal information, including things such as an IP address.
Handling GDPR Compliance
We started planning our GDPR compliance project at Razeware in March 2018, which gave us over two months to establish and execute an approach. We’d scheduled a week of engineering time to provide the highest-priority implementations, in an attempt to reduce the overall cost.
The first problem was establishing exactly what GDPR meant in the context of raywenderlich.com. This involved an extensive literature review of official sources and (surprisingly) informative blog posts. From this, we planned to start with a review of the personal data we currently stored and the processing tasks we engaged in.
There were three areas where we needed to audit the customer data Razeware collected:
- Operational data. Data in production databases required to run the current versions of the raywenderlich.com sites.
- Archived data. Backups of old databases from previous versions of the site. This also included full backups of our pre-AWS servers.
- Ad-hoc data. Lists of user details stored in Dropbox or Google Drive that were collected for a specific purpose, such as competition winners, or conference attendees.
We reviewed each of these data stores against the following simple criteria:
- Do we need it? Has this data outlived its usefulness? Keeping personal data longer than is required carries a huge risk with very little reward. This resulted in the deletion of a lot of archived databases and backups.
- Is it encrypted? Personal data must be encrypted, both while at rest and in transit. Some of our operational databases weren’t encrypted on disk, but since we use AWS RDS to host all our datastores, transitioning to an encrypted solution wasn’t too difficult.
- Should we have collected it? This is focused on the operational data. Are we collecting too much information from users? Having audited what we have, we were happy that we were being responsible about collecting just what we need to provide our services?
- How long do we need to keep it? If we’ve finished using data, we should delete it — which goes against every fiber in my body. Wasn’t cheap and widely available storage supposed to mean we’d never have to delete anything? However, this again plays on the unnecessary risk mantra. We updated our server log retention to 90 days, our database backups to 21 days, and now cleanse superfluous personal data from payment transaction records after 30 days.
We formalized this information into a register of processing activities and a privacy impact assessment. But far from being a complete and finished document, these documents were full of TODO items, notes, and questions. As we worked on the various sticky bits of implementing GDPR, we answered those questions and filled in the blanks as we went along.
It was quite an exploratory process; almost “design-by-discovery” if you will, and involved a good dose of pragmatism at every stage. The key challenge was to balance business decisions and goals, which we clearly have a vested interest in, with the spirit of the legislation.
GDPR includes a set of required functionality that should be offered to users of your service:
- Right to be forgotten. At any stage, a user should be able to delete their account, and all data associated with it. We already offered functionality very similar to this, which anonymized data. We’ve automated this manual process as an admin function, which deletes all non-critical transaction data. We have to retain records of transactions to maintain data integrity of our own records, but all personal user information will be stripped out.
- Right of data portability. A user should be able to request to download all their data and take it to another service provider. The service provided by raywenderlich.com isn’t really transferable, but downloading your data is a sensible functionality to offer. We’ve added this as a self-service function on the accounts site. It pulls in data from our store records, in addition to the account data itself. In the near future we’ll be enhancing this with additional data from our new raywenderlich.com frontend, and hopefully the forums in the future. The download is a ZIP file containing several JSON files, each of which represents a different source of personal data.
- Right to see all collected data. Although this is specified in the legislation, we consider the download to be a suitable facsimile for our users.
- Ability to correct incorrect personal data. Since we don’t collect a vast amount of personal data, this is limited to the user profile. We’ve had functionality to update this ever since we introduced the concept.
- Restricting visibility of personal data. Within the small scope of the raywenderlich.com sites, we interpret this as the ability for a user to make their profile page private. This is currently not possible, although users do have precise control over what appears on their profile page. Choosing to hide your user profile will be a feature of the upcoming front-end relaunch.
These features are all things we’re very glad to add. They symbolize a level of transparency we’re very proud of at Razeware. It is worth noting that we don’t consider these features complete, and they will continue to evolve as we redefine the data we collect from users and add features to improve the user experience across raywenderlich.com.
One of the most challenging aspects of our compliance journey was our audit of third-party services. GDPR is very clear in that it expects first-parties (i.e., us) to take responsibility for compliance of the third-party data collectors and processors.
The first aspect to this audit was to simply list the third-party components we use — which was more challenging than it should have been. As raywenderlich.com has evolved over the years, we’ve added bits and pieces to better serve our users. In fact, we’ve had a long-running project to unpick complexity on our WordPress install for the past two years. But still, it took some time to dig in and figure out exactly what components were underneath all of our sites.
At this point we took advantage of legal advice to help clarify our use of third-parties. This resulted in three distinct categories of usage:
- Consent. We can collect personal data – provided we have explicit consent. This is stricter than it first sounds; it’s no longer acceptable to have blanket consent for all personal data. Instead, you need specific categories of consent for different data collection and sharing purposes. Our use of Google Analytics and Facebook Pixel fall under this banner: we don’t require them to provide the services on our site, and therefore users should be given the choice to accept or reject them. This was a challenging piece of technical work, since we use Google Tag Manager to handle our tracking tags. We’ll write about the solution in an upcoming blog post.
- Contractual Necessity. We require these providers so that we can fulfill a contract that we have with our users. This forms the basis of our integration with our payment provider Paddle.
- Legitimate Interest. This is a nuanced category, and refers to data sharing required for the site to function. For example we use BugSnag, Skylight and AWS to ensure the smooth functioning of the site. If we removed them the site would cease working. We therefore had to review each of our providers on a case-by-case basis, establishing exactly what data they collected, before making a decision to keep them or toss them.
As part of this process, we actually decided to cut back on some of our third-party providers, to reduce the sharing of our users’ personal data:
- OneSignal. This service provides browser notifications of new posts. Unfortunately, their business model involves far wider access and reselling of user browsing behavior than we are happy with. We’ll be removing the browser notification functionality for the time being, and investigating alternative options in the near future.
- External advertising. We use an ad-provider to serve third-party ads on raywenderlich.com. We’ve decided that once we transition to the new frontend in the near future we’ll be removing these ads to provide a better experience for users, and to reduce the data our site provides to third-parties.
This GDPR compliance process has resulted in some user-facing changes that you’ll see across the raywenderlich.com sites:
- Consent-based tracking. You’ll also see a popover box asking you for consent to track you with Google Analytics and Facebook Pixel. We use this to work out what content is popular, and also to try and sell our great products. We’d very much like to continue collecting this data; signifying your consent is as simple as dismissing the box.
- Download your data. If you visit your account management pages at accounts.raywenderlich.com you’ll see a new button that will allow you to request to download all your data. We’ll be adding features to this download as we improve raywenderlich.com.
It’s all too easy to criticize the introduction of such a wide-reaching law, especially when your company isn’t even based in the appropriate jurisdiction. But the spirit of this legislation is wholly in the right place, and deserves attention. Company compliance results in customer personal data being handles with the respect it deserves.
We had to make a lot of judgment calls as part of our compliance exercise, balancing our understanding of the law with business decisions. We’ll continue to exercise this pragmatic approach as we continue to develop the site, and as we learn over the next few months how other providers are handling their own GDPR compliance.
Writing code, solving problems and entertaining the masses