Why is it important to minimize data risks before knowing what they are?

By designing software where data privacy and protection are foundational elements, it reduces opportunities for bad actors and lowers risk, rather than just responding to threats as they appear.

What is Privacy by Design and how does GDPR relate?

Privacy by Design is the principle of embedding privacy features into products and systems from the outset, not as an afterthought. GDPR (and similar regulations) enforces the requirement to consider data protection and privacy in software systems.

How can engineering teams ensure compliance with privacy regulations?

By understanding the classifications of data being collected, applying the legal responsibilities for handling, processing, and storing data, and leveraging automated tests and monitoring to demonstrate compliance.

What are key principles of Security by Design?

Key principles include taking a proactive approach to data protection, making privacy the default, embedding privacy in products, securing the entire data lifecycle, and focusing privacy-related decisions on user needs.

How can teams minimize personal data collection?

Only collect personal data when absolutely necessary for a core business process. Evaluate if data needs to be stored or just used transiently, and consider alternatives such as sending data to third parties and only storing a reference.

Are IP addresses and API logs considered personal data?

Yes. IP addresses and any data that can associate a user with their activity is considered personal data under many regulations, and should be protected accordingly (e.g. mask, hash, or avoid storing when possible).

What is data masking and when should it be used?

Data masking involves modifying personal data (such as removing parts of an IP address or hashing data) so that individuals can't be directly identified. It should be considered when storage or processing of data is necessary, but linkability should be minimized.

What if teams lack dedicated security engineers or architects?

Pragmatic thinking about security and privacy can be fostered through collaboration, using 'Security Champions' or similar roles, and embedding responsible practices throughout the software development process.

How does shifting left help with privacy and security?

Shifting left means considering privacy and security issues early in development—during design and planning—so that concerns are addressed proactively rather than as an afterthought.

what if it was your data

Key Takeaways

Privacy by Design means building privacy and security into systems from the start, not as an afterthought.
Minimize data collection — only gather what is critical for user journeys or business processes.
Mask or hash sensitive data wherever possible to reduce risks of identification.
Shifting left on privacy involves everyone, encouraging proactive security thinking early in development.
Evaluating the real need for each piece of data leads to simpler, safer software.

Threat modelling is a fantastic approach to identify, prioritize and then mitigate the potential threats to a system and unnecessary or unauthorized exposure of its data.

But what if you could minimize the risks before even knowing what they are? What if you built your software in such a way that the separation of concerns gave less tolerance to bad actors trying to exploit your system and that data privacy and protection was not an option presented to the user, but a capability that your system was built from the ground up with in mind?

I bet you’d be more inclined to entrust such a platform with your data.

Looking at Privacy by Design, it’s often bourne understanding the boogeyman of all stakeholder maps: regulatory bodies. This includes (and is what I’ll be focusing on as a once citizen of the EU, and since the UK government essentially just copied GDPR and signed it into law as the new Data Protection Act) GDPR, or the General Data Protection Regulation.

I won’t go into the definitions of what is personal data, PII vs SPI etc, just for the sake of brevity - however, teams which are building software should be aware of the basic classifications that the data that software is collecting is falling into and at the very, very least - the legal responsibilities and implications for handling, processing and storing that data.

Since regulation is a key consideration in all enterprise software design, across industries. Most of the ones that the public will have heard of them focus on security and privacy, and there’s good reason for that. Big companies have a horrible track record of over collecting data, not even using it, then letting someone break in and ransom it back to them for eye watering sums 20 years later, just releasing it to the dark web for anyone who knows how to go and get or informing the company of a vulnerability in reward for some cash to shut up about it until they fix it (that’s probably the difference between chaotic good and chaotic evil)

Compliance within this context sounds very legal and jargon-y, but it’s not so far from what we as engineering teams do: we use automated tests to show that our code is compliant with our functional and non-functional requirements, and use monitoring to help visualize and demonstrate that compliance with our goals and objectives.

As with most objectives enabled by the shift left approach and Garage Method, this usually means taking a pragmatic approach to a subject. There are plenty of ways to formally measure data protection (you might’ve came across a DPIA if you have worked with GDPR at that level before) - but this doesn’t enable pragmatism, it’s usually reactionary and it’s always done wayyyy too late in the lifecycle (even in the design lifecycle) to be properly impactful and effective in influencing how the system is built.

Embed thinking about privacy not just from a system point of view, but from an end-user point of view (nb: those end “users” might be other systems). There are a few key principles of Security by Design, and they support my thought process so far;

Take a proactive and preventative approach to data protection
Make privacy a default setting through UX design choices
Embed privacy into products, rather than as an add-on
Secure your entire data lifecycle
Make privacy-related decisions user-centric

Ideally, you should follow the “avoid this hazard” signs on the road when it comes to collecting personal data. Dont. Unless it’s absolutely critical to a user journey, or business process. Of course there are requirements where personal data is on the critical path, delivery addresses for example are an easy one. You need to capture it and use it to satisfy a business process (package delivery) as part of a user journey (customer has ordered a cake) - but do you need to store it? Can you use it as part of that journey and forget it? Can you send it on to your fulfilment partners (with your user’s consent, of course) and then store a reference number for the delivery, perhaps? This sort of thinking enables you to avoid the headaches of dealing with user data in the first place.

You might want to also take a step back - think about what personal data means. Think about what could identify a person or a user who visits your site. Sure you might not store their delivery details as part of the fulfilment, that’s someone else’s problem. But what about your web server fronting an API, for example. It has access logs, right? And those access logs show the source IP address, right? It shows a token that associates a given request with a user (maybe a JWT), right?

Of course, it does - now you’re collecting personal information. A bad actor could potentially look-up the user associated with that request and use the IP geolocation to map users to addresses. It might sound far-fetched, but it has happened - and it is an example of “storing” personal information, let’s not open the wormhole that is browser fingerprinting, but hopefully you get the idea. You can mitigate this by looking at examples related to this and masking the data, e.g. storing the IP address but masking the last octet, hashing the IP address and storing/using the hashed form (as long as it’s a one-way, crypto secure hash). In most cases, masking the data and storing or processing it without a link to the user’s individual record, or to any other data which could identify the data user, usually puts you in a much better position.

Masking data once collected is a solution, but try to identify the need (or the problem) first - do you really need to store that IP address? Use the “5 why’s” method and really ask yourself and your team if it is critical to the scenario.

Not every team (or even every organization… eek) has security engineers or security architects embedded, while that’s not ideal, it’s also not essential. Pragmatic thinking about security and privacy can be enabled in other ways (a Security Champion maybe?), but is always achieved through collaboration. Shift left and accept the fact that developing software and systems that handle data responsibly and keep it safe from unauthorized actors is something that must begin at the earliest stages of development - not as an afterthought)

You should always look at the why when it comes to collecting data - add a procedure in the development process that evaluates the scope of the data your application thinks it needs access to. That evaluation should help you to arrive at simpler data models during development, and that achieves data minimization. This means looking for ways to build your software while using the absolute least amount of sensitive data that you can. Think about data as part of your feature breakdowns and when discussing your behaviour driven scenarios, think from the point of view of a user of your system or product and how they might react to requests for data - even if it seems sane to you as a designer or developer, it might not even seem clear to a user why that data is required. (e.g. “does an app really need my driving licence number to sign me up? Isn’t there a less intrusive way to confirm identity?“) Capturing and reacting to these thoughts will decrease your application’s attack surface for free and give you fewer things to harden in your code.

Altering the thought process to put data protection and privacy first isn’t hard. We all do it every day. Every time we fill out a questionnaire, or make up a fake email address to sign up to a Wi-Fi hotspot. As consumers, we are guardians (or not) of our own data - so why as engineers and designers, can we be so negligent in the safeguarding of others.

By shifting left, everyone will better understand the security and privacy concerns of your application. Technical and non-technical folk alike will start to think about how they would interact with the system if it was their own data and everyone can contribute in small but way but meaningful to enhancing user privacy and creating a safer digital world for all. At a time when society’s reliance on digital channels is so extreme and where data is (sometimes literally) worth more by weight than gold, it’s a common goal that everyone should share.

what if it was your data

Table of Contents

Key Takeaways

Related Posts

an ai-first strategy?

scaling your impact

a new contender in ai-powered coding assistants

ai facelift