A magical promise of releasing your data and keeping everyone's privacy

Differential privacy is one of those ideas that sound impossible. It is a mechanism outputting information about an underlying dataset while guaranteeing against identification attempts by any means for the individuals in the data [1]. In a time when big data is so hyped on one hand and data breaches seem rampant, why aren't we hearing more about differential privacy (DP)?

I quote Moritz Hardt from his blog:

To be blunt, I think an important ingredient that’s missing in the current differential privacy ecosystem is money. There is only so much that academic researchers can do to promote a technology. Beyond a certain point businesses have to commercialize the technology for it be successful.

So what is differential privacy? First of all, DP is a constraint. If your data release mechanism can satisfy that constraint, then you can be assured that your data is safe from de-anonymization. DP came out of Microsoft Research initially and it's been applied in many different ways. There are DP implementations for machine learning algorithms, data release, etc. Here's an explain-like-I'm-5 description courtesy of Google Research Blog on their RAPPOR project, which is based off of DP.

To understand RAPPOR, consider the following example. Let’s say you wanted to count how many of your online friends were dogs, while respecting the maxim that, on the Internet, nobody should know you’re a dog. To do this, you could ask each friend to answer the question “Are you a dog?” in the following way. Each friend should flip a coin in secret, and answer the question truthfully if the coin came up heads; but, if the coin came up tails, that friend should always say “Yes” regardless. Then you could get a good estimate of the true count from the greater-than-half fraction of your friends that answered “Yes”. However, you still wouldn’t know which of your friends was a dog: each answer “Yes” would most likely be due to that friend’s coin flip coming up tails.

Googles Chrome uses RAPPOR to collect some sensitive data that even Google doesn't want to store because of end-user privacy risks [2]. With the use of DP, they're able to get access to some useful data that they wouldn't have been able to otherwise.

By this point, I hope you have a sense of what DP is and why it's useful. But how does it work? Luckily, I found out that Moritz open sourced his MWEM algorithm on Github. Then I spent a couple weekends deploying his Julia package and built a web application around it.

masking.io homepage

The site is live at masking.io (note the unsecure HTTP). Give it a try! It doesn't do much yet though.

masking.io is a weekend hack so I'm not sure if I'll do anything more with it. Email me if you think this can be useful to you. For now the app only takes binary values. So pretend your data are all Yes/No responses. The app can be patched to take in any numeric data. Moritz describes how to do that in his paper [3] and he's open to sharing his existing C# code for reference.

The way the web application works is by exposing Moritz's package as a Restful API using Morsel.jl. The frontend is done with Clojurescript's Reagent. I couldn't find any PaaS that can run Julia applications so I containerized the Julia part in Docker and deployed it. That was a bit annoying to do as I kept finding bugs and had to submit a few patches on the way. I guess not many people are deploying Julia applications yet.

The whole stack is open sourced: web application, DP micro-service, and MWEM algorithm. Let me know of your thoughts.

References:

  1. Ji, et al., Differential Privacy and Machine Learning: a Survey and Review, arXiv:1412.7584 [cs.LG]
  2. Erlingsson, et al., RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response, arXiv:1407.6981 [cs.CR]
  3. Hardtz, et al., A simple and practical algorithm for differentially private data release, arXiv:1012.4763 [cs.DS]

With thanks to Chris Diehl for bringing DP to my attention.

Difference Between Never Did It and Did It

I spent my whole Saturday designing a new homepage for this blog. Pretty proud of the result, I showed it to my wife. She couldn't stop laughing at how bad it was. Not willing to admit defeat yet, I took it to /r/design_critiques/ seeking help from this post: I've been told this looks like shit already. What can I do to un-shit it? A particular comment there (quoted below) led me to write this article:

The design is very boring and it makes you look boring as a result. In a situation like this I'd recommend to just use a good looking template: https://www.tumblr.com/theme/36149 is a good one for your needs. It's not productive at all to try and do something you're 1. not great at, and 2. not looking for work in. Hope this helps.

Is gaining knowledge by trying new things not considered productive anymore?

Before diving into that though, you're probably wondering what my awful experimental homepage looks like. I wanted it to convey a clean and succinct message but it came out more like a careless job done in 2 minutes. Here's a screenshot.

new homepage prototype

An excerpt from another comment in the same Reddit thread:

Just remember that just as you might make engineering look easy, designers make design look easy. Having Excel doesn't make me a data analyst just as having server access doesn't make someone a web designer.

Emerging from my startup experience, I learned fast to be the Jack of all trades. User experience design, customer development, product management ... I am not so naive as to think that I can just jump in and take over anything. That's not the point. The point is that having some hands-on experience opened my eyes to how wrong I was in thinking I knew what other roles actually entailed.

Take user experience design, for example. I used to think that you just use common sense, right? No, you need to understand the user, understand the system, then somehow bridge that gap between the two. Or sales. You just talk to a lot of people, right? No, sales is about understanding user demand and discovering how their needs overlap with what you can offer.

Everything looks easy from 30,000 feet up because you don't see the details. When you've never done something, you really don't know what you don't know. We don’t realize how we automatically make assumptions and over-simplify things we don’t fully understand as a coping tactic to fill in the gaps. That's a useful tactic in everyday life, as I really can't be bothered with all the details around me. But it’s not so useful when bootstrapping a business as you can easily get blindsided. Once you've done a new job or solved a problem once, you don’t guess or handwave your way around the details anymore. You become aware of what you are unaware of. That is the difference.

Anyway, I answered my own question on whether gaining knowledge by trying new things can be productive. Yes, it can. After that, you just need a bit of practice.

What is water: Avoiding a common pitfall to customer discovery

I've been doing customer discovery for a new venture that I'm investigating. That means going out there and talking to dozens of people in my target audience. The goal is to understand their needs and identify their pain points. I start the conversation with two questions: "what do you do?" and "what are the most painful parts about your work?" I learn something new every time I ask these questions. But there's a caveat to these customer interviews. For a person that has been immersed in their problems day in and day out, asking them to describe their problems is like asking a fish to describe water.

goldfish

We fell into this trap for our first product at Spokepoint. My co-founder spoke to almost a hundred target customers. Everybody said they had that problem and would pay for it when it is ready. We spent a few weeks developing a prototype to get user feedback. We made improvements. We then tried to sell it ... Nobody bought. "But if only you had these more features..." We pivoted away from that product soon afterward.

The obvious solution is to dig into the real, underlying problems that people really have versus the problems that they think they have. Unfortunately, I don't have a magical 3-step guide to read between the lines and know what people really want to say. This comes down to a matter of communication skills, experience, and hard work. We cannot solve this fundamental problem, but we did find ways to mitigate this.

One of the best methods that we had found useful, and with credits to our lean startup mentor, Spike, are a couple simple follow-up questions to screen out problems that don't really matter to people.

When people tell you they have a problem, ask them what is
their current solution and when was the last time they looked
for a better solution.

Like I don't enjoy having to think about what to make for dinner. My current solution is to make permutations of the same things. I never bothered to find a better solution because I don't really think about it anymore. It has become my water. More often than not, people will say that they haven't looked for another solution. Some even are not doing anything about it at all! Look for the itch that people are actually scratching. Don't ask people to describe water.

Moving on

When do you give up on a venture that's neither succeeding or failing? One thing that I learned from my stock trading days is to plan for the scenarios before you jump into a trade and then stick with the plan. Never make decisions in the heat of the moment. So I am sticking with my plan. I gave myself one year to try building a startup when we moved from London to Boston. The time is up and I have decided to move on.

Looking back, this is the most incredible and brutal experience that I've had so far. I am especially amazed that we got the helps that we did considering that Spokepoint is a bootstrapped company. I am very grateful for our unicorn designer+hacker Isaac Chansky, our brilliant software engineer intern Elizabeth Hagearty, and our Clojurescript/Reagent guru Tony Tam. Of course, the person that I learned the most from is my business co-founder, Dan Siegel.

Here's a screenshot of the current Spokepoint web application.

Screenshot of Spokepoint search

It has been a roller coaster year for me and the company. My co-founder will continue to soldier onward with the company. We both believe that there is something to what we're doing with Spokepoint. We just don't know how long that might take. I am moving on personally and wish him the best of luck.

I don't know exactly what I'll do next yet. I am exploring a couple directions at the moment. One lesson that I learned from this is to keep moving one baby step at a time.

←   newer continue   →