Monday, November 06, 2017

There's More to Sound Than Speech

Update: I got some beautiful comments on this, and I worked in some of the insights so I am not immediately out of date.

The combination of Big Data and an enormous user base is creating so much serendipity that Facebook has had to deny they are listening in to user's conversations to explain some spooky content appearing in user's News Feeds. Of course, what would literally have been a paranoid delusion up until five years ago--I have a copy of a friend's drug-fuelled diary of how a consortium of Disney and the CIA were bugging their home--is now reality: I have another friend who recently found out that they were the victim of a recurring home invasion by noticing new entries on the online log of queries to their Amazon Echo, that were made while they were not at home with the device.
I have interesting friends.

Our devices are now listening to us in order to serve us better and deliver us to more targeted advertisers and products, but I find it interesting how little they listen to: up to recently it has been only some magic word and then whatever the user says next. That's not real service, that's being dumb and robotic. Real service is about depth and anticipation, so if the next generation of listening robots is going to delight us, they should be able to deal with queries like:
  • What kind of bird was that?
  • Are foxes mating or is someone being killed outside?
  • Was that my car starting?
  • Holy crap, is that noise coming from the basement? (This may require having two or more listening devices in the home to do triangulation, but that is a) already true when people have a phone as well and b) technically a solved problem)
    Is it an emergency? Do you know well-rated emergency plumbers?
  • How many emergency vehicles did just come by the house?
  • Was that test of the Emergency Broadcast System scheduled?
  • Should the washing machine ever sound like that?
Especially for travellers who are new to the fauna, the city, and the AirBnB with its appliances and basements, these are the kinds of questions that need answers at 3 AM. 

Technologically, this requires acquiring and storing a huge library of data, something that FaceGoogAmApple excel at, and really good pattern matching, which is a goal they are mercilessly chasing as well. Is there a business model? Well, the easiest way to solve most problems these days is throwing money at them, and every one of these companies is trying to seek rent from enabling successful and efficient Money-throwing At Problems.

Evolution has run the experiment for a couple of hundred million years and has quite a definitive conclusion: sound is so superior as a warning system compared to sight that sensing vibrations is more ubiquitous than sensing light, and we don't get to switch sensing vibrations off the way we can close our eyes. Yet all the news coming from machine-learning pattern matching doesn't just make the community look like a collection of Family Guy cut scenes--Muffin or Chihuahua? Is my face gay? Behold my latest art nightmare--but it is also all visual. We're ignoring the best warning systems.

A decade ago I was exploring a service using mobile devices to keep children safe: upon encountering danger the mobile phone would go into recording and broadcast mode, notifying parents of location and danger. In order to switch to that mode the user would have to enter a specific key chord, but the necessity for that action was based on where technology was ten years ago. These days it shouldn't require touching the device at all, maybe just a keyword. But why even a keyword? A sudden spike in sound like a car crash or a raised voice of any kind or a gunshot, a sudden increase in heartbeat as recorded by the smart watch, sudden acceleration outside of habits like running or falling, all that should immediately trigger recording through all sensors, and broadcasting to trusted contacts or emergency services if shit gets serious (prolonged screaming or crying, a keyword uttered by the user, maybe even complete silence after an event?) Something that audio-recognisers can be trained on. 

It's not just the gunshot or the crash that is relevant, the moments leading up to them are as much as well. So shouldn't our mobile devices--battery allowing--be recording anyway to have a record after they keep us safe, and if necessary upload and notify? When something bad happens, when a car has hit you on your bike, pulling the phone out to record is hard enough, if even possible. It should be recording everything already. Which is basically what helmet cams are about, but they are missing the safekeeping aspect of uploading and are not always with you.

Our batteries and networks can't sustain these perma-vigilance models yet, not to mention the video will be mostly of the inside of pockets, but often audio is enough: I can't count the amount of times I wish could have just tapped my phone inside my pocket in some specific way, maybe hard with four fingers, or just said something like "Phone, keep!" to maintain the hysterical dialogue that had just happened between me and my friends. I also don't always have time to pull out a phone and Shazam, I'd rather tap hard or speak to keep the audio moment and ask the phone to identify the audio later, and not just for music. Google has noticed this need and released Now Playing on its Pixel 2 phones: the phone now hows on the home screen what music it hears, all the time.

Being recorded is not by itself a negative. It only get negative if you lose control of the recordings and they get used against you. Yet recordings can also keep you safe or exonerate you. The only time I did jury duty, the CCTV from the cameras pointed at the street contradicted the testimony of multiple police officers who swore they were telling the truth. It is also in the public record that the suspects were, after deliberations, found guilty by the jury of lesser charges than were supported by the police testimony alone. It proved to me that if you make mistakes, sometimes being recorded accurately is better than not being recorded at all and having to rely on human testimony. Being recorded is no guarantee for redress, as we see in the US where video of police officers killing or brutalising People of Color does not lead to convictions, but it is better than only having falsified police reports painting the worst picture of the victim. The existence of these videos creates a better chance of progress than not having been recorded at all.

We've always lived in a 'he said, she said' world (in any combination of genders), with certain he's and she's always having more power and being more believed than other she's and he's, until enough powerless voices show up with #metoo. Continuous recording should really be able to bring some equalisation to this state of affairs. If I am giving care and feeding to a recording device that is permanently on my body, I really want in return for it to actually have an answer to the up-till-now mostly rhetorical question "They did not really say that, did they?" without me having to ruin the moment by having to actually pull things out of pockets and find the right app and enter the right mode.

A huge hurdle to this is that, in many locations of the US at least, this would run afoul of wiretap laws. These laws, in short, come down to that you can't really record people without their permission, even often in public. Now Playing made pretty sure to stay on the right side of the law by loading 10000 song fingerprints on your phone instead of sending what it hears to Google for identification, so this is pretty serious. I personally expect that the legal boundaries of 'expectation of privacy' are going to change and we are just going to get used on being recorded in public, but we are definitely not there yet for this. Maybe our phones for now will get to listen, but not record.

The biggest hurdle here is not going to be technology, but legal and social. Which makes total sense, there's some real privacy issues here. Reality is that we are giving up on that privacy anyway for utility as Siri and Alexa know, and I can see more unexplored utility I would like. Because I really wanted to know if the washing machine in my AirBnb should ever have made that sound.