A witty and mind-expanding exploration of data, with mathematician Dr Hannah Fry. This high-tech romp reveals what data | dG1fVE9MQ251bXZGdmc
Category
🎥
Short filmTranscript
00:00 [MUSIC PLAYING]
00:03 What it means to receive a signal
00:10 is to be less uncertain than you were before.
00:13 And so another way to think of measuring or quantifying
00:16 a signal is in that change in uncertainty.
00:21 Using Shand's mathematics to quantify signals
00:24 is common in the world of complexity science.
00:27 But it's rather less familiar to historians.
00:31 I love maths.
00:32 I love its precision.
00:33 I love its beauty.
00:34 I absolutely love its certainty.
00:46 And that, Simon, can bring that mathematical world
00:52 view, that mathematical certainty, to what I work with.
00:57 The reason behind this remarkable marriage
00:59 between history and science is the analysis
01:03 of the largest single body of digital text
01:06 ever collated about ordinary people.
01:10 It's the Proceedings of London's Old Bailey,
01:13 the central criminal court of England and Wales,
01:15 which hosted close to 200,000 trials between 1674 and 1913.
01:23 There are 127 million words of everyday speech
01:28 in the mouths of orphans and women and servants
01:32 and ne'er-do-wells of criminals, certainly,
01:36 but also people from every rank and station in society.
01:40 And that made them unique.
01:43 What's exciting about the Old Bailey
01:45 and the size of the data set, the length,
01:47 the magnitude of it, is that not only can we detect a signal,
01:51 but we're able to look at that signal's emergence over time.
01:57 Shannon's mathematics can be used
01:59 to capture the amount of information
02:02 in every single word.
02:04 And like the alphabet, the less you expect a word,
02:07 the more bits of information it carries.
02:11 So, imagine that you walk into a courtroom at the time
02:14 and you hear a single word.
02:16 The question we ask is, how much information
02:19 does that word carry about the nature of the crime being tried?
02:25 You hear the word "the."
02:28 It's common across all trials,
02:30 and so it gives you no bits of information.
02:34 Most words you hear are poor signals of what's going on.
02:39 But then you hear "purse."
02:42 It conveys real information.
02:46 Then comes "coin," "grab," and "struck."
02:51 The more rarer a word, the more bits of information it carries,
02:56 the stronger the signal becomes.
03:00 One of the clearest signals that we see in the Old Bailey,
03:03 one of the clearest processes that comes out,
03:04 is something that is known as the civilizing process.
03:08 It's an increasing sensitivity to and attention to
03:13 the distinction between violence and nonviolent crime.
03:20 If, for example, somebody hit you and stole your handkerchief,
03:25 in an 18th century context, in 1780,
03:28 you would concentrate on the handkerchief,
03:30 more worried about a few pence worth of dirty linen
03:35 than the fact that somebody just broke your nose or cracked a rib.
03:38 The fact that 100 years later, by 1880,
03:43 every concern, every focus,
03:45 both in terms of the words used in court,
03:48 but also in terms of what people were brought to court for,
03:51 focused on that broken nose and that cracked rib,
03:54 speaks to a fundamental change in how we think about the world
03:59 and how we think about how social relations work.
04:03 Look at the strongest word signals for violent crime across the period.
04:09 In the 18th century, the age of the highwaymen,
04:12 words relating to property theft dominate.
04:16 But by the 20th century, it's physical violence itself
04:20 and the impact on the victim that carry the most weight.
04:27 That notion that one can trace change over time
04:30 by looking at language and how it's used,
04:32 who deploys it, in what context,
04:34 that I think gives this kind of work its real power.
04:38 There are billions of words.
04:39 There's all of Google Books.
04:41 There is every printed newspaper.
04:43 There is every speech made in Parliament,
04:46 every sermon given at most churches.
04:49 All of it is suddenly data and capable of being analyzed.