Shelf Love

What Books Do, Empirically Speaking with Dr. Andrew Piper


Short Description

Data scientist Dr. Andrew Piper joins Shelf Love to share how data science can help the romance community answer the big questions that close reading can’t answer. Andrew’s the director of McGill University’s .txtlab, a laboratory that uses machine learning to ask questions like why do people enjoy the work they love? And once we empirically quantify what’s going on here, he asks us to think about what we’d like to do about it.


Tags

business of books, scholarly


Show Notes

Data scientist Dr. Andrew Piper joins Shelf Love to share how data science can help the romance community answer the big questions that close reading can’t answer. Andrew’s the director of McGill University’s .txtlab, a laboratory that uses machine learning to ask questions like why do people enjoy the work they love? And once we empirically quantify what’s going on here, he asks us to think about what we’d like to do about it.

 

Guest: Dr. Andrew Piper

Website | Twitter | Enumerations: Data and Literary Study

Andrew Piper is Professor and William Dawson Scholar at McGill University. He is the director of .txtlab, a laboratory that uses machine learning and data science to understand literature and culture.

Shelf Love:

Join the Conversation on Discord: https://www.patreon.com/ShelfLove

Tweets discussed in this episode:

@katrinaJax: “is it me or are there so many more white romances this year and being announced? like... a lot...”

@momonoki8: Who is critique use of blonde, pink lips, thinness, small waists and blushing cheeks etc in contemporary white-led romance novels? @ShelfLovePod


Transcript

Andrew Piper: [00:00:00] What's happening when prestigious authors give other prestigious authors prizes. They're trying to circumvent the economic system. They're saying our stuff isn't being bought, but it's still important. And romance is an interesting case because it's stuff that is very widely read, very popular. And for reasons that, as you were saying, like to the community will feel very frustrating, it's like, why does nobody like us? The system of consecration is essentially designed to run away from popularity.

If you're popular, you're going to lose the consecration game. It's a resentment system that says well, lot of people bought it. It must therefore be bad.

Andrea Martucci: Hello, and welcome to Shelf Love a podcast and community that explores romantic love stories in fiction across media, time and cultures. Shelf Love is for the curious and open-minded who joyfully question as they consume pop culture. I'm your host, Andrea Martucci. And on this episode, academic and data scientist, Dr. Andrew Piper is joining me to discuss fictions functions and what machines can learn from the books people love.

Normally this is where I would ask you to introduce yourself and give some context for who you are today, but I'd like to impose a narrative discourse on this conversation. So I'm actually going to ask you instead to introduce us to who Dr. Andrew Piper was prior to the turning point that led you to explore data science.

Andrew Piper: Good question. And in the spirit of your question, I think I'd take it way back. If we're going to do narrative. I'd go back almost to the beginning. So, one of the things you should know about me is that I was not a great reader as a kid. There's a kind of famous family photograph of me sitting sulkilly on a couch with a book, held high over my head when I was a little kid where I was clearly being punished and being forced to read.

And stories, interest in literature that was not something that came naturally to me when I was growing up and besides being active and sporty, much more of a math and science kid. And then at a certain point when I got to university, I started to explore lots of different kind of courses and different options.

And I ended up in courses where I started to understand that the point of literature wasn't exclusively to figure out like what it meant. The thing that I'd always struggled with was these kinds of imposed meanings, that there was something important about this great book that I was supposed to know. And it never was at all clear to me how I was supposed to know that in advance.

But I started to be taught in ways that to see literature as something more of a set of patterns and an art form that, artists and authors they are doing things to draw our attention in particular ways to stories.

And what I was learning was that we could as readers really pick up on and start to appreciate. And that made a lot of sense to me and my pattern recognition brain and, someone who loves to be outdoors. And what I find so enjoyable about being outdoors is feeling all of those patterns around you, the colors and the sounds.

And so that's just to give you some sense of my entry into literature, even before data science was complicated and not straightforward. And I think that will explain some of my embrace of data science when we get further into how this happened. But the very first thing to [00:03:00] know is that it took me a while to get into literature.

And then when I did, what really interested me about it was the relationship between technology and literature, that I was a historian of the book in particular. And so I was drawn to this very long history of the way writing and the technologies of writing impacted how we express ourselves.

So computers just seemed like a natural next step. But in many ways it was also a nice way of closing the circle. So, I could take that early math and science brain of mine that I had cultivated as a kid, and then joined it up with this kind of later adult interest in reading and literature.

And for me, it was a kind of wonderful opportunity to join the two back together.

Andrea Martucci: That actually makes a lot of sense. And I love that you're talking about pattern recognition here. So you find yourself doing your undergrad work, graduate work in, remind me the field of study something about romanticism and perhaps Germanic...

Andrew Piper: Yeah. Yeah. So I was technically trained as a Germanist as well as a comparitist. So I'd always sort of worked in a kind of comparative European literary tradition. And that my particular historical specialty was the 18th and 19th centuries. In particular, this kind of pivotal, transitional moment that we call romanticism.

And the thing that drew me to romanticism was not actually the traditional ways that people talk about that period, which is a sort of revolt against the enlightenment. But far more to me that was a real transformation or transformative moment in the history of reading in the history of writing technology. So this is when you move from hand presses to steam presses, when you start to get penny newspapers and lots of people reading, and you get A lot of new kinds of readers in particular, young women reading voraciously books that people don't actually want them reading. And so you get a lot of social changes around that time.

And that became really interesting to me. So that was sort of my lens on the history of reading focused on that kind of pivotal moment.

Andrea Martucci: And so you focused a lot on the technology and how that was impacting perhaps accessibility to new readers, which then started to impact the work that was being published. But the field that you studied in I got the sense from some of your writing that you noticed that the field approached literature in a way that was either different from how you thought about it, or you started to notice some weaknesses with that approach.

You spoke on a podcast about how you noticed that charisma seemed to be rewarded over empirical certainty and that there were these really generalized aggregated summaries in literary studies. And you noted that there were both ideological and methodological barriers that seemed to be creating that particular way of analysis in literary studies.

Andrew Piper: Yeah, I think the other part of the story, you know, so if we, step back to my narrative of transformation, you know, how do I go from being someone who's studying the history of books to studying the history of literature in very traditional ways. I would spend time in archives. I would look at lots and lots of books. Lots and lots of books and gradually build up insights, conclusions about what I thought was happening during that period.

And as the expression goes, not that there's [00:06:00] anything wrong with that, but one of the reasons that I later became excited about data science and this was not actually what drew me to it initially, which is to say when I initially started working with computers and machines, reading, what I was really excited about was the fact that they were reading in a way that I read or I thought so to me, computers were never a strange way of reading.

They really were about detecting these patterns and trying to draw attention to what authors were doing in ways that we might not notice. And we might not be able to just support with our kind of, you know, when you pick up a book and you don't see all the words, right? You focus on certain ones versus others. And machines could help bring into sharper relief, maybe what was going on the page.

And that seemed very exciting to me. But over time, I began to see that one of the values of data science for a lot of fields is that it gives you this much stronger empirical foundation to make judgements and assessments about the world. Whether it's about whether to take a vaccine or, whatever it is.

Data science can be a very useful tool for making judgments about the world. And I had come to this realization and I think it's something a lot of people in academia struggle with, which is these problems of prestige and charisma that seemed to drive a lot of the conversation. And I'm sure that's true in so many industries in so many people's lives.

And it's obviously incredibly frustrating. And you know, when you're a new entrant to the field, or if you're coming from a place that doesn't have as much cultural capital as someone else. How do you get attention? How do your ideas enter into the conversation? And, now we can look back and say, empirically, there's enormous amounts of bias in the academic system towards institutions that are more prestigious and people from those more prestigious institutions.

But that was just an intuition I'd always had in the field. And it always made me feel ew, it was. It was not a happy space. And what data science lets you do is it says, look the thing I'm pointing out needs to be true for everybody. And not just for me because I, said it.

The real pivotal moment is the history of criticism is people who are eloquent, who have interesting ideas saying: this is what really matters. And you should believe me, you should trust me and you should follow me. What data science says is this is what really matters because that's what's actually happening out in the world and we can observe that empirically and we can use these interesting new tools and models to to study that.

And you don't have to take my word for it. You too can do the same thing. And if our findings and results don't match well, now we've got a problem right, now we've got something to discuss and debate what went wrong there. But rather than it be my opinion versus your opinion. It's really about building up a consensus about what does the evidence say about how people have written today, in the past, over the millennia of human writing and expression. So it's a, really powerful way of getting us to ground our observations in data. And for me as an academic, who's always been a bit frustrated by the kind of like charisma system in my field. It was very welcome in that regard.

Andrea Martucci: I think that what you were saying there about being able to ground your arguments or your points with data, it sounds like what you're enabling is a more productive conversation because you may disagree with somebody's [00:09:00] methods, but you can look at each other's data. You can look at each other's methodology and you can have a conversation about those things, instead of just saying you're wrong because you can read this passage and it says this, and it just starts turning into kind of an unproductive or pedantic, you know, it's very hard to see eye to eye.

Whereas at the very least, if you have a methodology and you want to have a discussion about well, I think you're biasing the model this way, because of the way you do this, at least it's grounded in, see when you do it this way, you get this result. When you do it this way, you get this result, see how that makes an impact. It's moving out of just pure opinion.

Andrew Piper: Yeah.

Yeah. That's really its strength. It really tries to pull us into a world of consensus. What do we believe collectively is happening in the world? What can the data show us and what can't it show us? We spend a lot more time in data science thinking about limitations.

If you read traditional humanist work, they make awfully large claims and never hesitate. There is no doubt on the part of when you read famous critics on what they believe, what they are saying is absolutely true. Whereas the culture of data science is to really start from we know it as far as we can with these conditions and these caveats and let's work on those, let's reduce the amount of error or uncertainty, but let's also accept that there is still a great deal of uncertainty or error to our methods and that we can improve those.

So I really liked that it puts the focus on collaboration and consensus rather than, my argument is better than your argument or, my voice is louder than your voice.

Andrea Martucci: And just for a little terminology 101 can you define what you mean by empirical?

Andrew Piper: Sure.

Empirical is just a kind of evidence-based form of presentation of information, Where you're measuring something out in the world and trying to use quantitative methods, data-driven methods to make judgments and inferences about how things are happening in the world.

So when we talk about empirical methods we're using experimental methods, we're using data and different kinds of setups so that we can be confident that when we see something happening. The example I always trot out in class, which is given that we're in the middle of the pandemic, is how do we have confidence with vaccines help reduce risks of severe illness or death?

The way we do that is we get two groups together and we give one group the vaccine and we don't give the other group the vaccine, and then we see how they do down the line. And then that gives us confidence, we can observe what's happening there given a sample of people, right? So the way you think about empirical methods is you start with a sample of what we call a population and you say, okay given this group are they doing something?

Is something having an effect on this group that I'm trying to study? And that we can get more concrete later, the basics of empiricism is that you're trying to use observations and data to arrive at judgements about the world.

Andrea Martucci: Some of the methodological limitations with literary studies is that if you are talking about a corpus, you're talking about a body of work that if you're talking about books, it's very hard for one person or even a group of people to examine closely a huge [00:12:00] volume of books, right?

So you're talking about a book that's 80 to 120,000 words long times a significant enough sample size that you can draw meaningful conclusions doing deep analysis. And thinking about my own experience reading romance novels, it's really clear to see how much the sampling bias of the work that I've encountered impacts what I believe romance is, what the boundaries of the genre are and what patterns I see are heavily influenced by what sub genres I read, which authors I read, et cetera.

And so I think that this isn't necessarily a failing of individuals. It's just that without the technology to do large scale analysis of texts, you are necessarily limited as a researcher to the work that you encounter or the work that you select. And I've seen lots of debates about the best way to select the body of work that you're going to work with.

And honestly, there's just always so much inherent bias that goes into that. What are you privileging and what are you leaving out

Andrew Piper: Yeah. It's a massive problem in some sense when we get down to it. For starters, yes, this is one of the fundamental limitations of traditional methods in the humanities and literary studies in particular.

If we're a critic and we want to say something about what's going on in romance today, you will read as many as you can. And, to be fair to the readers out there I'm one of them is we read a lot, right? And we say, okay I've read a lot and I'm an expert in this domain. And so I have a pretty good sense of what's going on, but as you just pointed out the risk is the moment I start to say this is what's going on generally with romance, everything I know is from the bits I've read and the bits I focused on of what I've read as you pointed out, if a book is a hundred thousand words long, it's very hard to give your attention to all a hundred thousand words, right? It's very hard to see everything in this highly multidimensional, highly complex object that's going on.

So your biases are kicking in too, what are you noticing? What are you focusing on? And so it's just bias on top of bias on top of bias. And ultimately what comes out is, something that you feel like you want to say. That may or may not be a good representation of what's going on out in the world. And there's really no way to know. That's the problem. You could be right. You could be accurate, you could be completely biased. And this is just, the thing you came up with this morning and why should we listen to you? And what data science does is it basically says, okay, let's make everything as transparent as possible.

Let me tell you exactly which books I'm going to test, how we collected them, where they came from, what we think they may or may not represent. What biases may be included in what we call that sample, right? The sample bias is a classic topic that people love to talk about in data science and then let me study something of interest.

Someone may have made an argument about something. And so then we'll go ahead and look more closely at it and see if we can detect that reliably across our whole sample. Is that something that's really going on or is it just located in a few books, right? Is it located in a few specific books that the ones that you happen to notice, but when you go to the other ones, not so much.

A really nice example I [00:15:00] came across recently, this stuff happens all the time in the press. So if read an article about, what's going on in fiction today in like the New Yorker or something like that, like these highbrow things, and someone was saying, oh, everything's about trauma narrative.

I would suspect it's kind a relevant topic for romance in different ways. But critic was really upset that everything that they seem to be coming across was invested in a kind of trauma narrative, but that was the driving narrative of the day. And I thought to myself, maybe that sure.

When I read Twitter, when I read stuff, it seems like the word trauma comes up a lot, for sure. And there are certainly some shows I can think about that foreground this, and certainly some books. Would it be the case if I just randomly went around to all the big publishing houses' lists and grabbed books off their lists, that a significant portion, significant minority of them would be classified as having something to do with trauma as a trauma narrative. Right. And the answer there is probably no, I think that probably what this person was seeing was something very specific to their Netflix feed. And to their reading lists, which is coming from, again, where are you getting, what you're getting, Right.

Your kind of recommendation environment is really gonna impact that sample. And so this is a classic case of sure that person's life. He felt like he was seeing a lot of it, but had everything to do with what was being fed to him or what he was picking up and not necessarily a good description of the larger field of fiction, which is obviously massive.

And so that's just a generalization that has really no what we call empirical basis, because if we go out and fairly sample the world, we're probably not going to see what that person is seeing.

Andrea Martucci: And even if you could measure how many were out this year, you'd then have to also measure is that more than in previous years? The same? Less? Because to actually make a point about why is everything suddenly blank? You have to have a comparison point.

Andrew Piper: Yeah, it's a classic, this is really going on and that's totally under specified point, like really going on means there's some more this year than there were last year. Literally everything is a trauma narrative? A majority, a significant minority, right?

And this is a classic case of critics just wanting to say something's happening and then they can tell you why it's important without having to do the hard work of measuring this and really assessing if it's going on out in the world.

Andrea Martucci: So bring us to the point at which you are actively starting to get into investigating how you can use machine learning and data analysis to start to ask the questions that you're interested in asking

Andrew Piper: Yeah it's been a long road. It's been a very steep learning curve for me. How do you translate the things you're interested in when you read into computational models and that's been really hard. In many ways a lot harder than I thought it was going to be.

There's wonderfully an enormous adjacent field out there called natural language processing that people may or may not have familiarity with, but that's the world of computer science that is trying to build out algorithms and software and systems that can understand language in ways that can be helpful for us analytically.

Some would love to get it so far as that some people may have seen examples of the [00:18:00] new text generation tools where like you can get a bot to write a news article, for example. And they're getting increasingly convincing and it's kind of interesting question, how close are they? What are they good at? What are they not good at in terms of generating texts?

That's a sign of the ability of these models to understand human behavior when it comes to language use. So what that indicates to us is that our models keep getting better and better at picking up on how do we make sense with language?

And so that's a kind of useful way of seeing, oh, then we should be able to use those models to make inferences and judgments about texts that we care about that we find interesting. So that's been a really steep curve and figuring out, what can we measure? What can't we measure?

And I think over time, the things we're able to find out will get increasingly sophisticated. So the stuff my lab is really interested in now is measuring just narrative itself. Trying to figure out if there are patterns and predictable ways that when people set out to tell stories that they do so, how do they reveal information?

How do they handle things like surprise and suspense? Even as a reader, you can feel like a book is being more narrative versus descriptive. And so it's investing more in telling you some events that are happening and trying to get to the root of that. What does that mean for us as human beings?

Like, why do we tell stories? And when we tell them, why do we tell them in this particular way, right? What are the affordances of setting something in the past at a distance to ourselves, of describing concrete worlds, focusing on people's agency? So these are some of the bigger, more universal questions we're going after in the lab right now, which is, what AI can tell us about why people tell stories.

And I think that's coming out of more specific work. I think that I had presented back a couple of years back of looking more at specific kind of behavior in different genres and questions about bias within the publishing industry. I think that's where a lot of our work in the lab started with. As we were very interested in who gets to tell a story and what kind of stories get told about what kinds of people. And for us, that was a important place to start because as most people intuit there's a lot of biases built into the industry.

And so we started off on that road to really figure out how do different groups get represented, whether it's in movies and in books. And we found a lot of what we would expect to find, which is there's a tremendous amount of bias, a tremendous amount of exclusion and to the industry's credit, there's obviously a lot of discussion and efforts to try and change that. That work is ongoing and hopefully data science can be a kind of useful tool there to help editors make good choices, to help publishers think about their lists and think about diversity and help them see things that they can't see when they're just counting things by themselves.

Andrea Martucci: And I actually had the pleasure of seeing you and Eve Kraicer speak back at TechForum in Toronto. I wrote this in the email to you, was this in 2018 or thereabouts and the presentation that you presented was essentially about how frequently in texts multiple women were speaking to each other in dialogue in a scene and how infrequently multiple women, like two women or more, three [00:21:00] women together were able to speak to each other directly, compared to the frequency with which women spoke only with men or men and men spoke together.

And I, I found it incredibly interesting at the time as somebody who had not yet started a romance novel podcast, but, even then like with very little specific interest in this, I was like, oh, that's interesting. That is something that would be incredibly difficult to really make a case for, without being able to look at such a large volume of texts that you could start to begin to say are representative of what is available as opposed to such a narrow slice that you're completely unable to make any helpful or dare I say, interesting conclusions that have any empirical basis.

Andrew Piper: Yeah, that was a fun project. Eve was a great student of mine. She's gone on to do really cool stuff out in the world with gender and tech issues. So she's leading a great career right now. And we had a fun time working on that project and we found a few interesting things.

The first was, if you look at the gender balance of main characters in a sample of books we were using about, I think, a sample of 1200 novels across six or seven different genres to really cover the field. The balance between, if you just use a sort of binary system of male, female, Characters was almost exactly 50 50.

And we were like, oh, that's interesting. But when you look at all the rest of the characters, and this is on the order of 25,000 or more characters that we analyze, the balance flips very quickly to 60 40 towards favoring men as would be expected. And it was a really good indicator of how, when someone's paying attention, editors can keep track of protagonists.

They can keep track of that main character and say I don't want too many books on my list that are just about men or just about women, vice versa. And things get balanced. But when you can't keep track of stuff, which has all the characters in books and when something gets in the order of tens of thousands of things, all these unconscious, these social norms, they all kick right back in.

And even when we looked at we conditioned on authors as well. So if you just look at women authors while yes, women do tend to write more about women, they still had an imbalance towards men characters in their books. we were like, that is nuts. And, we just thought why are people doing that? These kinds of unconscious reflexes and no one is catching this, right? We absolutely need data to see this, but once you see it, you kinda can't un-see it.

And then the other result that you were talking about, which is the relationships between characters in books. And so one of the things we talk about in that paper is this idea, what we call heteronormativity. And so there's a very intentional, strong bias towards putting male and female characters together in plots, much more so than if you just kind of randomly connected people in a book you would get a, whatever kind of expected set of random relationships between the characters. You'd get a balance between men and women, again, using this kind of rough binary. And what you see in actual books is this huge skew towards ensuring that men and women spend more time talking to each other, than they do to their own kind of gender.

And it is interesting in the context of a romance podcast, because it was really interesting, [00:24:00] kind of latent romance structure almost, and a heterosexual romance structure to contemporary fiction, writ large. Like we didn't see differences by genre.

It just didn't matter that authors across the board have this disposition to tell stories where it's much more likely than random that men and women are going to be interacting with each other in the books. And that for us was. It was not only interesting. It was something to make us pause and say, if you're an author, if you're an editor this is really something to think about. That there's an interesting predisposition towards like straight relationships, towards foregrounding straight relationships.

And that's a limitation, right? That's a problem. If you're thinking about stories representing the diversity of human sexuality.

And so that was something we just wanted to, to put in front of people and say, look here's information about what's going on in your books.

Give some thought to talking with your authors about setting up these plots and to us was important, it's not just about throwing in, we need some more women characters or, we needed, some straight characters, some gay characters or something like that.

It's really about the larger structure of relations you're building into your novel, which you can't tokenize that, right? You can't just throw a character at it and fix it. This huge bias can only be undone when you fully commit to investing in different kinds of stories.

And so we see, this is one of the kind of mantras we have in the lab is yeah, we're just measuring this very simplistic thing about the relationship, the number of times men and women interact in a book. And you think what does that telling us? There's a bigger picture there that it's picking up on, which is like the very nature of storytelling that's going on in contemporary fiction. That that simpler measure can give us an indicator of, and that we can go to people and say, look, we've got data. We think that You really should be an advocate for changing how you think about stories on your lists, in your publishing house, or, if you're an author in the kind of books you're going to write.

Andrea Martucci: You have this lab at McGill called the Txt Lab, and you've spoken a little bit about the larger field of natural language processing. And I know that definitely in industry, there are companies that are using this to make chatbots so there's commercialized ways of using this technology.

I worked at a company that was using, I believe supervised machine learning to try to abstract texts. So there are reasons that companies specifically are pushing the industry forward, but when it comes to using these methodologies or this field to explore literary texts or the written word like fiction, how far evolved is this field?

It sounds like it's fairly new. Not just because the technology is fairly new, but there are very few places and academics that are really working exclusively on this, I guess, like, can you give the context and place the Txt Lab within that context?

Andrew Piper: Yeah.

I think you're right in the way you're describing the field in the sense that it's a very small world. So the intersection of data science and literature is still, filled with weirdos like me, whose brain really likes these two ways of thinking about the world and numbers and letters and smushing them together. And people with [00:27:00] strange biographies. I think that's going to change over time.

What I see with my students is, I'm teaching a class right now and introduction to literary data science and it's students who all coming from the humanities who have very little programming experience and they're throwing themselves into it.

They're really trying to learn. And they're making progress in ways. And they're beginning to think like young data scientists and yet they're avid readers. And one generation at a time as we get that more normalized and there are more and more courses of the university you can take. And it's not just this kind of niche thing over in the corner. I think it will become more normalized.

I think the reason it's not as widespread or hasn't been as commercialized as other applications of AI and NLP is two things. One is because it requires this diverse double skillset, that you have to both be really interested in computer science and also culture. And it's actually not as far apart as people think right. 20 year olds are generally really interested in movies and books. You know, one of the reasons we're growing popular is because there are students out there who are tired of doing just computer science for machine learning sake right in there. And they're like, Yeah. I'm an avid consumer of movies or an avid consumer of music. And I want to know about my cultural world and I want to study it in ways that aren't just like anecdotal and appreciative and don't have that kind of great books aspect to it.

Part of that frustration I had as a student learning to read literature, so to speak was that it was so foreign to me. And I just, I didn't have that family background to teach me of like intrinsically what I'm supposed to be looking for. And it was so alienating and so hard to get access to. And there's a lot of shame that goes with that. When people are faced with books that are supposed to be great and they just don't understand why.

And so it pushes a lot of people away from these fields, I think. Whereas they're avid consumers of this stuff otherwise. And so people who really shy away from the kind of great books or deep reading approach to literature that really turns them off. They're very same people who, love to listen to music, loves to watch TV actually like to read fiction or things like romances and really enjoy that stuff.

And so there's a disconnect between the way we've taught reading and literature. And I think what people like to do and what they value about it. And hopefully data science can bring us back to that because it tries to figure out, what a books do as opposed to what are you supposed to do with books? And for me, that's a really powerful distinction.

So this is all by way of saying that, our lab is really about trying to bring people who have those double interest together and to move away from this culture of what are you supposed to say to how can we better understand this kind of culture that I consume and love and am passionate about and better understand why am I so passionate about it? Or why are all these people reading this and really fascinated by it?

And it's taking time because it requires skill sets that there's not a pre-existing feed for. There's tracks in secondary school for if you're a great reader and you love to read, there's tracks for if you're great at math and science.

There's not a lot of tracks for people who do both really well or want to try both. So we have to start over when we get to university and they come into my classes and we kickstart that process. But it's going to take awhile I think. [00:30:00]

But the nice part is there's going to be industry applications for this. It's important to emphasize to folks that yes, I think the dream of chatbots that can do customer service better for companies is there's massive amounts of money to be made there, and everybody goes there first. Sure. But it's very clear in conversations that I've had with places like Netflix, with places like Apple, with places like Wattpad, they are looking for people who are tech savvy, but also culturally savvy. Right, they're looking for the readers who can code. They don't just want another data scientists or computer scientist or an engineer. They really want someone who understands culture who understands reading but has to have these skills, this fluency with data and AI, because you can't in the modern workplace today in the culture industry without that knowledge. And so we see ourselves as like really positioning students to go into those industries. And I'm hearing that's the kind of student and worker profile they really love. And that's what they're looking for is not just another cookie cutter person who knows how to run a machine learning algorithm, but someone who can really think deeply about culture and run a machine learning algorithm.

Andrea Martucci: Yeah, I think it's really hard to answer these questions productively, if you have people coming from diametrically opposed ways of speaking about these things. There literally is a language of speaking about how did you put it before, pattern recognition and sort of thinking about reading in that way versus thinking about reading as this like extremely qualitative experience.

And I can totally see what you're talking about, where you have to have a foundation at the very least of that language as somebody who understands culture of understanding the data side, because otherwise it's not even like, Hey, I'm the culture expert and I'm going to come in and talk to the data people. They cannot talk to each other, like they're not going to get anywhere.

Andrew Piper: Yeah, that bridge is really hard to build and takes a lot of work. And, my experience has been, I've been collaborating with several years now with engineers, with data scientists to really adopt and adapt to their way of thinking. And I think it's gradually going to start going the other direction as well, that we, what we seen over the years is a lot of the failures of data science and AI have been precisely being too far divorced from human and cultural questions.

And they're realizing that failure has led to failed systems. And they're beginning to pick up that if you don't know about human behavior and don't understand human behavior and how cultures work, you're going to build systems that interface with those cultures in poor ways and get outcomes that you're not proud of.

So, yeah, everybody's figuring out that we need to join these two things together. But it does take work. It takes a lot of work to reformulate how we look at problems, give us shared vocabulary. But it's also the fun part. I have to say. That's the joy I get is when I see them starting to rethink and my lit students start to think like data scientists or a computer scientists really starts to ask an interesting cultural question and I'm like, okay, it's clicking.

It's happening.

Andrea Martucci: It's happening. To talk about one of the projects that you worked on in the Txt Lab you've said as much in other words, but [00:33:00] essentially what you're talking about here is how can you use these tools, natural language processing, AI, et cetera, to extract meaning on a large scale from texts.

And I think that we all understand intrinsically perhaps that when we consume a piece of literature that we are extracting some sort of meeting. But when you think about getting granular about how you're actually going to do that, you start running into these issues of in my brain, I can make these associations where this word is, synonymous with this word or in this context, it means this, and you may not be aware of how much work your brain is doing to kind of make meaning and make sense of what is going on.

You have your own experience, you have the cultural cues, the cultural context, et cetera. The proximity to other words, the specifics of an individual story. And so that seems to be the real challenge with data analysis is asking the question in the right way to be able to extract meaning with enough nuance to get at what you're really trying to do.

And a practical example of this I know that. AI is being used to analyze sales calls, where a sales person calls a prospect that they're working with and they have a recording of this conversation, an AI generated transcript. And what they're trying to do is extract meaning from this. What is the likelihood that this person is actually interested in the product and that the likelihood to close and all of these questions that have business implications.

A human can perhaps parse oh tone. And they seemed into this, and they talked about implementation, which they wouldn't talk about if they were not actually interested, you know, the human is really making meaning of that conversation, but how do you then get a machine to understand that?

So in the Txt lab, you published the paper, How Cultural Capital Works.

And what I noticed as a romance reader is that the romance community talks a lot about how there is a lack of respect for romance and how it is considered either middlebrow or low brow. And, it is considered lesser than, than things that are considered highbrow. And so there's this perceived dichotomy between what is popular and what is revered.

And so you set this up in your study to examine works that seem to have economic capital versus those that have cultural capital. So could you talk a little bit more about that study and the meaning that you were trying to extract from the work in that study? And how you constructed a scenario that would reveal those things.

Andrew Piper: Yeah, it's a good question. And it goes to the heart of how we structure the questions and our research in the lab. The French philosophers who is best known for this is Pierre Bourdieu. So this idea of cultural capital is something that he introduced, which is, what is cultural capital?

It's a form of value that isn't intrinsically economic, but may be related to economic capital. Economic capital's the easy stuff. It's, how much money do you have? How much money do you make? That's where we spend a lot of [00:36:00] arguments. Who's making what, and is that equal across different parts of society?

But Bourdieu, what he tried to do was say cultural capital's important too. It's the stuff of consecration. It's that thing that gives you power, that isn't necessarily money. And a lot of it's education based. And so he saw the cultural field as it's this battle for these positions, right?

This battle for, if a type of writing makes money, then people who feel excluded from that will want to create another system of value that says, oh, this is really important because it's good or it's hard, or it's special, or, whatever. And the simplest way to break this down is that we do in the lab is through this idea of prizewinning things as a form of consecration versus best-selling things, the things that people buy.

And what we do, and this is where I think AI can be really valuable. We don't really start from a place of saying let's define all of the literary features that we could ever imagine that may or may not be different in these worlds, but let's just start from a neutral position and say, are these worlds different from one another at a sort of stylistic level, in the ways these stories are told.

That's where we get these very strong indications of behavior. That is to say books that are bestsellers, that sell really well and books that are given prizes, they behave very differently. And that, again, isn't too surprising except it allows us then to zoom in on what are those differences, right?

What's happening when prestigious authors give other prestigious authors prizes. They're trying to circumvent the economic system. They're saying our stuff isn't being bought, but it's still important. So let's give each other recognition, right? Let's create this like para economy of value. And romance is an interesting case because it's obviously much more on the consumer side, right? It's stuff that is very widely read, very popular. And for reasons that, as you were saying, like to the community will feel very frustrating, it's like, why does nobody like us? And that the system of consecration is essentially designed to run away from popularity.

And so in many ways, it's a like default, like if you're popular, you're going to lose the consecration game. Like the consecration game is designed to make popular things lose, right? It's a resentment system that says well, a lot of people bought it. It must therefore be bad. And that is like the premise that Txt Lab is designed to, you know, pop to like, I had a dream of what I could achieve over the course of my career. It would be to get people to stop thinking like that and to ask the question in the opposite way, which is to say, why do people enjoy this so much? And once you start there, everything's interesting, right? Romances are super interesting because a lot of people like them. Fan fiction is super interesting because a lot of people write it and also consume it.

Bestsellers are interesting because a lot of people buy it. Prize winning books are interesting in so far as it tells me something about elites. I don't think it necessarily makes them interesting, but things that people love are interesting because we can learn about what do people love.

And so in any case, that's where we kind of start, right? And so these AI systems can be very good at helping us see what those differences are. And rather than me [00:39:00] saying, you know what, I think the differences between, why romances are bad is blah, blah, blah. What it actually does is it says, here are the features, here are the qualities that help us predict this kind of book versus this other kind of book.

And so while we didn't work specifically on romances, we did, but we didn't delve into what makes romance special in that paper, what we found was that what makes prize winning books, what makes this sort of elite culture special is an obsession with nostalgia and nature. And what we think is happening is that's also a sort of reactionary to bestsellers' investment in technology and in action.

In other words, best-sellers seem to be driven by this idea of like narrative propulsion, like what's happening next, what's happening right now. And they're telling gripping stories that way. And so the way prizewinners distinguished themselves, make themselves feel better for not having sold a lot of books, is they invest in slow narratives. Backward-looking narratives, nostalgic narratives, nature, mountains, trees, rivers, right. And it's funny, like once you start to see this data, every time I pick up, you know, if someone's shortlisted for the Man Booker prize, I guarantee you, you will find a field of grass somewhere or a vista with some mountains.

Like it's just, it is very funny. You really start to notice these things. And so, that's where computation would be very valuable because it can start to pinpoint and say what makes the stuff so predictable. And then we get to ask ourselves that bigger question, which is how do we feel about that difference?

And what I finished the piece with is, is nostalgia is conservative, backward looking story telling really the thing we want to say is like the most valuable kind of storytelling out there? Like I personally live in a world of urgency. Like we have urgent social problems we're confronting, I would like our storytelling to reflect some of that urgency. And why can't things that are consecrated of complex, literary value, why can't they do that too? And that's something we should think about in the same way that we should try and think about telling stories that a lot of people love and may buy that are about the natural world, because that's obviously our real crisis point at this point.

I think data can get us to see these behaviors that are happening, that we wouldn't otherwise be able to measure. And then give us entry points to think about what can we do about it and come back to that point, we were discussing the kind of heteronormativity of fiction.

There's always a limitation of what these measures can tell us, but we can use the ones we have to try and target things that we think might matter in how people are represented in the kind of types of stories that are being told about different kinds of people. And I think that's one place we kind of gravitate towards in the lab to say, look these stories are overwhelmingly biased towards straight relationships. Surely you don't want all of your stories doing that.

You asked a question in the email you sent me an advance that someone had posted online about, why all the women in romances have these particular traits?

And that's something that computers are very good at detecting. Is that happening? And if it is okay, now we have information that we can prime authors with and say let's maybe not have all of the heroines of romance have red lips [00:42:00] or, blonde hair or whatever it is.

And so data can be good at detecting these biases and these skews, these and think about we like to use the term representation. Like how is representation working, right? How are groups of people, how are individuals being represented? And then what can we change about that depending on how we feel about those representations once we know more about it.

Andrea Martucci: Yeah. And this is something I've talked a lot about on the podcast where one of the challenges of talking about texts and I'll just ground this in reading romances is that you can read one romance and look at it and say, oh, I kind of don't love that it does X. I think that it's very easy to jump to oh, so this writer can't write about this situation. This is a thing that happens. This is a lived experience. And the problem isn't necessarily that one person did it. The problem is that it's happening so widely that it is considered the norm. It's considered the default. And when you look at all of the texts together, all of the texts together are fundamentally privileging certain experiences or certain people over a more representative representation of people in general, right? So it's not necessarily the problem that one text does it, it's that the pattern, the wider pattern is problematic. And so I think it's so powerful to have a way to actually look at that where if you get out of the anecdotal, out of the like, I don't know, that's just my lived experience so that's what I'm writing. It forces individuals and systems and industries, hopefully to really examine it turns out that all of these individual choices that we think we're making completely without bias are actually really continuing to reinforce, systemic oh my God. What's the word I'm looking for?

Andrew Piper: It's just a systemic bias.

Andrea Martucci: systemic bias. Yes. Yeah,

Andrew Piper: yeah, no, that, the insight and the value of data science, and this is a very hard point to get across in my world. And, this is why I like to spend time talking with people about it is that we're very concerned that data flattens the richness of a book. So first thing, pretty much, most people say to me when I tell them what I do, and they say, aren't you worried about data is going to flatten the richness of the book? And I think, yes, it absolutely does.

Andrea Martucci: How many times out of 10 do they say it exactly like that? Flatten the richness of a book.

Andrew Piper: it's almost literally word for. You can swap one or two words in and out, but you get pretty close. And yeah, they say that. And then I say look, yes, that's absolutely the case, but that's not our primary question. Our primary question is what are large amounts of books doing? Because the theory is that has an impact on us as a society, right.

As you were talking about it's the systemic representations, Right. It's the representations you encounter over and over again that I almost would call they're ideological. They're the ones that give us our ideas. They're the ones that turn into ideas that turn into beliefs. And so the more we see something, the more we come to believe it.

And this is wonderful tagline from the Geena Davis [00:45:00] Institute for Gender Studies, Women and Gender Studies is called, "if she can see it, she can be it." And it's all about trying to undo the bias in Hollywood around gender representation. And I think you can take that to the next level and say, if you keep seeing it, you will believe it. And so what we're trying to identify is these larger scale patterns, which means yeah, each individual book, we're losing a lot of interesting information about what makes it But instead, what we're seeing is the kind of patterns that everybody's engaging in and then allowing ourselves as, members of the community to decide, how do we feel about that?

Now you may be maybe like, sure, I don't care. That's fine. That's what we want to do. That's the message we want to send. Often the case. And this has been my experience with publishers is they're like, oh, that is definitely not what I was trying to do. Apparently I was making poor choices, but I didn't know.

And so a lot of what we're just doing right now is putting information in people's hands and saying, look, here's, what's happening. You probably didn't know this because you don't have the tools or technologies to do it. But you should be aware. And that, that probably is going to have an effect at the social level that you probably don't want to be contributing to.

The next question is a much harder one, which is what do we do about it. Okay we can measure a lot of stuff and we can hand that to stakeholders, whether it's authors or editors or publishers, but what's the right answer? What's the best way forward. And as you talk about, look, you're an author. You don't want to have a quota machine where you're like I have to say this, or I can't say this right. That doesn't feel creative. That's the first thing you're doing is limiting someone's creativity, which is the opposite of what you want creative people to feel.

So you know, how do you get behavior to shift? And that's something I don't have a good answer on. Other than again, our other mantra is just make some noise. Try and think once you know that there's a pattern out there, try and imagine how you can move it.

That is to say, what can you do to alter those patterns, right? What can you do to re-imagine your main characters, to reimagine your plots, to reimagine your relationships, so that you're getting out of those expectations and those expected norms to create something that's novel, surprising and interesting.

I think there's an immediate tension for authors who are thinking well, my readers expect something, I can't go too far out of bounds or they're going to be like, that's not why I came to this book. And so that is challenging. I think that is a very hard thing to solve then is reader expectations, audience wishes, and needs versus, are we being fair?

Are we embracing dignity across our whole community? And can we get those goals to be not mutually exclusive, but actually compl Can we find more readers? Can we find where readers are? And at the same time tell stories that are more expansive, more diverse, more surprising.

Andrea Martucci: Right. And that actually feels really relevant to the tension within the romance genre specifically where a lot of the appeal of romance texts is the sameness, that there are certain structural and tropes and archetypes that are considered really attractive to authors and readers and where a reader reads something and they enjoy it. And then they say, I want [00:48:00] something that's like this.

And so how do you decide, which are the parts that are important in that likeness. you know, Regency, England, it has to take place in Regency England. That is the part that I enjoyed about this. Instead of being able to drill down into like, well, actually I think that the part that they really enjoyed about it was something that could be defined more broadly, that then can translate less one-to-one or less hegemonically to what has been presented in excess and abundance in the past, or, even what everybody else is doing.

So this brings me then to the other challenge that I see where the data that your lab is creating, studying, analyzing, publishing on is one piece of the puzzle of I guess helping people understand a was really going on. And so it sounds like what your lab is doing is okay, we're actually going to get the data to show what's actually happening so that we can say, Hey, we know that anecdotally, people think this, but this is what's really happening.

The other, interrelated step there is the study that I think generally takes place in like social psychology world of how do texts impact us? And I know my anecdotal impression within the romance genre and the readers and writers is there's like this tension with the idea of do texts, impact us as readers and consumers of those texts.

And there's a lot studied showing absolutely. Yes they do. And when you have large scale exposure to certain narratives, that those narratives not only seep into your belief systems you also internalize those beliefs as true and you no longer associate them with like, well, I read this in a lot of fictional stories.

You say this is just the way it is. And you no longer have that separation between like I experienced this exclusively in fiction versus my real experience. It all bleeds together. And like the data is there on that. But how do you create this like awareness large scale, to the public who are not going to read academic work about a, this is really happening and we have the data to prove it and B it matters. And then, the part you're talking about. And so what do we do about it?

Andrew Piper: I mean, I think my take has always it's very important to start where readers are. When we were doing these studies, we were often trying to initiate them from the perspective of what could be wrong here? What would be some concerns we might weigh in on?

And so when we did, for example, a study of racial bias in Hollywood screenplays, we started from the Oscars so white mov that is to say there were audience members, like people in a community who said, we have concerns about this thing. And we don't have the data to back it up, but we don't like what we're seeing for the following reasons.

And then we were like, okay here you go. Here's the data. We can help you with that. And so I think it's really important to to start all of one's analysis, really [00:51:00] from what our readers concerned about, what are they interested in and what do they have reservations about?

And then seeing, is that something that's measurable, seeing if that something's observable. And then try and ground those beliefs in the data and then leave it up to the community to begin to think about okay how can we change this?

And there's two ways we go about that. One is to say, okay we need to get closer to readers.

Like we need ways to hear more from community members to say this is stuff I believe about this world. I don't like this, this is a limitation to me. And then we can say, okay good question. We'll get to work on that.

The next would be, yeah. How do we empower authors to make informed choices? Because a lot of what we're seeing is, my guess is a lot of what's going on is it's just this intuitive, instinctual process, and that's what creative writing is in some deep down way, right? It's not highly reflective, it's not highly analytical.

And so part of the challenge is to add an analytical filter into that creative process, because what you're going to do is you're going to reproduce, the norms that are around you. As we saw with the distribution of gender in books like that just comes out regardless of who you are.

But, what we're thinking about is can we develop tools that authors can use to assess their manuscripts so they can sort of, when they're having an editorial pause and thinking about their work, can they be told like, oh, hey, by the way, your gender balance in your book is like 70, 30. Maybe you want to do that, but maybe you don't, right? Here's the information you need to make an informed choice about what you're doing, or, it turns out all your women have red lips and blue eyes and are you sure?

So that's the place of intervention I'd like to get to, which is pretty seamless systems and software that can really help either editors or authors take a reflective pause on their writing and do some like self-assessment and think of that can be a boon to creativity, right?

Because that moment of seeing the norm you're like. Yeah. that wasn't very creative of me. How can I revisit that? How can I do better there? How can I do something different? And now now you're tapping into your creativity. Because you're saying well, what, how would I do it if I didn't do it like everybody else?

And so I think the key is really beginning to prioritize what are the things that our community wants to know? And how can we then have them do more of that? The things that they're proud of and excited about, and then do less of the things that you know, they're concerned about.

Andrea Martucci: Yeah. And so to speak about some of the questions that are coming up in the romance community. So on one day, two questions came out on Twitter that I sent you. And I want to talk about the first one here. So this was a question Katrina Jackson put out there she said, "is it me, or are there so many more white romances this year, and being announced like a lot."

And I was in that thread a little bit talking to Katrina. One of the thoughts I had was like, how could you enter this question? So you could look at Publishers Marketplace announcements, like that's a fairly good way of looking at what books are being announced by traditional publishers in a given time period.

And thinking through how could you answer that question? What is a white romance? [00:54:00] Now? I happen to know from my own observations. Sorry, I don't know this. In my own observations. I think I know that white romances, the meaning I pick up from reading these publishers marketplace listings is if they don't mention the race they're white.

And if they do mention the race well then okay then you know what their racial identity is of the characters. And my hypothesis was well, If you are able to pull in a large sampling of Publishers Marketplace listings, for romance specifically. And every time a race isn't mentioned, catalog it as white or, assumed white and then catalog other racial identifying language.

And, I would hypothesize that you could probably get a fairly good understanding of the, let's be honest, mostly pairings. It's going to be very rare to find more than two characters in a traditionally published romance, get a fairly good understanding of the racial identity of the characters in new books being announced.

Critique my method here. Like what do you think of this? Is there a better, different way? How would you approach this?

Andrew Piper: No, I think you've got it. I think you're thinking like a data scientist uh, is, first of all, because you're not actually worried too much about the books, let's say inside the covers in many ways, the summary or description may be the most important information that we care about right now.

And so, all of those extra details, we can just brush them aside, which again, makes those who really want to study literature in the traditional way gasp a little bit. Oh no. What have you done? Our question really, as you said, is is there a problem of racial diversity when it comes to the genre of romance?

And you highlight very quickly one of the analytical problems, which is that whiteness is often a negative signifier, it's an absence and it signifies through not having to tell itself. And there's a lot of great work on this. Toni Morrison has written some beautiful books on this exact topic. So it makes it hard to teach a computer to predict the racial identity of a character if a big swath of them don't have identities. But as you said, you could take that absence as a positive indicator.

And Yeah. I think I wouldn't surprise me. If there's a lack of representativeness there in the sense that the distribution of main characters in romance do not line up well with the racial and ethnic distributions in North America. For example, if you look at North American published books that wouldn't surprise me at all.

It would raise the question of readership. And whether is that more reflective of the readership? And, that's data I don't have, but that's data you'd want to think about. Because again, this goes between the social goals that we have of telling stories that represent the world we actually inhabit versus one that we probably shouldn't inhabit versus appealing to what readers would like to see, which may be stories about themselves. And if that readership is highly skewed, then that's something, as the author or publisher, you're still going to have to think about nonetheless, right?

Nonetheless, this, my students love to point out class. Like it's fiction. [00:57:00] You can do whatever you want.

Andrea Martucci: You can do whatever you want, but why do people keep choosing to do very similar things?

Andrew Piper: So their, their argument is do the different thing. Like whether it's representative or not of whatever. If there's a concern that there's a racial bias here. Why not tell stories with more diverse characters and see what happens? See what happens with your audience.

Andrea Martucci: This question actually, there has been an attempt to try to answer this question, or I'm going to say an approximation of answering this question quantitatively. There is a romance only bookstore called The Ripped Bodice. They undertook a study that there has been some public conversation about the methodology of this study.

I personally find some issues with it, not least of all the way they're trying to measure diversity within the romance genre is by guessing the race of the authors and measuring them. And so there's some methodological problems there, but one aspect of that is assuming that diversity in the genre can be measured by the authors as opposed to the characters on page, which I would argue is probably the more salient data point for readers because you know, you're consuming the characters, not necessarily the author. And I think that in that public conversation people were like well, how else are you supposed to do this? There's a lot of sort of methodological concerns about how to answer this question, but I actually found that if you say hold on, let's back up. These things get announced at some point there's some data or is that data or metadata?

Andrew Piper: If it's about the author yeah, we would refer that to metadata, right?

Andrea Martucci: I guess like a Publisher's Marketplace listing is that's data right ?

Andrew Piper: Yeah, I guess we would, it would be metadata in so far as it's it's about a book.

But when you go into the book's description, now that's technically data because it's unstructured text and all that. So it's data inside of metadata. I guess we could, if we made this as complicated as possible.

Andrea Martucci: Let's do that. That's what I seek to do on this podcast as much as possible. We don't need to go too far into this other example, but just to talk about the contrast, the example we just talked about Hey, what if you were trying to look at book announcements to understand the racial identities of the characters in the books that are being published or about to be published?

You could go a step farther and try to guess how lucrative those deals are by analyzing the deal language. There's a lot of things you could do there. @MomoNoki8 asked the question about who is critiquing the use of blonde, pale lips, thinness, small waists and blushing cheeks, et cetera, in contemporary white led romance novels.

And I talked to them a little bit in direct messages on Twitter, because I was like, first of all, I'm talking to somebody soon. If I can kind of talk to about this, but just to point out like methodologically, how different these two questions are. On the one hand, if you're asking which characters are represented in texts, that's an analysis of how often certain kind of character characteristics are appearing.

Whereas this [01:00:00] now is getting into a question of which language is being used very often more often than not. like in an over representative way, to describe characters and then what is the meaning of that? So it's I guess then this is getting, into a much more nuanced look at the words being used to describe characters and like the possible meaning of those words and the associations.

Andrew Piper: Yeah. And what's nice is It's all words like this is, this is what I love about my job is at the end of the day, everything is words. And words are very measurable. And despite again, many of my colleagues' frustrations, it's actually the case that they're very discreet. They're very measurable and they're very distinctive patterns we use language to signal to people what we mean. And generally we do that in very overt, very clear ways. And so this is one of the reasons why these methods work so well is because when people set out to tell stories, they're not trying to obfuscate. They're not trying to be subtle and they want you to know they're telling you a story. They want you to know that it's a made up story and they want you to know what kind of made up story. It is. Write a romance. You're going to know it's a romance very quickly.

Andrea Martucci: It's very important that you know, it's a romance so that you can come in with certain expectations and it like, because people are looking for this particular thing. Yeah.

Andrew Piper: So it's not magical or mystical in any way. And it's in many ways is a human behavior we've underplayed, which is our signaling capacity, we're signaling to each other, what we're doing in intentional ways. It's very different from, how good humans are at lying, right?

That's a different problem. But this is one where we want you to know what we're doing and we want to make sure, you know, and so those patterns, they're going to come off very detectably and they're going to be in the words, they're going to be in the language. Cause that's what books are.

You're describing kind of two related problems with kind of different goals. One is to say how do we know a characters' racial identity. Because of the words used to describe that character, or as you were pointing out before, words not used. And the nice thing about machines is they take both of those things seriously. That is to say when they don't see things that's as meaningful to a machine as it is when they do see things, I think that's true for humans too, right? Like the absence of information signals to us as much as information

does. And so you're doing something in a more binary, what we call classifatory way or using those descriptive words around characters to say that character is X or that character is Y right. That character belongs to racial minority of that character does not.

But at the end of the day, you're making that inference. You're making that judgment by the words used to describe that character and that's how readers are doing it, right? Like, how do you know it's a white heroine or a black heroine or an Indian heroine or whatever. It's because the author has told you in some way, right? They've used the words to let you know that. So you can form that representation in your head.

With the other question about this overuse of sort of descriptive words, or, like we call synecdoche, like the part for the whole, like, why do we focus on this body part or this quality of this body part? And falls probably more under the traditional heading of a little bit cliches, right? That thing we rely on to do that. And [01:03:00] again, machine's very good at finding that and finding that behavior and showing what those trends are Which then stops at the point of saying, okay, what are we going to do about it?

But that project is really not hard. They require different kinds of information. So to do the second one to say, why are thin lips always showing up? And in romances we would need a lot of romances. We need a big pile of published romances. And then we use our techniques to identify where the characters are.

And then we use our techniques to identify the words that are modifying those characters, which again, we're just picking up grammatical relationships, right? This is, we know adjectives modify nouns. So we pick up those words and then we just use very familiar statistical techniques to say it's much more likely given these characters or these types of characters or these types of books that you're going to see these things mentioned to describe them. And so amassing that kind of evidence is really actually pretty straightforward. Like that question is very doable. Often people want to study things that I'm like, that's impossible,

Andrea Martucci: (laughs) Yeah,

Andrew Piper: the two you've mentioned are both very doable.

We could address each of these things and then we're left with what next? What would people want to do about it? But we could definitely measure those things easily.

Andrea Martucci: Right. Hey, guess what? Publishers, we've got the data. We've shown that there's an over-representation of a blonde characters or, characters that are described in a very particular way that is not representative. And we believe X. What would you like to do about it? Or what are some tools, you mentioned earlier potentially like a manuscript analysis tool for authors or publishers, so like practically speaking, how could you operationalize addressing that concern?

Andrew this has been fascinating, obviously this is something I'm super interested in. So this has been a lot of fun. Where can people learn more about you and your work, the lab, and where can people find you online?

Andrew Piper: Easiest place is just follow the blog, txtlab.org, where we post new work there. It's a kind of the, the big updates. And then if you want a more daily what's coming out of my brain and, you know, things like that going on, which you may or may not actually want. But if you did I'm on Twitter at @_akpiper is my Twitter handle and people can follow me there.

Andrea Martucci: Great. And that's T X T lab.org.

Andrew Piper: Yes. Named after the file format. That is my favorite, which is, this is the nerdy part of me, which is a dot TXT file is the simplest way you can represent a text as a text file. And I just love that because it's like, it gets to this idea of the beauty of simplicity, we scrape away all the crap that Microsoft word adds to something and all these baggy software systems.

And we just get down to the words themselves. So that's why we named the lab after that.

Andrea Martucci: Love it. Please everybody check out Andrew. And actually Andrew you've written several books. So if somebody was interested in the conversation we had today, is there a book in particular of yours that you would suggest th ey check out?

Andrew Piper: I think if you want to get started, Enumerations is a good place. It's called a data in literary study. [01:06:00] And so it tries to introduce how we can start to think about the importance of quantity in literature that is to say these patterns and the way they have meaning.

Andrea Martucci: Thank you for being here today.

Andrew Piper: Thank you.

Andrea Martucci: Thank you so much for spending time with me today. If you enjoyed today's episode, please subscribe, rate, or review on your favorite podcast app or tell a friend. Check out ShelfLovePodcast.Com for transcripts and other resources.

If you want to join the conversation about the topics that we discuss on Shelf Love, I'd encourage you to check out Shelf Love's Patreon at Patreon.com/ShelfLove. Thank you to Shelf Love's $20 a month supporters: Gail, Copper Dog Books, Frederick Smith, and John Jacobson.

See your name listed as a Patreon supporter on the Shelf Love website if you join at any level. That's Patreon.com/ShelfLove. That's all for today. Thanks so much. Bye.