Science Fiction Categories: A Proposal

Interesting reply Kat, but this statement is crazy:
But Stephenson is not considered to have made any real significant predictions in his fiction
The list is long so I'll just reel a few off the top of my head:

In Snow Crash he predicted Massively Multiplayer Online Games, including technical problems like object collision for a branch of software that hadn't even been coded yet. He also predicted online real estate, and about 40 other things related to online gaming. Snow Crash also predicts DRM, modern cell phones, and Google Glass. Matter compilers in The Diamond Age are advanced 3d printers. Mediatronic paper is an advanced Kindle. Neither of those devices existed in 1995. The science for all the nanodevices in Diamond Age was there in 1995 (I actually worked on a few), the only issue is they still don't have a power supply that small and powerful. The list goes on.

Are some of his inventions 'magic'? Sure. A lot of his devices were thrown in without a lot of thought into how to really make them. But for theo ones he does, he blows Arthur Clarke out of the water as far as predictions, etc.
 
What degree of accuracy and certainty do you think I am trying to get?

I haven't a clue.

This is about probability.

Probability of what?

But I have already found that the words I search on significantly affects the results.

Well yes, but do the results really show hard SF in the story, or just sciency words that could be used any which way? And then there are the tricky ones, like cell, that mean more than one thing and not all of them about science. I'm just not certain how accurate the scores you are generating are about actual hard science content in SF.

then added the word "castle" to the list. That word is totally irrelevant to most science fiction.

So why would you add it to the list of science words you're testing for, which are supposed to be the relevant words of science fiction? The idea is to measure the amount of hard science content, no? The presence of the word castle in a story is indeed irrelevant.

But it turns up more than 50 times in Bujold's fantasy works, Curse of Chalion and Paladin of Souls.

Yeesss, because she's writing about knights in her fantasy fiction. Knights, castles, kind of go together. Paladin is kind of a clue there. The question remains why are you analyzing Bujold's fantasy novels set in a medieval secondary world when you already know they aren't science fiction novels and therefore should not be in your data set at all? It's like saying, "I'm going to measure the hair length of dogs. And now I will put cats in my data set and lo, my analysis is that these things are indeed cats and not dogs and so their hair length is irrelevant to my study."

So I figure specifying a number is not enough. The prospective reader should be told the 5 or 10 most commonly used words.

The five or ten most commonly used words in fiction: and, the, I, we, you, but, or, was, said, a, an. Also me, they, their, your, and my. You are not analyzing what word is used most often in a fiction work; you're trying to determine how often science words are used in the fiction. So again "castle" is not a science word. Quantum, planet, gravity, etc.

Since Bradbury said his Martian Chronicles is not science fiction

And Margaret Atwood didn't think the science fiction novels she wrote were science fiction novels either, even though they were. She wanted to call them speculative novels until enough people gave her grief that she threw up her hands. Martian Chronicles are science fiction stories, but they aren't hard SF, which is what most of Bradbury's buddies were doing at the time. Martian Chronicles would get a low score for sciency words, which would tell prospective readers that it is not hard SF (although really I think anyone could figure that out from the brief description of the collection.) There is really only a small percentage of stories that you would need some sort of number score to determine that it was or was not hard SF -- those books that are close to hard SF, but not so much with the science.

I am going to compare it to the Mars Trilogy by Robinson which everyone agrees is hard sci-fi to the point that many accuse it of being boring.

Why? Everyone knows that Martian Chronicles is not hard SF and everyone knows that Robinson's terraforming stories qualify as hard SF (although just barely since some of his science as we've discussed in the past is not that great, and he gets obsessed with political and sociological philosophy of Mars as emerging nation more than the hard science terraforming stuff.) So comparing them on that front seems rather fruitless. Robinson's books should have a fairly high score on sciency words and Bradbury's stories collectively a low one. They are just points in a spectrum, not related.

Data does not have to be precise and certain to be useful.

Yes, but data that you already know isn't particularly useful to collect in a different form. You already know that fantasy novels are not hard SF. You already know that Martian Chronicles are not hard SF. You know that Bujold's Miles novels are not hard SF. You know that William Gibson doesn't write hard SF. The ones that are science fiction, though not hard SF, you can certainly test to see if the scale works. But what would be potentially more useful is to run through stories for analysis of sciency words (and not fantasy words,) that you are not certain about whether they are hard SF or not.

Basically, you're using your control groups (works that are definitely not hard SF and that are definitely hard SF,) as your main test group instead of as your control groups.

phil geo said:
In Snow Crash he predicted Massively Multiplayer Online Games, including technical problems like object collision for a branch of software that hadn't even been coded yet. He also predicted online real estate, and about 40 other things related to online gaming. Snow Crash also predicts DRM, modern cell phones, and Google Glass. Matter compilers in The Diamond Age are advanced 3d printers. Mediatronic paper is an advanced Kindle. Neither of those devices existed in 1995. The science for all the nanodevices in Diamond Age was there in 1995 (I actually worked on a few), the only issue is they still don't have a power supply that small and powerful. The list goes on.

Ah, no, he didn't "predict" them. Multiplayer online games began in the seventies running on limited networks. The first commercial massive multiplayer online game (although "massive" then was much smaller groups obviously,) was in the mid-1980's on CompuServe. It was widely expected as computer and video games developed back then and as the Web became a viable widespread network in the late eighties and early nineties, that there would be further multi-player online games that would be larger. It was not a new concept. Likewise virtual reality and simulated life concepts were long established ideas in SF well before anything close to VR got invented, much less the online gaming industry. (See the movie The Lawnmower Man, adapted from Stephen King's work, among many others.) Encryption DRM was a significant issue once computers could transmit data files commercially. An early version of DRM was the Software Service System in the 1980's. And cellphones is just silly. SF writers have been doing versions of cellphones, smart phones and tablets since the 1950's at least. Same with Google Glass. The appeal of Google Glass is in fact that it looks like something that used to be envisioned by early SF writers who had all sorts of tech spectacle inventions.

Electronic paper of various kinds were invented in the 1970's-1990's and had been a long time goal for development in the science/tech field and often used in science fiction. E-reader devices using CD-ROM's were commercially developed in the 1980's. They did not succeed as a market because the tech companies were largely uninterested in developing it. The development of the Web led to the movement from CD-ROM's to smaller drives with data files and online transmission. (Large publishers had all developed CD-ROM publishing divisions in the late 1980's and had to sell them off/disband them by the early 1990's.) The techniques for better electronic paper/ink were expected to develop (the world was supposed to go paperless in the 1980's,) and did in the late 1990's leading eventually to Sony putting out their e-reader as you know in the early oughts for general commercial use.

Replication printing was a long time goal of science fiction. (See Star Trek's replicator.) Early 3D printing efforts were made in the late 1970's and the 1980's. What's happening now is that the 3D printing devices are getting smaller and more adept at more creations and is poised, possibly, on a wider commercial use, which has been wished for quite awhile in the tech industry. And nanotechnology was the favorite toy of science fiction writers in the 1990's, not necessarily always for hard SF stories.

So no, he didn't "predict" these things. You cannot say that a SF author predicted stuff would happen that many other SF writers and scientists and engineers had already been predicting and in the latter case developing for years, including before he wrote the novel. Some of the best predictors were SF writers who were respected at the time (1930's-1970;s) but have not been remembered as well as the bigger stars in the field. (Like some of the ones Psikey spits out links to.) Some of the really good predictors weren't hard SF writers, like Bradbury, who is credited with the concept of the robotic house and wall televisions, and Aldous Huxley in Brave New World. Jules Verne has a frightening record from two centuries ago. Clarke, though, had a lot of advantages because of what he worked on in science. He is considered to have predicted orbital satellite communications, space elevators, etc.

Now is there some particular detail in Snow Crash and Diamond Age that did develop in gaming say? Possibly. But I'm not aware of him being credited with anything major. Has he inspired designers in tech? Very possibly; that's not uncommon with SF writers; as we know Gibson did quite a lot for instance. But he simply isn't regarded in the field as a hard SF guru. I am comfortable with calling Diamond Age hard SF cyberpunk; others are not, but that's certainly the one where he went into some detail, albeit he skipped a lot. Anathem has a lot of interesting quantum stuff and neurology related to the idea of the mind as a quantum device, which he got from science theorists, but a lot of the stuff in the book is just regurgitated basic maths and physics, some of the science backed action stuff logistically is dicey. And in that book, he's obsessed with Plato and religion. I haven't read Reamde -- maybe he does hard stuff there, but it sounds more like VR adventure (which is definitely one of his wheelhouses.)

I think as a tech and science writer outside of SF, Stephenson very much has his finger on the pulse of what's going on out there and often synthesizes it well. He's got a good eye for cultural stuff, how we might react to tech and he's interested in human evolution and neurology. He's good at synthesizing alternate history/history with science concepts in some of his work. He's perfectly capable of writing hard SF, but most of his work just hasn't been in that direction. And a writer like Bruce Sterling has probably got a better track record of predicting and coining terms and concepts than Stephenson off of cyberpunk.

But it would be interesting to see what scores Stephenson's books get through Psikey's vocabulary exercise. Stephenson's work are the kind of books that would be in the main test group of what Psikey is doing. You could compare that against the hard SF control group to see if the scale is working.
 
The five or ten most commonly used words in fiction: and, the, I, we, you, but, or, was, said, a, an. Also me, they, their, your, and my. You are not analyzing what word is used most often in a fiction work; you're trying to determine how often science words are used in the fiction. So again "castle" is not a science word. Quantum, planet, gravity, etc.

You are being absurd. I am talking about the commonly used words that I am searching on. So by saying the "sword" and "castle" are the most used search words would indicate to the potential reader that it is not science fiction. I did not use such obviously ambiguous words as "cell". I am adding "quantum" and "photon".

I said the system might only be of use for readers inclined toward hard SF. If you are not then ignore it.

Yes, but data that you already know isn't particularly useful to collect in a different form. You already know that fantasy novels are not hard SF.

But there are people who say that fantasies like Star Wars are science fiction.

And Margaret Atwood didn't think the science fiction novels she wrote were science fiction novels either, even though they were.

Did Atwood admit any of her novels were SF? Bradbury said that Fahrenheit 451 was his only science fiction novel. It satisfied his definition of SF.

https://www.youtube.com/watch?v=wnN6inSFdwA

psik
 
Last edited:
Kat, you are confusing imagining with predicting. Your argument is spurious because I could use the same argument to say that in the 1600s men imgined travelling in space, so no one could 'predict' satellites or manned spaceflight after that point. Michelangelo imagined men flying, so you can't predict airplanes. I'm talking about laying out the science, using existing technology with only a minor assumption (like small power supplies or stronger materials) to predict the direction technology will go. That is the accepted definition of hard science fiction. And the hard part is being right - 3d printing competed for the last 50 years with a hundred other ways to mass produce custom items, but Stephenson predicted that the technology for 3d printing was the way society would trend. That's amazing because he was right, and he's been right about dozens of things so its not a fluke.

At this point, we're arguing semantics again which I don't find very interesting, but suffice it to say that you will quickly realize that by your narrow definition, there is no such thing as hard scifi, nor predictions, there is only science or fiction.

EDIT:

I'll also point out that you keep saying that you, personally are unaware of Stephenson being recognized for predictions. Here is an article in the NYT that explains Stephenson's legendary status as a major predictive writer. The fact that Stephenson is humble and downplays it does not change the fact that he is legendary for it, as the article states several times.

http://www.nytimes.com/2011/12/06/s...sons-imagination-came-a-new-online-world.html
 
Last edited:
You are being absurd. I am talking about the commonly used words that I am searching on. So by saying the "sword" and "castle" are the most used search words would indicate to the potential reader that it is not science fiction. I did not use such obviously ambiguous words as "cell". I am adding "quantum" and "photon".

I said the system might only be of use for readers inclined toward hard SF. If you are not then ignore it.

There is a world of difference between a story called science fiction, such as a space opera novel, set in space in the far future, and at least making a pretense at a scientific explanation for what it is offering, and a novel that is already labelled a fantasy novel, is set in an imaginary world in pre-industrial times and where magic is real. You don't have to let people who want to read hard SF know that Bujold's fantasy novels are fantasy, because it says they are fantasy right there on the tin. You don't have to test for the words castle and sword in fantasy novels that advertise that they have castles, swords and magic, not science. You clearly already know that these are fantasy novels -- you call them Bujold's fantasy novels -- and nobody is saying they aren't, unlike the debates over Star Wars, or space opera, etc., and therefore, they don't have to be analyzed. Not only is there no chance that they are hard SF, they deliberately let you know they are not. No one looking for hard SF is ever going to crack open Bujold's fantasy novel in the fantasy section of the bookstore to find it. Your test is for science fiction stories. There is no reason to test stories that have never claimed to be any kind of science fiction.

Likewise, it's not the presence of words like castle and sword that say a story is not hard SF. (It doesn't even promise to find a fantasy novel -- historical stories with no fantasy in them may use the words castle and sword frequently; contemporary fantasy novels may not use the words castle or sword at all.) It's the lack of hard science in the story. Castle and sword are not science words. You said that your vocabulary list was a list of science words to test for the presence of science content, not regular words. If you're just going to throw random words in there, then it's not very useful. I could test for the presence of "shoe" or "deer" or "soda." Doesn't tell me how much science is in a story.

If I was trying to use your system to find hard SF and you told me that Bujold's fantasy novel has a very low score for science content, I'd tell you that I already know that because it's a fantasy novel. What I'd want to know is if a science fiction story was hard SF and how much science content it had. Analyzing Bujold's fantasy novels doesn't tell me that.

But there are people who say that fantasies like Star Wars are science fiction.

Star Wars is science fiction, specifically space opera. Some people think it's bad science fiction, which is not the same thing as its tie-ins being fantasy novels. Even that aside, again, Bujold's novels are labelled fantasy -- there is no dispute about those titles being fantasy. No one calls them science fiction. Therefore Star Wars is irrelevant to the question of whether to analyze Bujold's fantasy novels or not.

There are stories like Star Wars that are science fiction -- space opera, military SF, time travel, alternate history, sociological SF, cyberpunk, etc. -- that may or may not have hard SF content to one degree or another. There are stories that are called hard SF -- but how much hard science content do they have? So your system, if it sticks to science words, could attempt to test for that in these different science fiction stories to see which are hard SF and which are not. So again, why are you testing to see if fantasy stories are hard SF if you already know they are not and everybody else does too? If a book is labelled fantasy, it's already out of the pool -- you don't have to test it. It's already said that it is not a hard SF story.

Did Atwood admit any of her novels were SF?

She eventually gave in and agreed with people, yes, she wrote some SF. It wasn't so much an admitting, as that she felt that near future stories based on current science were called speculative fiction and science fiction happened far in the future in space with totally new inventions, etc. Essentially, she just got her sub-categories wrong and the field explained it to her that no, near future SF stories are still called SF. So did Bradbury. A number of the things he put into SF stories like Martian Chronicles were dead on what would happen culturally with technology. But neither Bradbury or Atwood are hard SF writers. So you could run both through your system and they should come out with lowish scores. But they would be actual science fiction stories you're analyzing.

phil geo said:
Your argument is spurious because I could use the same argument to say that in the 1600s men imgined traveling in space, so no one could 'predict' satellites or manned spaceflight after that point.

Well no, that's not what I'm saying at all. The authors in the 1600's predicted space travel, imagining that it would be possible. But that doesn't mean that no one else could predict how various kinds of space travel specifically and scientifically would occur. It just means those later authors can't say "no one else ever thought we would travel to space until I said it!" You can't say that Stephenson "predicted" the existence of the Kindle when there were already e-readers and electronic paper in existence. He would be predicting something that already existed. Likewise, if dozens of scientists are already theorizing how to do something, you can't say that Stephenson then "predicted" it instead of them by imagining what will happen if they turn out to be right. If how he imagines it will be in the society is right on the money, that's a decent prediction, but most of the stuff you were talking about, other SF authors had already imagined accurately and specifically long before Stephenson. Every SF author took a crack at nanotechnology in the 1990's, including predictions about how it would be used in medicine particularly.

I'm not against the idea that Stephenson might have predicted something specific in 1995 or so that then fifteen years later not only developed but as he said that it would culturally and physically (that's what happened with Gibson.) But claiming that he predicted the concept of cellphones, for instance, when versions of cellphones already existed and plans to develop them were already in the works and several decades of SF had already predicted their specific eventual existence is stretching the idea of a prediction way past its definition. If that's the criteria, I can make all sorts of predictions and be considered a genius. With all due respect to the New York Times, again, multiplayer online games did exist before he wrote the novel. CompuServe was not an imaginary creature. So crediting him with the idea of multi-player online games while ignoring what game designers were doing with new networks and tech in the 1980's and the fact that the Web was already functioning in the early 1990's -- I'm not really surprised Stephenson is backing away from that mantle. There were lots of cyberpunk novels well before Snow Crash predicting VR communities and online networks and games. There were a lot of SF stories about 3D replication and how it would develop in society, some fanciful like the one where we could essentially copy versions of ourselves in a copy machine like a data file archive and others that hit nearer the mark. If the claim is that he predicted the precise behavior of people on such online networks, I'm not so sure I'd buy that. Maybe I just read too much of that fiction.

And again, as we already discussed, even those who get spot on predictions aren't necessarily writing hard SF, and hard SF writers aren't necessarily making tech predictions. Bradbury, Huxley, Gibson -- they and others nailed stuff that eventually happened, especially culturally, but their stories weren't hard SF. Dick for instance had a pretty good record -- and probably is the guy who really nailed VR and what could be done with it. But he wasn't a hard SF writer. Sociological SF may be "soft" but it quite often hits the nail on the head. Hard SF is a type of science fiction story that is dealing with a central issue that is physics, biology, chemistry. That's why Diamond Age is Stephenson's hardest novel with its semi-focus on biological developments, while Snow Crash is mainly noir suspense, like most cyberpunk. Stephenson is more interested in culture, philosophy, the impact of history and politics.

But again, you are setting up the problem that Psikey is trying to solve -- for work that may be in dispute about its degree of "hardness", would testing a list of science words for an overall score tell you anything? I'm not so sure it would, but testing Stephenson's works on that idea at least makes more sense than testing Bujold's fantasy novels. Maybe Stephenson would have a high score. :)
 
Last edited:
Star Wars is science fiction, specifically space opera. Some people think it's bad science fiction, which is not the same thing as its tie-ins being fantasy novels. Even that aside, again, Bujold's novels are labelled fantasy -- there is no dispute about those titles being fantasy. No one calls them science fiction. Therefore Star Wars is irrelevant to the question of whether to analyze Bujold's fantasy novels or not.

There are stories like Star Wars that are science fiction -- space opera, military SF, time travel, alternate history, sociological SF, cyberpunk, etc. -- that may or may not have hard SF content to one degree or another. There are stories that are called hard SF -- but how much hard science content do they have? So your system, if it sticks to science words, could attempt to test for that in these different science fiction stories to see which are hard SF and which are not. So again, why are you testing to see if fantasy stories are hard SF if you already know they are not and everybody else does too? If a book is labelled fantasy, it's already out of the pool -- you don't have to test it. It's already said that it is not a hard SF story.

Psikey is using a good scientific method by testing his theory on things that should not produce results - hence the use of fantasy. At least he is trying and testing an idea out, rather than saying this doesn't work or that doesn't work.

Science fiction is notoriously difficult to define at the best of times. And what exactly is 'hard science'? Science is not difficult nor solid.

It would not surprise me if you can define something as belonging to hard science fiction by the number of scientific words in the story, but you cannot say that the low number of science words indicates it is not hard science fiction.
 
Psikey is using a good scientific method by testing his theory on things that should not produce results - hence the use of fantasy. At least he is trying and testing an idea out, rather than saying this doesn't work or that doesn't work.

Thank you Rosie.

It seems so peculiar that anti-scientific attitudes seem to be so prevalent among science fiction fans.

I am considering having the next version of the program use two different word lists, one for SF and one for fantasy. So fantasy books should get a higher score on fantasy density and SF works a higher score on SF density.

Science without experimentation and analysing data? How can these people analyse science fiction if they don't even respect scientific methods?

psik
 
Thank you Rosie.

It seems so peculiar that anti-scientific attitudes seem to be so prevalent among science fiction fans.

I am considering having the next version of the program use two different word lists, one for SF and one for fantasy. So fantasy books should get a higher score on fantasy density and SF works a higher score on SF density.

Science without experimentation and analysing data? How can these people analyse science fiction if they don't even respect scientific methods?

psik

The anti-science attitudes seem to have extended into the publishing industry (to be fair they are only reacting to what they perceive the fans want).

It seems bizarre to me that science magazines (e.g. Nature) are coming to the rescue of science fiction heavily based on science by publishing short stories i.e. the scientists are coming to the rescue of the 'sciencey' science fiction.

The reasons? The science community are looking for inspiration for new ideas and want to see how people might react to up and coming science so the marketeers know where to put their investment. (Which of course means that the science fiction publishing industry is not catering for a particular sector of the potential market.)

Good luck with your study - I hope you get an article published from the work you are doing.
 
Okay, I'm not anti-science. I was not telling Psikey not to use his system, although I am not sure the results will yield data useful to what he wants. What I was doing was questioning his methodology and scope of his data collection re his aims. And as a scientist, he should welcome this as it is part of the scientific method. We do not simply accept a scientist's methodology as gospel if it seems to us to be faulty in execution. That way the study can be improved.

Yes, you can use fantasy novels as a control group. That's why I called them as a control group. But you could also use science fiction novels that you know will have low scores, such as Dune and Anne McCaffrey's Pern books, as a better control group because they are in science fiction and the amount of science in them is therefore an issue. The amount of science in fantasy novels is not an issue and doesn't need to be measured. If you're going to measure fantasy novels for science content, you might as well measure non-speculative mystery novels, romances and westerns for their science content. By testing fantasy novels as well as science fiction, Psikey isn't establishing a good control group. He's including oranges with his apples and creating too many variables. He's making the fantasy novels the main group he's studying to predictable results instead of a control group. Low-tech science fiction and space opera -- science fiction -- would make a better control group.

Likewise, to effectively measure the science content of the stories, you need science words for the list, not words like castle and sword, which are not science terms. They have no relevance to what he is actually measuring and therefore the methodology including them wasn't very scientific.

"Hard" sciences is a common term in academics and elsewhere to refer to physical sciences such as physics, biology and chemistry. Social sciences refers to other scientific disciplines such as political science, sociology, and psychiatry. These sciences are sometimes referred to as the "soft" sciences in reference to the "hard sciences." The field of science fiction took up these terms to refer to different kinds of science fiction stories -- hard science fiction to refer to stories about physics, chemistry, biology and sometimes engineering issues, and sociological SF or "soft" SF to refer to SF stories that dealt with political, cultural and psychological issues in reference to the future and science developments. (For instance, Ursula LeGuin.) These are long standing terms in SF, and the debate about how "hard" the science is in a particular story -- how much it is about physics, chemistry or biology -- is a central one for many fans and critics. Some fans feel that only science fiction that is heavy with the hard sciences is real SF.

Psikey is attempting to measure how much hard science content there is in SF stories -- the presence of physics, chemistry, engineering and biology material, through searching for scientific terms. Therefore the issue of how "hard" a SF story is -- how much of those things it has -- is central to what he is trying to measure. Psikey is not interested in measuring for social science content in the stories, such as sociology.

Psikey is now proposing that he remove the non-science words from the list for measuring hard science in SF stories and create a new list for measuring fantasy novels for their "fantasy density." The amount of fantasy in fantasy novels is seldom much of an issue for fantasy fans. They do not have this central issue like SF fans do. (They argue over other issues.) Words like castle and sword can't measure the fantasy density of a fantasy story as they are not fantasy words, but regular thing words. A large percentage of fantasy novels have no castles or swords. You would need words like magic, wizard, dragon, etc. that referred strictly to fantasy elements, and it would have to be a very long list. And it would be very hard to actually measure the fantasy content that way.

But there is no reason for Psikey to measure this as it is not of interest to fantasy readers or his original goal. By trying to measure both fantasy content in fantasy novels and SF novels that have no fantasy, and science content in both SF novels and in fantasy novels that have no or little science content, Psikey is again throwing in too many variables and too many search words. The methodology is weak. If he limits his word search to science terms, and if he limits his data set to SF stories, dumping the fantasy ones, he should have, theoretically, a much leaner, more efficient measuring stick for hard science content in SF stories. My suggestion has never been that he abandon his study or not analyze SF data, but that he refine his parameters to exclude fantasy novels as not relevant to his data set, hypothesis and stated aims, concentrating only on SF stories.

I was in fact being super sciency. :)
 
"Hard" sciences is a common term in academics and elsewhere to refer to physical sciences such as physics, biology and chemistry. Social sciences refers to other scientific disciplines such as political science, sociology, and psychiatry. These sciences are sometimes referred to as the "soft" sciences in reference to the "hard sciences." The field of science fiction took up these terms to refer to different kinds of science fiction stories -- hard science fiction to refer to stories about physics, chemistry, biology and sometimes engineering issues, and sociological SF or "soft" SF to refer to SF stories that dealt with political, cultural and psychological issues in reference to the future and science developments. (For instance, Ursula LeGuin.) These are long standing terms in SF, and the debate about how "hard" the science is in a particular story -- how much it is about physics, chemistry or biology -- is a central one for many fans and critics. Some fans feel that only science fiction that is heavy with the hard sciences is real SF.

Psikey is now proposing that he remove the non-science words from the list for measuring hard science in SF stories and create a new list for measuring fantasy novels for their "fantasy density." The amount of fantasy in fantasy novels is seldom much of an issue for fantasy fans. They do not have this central issue like SF fans do. (They argue over other issues.) Words like castle and sword can't measure the fantasy density of a fantasy story as they are not fantasy words, but regular thing words. A large percentage of fantasy novels have no castles or swords. You would need words like magic, wizard, dragon, etc. that referred strictly to fantasy elements, and it would have to be a very long list. And it would be very hard to actually measure the fantasy content that way.

By trying to measure both fantasy content in fantasy novels and SF novels that have no fantasy, and science content in both SF novels and in fantasy novels that have no or little science content, Psikey is again throwing in too many variables and too many search words.

I was in fact being super sciency. :)

I already said:
I am considering having the next version of the program use two different word lists, one for SF and one for fantasy.

Curiously I decided to test some of Le Guin's works before I read that post. Since she had done science fiction and fantasy with her Wizard of Earthsea trilogy. I read that long ago but admittedly have not read a large amount of fantasy. But the program is the experiment since I DO NOT KNOW what results it will turn up.

Here are the results for Lathe of Heaven from two different versions of the program. The second version gives a Fantasy density in addition to an SF density.

radiation 3
pressure 5
radio 5
science 5
experiment 6
nuclear 6
scientist 6
scientific 7
research 18
brain 27

Totalled file: UKL.LathoHeavn.txt for SF word test.
total number of words was: 44 used 144 times
total document length: 345K SF word density 0.418
total number of words was: 44 used 144 times
document length: 345K SF word density 0.415 Fant word density 0.003

===============================================

This is for Wizard of Earthsea:

Scanned file: UKL.WizardoEarthsea.txt for SF word test.

alien 2
castle 3
sword 4
language 9
magic 22
dragon 31

Totalled file: UKL.WizardoEarthsea.txt for SF word test.
total number of words was: 8 used 71 times
total document length: 327K SF word density 0.217
total number of words was: 8 used 71 times
document length: 327K SF word density 0.034 Fant word density 0.184

==================================================

Neuromancer with only the new version:

nerve 6
DNA 7
magnetic 7
laser 8
vacuum 9
brain 10
computer 15
gravity 21
virus 21
program 34

Scanned file: WG.Neuromancer.txt for SF word test.
total number of words was: 68 used 260 times
total document length: 461K SF word density 0.540 Fant word density 0.024

==================================================

The Sorcerer's Stone:

brain 2
logic 2
magical 3
computer 5
nerve 5
knight 6
dragon 18
castle 22
magic 34
wand 51

Scanned file: JKR-HP1-SorcerSton.txt for SF word test.
total number of words was: 26 used 160 times
total document length: 442K SF word density 0.362
total number of words was: 26 used 160 times
total document length: 442K SF word density 0.052 Fant word density 0.310

Notice how the old SF word density is the sum of the new Sf and Fantasy word densities. But this fantasy novel gives a much higher fantasy density which seems to be the pattern.

====================================================

and Order of the Phoenix:

telescope 7
theory 7
thrust 11
dragon 21
elf 30
magical 35
brain 36
castle 44
magic 48
wand 266

Scanned file: JKR-HP5-OrderothPhenx.txt for SF word test.
total number of words was: 45 used 528 times
total document length: 1496K SF word density 0.353
total number of words was: 49 used 562 times
total document length: 1496K SF word density 0.070 Fant word density 0.306

Adding elf and elves to the fantasy word list had a significant effect on the results.

==================================================

So far the Fantasy density is considerably higher than SF density for fantasy works even though the total number of different words is much greater for the SF word list. Rowling only uses the word "wand" less than 100 times in the very first book. It occurs over 400 in Deathly Hallows. It seems to get used more and more through the series at a faster rate than the length of the books increases.

I can only test fantasy books I know about and can get the text for. So if you are thinking about any fantasy books I have never heard of then my perspective would have to be skewed relative to yours. You are free to suggest any of course.

psik
 
Last edited:
But this fantasy novel gives a much higher fantasy density which seems to be the pattern.

I am shocked, shocked to hear that a fantasy novel has a high fantasy density.

Adding elf and elves to the fantasy word list had a significant effect on the results.

ROTFLOL!

So far the Fantasy density is considerably higher than SF density for fantasy works even though the total number of different words is much greater for the SF word list. Rowling only uses the word "wand" less than 100 times in the very first book. It occurs over 400 in Deathly Hallows. It seems to get used more and more through the series at a faster rate than the length of the books increases.

You are killing me here. Do some more! Do some more!

See if you can find an adult contemporary fantasy novel (that's a fantasy novel set on modern Earth or a version thereof, often involving a detective-like character) that has been published within the last ten years. It would be interesting for you to see if a contemporary fantasy for the adult market got you a different result than Bujold and LeGuin, who wrote secondary world pre-industrial fantasy novels, and Harry Potter, which is YA.* You might also want to test a horror fantasy novel, say a Stephen King title, and maybe a horror SF novel, such as zombie stories. You might want to also test World War Z, because it's a SF novel done as an oral history (social science) including covering some science/biological areas, so it would be interesting to see what score it gets. It should be low, as a sociological SF novel about a zombie outbreak, but it might skew higher than expected because of how Brooks tackles the subject matter.

On SF, you would need to test titles from the different sub-categories. Lathe of Heaven is sociological SF, so you would expect the novel to have a score of hard SF "density" in mid to low range. You might get a higher score on a military SF novel, even though it doesn't have a lot of hard science material, because it might have a lot of tech. So you could test that. And then a space opera, a hard science fiction title such as works by Greg Egan, Greg Bear, Peter Watts, etc., an alternate history SF novel and a time travel novel (say Connie Willis' work.) That would give you a wider range on the SF. And for Geo's sake, you should probably try a Neal Stephenson novel.

(*The reason the mentions of the word "wand" go up in the Harry Potter series is because Rowling has her wizards do nearly all their magic with/through wands, including battle scenes where the wand is the weapon. As the series progresses, the number of battle action scenes increases, especially in the last book of the series. Therefore the mentions of the word "wand" increase just as if it were the word "gun" in crime thrillers or a war story. This indicates the potential problems with your data gathering -- is Harry Potter really more full of fantasy elements than other fantasy books, or did it just get a boost by using one particular word from the list a lot.)

The fantasy list of words again includes words that are not fantasy words, nor are they science words. Therefore, they aren't really measuring either density. I would again suggest that you remove such words, such as castle and sword, from the list of words, and replace them with words that are clearly fantasy only, such as wizards, magic, wand, dragon, elf, dwarf, ghost, demon, angel, etc.
 
Last edited:
I already said:

So far the Fantasy density is considerably higher than SF density for fantasy works even though the total number of different words is much greater for the SF word list. Rowling only uses the word "wand" less than 100 times in the very first book. It occurs over 400 in Deathly Hallows. It seems to get used more and more through the series at a faster rate than the length of the books increases.

I can only test fantasy books I know about and can get the text for. So if you are thinking about any fantasy books I have never heard of then my perspective would have to be skewed relative to yours. You are free to suggest any of course.

psik

Interesting this is... thank you for the insight from your studies.

If you continue with your studies, I expect you might (I emphasise MIGHT) find something else interesting. I don't want to in any way influence your work, so won't say. But I will say I have a short science fiction story doing the rounds at the moment on the very topic. I'll let you know if it ever gets published, which considering how oddball it is will be a miracle.

I look forward to the next instalment of your results - and seriously, consider trying to get an article published on this.
 
"Hard" sciences is a common term in academics and elsewhere to refer to physical sciences such as physics, biology and chemistry. Social sciences refers to other scientific disciplines such as political science, sociology, and psychiatry. These sciences are sometimes referred to as the "soft" sciences in reference to the "hard sciences." The field of science fiction took up these terms to refer to different kinds of science fiction stories -- hard science fiction to refer to stories about physics, chemistry, biology and sometimes engineering issues, and sociological SF or "soft" SF to refer to SF stories that dealt with political, cultural and psychological issues in reference to the future and science developments. (For instance, Ursula LeGuin.) These are long standing terms in SF, and the debate about how "hard" the science is in a particular story -- how much it is about physics, chemistry or biology -- is a central one for many fans and critics. Some fans feel that only science fiction that is heavy with the hard sciences is real SF.

Hard sciences used to be a fairly well understood term. However, over time it is moving on. The reason is those (ex-)softer sciences are becoming more rigorous in the understanding of their subject matter. If I may borrow from famous peoples' sayings:

All Biology is Chemistry
All Chemistry is Physics
All Physics is Maths

I believe there is also one with All (one of the 'soft' sciences) is Biology - not sure which 'soft' science.

Whether the acknowledged divide in science fiction ever catches up with science is another matter.
 
I find Psik's experiment interesting. Several things bother me with it though, in a sciency way. :rolleyes:

Using the word count of a book to create the final number seems like it could lead to skewed results. Old works were shorter, new works could be high in science content but with bloated books running 600 pages the density could be understated. Perhaps taking the total number of science words along with the density would normalize for this situation.

There is also the number of false positives. "The gravity of the situation threatened to overwelm her." "His spirit soared like a saturn V rocket launching into space". I'm not sure there is any possible way to account for this kind of thing and you just have to accept that there is some deviation in the results, though it would always have to be a round down situation. At least there are only some of these kinds of words and nobody uses 'Nucleotide' in a common sentence.

No matter, Psik's project seems like a first generation experiment and with further work, tests and refinements can perhaps lead to something that could seriously be used for the express purpose of identifying hard science. Still funny that "hard SF" seems to be about the only subgenra in search of its soul if you will.
 
I find Psik's experiment interesting. Several things bother me with it though, in a sciency way. :rolleyes:

There is also the number of false positives. "The gravity of the situation threatened to overwelm her." "His spirit soared like a saturn V rocket launching into space".

That is why I made a version of the program to ignore a word if it is only used once.

psik
 
Last edited:
Hard sciences used to be a fairly well understood term. However, over time it is moving on. The reason is those (ex-)softer sciences are becoming more rigorous in the understanding of their subject matter.

I'm not particularly sure that the social sciences used to be less rigorous in their studies than they are currently, except perhaps for archeology very early on. Tools and technology have developed for the social sciences as well as for the natural sciences. Certainly some of the social sciences are directly related to biology -- sociology and psychology. However academia tends to separate them somewhat as two different but related disciplines, both of them science.

Sociological SF as the term developed, however, is not directly correlated to the social sciences. It's a broader category. Put simply, it consists of science fiction stories that concentrate not necessarily on problems for social scientists, but instead on cultural, political and individual psychology issues. These issues are often related to and directly created by or effected by technology and natural science factors, which may have their role in the story to one degree or another, but the main focus of the story is the societal issues, not studying scientific phenomena and solving problems of it thereby. Hard SF stories may also look at cultural and political issues, but they are not the main focus of the story.

So for example Flowers for Algernon is about a biological/psychological experiment in raising the I.Q. of a mentally disabled young man. A hard SF story would focus on the drug they used to increase the intelligence of the man and on them dealing with the problems that develop with the experiment. Instead, the story focuses on the man's experience, the individual psychology of his intelligence and world view changing, with commentary on our society. It's a sociological SF story. Ursula LeGuin's The Left Hand of Darkness is about a planet which only has the season of winter and its inhabitants have no gender. A hard SF novel would be focused on these issues and dealing with problems or discoveries in the planet's biology and climate/physics. Instead, the novel is sociological SF -- it is concerned with a diplomat who has to deal with the politics and culture of this society without dualities in order to solve a political problem/goal. It's all related, but the focus is different -- one is the data of the natural sciences and the other is cultural data and individual psychological experience. A hard SF novel may have a lot about the psychology of its characters and some political and social issues, but the main focus is on dealing with an exploding sun or fixing a rocket ship or unraveling the riddle of alien biology.

So the terms simply developed to be shortcuts communicating what the general focus of the story was, what kind of science would have dominance. But while hard SF stories concentrate on natural science issues, the amount of "tech" language they use about it -- how the science is presented -- varies. I'm not sure that Psikey's system can actually measure how "hard" a hard SF story may be. It might be able to measure how much natural science terminology there is in a sociological SF story compared to another sociological SF story. And I find the methodology completely problematic for fantasy novels re random fantasy words and continue to suggest that all the fantasy material be jettisoned from the data set in favor of a narrower set of science fiction with fewer variables, measuring one main factor, that would get more accurate results.

Mylinar said:
Still funny that "hard SF" seems to be about the only subgenra in search of its soul if you will.

It's not really in search of its soul because it doesn't have one. It's just a general label. But there is a contingent of fans who want anything not hard SF jettisoned from being considered part of the SF field. Therefore, for that goal, the issue of who gets to stay in the hard SF pool and who does not, what gets to be called science fiction or not, is very important and leads to arguments with other fans. Ultimately, the term will be developed or discarded to the degree that it is considered useful by enough people. There continues to be a deep emotional belief that hard SF is constantly under threat by other types of science fiction stories, and this is why hard SF always becomes the issue of SF sub-genre discussions.

Psikey, however, is trying to measure the presence of natural sciences as a reading guide factor. You might read a hard SF story or a sociological SF one, but you'll know how much natural science is in it before going in, theoretically.
 
Ursula LeGuin's The Left Hand of Darkness is about a planet which only has the season of winter and its inhabitants have no gender. A hard SF novel would be focused on these issues and dealing with problems or discoveries in the planet's biology and climate/physics. Instead, the novel is sociological SF -- it is concerned with a diplomat who has to deal with the politics and culture of this society without dualities in order to solve a political problem/goal.

Here is the data on Left Hand of Darkness:


Scanned file: UKL.LeftHandoDarkness.txt for SF word test.

Code:
 aluminum 1			atmosphere 1
 biological 1			chemical 1
 circuit 1			computer 1
 computers 1			dimension 1
 ecliptic 1			ecology 1
 engineer 1			experimental 1
 fusion 1			genetic 1
 hydrogen 1		        hypothesis 1
 logic 1			wavelength 1
 magnetic 1			mammal 1
 metabolic 1			meteor 1
 mutation 1			quantum 1
 symptoms 1			technology 1
 vacuum 1			sword 1
 engineers 2			psychology 2
 radiation 2			satellite 2
 singularity 2			technological 2
 theory 2			thrust 2
 castle 2			brain 3
 physiology 3			psychological 3
 relay 3			scientist 3
 astronomy 4			scientific 4

experiment 5
physiological 5
pressure 5
orbit 6
science 8
solar 8
electric 12
language 13
planet 16
alien 17
radio 33

Totalled file: UKL.LeftHandoDarkness.txt for SF word test.
total number of words was: 57 used 197 times
total document length: 486K SF word density 0.405
total number of words was: 57 used 197 times
total document length: 486K SF word density 0.399 Fant word density 0.006

I seem to be accused of being an advocate of hard science fiction. Psychology is just as much a science as physics. Check out Hell's Pavement by Damon Knight.

psik
 
Last edited:
Psikey -- You think you're being accused of that because you are not reading closely enough. I was explaining to Mylinar why there seems to be a battle over what is hard SF in the SF field in general. I was talking about the history of these debates in the field regarding the term hard SF. I then said that you, however, are doing something different than that debate and simply attempting to measure science content. So I was saying that you should not be accused of that, actually.

However, while you may be doing some psychology words in reference to biology, I'm not sure you're trying to measure for psychological material and certainly not for political science material (social sciences.) There are words on the list you showed for Lathe of Heaven that would seem likely to skew your results for scientific material.

Aluminum -- an element, yes, but also a commonly used metal. So it could very well be used more than once in a contemporary fantasy story.

Electric, radio -- common words in our current society. So again, a contemporary fantasy novel would skew higher but you'd only be measuring what is perfectly clear already -- that it's a post-industrial setting.

Logic, theory -- words used routinely for non-scientific purposes, so again would skew results.

Castle, sword, language -- again, words that are neither science words nor fantasy words.

So there is a question of what your criteria is for selecting the vocabulary words. As you saw, adding one or two words significantly changed results. Are you really measuring science content? That's the big question of the system. It's a more complicated thing to measure than syllables, pronunciation and word length regarding reading levels.
 
Aluminum -- an element, yes, but also a commonly used metal. So it could very well be used more than once in a contemporary fantasy story.

It is so common it is not mentioned at all unless it is important. Most works that do mention it only have it once or twice.

But Jules Verne's From the Earth to the Moon has it 9 times.

14 time in Tom Godwin's Space Prison.

That does not count any uses where it might start with a capital A. There is a flaw in the program I have not tried to fix yet which must cause undercounts. If the word is not all lower case it is not found except for planets like Mars and Venus. So I am undercounting a bit. But since all works will be undercounted I doubt that it affects relative comparisons much.

I should add "werewolf" to the list. Redesign the program also so I don't have to recompile when I change the list. Read the list from a file.

psik
 
Last edited:
I was wondering if your analysis program could be expanded, or a different version created, that would use phrases as opposed to single words. I was thinking that using this on the same work might provide some additional insight to the hard science content.

For example, from a previous post of mine, "Gravity" and "Gravity well". The latter almost certainly would reflect hard science content.

However, creating these phrases would seem to be a non-trivial task because there could be many interesting combinations of phrases with just Gravity alone.

Even if nothing every truly comes of this program I find it interesting nonetheless. I am a Scientist wanabe.
I did get a Computer Science degree, so technically I did achive my childhood dream, but not in the way I forsaw.
 

Sponsors


We try to keep the forum as free of ads as possible, please consider supporting SFFWorld on Patreon


Your ad here.
Back
Top