What degree of accuracy and certainty do you think I am trying to get?
I haven't a clue.
This is about probability.
Probability of what?
But I have already found that the words I search on significantly affects the results.
Well yes, but do the results really show hard SF in the story, or just sciency words that could be used any which way? And then there are the tricky ones, like cell, that mean more than one thing and not all of them about science. I'm just not certain how accurate the scores you are generating are about actual hard science content in SF.
then added the word "castle" to the list. That word is totally irrelevant to most science fiction.
So why would you add it to the list of science words you're testing for, which are supposed to be the relevant words of science fiction? The idea is to measure the amount of hard science content, no? The presence of the word castle in a story is indeed irrelevant.
But it turns up more than 50 times in Bujold's fantasy works, Curse of Chalion and Paladin of Souls.
Yeesss, because she's writing about
knights in her fantasy fiction. Knights, castles, kind of go together. Paladin is kind of a clue there. The question remains why are you analyzing Bujold's fantasy novels set in a medieval secondary world when you already know they aren't science fiction novels and therefore should not be in your data set at all? It's like saying, "I'm going to measure the hair length of dogs. And now I will put cats in my data set and lo, my analysis is that these things are indeed cats and not dogs and so their hair length is irrelevant to my study."
So I figure specifying a number is not enough. The prospective reader should be told the 5 or 10 most commonly used words.
The five or ten most commonly used words in fiction: and, the, I, we, you, but, or, was, said, a, an. Also me, they, their, your, and my. You are not analyzing what word is used most often in a fiction work; you're trying to determine how often science words are used in the fiction. So again "castle" is not a science word. Quantum, planet, gravity, etc.
Since Bradbury said his Martian Chronicles is not science fiction
And Margaret Atwood didn't think the science fiction novels she wrote were science fiction novels either, even though they were. She wanted to call them speculative novels until enough people gave her grief that she threw up her hands. Martian Chronicles are science fiction stories, but they aren't hard SF, which is what most of Bradbury's buddies were doing at the time. Martian Chronicles would get a low score for sciency words, which would tell prospective readers that it is not hard SF (although really I think anyone could figure that out from the brief description of the collection.) There is really only a small percentage of stories that you would need some sort of number score to determine that it was or was not hard SF -- those books that are close to hard SF, but not so much with the science.
I am going to compare it to the Mars Trilogy by Robinson which everyone agrees is hard sci-fi to the point that many accuse it of being boring.
Why? Everyone knows that Martian Chronicles is not hard SF and everyone knows that Robinson's terraforming stories qualify as hard SF (although just barely since some of his science as we've discussed in the past is not that great, and he gets obsessed with political and sociological philosophy of Mars as emerging nation more than the hard science terraforming stuff.) So comparing them on that front seems rather fruitless. Robinson's books should have a fairly high score on sciency words and Bradbury's stories collectively a low one. They are just points in a spectrum, not related.
Data does not have to be precise and certain to be useful.
Yes, but data that you already know isn't particularly useful to collect in a different form. You already know that fantasy novels are not hard SF. You already know that Martian Chronicles are not hard SF. You know that Bujold's Miles novels are not hard SF. You know that William Gibson doesn't write hard SF. The ones that are science fiction, though not hard SF, you can certainly test to see if the scale works. But what would be potentially more useful is to run through stories for analysis of sciency words (and not fantasy words,) that you are
not certain about whether they are hard SF or not.
Basically, you're using your control groups (works that are definitely not hard SF and that are definitely hard SF,) as your main test group instead of as your control groups.
phil geo said:
In Snow Crash he predicted Massively Multiplayer Online Games, including technical problems like object collision for a branch of software that hadn't even been coded yet. He also predicted online real estate, and about 40 other things related to online gaming. Snow Crash also predicts DRM, modern cell phones, and Google Glass. Matter compilers in The Diamond Age are advanced 3d printers. Mediatronic paper is an advanced Kindle. Neither of those devices existed in 1995. The science for all the nanodevices in Diamond Age was there in 1995 (I actually worked on a few), the only issue is they still don't have a power supply that small and powerful. The list goes on.
Ah, no, he didn't "predict" them. Multiplayer online games began in the seventies running on limited networks. The first commercial massive multiplayer online game (although "massive" then was much smaller groups obviously,) was in the mid-1980's on CompuServe. It was widely expected as computer and video games developed back then and as the Web became a viable widespread network in the late eighties and early nineties, that there would be further multi-player online games that would be larger. It was not a new concept. Likewise virtual reality and simulated life concepts were long established ideas in SF well before anything close to VR got invented, much less the online gaming industry. (See the movie The Lawnmower Man, adapted from Stephen King's work, among many others.) Encryption DRM was a significant issue once computers could transmit data files commercially. An early version of DRM was the Software Service System in the 1980's. And cellphones is just silly. SF writers have been doing versions of cellphones, smart phones and tablets since the 1950's at least. Same with Google Glass. The appeal of Google Glass is in fact that it looks like something that used to be envisioned by early SF writers who had all sorts of tech spectacle inventions.
Electronic paper of various kinds were invented in the 1970's-1990's and had been a long time goal for development in the science/tech field and often used in science fiction. E-reader devices using CD-ROM's were commercially developed in the 1980's. They did not succeed as a market because the tech companies were largely uninterested in developing it. The development of the Web led to the movement from CD-ROM's to smaller drives with data files and online transmission. (Large publishers had all developed CD-ROM publishing divisions in the late 1980's and had to sell them off/disband them by the early 1990's.) The techniques for better electronic paper/ink were expected to develop (the world was supposed to go paperless in the 1980's,) and did in the late 1990's leading eventually to Sony putting out their e-reader as you know in the early oughts for general commercial use.
Replication printing was a long time goal of science fiction. (See Star Trek's replicator.) Early 3D printing efforts were made in the late 1970's and the 1980's. What's happening now is that the 3D printing devices are getting smaller and more adept at more creations and is poised, possibly, on a wider commercial use, which has been wished for quite awhile in the tech industry. And nanotechnology was the favorite toy of science fiction writers in the 1990's, not necessarily always for hard SF stories.
So no, he didn't "predict" these things. You cannot say that a SF author predicted stuff would happen that many other SF writers and scientists and engineers had already been predicting and in the latter case developing for years, including before he wrote the novel. Some of the best predictors were SF writers who were respected at the time (1930's-1970;s) but have not been remembered as well as the bigger stars in the field. (Like some of the ones Psikey spits out links to.) Some of the really good predictors weren't hard SF writers, like Bradbury, who is credited with the concept of the robotic house and wall televisions, and Aldous Huxley in Brave New World. Jules Verne has a frightening record from two centuries ago. Clarke, though, had a lot of advantages because of what he worked on in science. He is considered to have predicted orbital satellite communications, space elevators, etc.
Now is there some particular detail in Snow Crash and Diamond Age that did develop in gaming say? Possibly. But I'm not aware of him being credited with anything major. Has he inspired designers in tech? Very possibly; that's not uncommon with SF writers; as we know Gibson did quite a lot for instance. But he simply isn't regarded in the field as a hard SF guru. I am comfortable with calling Diamond Age hard SF cyberpunk; others are not, but that's certainly the one where he went into some detail, albeit he skipped a lot. Anathem has a lot of interesting quantum stuff and neurology related to the idea of the mind as a quantum device, which he got from science theorists, but a lot of the stuff in the book is just regurgitated basic maths and physics, some of the science backed action stuff logistically is dicey. And in that book, he's obsessed with Plato and religion. I haven't read Reamde -- maybe he does hard stuff there, but it sounds more like VR adventure (which is definitely one of his wheelhouses.)
I think as a tech and science writer outside of SF, Stephenson very much has his finger on the pulse of what's going on out there and often synthesizes it well. He's got a good eye for cultural stuff, how we might react to tech and he's interested in human evolution and neurology. He's good at synthesizing alternate history/history with science concepts in some of his work. He's perfectly capable of writing hard SF, but most of his work just hasn't been in that direction. And a writer like Bruce Sterling has probably got a better track record of predicting and coining terms and concepts than Stephenson off of cyberpunk.
But it would be interesting to see what scores Stephenson's books get through Psikey's vocabulary exercise. Stephenson's work are the kind of books that would be in the main test group of what Psikey is doing. You could compare that against the hard SF control group to see if the scale is working.