From Dear Author:
When readers talk to other readers about books, they speak in a language of tropes, character types, and hot button issues. They ask for spoilers and in depth details. They want to know if a character in book B is like the character in book A that they love.
A quick run down of forum topics at a popular romance gathering includes:
- Fight club
- Shy awkward / alpha
- Books where heroine almost dies
When I gather books for the DA New Releases site, Calibre pulls down the metadata for a book including the “category” a book is placed in. Categories, generally speaking, are the BISAC codes assigned by the publisher (whether it be a publishing house or author). A BISAC is a classification system that “BISG develops and maintains a number of classification systems for both physical and digital products.
. . . .
Online discovery isn’t happening at the retailers, like it happens in the bookstore, because retailer sites aren’t set up for discovery beyond the front page. BN’s romance page has featured 50 Shades above the scroll for months. Scrolling down, you get horizontal scrollbars for things like “Coming Soon”, “Bestsellers”, “New releases” but a quick scroll through both and you begin to see repetitive titles. This is not hand curated like a table in the retail store (or if it is, the curation is poor because the constant repetitive nature of the titles reduce visibility of other titles). Further, little information is imparted about the book unless the reader can guess from the title and the cover exactly what the genre or subgenre is on it. No wonder romance cover artists rely so heavily on the naked chest. BN offers no advanced search function.
Amazon hews closely to the BISAC codes, as does BN. Amazon does offer an advanced search wherein you can filter by subject matter and keyword as well as publisher and date published. However, how many users realize that a) it is available and b) can figure out how to use it.
. . . .
None of the three major retailers allow you to exclude titles. For instance, maybe I want to see all contemporary romances but none with a title of billionaire. But these complaints address just the existing flawed search functions.
Beyond how rudimentary and unhelpful the search features are at these retailers is the fact that the search terms are designed to speak to readers. Amazon has tried to address this by allowing readers to add “tags” to books but the tag feature has been sorely abused. Many of the books at AllRomance have no tags either.
Link to the rest at Dear Author and thanks to Matthew for the tip.
Passive Guy comes to computer-based searching with past work experience at LexisNexis, the company which pioneered computerized legal research and, later, computerized business and political information research.
For legal research, you can’t miss anything. Even a single court opinion can blow up your case or, if used well, blow up the opposing party’s case. On the other hand, a search that generates a list of 200 case opinions averaging 15 pages each doesn’t help because you’re swamped with case opinions that are not really relevant to the legal issue you’re looking at.
If you know what you’re doing, you can use the Lexis search engine with great precision to pull needles from haystacks comprised of collections of cases and statutes or millions of academic papers. It’s miles more powerful than Google. Unfortunately, it’s also very expensive to use and PG was sad when his Lexis password eventually expired several years after he left the company.
PG thinks that one of Amazon’s big advantages over other etailers is more sophisticated methods of discovery. That said, he agrees with Dear Author that, given the range of possibilities for computer-based searching, Amazon is far from ideal for discovery. He also thinks that BISAC codes are primitive holdovers from a mainframe world that work much better for bookstores than they do for readers.
Given the rapid proliferation of new genres and sub-genres, rigid categories such as BISAC are always going to be outdated and of marginal value. A category comprised of 200,000 books is essentially useless for a reader. She usually wants a category comprised of something like 25 books she hasn’t already read. And the 25 best-selling romance books is not the kind of subjective category that is useful for most romance readers. Romance readers recognize many more categories of books than BISAC or Amazon do.
If anybody at Amazon asked PG for advice (they haven’t and he doesn’t expect them to do so), he would tell them to develop their search function to be much more dynamic, flexible and accessible to customers.
Amazon has the data to provide Frequently Bought Together and Customers Who Bought This Item Also Bought information for a particular book and should expose more levers to allow customers to use these tools. If a customer is so inclined, allow him/her to perfect a search or multiple searches that bring up the kind of books he/she wants.
At LexisNexis, gobs of legal case opinions swarm into the computers every day. For purposes of gathering cases into useful libraries – criminal law, real estate law, entertainment law, etc., etc., the computers perform content analysis and automatically sort the opinions into many different libraries. Then, an attorney can limit searches to opinions that reside in the real estate library without seeing irrelevant criminal cases. As the law changes, the case-sorting algorithms change.
Another way of helping an Amazon customer to find books to purchase would be to permit the customer to list his/her 25 favorite books, then ask Amazon’s search engine to generate a list that applies the Frequently Bought Together and Customers Who Bought This Item Also Bought data to those books and create a list of five or ten or twenty-five other books that are most similar to the 25 favorite books. Give the customer a lever that allows him/her to limit the list to books published in the last two years or five years to remove moldy oldies.
Then allow the customer to save that search and ask Amazon to send an email that includes updated search results every couple of weeks. Since Amazon knows what books the customer has purchased, screen out books the customer already owns from the search results and give the customer a button to indicate that she’s already read a recommended book. The fact that a customer already owns or has already read a book is an important fact that should be rolled into the suggestion algorithm. If a customer buys a book from the recommended list, that fact similarly provides information for the suggestion algorithm. Amazon could also include a button the customer could click on for a listed book that was definitely not what the customer was looking for and that negative rating would be further information for the suggestion algorithm.
If PG were playing with Amazon’s data, he would take advantage of the fact that the company has the full text of every ebook it sells sitting on its hard drives and use artificial intelligence techniques to derive data from that text. For non-fiction books, full-text search could be valuable to customers on its own as another way to sell them more books – allow a customer to perform a search of the text of all information systems books for the terms, “denial of service attacks” and “foreign intelligence services” then provide a list of books that include those terms.
However, full-text search is old stuff that LexisNexis and its competitors have been doing for ages. It would be much more interesting to use full-text to conduct content analysis of fiction books to help group similar books together into new genres regardless of BISAC categories or author-provided tags.
Mix sales data in with content analysis, you’ll make it more likely that customers will find books they’ll enjoy – Viking vampire romances or Jane Austen look-alikes. As an additional benefit, this kind of information would give Amazon unparalleled insight into emerging book trends. Among other beneficiaries, the folks at Amazon Publishing could use this kind of information for acquisition strategies.
PG will stop now because he knows very few people get turned on by the many possibilities presented by mining large data sets or enjoy discussions of computerized search and artificial intelligence.