New look to site search


Introducing indexisto.com — search for websites and mobile applications.
The project is in alpha, please be understanding (to tap gently). Test results now English content musical subjects. We also really need early adopter's, if the search is interested write a PM.



Chronicle

The story began a couple of years ago when I moved from Windows to Ubuntu, and then continued with the move to Mac. Such a transfer could give rise to ten stories, but I had one — I suddenly started to search for the operating system as the primary navigation tool.

In both systems the search is deeply integrated, organized into categories (files, programs,..) is very fast and has some nice features, such as based on previously entered queries in the results. Over time, search has learned to understand me from the first letter entered.

I also began to notice many other scenarios when a search is quite a time saver. Search for "settings" in Chrome, search for contacts in Skype, the jump to humans through a search on Facebook, tip the URL in the address bar of Firefox taking into account the frequency of visits to the sites...

At the same time the situation with the search on the websites of 99% of the time depressing. It seems that no one takes the search string seriously and doesn't waste time to think. Yes, Yes, and on Habre too.
Since this all started )

Training

Having a team of like-minded people, we decided that the situation with the "dead" search string on the websites can radically change )

Started with what I was looking for examples of good and bad search. Going to a new website in the first place was watching how a search works. In the end, somewhere through a floor of year began to emerge requirements for the search string, which was influenced by Windows 8 (pleasant feeling), new.myspace.com (courage), Vkontakte (speed and local search throughout the service), and many other smaller ones.

Example "revolutionary" search by overlapping a main screen on new.myspace.com



Requirements from the user

There are demands that you need from the search user:
the
    the
  • speed, we are talking about tens of milliseconds from the query to the result
  • the
  • a minimum of extra clicks, instant search and shift to desired from drop-down results
  • the
  • convenience. If man got to the search string, there's no need to clamp in the close input width 100px.
  • the
  • ability to quickly set two search lines per page — one global throughout the site, the second current section
  • the
  • additional parameters in search: categories, facets (tags), sorting
  • the
  • smart search. We must remember that the man was looking for before, where they click in the results other people etc.
  • the
  • very smart search. The ability to "polysemanticism" queries, such as "big red sofa"

With the first points we think we've coped with the last two I think in the process )

Requirements on the part of the programmer/administrator:

We must note that to search most site owners are not enthusiastic, and allocate time programmers as a residual
the
    the
  • integration like Google Site Search — put JS up and running. Despite the presence of the search servers a high level like: Solr, Sphinx, even their simple setup will take time, not to mention the many beautiful options with those names dis_max, tie_breaker, cutoff_frequency, slop, etc.
  • the
  • smaller to climb in the console to read the logs to catch slow queries.
  • the
  • if the Manager asks "and what people we are looking for" would not have to panic to make samopisny statistics
  • the
  • to avoid double work if they come to the task itility a couple searches
  • Here not all items we managed to achieve, in particular, our search is more difficult than Google Site Search, but it's easier than Solr, Sphinx

    In the end, was born http://indexisto.com

    What is indexisto?

    the
      the
    • Is a full text search in the cloud. The project made use of technologies Lucene and Elastic Search, and written entirely in Java.
    • the
    • Not need to install, configure, and monitor server full-text search to Sphinx, Solr,
    • the
    • import the data directly from the database. This is, for example, a PHP agent that we shove executes queries of type SELECT title,body FROM posts... our database should definitely create a user with read-only rights, and only on certain tables. The request was signed by the private key.
    • the
    • Ready bystrorastvorimami search JS string with a lot of features (widgets, facets, histograms, sorting). Insert asynchronous, 50kb.
    • the
    • Images are siphoned off and huddle machine. Then you can insert them into the template of issuance.
    • the
    • Easy admin where written requests for shoveling data line is configured, queries,
    • the
    • Reports about the search, logs, reports imports


    Indexisto Admin panel:



    Now Indexisto is a full text search in the cloud with user-friendly admin panel. In the process, we solved many problems which make life easier for the administrator. For example, you can configure and experiment with search results in the admin interface, but these changes will only appear after you press the Activate button search box. This is very useful in case of any changes.
    You can easily clone the settings of the index and make the other results, for example in the section. There are subtle but complex problems that we decided. For example, in Elastic Search you can't just go and change a String field to Int field in an already indexed document type within the same index, mapping will be incompatible. We have this problem solved is opaque to the administrator, will create a new index with a different internal name and external name will remain the same and all the settings will be saved.

    the Movement toward smart search

    Now we consider a clique in the results, and in the near future it will be possible to set boosts the results on user behavior and search previously found.
    Another interesting possibility — "polysemanticism search". As we take data directly from the database, you can do pretty interesting things. For example, to index the tags in the text fields. The example of our issue try to score DISCO 80. You will see the relevant groups that played disco in the 80s:


    It's certainly not rocket science, but you can do more interesting things, for example, when indexing the product:
    the
      the
    • item Name: Sofa "Svetlana 5" t
    • the
    • product type: sofa
    • the
    • price: 7000rub
    • the
    • color: red
    • the
    • length: 2400mm

    you can prescribe rules:
    the
      the
    • if length > 2000mm add synonyms: BIG, HUGE, LONG
    • the
    • price < 10000rub add synonyms: CHEAP, DISCOUNT, SALE

    thus we get "polysemanticism" search, we will run queries like:
    the
      the
    • LARGE CHEAP SOFAS
    • the
    • CHEAP RED COUCHES


    Traffic in the direction of more intelligent search

    I do not know how habré followed by projects like Freebase, Dbpedia and other attempts to structure the information, but there is progress that can be freely used in their benefit. If you do not go into the details, you can extract structured information.
    If you are trading operating systems and you have a Microsoft Windows product, you can enrich the description of a variety of additional data which in General case can be found in the right column in Wikipedia:

    Thus you will have to run query:
    OS FOR ARM
    The project is now in active development, but the basic functionality we're ready for free to connect early adopter'am
    write.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Integration of PostgreSQL with MS SQL Server for those who want faster and deeper

Custom database queries in MODx Revolution

Parse URL in Zend Framework 2