The latest Squirro Release 3.5.8 is now generally available on our mirror. In order to upgrade your existing Squirro installation or install a new version of Squirro, please head over to the Getting Squirro page for relevant instructions.
To download the latest Squirro toolbox, the latest build of our lib/nlp library or the project templates of our applications (Cognitive Search, Sales Insights), please head over to the downloads space.
This release highlights the progress since the previous LTS release 3.4.7. Most of these additions were already introduced in the intermediate releases of the 3.5.x series and have been documented in those release notes as well.
As a long-term support (LTS) release, Squirro 3.5.8 will receive updates for security issues and important bug fixes for the next two years. See the Squirro Release Process document for details on Squirro’s versioning.
Over the last few years Squirro’s Cognitive Search solution has been deployed at many small and large organizations and we have learned a lot about users’ expectations towards search and knowledge management, from those projects. Given this, we went back to the drawing board and reflected on what a modern Cognitive Search experience should consist of.
The result, a true back to front re-think that is strikingly visible in this particular release. A lot of usability improvements have been implemented throughout and a new dashboard functionality makes it much easier to access search results and community information from anywhere. Question answering and updated query processing workflows improve the quality of the search results for users. And, just as important, the result looks beautiful and is enjoyable to work with every single day.
To access more screenshots and discover the revamped search experience, please check the following documentation: https://squirro.atlassian.net/wiki/spaces/DOC/pages/2729312289.
In the background these improvements were driven by the following platform additions:
- New Items widget
- New Search Bar widget including its Advanced Search dialog used in Global Search
- Contextual Concept search
- Question Answering
- Similar Searches
- Improved query processing workflow
- Community 360 Dashboard
- Native Display of Office Documents
Some of these are outlined in more detail below, to find out about the others, please see the linked documentation pages above.
With the https://squirro.atlassian.net/wiki/spaces/DOC/pages/2728427651 add-in, Squirro can be integrated directly into the Microsoft Outlook mailbox. This works for Microsoft 365 installations, as well as on-premises Microsoft Exchange setups.
With this add-in any Squirro dashboard can be exposed to your users in Microsoft Outlook. A typical use case for this add-in is to enable searches that are linked to current messages.
For information on how to enable and deploy this functionality please see: https://squirro.atlassian.net/wiki/spaces/DOC/pages/2728427651.
The Cognitive Search application is often connected to document management systems where all types of documents are stored. This most prominently includes office documents, such as: Microsoft Word, Excel, or PowerPoint.
With this release, the aforementioned document types are now displayed better and look exactly like they would, if they were to have been opened in their original application.
For information on how this is achieved, and how this can be enabled in a Squirro project very quickly, please see: https://squirro.atlassian.net/wiki/spaces/DOC/pages/2728067169.
The best search experience does not expect users to scroll through large number of search results. Instead, what users really want is a solution that provides them with the right answers to their questions. This has been Squirro’s vision since the first product version.
This release includes a new functionality that highlight further progression towards this vision: https://squirro.atlassian.net/wiki/spaces/DOC/pages/2693595182.
By providing short summarized answers on top of the search results, users are able to get their answers more quickly.
Concept Search allows users to retrieve search results without having to formulate a complex query. This is done by giving the system some input, from which it can learn a “concept”. While Squirro has always supported this using https://squirro.atlassian.net/wiki/spaces/DOC/pages/69009481, this version now introduces the same functionality in a very easy to use manner for end users. This new functionality is https://squirro.atlassian.net/wiki/spaces/DOC/pages/2727673878 and is made available to users using a search icon whenever they highlight some text.
When this search is run the result is a concept which is displayed in the search bar.
Similar Searches are automatically suggested based on the user’s current search. This is calculated based on searches run by all users in the past which are semantically similar. This aids users to discover other ways of finding the data they are looking for.
Please see https://squirro.atlassian.net/wiki/spaces/DOC/pages/2729017818 for how to start using this functionality.
Communities are a central component of any Squirro Cognitive Search project and these communities allow extremely easy content consumption of specific topics. Common community types are: a company’s products, clients, or research topics.
With this release, we are introducing the https://squirro.atlassian.net/wiki/spaces/DOC/pages/2727641208 a feature that makes it even easier for users to find the data associated with a given community.
Combined with the https://squirro.atlassian.net/wiki/spaces/DOC/pages/12156986 capabilities this lets Squirro users design compact but powerful dashboards that present all essential information centrally. The new https://squirro.atlassian.net/wiki/spaces/DOC/pages/2728624129 widget was also introduced for this purpose.
Please see https://squirro.atlassian.net/wiki/spaces/DOC/pages/2727641208 to learn how to start using this functionality in Squirro projects.
The https://squirro.atlassian.net/wiki/spaces/DOC/pages/2727444575 has been introduced and will eventually fully replace the previous Confluence and Confluence widgets. The Items widget unifies their functionalities in one widget and provides a few additional features (such as the ability to show starred items or last read items).
The https://squirro.atlassian.net/wiki/spaces/DOC/pages/1024655387 widget has also been completely overhauled and provides a much more enriched type-ahead experience.
This is part of a technical platform change in the platform going on in the background. The user interface is moving to a new technology called React (previously Squirro was fully built on Backbone.js). With this change, the development of Custom widgets for Frontend engineers is becoming much easier and is now based on a modern framework.
For more information on this transition and how to use the new React-based widgets, please see the separate document page: React Custom Widgets.
Not all data is equal. Some data sources should be shown in Squirro with a higher priority than others. Additionally, some pipeline processing steps are less urgent and it’s more relevant to quickly show the initial processing result to the users.
To satisfy these use cases, Squirro now provides more control over data priorities, as it flows through the Squirro pipeline. This can be done at ingestion time, or during processing, in the pipeline.
For information on how to use this new functionality, please refer to https://squirro.atlassian.net/wiki/spaces/DOC/pages/2713813031.
Labeling of Ground Truths for AI Studio model training should be shared with domain experts who know most about the concepts being trained. To facilitate this, we have introduced the ability to share a Ground Truth labeling view with end users. This allows Squirro Model Creators and Data Scientists working with https://squirro.atlassian.net/wiki/spaces/DOC/pages/2220458018 to quickly get high quality labels from their users.
Please see https://squirro.atlassian.net/wiki/spaces/DOC/pages/2729574474 to learn how to get started with this new functionality.
Model-as-a-Service (MaaS) based on microservices opens the Squirro platform to custom machine learning (ML) models and also accelerates the prototyping phase for ML projects in Squirro through standardization and decoupling. The process is based on MLflow models standardization, a standard format for packaging machine learning models that can be used in a variety of downstream tools.
To make use of that feature, self-trained models or already existing (pre-trained) models need to be converted into the structure of an MLflow model, uploaded via squirro assets or via scp to a Squirro instance, deployed and added to a workflow to be used in the Squirro platform.
The framework created by MaaS thus unifies the deployment process of external ML models, significantly reducing the complexity of current and future ML projects. Furthermore, by incorporating a variety of new ML models, MaaS is an integral part of taking the AI Studio beyond its current capabilities.
A more detailed description about MaaS and MLflow and its dependencies packages can be found in: https://squirro.atlassian.net/wiki/spaces/DOC/pages/2696249556.
Next to the bigger new additions listed above, we have also worked on a lot of smaller improvements.
- The https://squirro.atlassian.net/wiki/spaces/DOC/pages/2470283239 now accepts a folder ID to limit the data that is retrieved.
- The https://squirro.atlassian.net/wiki/spaces/DOC/pages/2478113223 and the https://squirro.atlassian.net/wiki/spaces/DOC/pages/2485649600 support handling of file deletions in incremental loading.
- The data loading user interface allows mapping of the thumbnail URL from a connector field.
- Processing in the https://squirro.atlassian.net/wiki/spaces/DOC/pages/50855952 was made more resilient by retrying the initial retrieval of batches. This protects against some of the internal https://squirro.atlassian.net/wiki/spaces/DOC/pages/13598782 not being run. The maximum number of those retries is controlled by the server config option ingester.stream.max-dequeue-retries (default value is 3).
- A new option Force OCR was added to the https://squirro.atlassian.net/wiki/spaces/DOC/pages/2678096427 pipeline step. This runs the text extraction even if existing text was already found in the PDF file. This is useful to extract text for documents that have a mix of machine-readable content as well as scanned pages. It also allows extracting of additional text from embedded charts.
- The https://squirro.atlassian.net/wiki/spaces/DOC/pages/2678096427 now supports password-less encrypted PDFs.
- Known Entity Extraction that has been set up from Communities (see How to set up Communities Using KEE) is now regularly refreshed to maintain consistency between the updated communities and the KEE configuration.
- Overall usability improvements on the AI Studio screens.
- When giving feedback to an entity, the label select menu now shows all labels with the model name for the same tagging level.
- In the AI Studio model templates the fastText templates have been changed to use model compression by default. This reduces the storage footprint by a magnitude but comes with the cost of longer training duration. More information is available in the Model compression section of the fastText web site.
- For proximity filter rules it is now possible to define a rule_field field which will contain the rule that triggered the match. If no rule matches, it will save the value of the no_rule_matched_label field which will then default to “NO_RULE_MATCHED”.
- The https://squirro.atlassian.net/wiki/spaces/DOC/pages/2675999597 functionality underwent many improvements that aid the search quality:
- The query strategy can be changed per project using a project configuration setting (topic.search.query-strategy). The default query pipeline can not be edited or deleted anymore, instead it has to be cloned to customize it. In turn, the default query pipeline is automatically updated with new Squirro releases, thus always reflecting the current state of the art.
- The default search behavior no longer requires that all query terms are found in a document. Instead, a minimum-should-match strategy has been implemented that requires a minimum number of words to match based on the query length.
- Improve search precision for large documents by merging relevant chunks of terms into loose phrases. For example this detects in the user’s query, “will EU extend Brexit deadline” that Brexit and deadline often come together. It will reformulate this as eu extend (“brexit deadline”~15 OR (brexit deadline)). This makes sure that results where “brexit” and “deadline” are found within 15 words are scored better than if they simply appear anywhere in the document.
- Significantly faster search speed by disabling some functionalities that do not add any value in modern Squirro projects. This includes the deactivation of near duplicate merging, searching on summary, and PDF searching functionalities in non-PDF projects. Handling of synonym matching was also improved.
- The query processing workflow is now aware of term-sequences and term-phrases to allow fine-grained tuning of the user’s query text.
- Popular Query Suggestions default to suggesting popular queries only from the current project. The scope can be changed with the server:topic.typeahead.popular.enabled configuration option.
- As a performance improvement, the scan API (used to navigate large query result sets) defaults to returning results unsorted. This can be changed using the preserve_scroll_order argument to return results in sorted order.
- Searching for widgets in the dashboard editor also respects additional widget keywords. For example, searching for “link” will show the “Actions” widget. This was introduced to make the transition to some of the new widget names easier for experienced project creators.
- Added option to hide dashboard layers when the widgets inside are empty.
- Added “View all” buttons to many widgets. Where available, this can be linked to a separate dashboard that lets users explore the same data set in more detail. For example, a horizontal community list on the homepage can use the View all option to link to the communities dashboard.
- The https://squirro.atlassian.net/wiki/spaces/DOC/pages/1024819275 widget has a new mode for stacked bar charts.
- The Communities widget can be changed to show relevant communities for the current result set; in that mode it also has a new horizontal visualization mode.
- The https://squirro.atlassian.net/wiki/spaces/DOC/pages/1024786458 widget horizontal view mode was extended to a carousel view like other horizontal modes. The widget also has a new option that lets you decide if you want to show the number of filters, or not.
- The https://squirro.atlassian.net/wiki/spaces/DOC/pages/1024884755 widget stores its selected tab in the page’s URL and also retrieves it from there on load. This is very helpful in combination with the View all functionality to deep link to specific parts of a dashboard. This also allows retaining of the tab when the page is reloaded or when a link to the dashboard is shared.
- Added search bar to the https://squirro.atlassian.net/wiki/spaces/DOC/pages/2682356092 documents selector.
- Atlassian Jira issue collectors can be configured for the Send Feedback option in the help menu.
- Detailed information for individual communities is exposed in the API. This new endpoints includes information such as the number of followers and the number of items belonging to the community.
- The Python version used by Squirro has been upgraded from Python 3.6 to Python 3.8.
- The https://squirro.atlassian.net/wiki/spaces/DOC/pages/531398691 now supports project and source statistics.
- Single Sign-On integrations can now return session data as well as user data. This is used in https://squirro.atlassian.net/wiki/spaces/DOC/pages/184811752 to store additional user profile data. One current use case is the storage of Microsoft Exchange IDs for the https://squirro.atlassian.net/wiki/spaces/DOC/pages/2728427651 integration.
- PostgreSQL is now supported as a storage backend for Squirro’s metadata database. Setups can now choose MariaDB, MySQL, or PostgreSQL as the database backend. Setup and upgrading of Squirro systems with a PostgreSQL database is currently manual. If this is something you need for your environment please reach out to Squirro support.
- Extend possible package size sending to MariaDB server. This resolves problems where large configuration data would break the connection. Please see https://squirro.atlassian.net/wiki/spaces/DOC/pages/2717745156 for more information.
- Simplified synonym handling to support synonyms on managed ElasticSearch. For this, the configuration was moved from being file-based to ElasticSearch’s inline settings.
- Uploaded files are de-duplicated using Unix hardlinks. This reduces the disk space consumed when processing documents.
The biggest potentially breaking change comes form the upgrade to Python 3.8. This will affect all Python-based plugins, such as https://squirro.atlassian.net/wiki/spaces/DOC/pages/7077924 or https://squirro.atlassian.net/wiki/spaces/DOC/pages/85460174. Especially if custom dependencies have been installed into Squirro’s Python environment, custom steps need to be taken. For plugins that have their dependencies declared using requirements.txt a re-install of the dependencies is automatically attempted.
!Be aware! that you should carefully verify whether all of the plugins still work after the upgrade.
A few changes were made to Squirro’s default https://squirro.atlassian.net/wiki/spaces/DOC/pages/2949295. These changes may cause issues for users who are used to the previous syntax or if they have been configured as default dashboard or widget queries.
The relevant changes are:
- The lowercase terms “and”, “or”, and “not” no longer have any special significance. To use the boolean operators the terms need to be written in full uppercase: “AND”, “OR”, “NOT”.
- Phrase searches, such as “squirro product” no longer require those words to be in the exact order, nor directly next to each other. Instead they are converted into proximity searches, and the words merely have to appear closely together. This change was introduced as it results in better recall for most users. Users can revert to the old behavior by manually entering a proximity search (“squirro product”~1). Project creators can change the behavior in the project’s Query Strategy configuration (see https://squirro.atlassian.net/wiki/spaces/DOC/pages/2675999597).
The Calendar widget was removed. Existing dashboards containing this widget should be edited and the existing “Calendar” widgets deleted.
Labels in item detail views are now displayed before the actual document, not at the end. This is in preparation for a larger upcoming change where document-level AI Studio feedback will implemented by interacting with those labels.
The term Labels is now used throughout Squirro to refer to any of the concepts otherwise known as: keywords, entities, facets, etc. The documentation has not yet been fully updated to reflect this, so the terms are sometimes going to be used interchangeably. This also changes the names of a few widgets in the dashboard editor.
Activity tracking is used more actively throughout Squirro especially for some of the new search features (such as popular queries and similar queries). For this a few changes were required on the activity logging:
- The /activity API endpoint now enforces authorization.
- Delete all indexed items in the projects where activity sources with a non-analyzed query facet exist.
- Activity files are no longer stored in the same folder as the frontend log files but in a sub-folder which includes the server name. This allows better handling of multi-node setups where the activity log files may be stored on a shared file system.
Other changes have been implemented that should have no affect on most uses:
- Removed the –facet-delimiter parameter from the data loader command line. This may cause existing command line load scripts to fail.
- In the AI Studio a proximity filter rule can not contain more than 20 words. This was introduced to avoid performance problems with very large proximity filters.
- The pipeline steps https://squirro.atlassian.net/wiki/spaces/DOC/pages/2949443 and https://squirro.atlassian.net/wiki/spaces/DOC/pages/2949444 now only process the first file found in the files list of any processed item. This does not affect any standard setup, as no built-in Squirro data loader or pipeline step would ever result in items with multiple files attached. Custom data loader plugins or pipelets could however have resulted in such a scenario.
See the intermediate release notes for a list of all the bugs fixed since the last LTS release:
For new installations please follow the Setup on Linux instructions.
To upgrade an existing installation, please consult the https://squirro.atlassian.net/wiki/spaces/DOC/pages/2696118826 guide.