The latest Squirro Release 3.5.3 is now generally available on our mirror.
In order to upgrade your existing Squirro installation or install a new version of Squirro, please head over to the Getting Squirro page for relevant instructions.
To download the latest Squirro toolbox, head over to the downloads space.
The Latest SquirroClient is also now available on PyPI.
Features
Frontend and other UI related enhancements:
- Add JIRA feedback support.
- Adjust the styling of the search widget based on the new designs
- Update the React search widget to be compatible with the spellcheck and reset filters widgets.
- Fix spellcheck widget behaviour.
- Show creation time for user history searches in the React search widget.
- Change query for candidate set items to match the new query strategy.
- Add scroll to job id from the URL params functionality to the ML jobs page.
- [AIS] Add tooltip in model validate page.
- [AIS] Adds a tip in the publish dashboard that points the user to the enrich tab.
- [AIS] Adds demo of model templates from tip in the models section
- [AIS] Add animation to the the tooltip and prop to close it properly.
- [AIS] Add tooltip in labeling view if there is no highlight.
- [AIS] Added AI studio model validation error page with link to ML Job details page.
- Add support for highlighting on Labels Table widget.
- Extend communities headlines widget to support entities feedback.
- When rerunning a single step, the Linked steps option is now checked by default.
Backend related enhancements with tags which represent their area of work:
- [Query Strategy] New default query-strategy for Search-Pipeline: improve search precision for large documents by enabling re-scoring of top N result-documents based on users’ query term-sequences.
- [Communities] Suggest image URLs relevant to a term using the Squirro client, retrieve the byte image using an image URL.
- [Pipeline Steps] The
PDF OCR
step now supports password-less encrypted PDFs. - [Typeahead] Expose query search time. It allows frontend to display the time when the query was typed within the search history suggestions.
- [ML Monitoring] Add a new Elasticsearch index for the ML monitoring.
- [ML Monitoring] Add structured logs to the ML jobs execution.
- [Proximity Filters] It is now possible to define a
rule_field
field which will contain the rule which triggered the match. If no rule matches, it will save the vale of theno_rule_matched_label
field which defaults toNO_RULE_MATCHED
. - [Platform] Complete a first iteration on supporting PostgreSQL as the metadata storage system of Squirro. This iteration aimed to support Squirro on using PostgreSQL, on a fresh, single-node deployment, with a pre-configured PostgreSQL server.
- [Data Loader Plugins] Specify a list of folders the Google Drive plugin searches for files with.
- [Communities] The backend supports a new dashboard role called
communities360
which follows the same principles as thesearch
role, meaning that there can only be one Communities 360 dashboard at a given point of time. - [Data Sources] The Data Sources now feature a
Priority
field, which can be defined during data source creation or edited during data source update. - [Activity Tracking] The Activity Dataloader now searches for activity files also in the folder with corresponding hostname.
- [Query Strategy] Search-Pipeline: Significantly faster search speed by disabling legacy feature (fold_near_duplicates, can be enabled again if needed) and producing a light-weight query-clause structure (don’t search on summary, improved handling of synonym matching, don’t do expensive PDF search for non-pdf-indices). See performance improvements for more details.
- [Studio Plugins] The Monitoring plugin has been updated to display statistics for batches of any priority.
- [Smart Results] QA returns an excerpt with highlighted answer span that respects sentence boundaries and displays additional context.
- [Smart Results] QA uses a larger abstract from retrieved documents to find an answer span by default
- [Activity Tracking] Activity path is now specified in the
.ini
configuration file using thepath
option in theactivity
section. - [Query Strategy] Search-Pipeline: Implemented configurable default query-strategy as project-configuration setting
topic.search.query-strategy
. Default search is now more lenient and does not require that all query terms have to match in a document (minimum should match in place). - [Auth] API update for the extauth API. In the first version, we had user_information , which persisted in the session. We have now redesigned this to support two different keys:
-
session_data
: is stored with the user’s session. This is the olduser_information
. It’s used in query templates to have additional information resolved for the user. -
user_data
: additional key/value pairs persisted on the user in theUserValue
table. Existing keys are overwritten, but missing keys are not deleted (except if they are specifically set toNone
).
If any of the new keys (session_data, user_data) is present, the new behaviour kicks in. Otherwise, it falls back touser_information
.
-
- [Query Strategy] Search-Pipeline: Make Squirro Query Syntax Parser aware of term-sequences and term-phrases to allow fine-grained tuning of the user’s query text.
- [Communities] Shipped a local placeholder image for communities to support a placeholder image in instances with no internet access. In addition to this, the placeholder image URL is now configurable from the configuration service and can be set to any image hosted on a server.
- [Communities] Image suggestion given a term and image retrieval using a URL.
- [Activity Tracking] Make the activity path argument optional in the Activity Dataloader and by default load the path from the .ini configuration.
- [Query Processor] When query processing merges phrases, now it also keeps the original terms. It makes the search more precise and decreases side effects where the original query would return the results, but the enriched query doesn’t.
- [Smart Results] QA returned answer span and textual context is truncated if necessary.
- [Typeahead] Implement cache for typeahead suggestions besed on communities and enable suggestions by default.
- [Smart Results] QA widget relays all NLP parameters including user terms, i.e. presumable natural language queries.
- [Smart Results] QA favours user terms over unprocessed query as question for answer span extraction, e.g. this allows to ignore facet parameters during question answering .
- [Platform] Squirro has been upgraded from Python 3.6 to Python 3.8.
Bug Fixes
- Fix a bug on Engagement Map widget where clicking the arrows would not re-fetch the data.
- Do not disable any existing loggers while setting up structured logging.
- Fixes model change not triggering collection change in the react version of ItemsWidget.
- Ensure Link widget is in the right color (theming highlight color).
- Fixes a bug where matching terms were not highlighted in the PDFs.
- Custom Assets: now requirements.txt installation will proceed properly even if it failed after previous asset upload.
- Remove unused compact mode for Insights widget.
- The first retried batch will no longer get moved to the
inputstream/processed
directory. - [Query Processor] Fix issue where Query Processing workflows were not cached on instances with multiple ML Workflows running. It makes query processing working on instances overloaded by other models.
- Fixed a bug that caused an incorrect tab to be marked as active in the settings navigation for studio plugins.
- Improve the slow loading time of the Data Sources page and the Monitoring plugin when inputstream includes many batches.
- Fixes a bug where Insights widget has different size if the header is hidden.
- [Facet List Widget] Fix multi selection when using a modifier key (Cmd/Ctrl).
- [AIS] Fixed a bug that prevented labeling if the label was already matching an existing proximity rule.
- [AIS] In labeling, fixed active tab changing back to list view on page refresh.
- [AIS] Fixes newly created rules being in edit mode when going to rules overview tab.
- [AIS] In focus view, fixes multiple label request firing when clicking multiple time the same label.
- Fixes missing react error where tooltip is rendered.
- Fixed a bug with sort selector redirecting to 404 after clicking on it.
- Fix entities with similar names not filtered correctly.
- Ensure priority gets set during source creation.
- Entities in the sidebar will now be grouped only if they refer to the same sentence at the same location in the document. Fixes a bug where entities were grouped for different sentences with the same text content.
- Fix handling of items with None body values in the cleanup pipeline step.
- [Query Processor] Search will now fall back to using the original user-entered query if natural language query processing workflow fails or is not available.
- Ensure chart color theming works for Word Cloud widget.
- [Query Strategy] Search-Pipeline: Always produce a correct and valid item-preview text (abstract) by creating it from the body alone (priorly done from summary, caused issues when the summary differed from the body).
- Keep new lines in the body highlights. It prevents from removing new lines and white spaces when searching for the highlights in the item body.
- Fix a bug where some buttons were not visible in the Setup Space. Affects communities pages and project properties.
- Fixed an issue where pikepdf and orjson packages were not being installed on production machines.
- Question Answering & Query-Parsing endpoints were only accessible for project-admins, hence feature worked only for admins. Changed permissions to allow access for project-readers too.
- Fix Entities Widget ignoring additional widget query.
- Map Widget doesnt render if starting on hidden layer.
- PDF OCR no longer enforce PDF/a conversion to avoid errors and slightly improve performance.
- [Query Strategy] Search-Pipeline: Don’t break query on searches that contain standalone reserved characters, e.g. wildcard character or question mark.
Breaking Changes
- [Activity Tracking] Delete all indexed items in the projects where activity sources with the query facet that is not analyzed exist.
- [Activity Tracking] Activity files are now saved to the path specified in the common.ini file. Additionally, the folder with the hostname as a name is appended to the path. It allows separating activity files per node, so each node saves data to his own dedicated folder.
- [Query Strategy] Matching all documents (
match_all
item query) via a standalone wildcard*
is not possible anymore, i.e. wildcard is matched literally. Instead, perform an empty query search to retrieve all items. - [Platform] The upgrade from Python 3.6 to Python 3.8 prompted for updated instructions on how to upgrade your Squirro instance. Please follow the updated instructions included on this page.
Known Issues
-
Empty image suggestion is not well handled with community creation, which results in a stack trace in the
topicproxy/stderr.log
file. However, this does not block the community creation as the communities are created with a default image. -
PyTorch is not automatically installed in release 3.5.3. This package is needed for the Question Answering to work. It can be installed manually with the command
sudo yum install squirro-python38-torch
.
Installation and Upgrade
You will have to resolve at least the following config files when upgrading from Squirro 3.3.0
/etc/nginx/conf.d/ssl.inc
/etc/squirro/common.ini
For new installations, please follow the Setup on Linux instructions.
To upgrade an existing installation, please consult the Upgrades for Squirro 3.5.3 and later guide.