Adding proxies for squirro frontend data loader

in which config file we need to add proxies, I am using squirro plugin dataloader

2 Likes

Hi @adarsh

Proxies can be configured through the configuration files. Specifically, you can edit /etc/squirro/common.ini and add the proxy configuration as follows:

[proxy]
proxy = http://proxy.mycorp.com:8080
no_proxy = 127.0.0.1,localhost,127.0.0.1:81,localhost:81

This particular configuration will enable the indicated proxy for all requests, except any requests to the listed exceptions (localhost nodes in this case).

Note that all services will need to be restarted after any change of this configuration. You can use the squirro_restart command to restart the Squirro services.

Options

Option Description
proxy Proxy used for HTTP and HTTPS requests.
http_proxy Proxy used for HTTP requests. Only used if proxy is not specified.
https_proxy Proxy used for HTTPS requests. Only used if proxy is not specified.
no_proxy Comma-separated list of hostname suffixed for which the proxy should not be consulted. This will usually contain data-centre domain names for which the proxy is either not needed or even a hindrance.
1 Like

I have also encountered a similar issue @pneff.

I can confirm that the common.ini settings have been updated to reflect the above information. However, when trying to fetch the document preview from the dataloading screen, I seem to get the following error: ValueError: check_hostname requires server_hostname

The full stack trace can be found below:

MainThread datasourced[2060] 2022-03-08 07:15:14,742 INFO     Start load from squirro_plugin
MainThread datasourced[2060] 2022-03-08 07:15:14,770 ERROR    Exception: (None, ValueError('check_hostname requires server_hostname',))
Traceback (most recent call last):
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/squirro_client/base.py", line 276, in _perform_authentication
    r = session.post(url, data=data, timeout=self.timeout_secs)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/requests/sessions.py", line 590, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 696, in urlopen
    self._prepare_proxy(conn)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 964, in _prepare_proxy
    conn.connect()
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/urllib3/connection.py", line 359, in connect
    conn = self._connect_tls_proxy(hostname, conn)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/urllib3/connection.py", line 506, in _connect_tls_proxy
    ssl_context=ssl_context,
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 453, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 495, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/ssl.py", line 407, in wrap_socket
    _context=self, _session=session)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/ssl.py", line 773, in __init__
    raise ValueError("check_hostname requires server_hostname")
ValueError: check_hostname requires server_hostname

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/squirro/dataloader/sq_data_load.py", line 165, in load_from_source
    source.connect(self.config.incremental_column, max_inc_value)
  File "/var/lib/squirro/topic/assets/dataloader_plugin/_global/squirro_plugin/squirro_plugin.py", line 113, in connect
    self.client.authenticate(refresh_token=self.args.source_token)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/squirro_client/base.py", line 241, in authenticate
    self._perform_authentication(dict(base, **data))
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/squirro_client/base.py", line 281, in _perform_authentication
    raise ConnectionError(None, ex)
squirro_client.exceptions.ConnectionError: (None, ValueError('check_hostname requires server_hostname',))
MainThread squirro_plugin 2022-03-08 07:15:14,772 INFO     The max inc value is stored in MySQL as 2022-03-01T14:48:05
MainThread squirro.service.datasource.background 2022-03-08 07:15:14,772 ERROR    Exception invoking dataloader
Traceback (most recent call last):
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/squirro_client/base.py", line 276, in _perform_authentication
    r = session.post(url, data=data, timeout=self.timeout_secs)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/requests/sessions.py", line 590, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 696, in urlopen
    self._prepare_proxy(conn)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 964, in _prepare_proxy
    conn.connect()
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/urllib3/connection.py", line 359, in connect
    conn = self._connect_tls_proxy(hostname, conn)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/urllib3/connection.py", line 506, in _connect_tls_proxy
    ssl_context=ssl_context,
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 453, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 495, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/ssl.py", line 407, in wrap_socket
    _context=self, _session=session)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/ssl.py", line 773, in __init__
    raise ValueError("check_hostname requires server_hostname")
ValueError: check_hostname requires server_hostname

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/squirro/service/datasource/background.py", line 240, in _process_dataloader_task
    max_inc_value=source["max_inc_value"],
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/squirro/service/datasource/dataload.py", line 95, in fetch_process_and_upload_items
    max_inc_value=max_inc_value, cli_mode=False
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/squirro/dataloader/sq_data_load.py", line 511, in execute_load_only
    upload_rows=True,
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/squirro/dataloader/sq_data_load.py", line 165, in load_from_source
    source.connect(self.config.incremental_column, max_inc_value)
  File "/var/lib/squirro/topic/assets/dataloader_plugin/_global/squirro_plugin/squirro_plugin.py", line 113, in connect
    self.client.authenticate(refresh_token=self.args.source_token)
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/squirro_client/base.py", line 241, in authenticate
    self._perform_authentication(dict(base, **data))
  File "/opt/squirro/virtualenv3/lib/python3.6/site-packages/squirro_client/base.py", line 281, in _perform_authentication
    raise ConnectionError(None, ex)
squirro_client.exceptions.ConnectionError: (None, ValueError('check_hostname requires server_hostname',))

Any tips on how best to proceed? I can confirm that the server is indeed able to access the proxy via a simple curl request (via exporting the HTTPS_PROXY variable) and the hostname being used is in a normal format.

2 Likes

Following up on above, I was able to resolve the issue by using the https_proxy key instead of the proxy key. It’s also important to mention that when specifying the https_proxy key/value it’s also necessary to specify the http_proxy key/value as well.

2 Likes

Thank you for the update @peter.brejza :muscle:

2 Likes