Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2855

Investigate activating UCSC DAS source from External View tab

    Details

    • Story Points:
      2
    • Sprint:
      Spring 4 2021 May 3 - May 14, Spring 5 2021 May 17 - May 28

      Description

      If the UCSC Distributed Annotation Service (DAS) data source is not active (see Preferences > Data Sources) then External View does not work. It shows a message stating that the genome is not available.

      This message is very cryptic because it does not explain to the user that they need to activate the UCSC DAS data source in order to show data in the External View tab.

      Investigate: Can we simply add a button or other user interface component that would let the user activate the required data source within the External View tab?

        Issue Links

          Activity

          Hide
          aloraine Ann Loraine added a comment - - edited

          Before we design the interface, we need to look at the implementation of the panel that is shown when the user clicks "update". We need to know what the actual problem is here, and whether we can simply fix a bug and not have to change the UI.

          The current "External View" interface shows an empty panel with three components at the bottom of the panel:

          • A button - "settings"
          • A menu with options "UCSC" and "Ensembl"
          • A button - "update"

          When users click "update," some code in IGB sends requests to either UCSC or Ensembl genome browser systems to retrieve image files corresponding to the currently visible region in the IGB main window.

          However, as you know, this sometimes fails if (a) the genome being viewed in IGB does not exist in the requested external system or (b) DAS data source is de-activated.

          DAS data source activation is controlled in Preferences > Data Sources tab.

          When the UCSC option is selected within the External View tab and the user clicks "update," probably what happens is that IGB then requests data from the UCSC DAS data source. (DAS stands for: Distributed Annotation Service.)

          If and when this call fails to retrieve the expected data, the External View interface then displays a message that reads:

          • "An external viewer is not available for this genome. If you have questions, please contact the IGB team."

          There are actually three (possibly more) situations when this message is displayed. These include:

          Situation 1: The user has launched IGB but has not yet selected a genome. If the user then opens the "External View" tab and clicks "update", the error message will be displayed.

          Situation 2: User has launched IGB and selected a genome for which there is no external view available from UCSC. If the user clicks the "update" button they will see the error message.

          Situation 3: User has launched IGB, selected a genome for which there is indeed an external view available at UCSC, but the DAS data source is not active. In this case, the error message will be displayed, as well.

          It is possible there could be a very simple fix that not require changing anything apart from how the External View code is contacting and using the UCSC genome browser resource. It depends on what information the External View is needing to retrieve from the data source. It may be something that does not actually require the Distributed Annotation Service. Or it could be a piece of information we can get from a better, more robust source, such as the newer JSON REST service - see: http://genome.ucsc.edu/goldenPath/help/api.html

          Show
          aloraine Ann Loraine added a comment - - edited Before we design the interface, we need to look at the implementation of the panel that is shown when the user clicks "update". We need to know what the actual problem is here, and whether we can simply fix a bug and not have to change the UI. The current "External View" interface shows an empty panel with three components at the bottom of the panel: A button - "settings" A menu with options "UCSC" and "Ensembl" A button - "update" When users click "update," some code in IGB sends requests to either UCSC or Ensembl genome browser systems to retrieve image files corresponding to the currently visible region in the IGB main window. However, as you know, this sometimes fails if (a) the genome being viewed in IGB does not exist in the requested external system or (b) DAS data source is de-activated. DAS data source activation is controlled in Preferences > Data Sources tab. When the UCSC option is selected within the External View tab and the user clicks "update," probably what happens is that IGB then requests data from the UCSC DAS data source. (DAS stands for: Distributed Annotation Service.) If and when this call fails to retrieve the expected data, the External View interface then displays a message that reads: "An external viewer is not available for this genome. If you have questions, please contact the IGB team." There are actually three (possibly more) situations when this message is displayed. These include: Situation 1: The user has launched IGB but has not yet selected a genome. If the user then opens the "External View" tab and clicks "update", the error message will be displayed. Situation 2: User has launched IGB and selected a genome for which there is no external view available from UCSC. If the user clicks the "update" button they will see the error message. Situation 3: User has launched IGB, selected a genome for which there is indeed an external view available at UCSC, but the DAS data source is not active. In this case, the error message will be displayed, as well. It is possible there could be a very simple fix that not require changing anything apart from how the External View code is contacting and using the UCSC genome browser resource. It depends on what information the External View is needing to retrieve from the data source. It may be something that does not actually require the Distributed Annotation Service. Or it could be a piece of information we can get from a better, more robust source, such as the newer JSON REST service - see: http://genome.ucsc.edu/goldenPath/help/api.html
          Hide
          omarne Omkar Marne added a comment - - edited

          My observations:

          By default UCSC Distributed Annotation Service (DAS) data source option is enabled when we run the IGB browser which is good and situation 3 can be avoided.

          For situation 1, I have identified the code which displays the error message when the user has not yet selected the species and genome version. Below is the code which displays the error message. Below code is from BrowserView.java file.

              public void actionPerformed(ActionEvent e) {
                          if (worker != null) {
                              worker.cancel(true);
                          }
                          final String msg = MessageFormat.format(ExternalViewer.BUNDLE.getString("updatingMessage"), getViewName());
                          igbService.addNotLockedUpMsg(msg);
                          final int pixWidth = scroll.getViewport().getWidth();
                          worker = new SwingWorker<Image, Void>() {
          
                              @Override
                              public Image doInBackground() throws ImageUnavailableException {
                                  String ucscQuery = ucscViewAction.getUCSCQuery();
                                  Loc loc = Loc.fromUCSCQuery(ucscQuery);
                                  if (ucscQuery.length() == 0 || loc.db.length() == 0) {
                                      return BrowserLoader.createErrorImage(ExternalViewer.BUNDLE.getString("resolveError"), pixWidth);
                                  }
                                  return getImage(loc, pixWidth);
                              }
          
          

          Error message is from external.properties file.

          connError=Error: could not resolve connection
          serverError=Error: the server was not able to return the answer in the appropriate time
          error=Error: {0}
          resolveError=An external viewer is not available for this genome. If you have questions, please contact the IGB team.
          ensembleFileErrorTitle=Cannot open ensembl url mapping file
          
          

          When the user hasn't selected any genome version the ucsc query length remains 0 and as a result it displays the error message. Java SwingWorker package and it subclasses invoke doInBackground() menthod to perform the background computation of searching the genome. If the genome is not selected or not found they contact the UCSCView file and as a result the ucsc search query length remains 0.

          To avoid this we can edit the error message and make it more descriptive and with proper instructions so that user will not get confused.

          For situation 2, we can either edit the error message with the information of genomes which don't have external view available or we can make changes in the code for those genomes and edit the error message.

          Show
          omarne Omkar Marne added a comment - - edited My observations: By default UCSC Distributed Annotation Service (DAS) data source option is enabled when we run the IGB browser which is good and situation 3 can be avoided. For situation 1, I have identified the code which displays the error message when the user has not yet selected the species and genome version. Below is the code which displays the error message. Below code is from BrowserView.java file. public void actionPerformed(ActionEvent e) { if (worker != null ) { worker.cancel( true ); } final String msg = MessageFormat.format(ExternalViewer.BUNDLE.getString( "updatingMessage" ), getViewName()); igbService.addNotLockedUpMsg(msg); final int pixWidth = scroll.getViewport().getWidth(); worker = new SwingWorker<Image, Void >() { @Override public Image doInBackground() throws ImageUnavailableException { String ucscQuery = ucscViewAction.getUCSCQuery(); Loc loc = Loc.fromUCSCQuery(ucscQuery); if (ucscQuery.length() == 0 || loc.db.length() == 0) { return BrowserLoader.createErrorImage(ExternalViewer.BUNDLE.getString( "resolveError" ), pixWidth); } return getImage(loc, pixWidth); } Error message is from external.properties file. connError=Error: could not resolve connection serverError=Error: the server was not able to return the answer in the appropriate time error=Error: {0} resolveError=An external viewer is not available for this genome. If you have questions, please contact the IGB team. ensembleFileErrorTitle=Cannot open ensembl url mapping file When the user hasn't selected any genome version the ucsc query length remains 0 and as a result it displays the error message. Java SwingWorker package and it subclasses invoke doInBackground() menthod to perform the background computation of searching the genome. If the genome is not selected or not found they contact the UCSCView file and as a result the ucsc search query length remains 0. To avoid this we can edit the error message and make it more descriptive and with proper instructions so that user will not get confused. For situation 2, we can either edit the error message with the information of genomes which don't have external view available or we can make changes in the code for those genomes and edit the error message.
          Hide
          aloraine Ann Loraine added a comment - - edited

          Thank you Omkar Marne for tracking down the relevant code.

          It does indeed look like the code is trying to access a list of all available genome versions and their names from the UCSC Genome Browser site, using the following:

          First, a static (class) variable is declared (see top of the file):

          private static final Set<String> UCSCSources = Collections.synchronizedSet(new HashSet<>());
          

          Later, when a user clicks "update", the code attempts to retrieve a UCSC DAS data provider. If this is available, it then attempts to retrieve all the available genome names from the DAS REST endpoint. We don't see the formulation of the REST call below because it's handled behind the scenes by the DAS data provider object:

          
              /**
               * Returns the genome UcscVersion in UCSC two-letter plus number format,
               * like "hg17".
               */
              private String getUcscGenomeVersion(String version) {
                  initUCSCSources();
                  String ucsc_version = genomeVersionSynonymLookup.findMatchingSynonym(UCSCSources, version);
                  return UCSCSources.contains(ucsc_version) ? ucsc_version : "";
              }
          
              private void initUCSCSources() {
                  synchronized (UCSCSources) {
                      if (UCSCSources.isEmpty()) {
                          Optional<DataProvider> dasDataProvider = igbService.getAllServersList().stream().filter(dataProvider -> dataProvider.getUrl().equals(UCSC_DAS_URL)).findFirst();
                          if (dasDataProvider.isPresent()) {
                              Set<String> supportedGenomeVersionNames = dasDataProvider.get().getSupportedGenomeVersionNames();
                              UCSCSources.addAll(supportedGenomeVersionNames);
                          }
          
                      }
                  }
              }
          
          Show
          aloraine Ann Loraine added a comment - - edited Thank you Omkar Marne for tracking down the relevant code. It does indeed look like the code is trying to access a list of all available genome versions and their names from the UCSC Genome Browser site, using the following: First, a static (class) variable is declared (see top of the file): private static final Set< String > UCSCSources = Collections.synchronizedSet( new HashSet<>()); Later, when a user clicks "update", the code attempts to retrieve a UCSC DAS data provider. If this is available, it then attempts to retrieve all the available genome names from the DAS REST endpoint. We don't see the formulation of the REST call below because it's handled behind the scenes by the DAS data provider object: /** * Returns the genome UcscVersion in UCSC two-letter plus number format, * like "hg17" . */ private String getUcscGenomeVersion( String version) { initUCSCSources(); String ucsc_version = genomeVersionSynonymLookup.findMatchingSynonym(UCSCSources, version); return UCSCSources.contains(ucsc_version) ? ucsc_version : ""; } private void initUCSCSources() { synchronized (UCSCSources) { if (UCSCSources.isEmpty()) { Optional<DataProvider> dasDataProvider = igbService.getAllServersList().stream().filter(dataProvider -> dataProvider.getUrl().equals(UCSC_DAS_URL)).findFirst(); if (dasDataProvider.isPresent()) { Set< String > supportedGenomeVersionNames = dasDataProvider.get().getSupportedGenomeVersionNames(); UCSCSources.addAll(supportedGenomeVersionNames); } } } }
          Hide
          aloraine Ann Loraine added a comment -

          I think that we can make this a little more robust as follows:

          Instead of using the DAS service to retrieve the name of the required genome version, we can instead "hit" the new JSON API, which UCSC appears to be very committed to maintaining.

          As per the documentation, it looks like we can retrieve the data with this new REST endpoint from the new JSON API:

          The format of the returned data is JSON, which the code could then be modified to parse, using a JSON parser that is already included in the IGB project. I forget what it's called – we can track it down pretty easily by looking at the dependencies in the project-level POM or in some of the plugins/modules that use it, e.g., the preferences module.

          Show
          aloraine Ann Loraine added a comment - I think that we can make this a little more robust as follows: Instead of using the DAS service to retrieve the name of the required genome version, we can instead "hit" the new JSON API, which UCSC appears to be very committed to maintaining. As per the documentation, it looks like we can retrieve the data with this new REST endpoint from the new JSON API: https://api.genome.ucsc.edu/list/ucscGenomes The format of the returned data is JSON, which the code could then be modified to parse, using a JSON parser that is already included in the IGB project. I forget what it's called – we can track it down pretty easily by looking at the dependencies in the project-level POM or in some of the plugins/modules that use it, e.g., the preferences module.
          Hide
          aloraine Ann Loraine added a comment -

          See attached file for a view of data from the UCSC genomes endpoint, displayed using the Firefox Web browser, which has built-in support for pretty printing of JSON data.

          Show
          aloraine Ann Loraine added a comment - See attached file for a view of data from the UCSC genomes endpoint, displayed using the Firefox Web browser, which has built-in support for pretty printing of JSON data.
          Hide
          nfreese Nowlan Freese added a comment -
          Show
          nfreese Nowlan Freese added a comment - Current UCSC DAS genomes: https://genome.ucsc.edu/cgi-bin/das/dsn Current UCSC API genomes: https://api.genome.ucsc.edu/list/ucscGenomes
          Hide
          aloraine Ann Loraine added a comment -

          We have successfully investigated the problem and planned a solution. Closing the ticket.

          Show
          aloraine Ann Loraine added a comment - We have successfully investigated the problem and planned a solution. Closing the ticket.

            People

            • Assignee:
              omarne Omkar Marne
              Reporter:
              aloraine Ann Loraine
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development

                  Agile