SLiMEnrich Version 1.5.1


DMI File:

Motifs and interacting Domain fields

Motifs File:

Motif-containing proteins and their motifs

Domains File:

Domain-containing proteins and their domains

Randomisation settings:

Last updated:18 Jul 2018

Uploaded Data

This tab shows the content of the four uploaded files used for analysis: PPIs, DMI, Motifs and Domains. (NOTE: for elmiprot and elmcprot strategies, the Motif and/or Domain file will have been made from the DMI file.) Unless user files are selected for upload, these will be the relevant files from data/. For human PPI data, the user need only load their PPI data and press LOAD DATA. In the absence of a PPI file, the example PPI data from data/ will be used. By default, the full file contents will be shown in the tables. Checking the Show parsed data columns box will display which data have been parsed from each input file for the later stages (e.g. what data is being used for mProtein, Motif, Domain and dProtein).


Instructions

Select PPI data, DMI strategy and optional data uploads. Click LOAD DATA. Uploaded data will be displayed below. Check the Show parsed data columns box to display which data have been parsed from each input file. See the HELP tab for more information.

Note: To analyse the example dataset, press LOAD DATA without uploading any files. These files can be found in the data/ folder.

Note: To hide these information panels within the tabs, check the Hide tab info text box (below DMI Strategy selection).


Potential DMIs

This tab shows all possible DMIs for the loaded PPI data by mapping all possibe mProtein-Motif-Domain-dProtein links from the loaded data, where mProtein and dProtein are both found in the PPI data, but not necessarily interacting with each other. This represents the overall pool of the DMIs, given the proteins (but ignoring the actual interactions) in the PPI data. For more details of how mapping is done, refer to Input Data Mapping. By default, all possible links will be shown. To reduce to the set of non-redundant mProtein-dProtein pairs, check the Show NR potential DMI box.



Download

Predicted DMIs

This tab shows the actual DMIs that are identified (when restricting to known ELM interactions) or predicted in the PPI dataset through mapping mProtein and dProtein PPI pairs to mProtein-dProtein pairs in the set of Potential DMIs (i.e. all possible DMIs. For more details of how mapping is done, refer to Input Data Mapping. By default, all possible links will be shown. To reduce to the set of non-redundant mProtein-dProtein pairs used for calculations, check the Show NR predicted DMI box.



Download

Summary Statistics

This tab gives a graphical representation of the unique number of mProtein, Motif, Domain and dProtein identifiers involved in the predicted DMI dataset. Numbers can be visualised upon mouse hover on the bars. Colours match the Network output.


SLiMEnrich Histogram

This tab is the main SLiMEnrich output. Clicking on this tab will trigger the PPI randomisation and generate the expected distribution of predicted DMIs based on purely chance associations between PPI mProteins and dProteins. In brief, the input PPI dataset is shuffled without replacement (i.e. keeping the original number of interacting partners per protein) and the number of predicted DMI calculated for the shuffled data. By default, this is done 1000 times, which can be set using the Number of randomisations (or --random=INTEGER on the commandline).

The histogram shows the expected distribution of predicted DMIs from these randomised PPI data, and marks the observed number of predicted DMIs in the real data. The following values are calculated and displayed:

  • P-value: this is an empirical p-value based on the proportion of randomised PPI datasets that equal or exceed the number of DMI observed in the real data.
  • Enrichment: this is the ratio of observed non-redundant DMI to the mean of the random non-redundant DMI.
  • FDR: this is the estimated proportion of observed DMI that are false positives, based on the mean random DMI count, excluding any random datasets exceeding the observed predicted DMI count.

Clicking SETTINGS will open options for customising the histogram. The bin size (default 1) and x-axis extension can also be set for the commandline version (-b and -x, respectively). Note that the histogram x-axis will not truncate before the maximum real or random DMI count and any Extend X-axis End setting below this number will be ignored. The histogram can be saved as a PNG by clicking DOWNLOAD.

  • Normalise DMI counts will convert all DMI counts to a value relative to the mean random DMI count (e.g. divide all values by mean random DMI) to enable comparisons between datasets of very different sizes.
  • Convert to distribution of estimated real DMI will subtract the random DMI counts from the predicted DMI to generate a distribution of estimated real DMI.


Select labels


Select Colors


Select width/height to download plot as png

Motifs

This shows the frequency of each Motif identified involved in the predicted DMIs and the total proportion of predicted mProtein-dProtein DMIs that were linked via that motif. Frequencies can be visualised in a bar chart by clicking INTERACTIVE VIEW.




Domains

This shows the frequency of each Domain identified involved in the predicted DMIs and the total proportion of predicted mProtein-dProtein DMIs that were linked via that domain. Frequencies can be visualised in a bar chart by clicking INTERACTIVE VIEW.




Predicted DMI Network

This tab generates a visualisation of the predicted DMI network. Different network layouts can be selected and the network can be explored by dragging and moving nodes. Colours match those used in the summary tab.

NOTE: SLiMEnrich will display whatever identifiers have been used for DMI mapping. (ELM, Pfam and Uniprot by default.) For accessible network labels, please use accessible input data, e.g. HGNC gene symbols for mProtein and dProtein fields. Due to the inherent risks of errors and failing to keep up-to-date, SLiMEnrich does not perform any identifier mapping.




SLiMEnrich README

SLiMEnrich Shiny App for assessing enrichment of domain-motif interactions in protein-protein interaction data. Please see the GitHub SLiMEnrich Wiki for details and documentation. SLiMEnrich is also available as a Shiny server at: http://shiny.slimsuite.unsw.edu.au/SLiMEnrich/. SLiMEnrich can be run as an Rscript from the commandline. For commandline options, please run:

Rscript slimenrich.R -h

SLiMEnrich is free software, made available under a GNU General Public License. This program comes with ABSOLUTELY NO WARRANTY. See distributed LICENSE for details.

SLiMEnrich output tabs

See Wiki for details. Check the Hide tab info text box (below DMI Strategy selection) to hide this information in the tabs themselves.


Uploaded Data

This tab shows the content of the four uploaded files used for analysis: PPIs, DMI, Motifs and Domains. (NOTE: for elmiprot and elmcprot strategies, the Motif and/or Domain file will have been made from the DMI file.) Unless user files are selected for upload, these will be the relevant files from data/. For human PPI data, the user need only load their PPI data and press LOAD DATA. In the absence of a PPI file, the example PPI data from data/ will be used. By default, the full file contents will be shown in the tables. Checking the Show parsed data columns box will display which data have been parsed from each input file for the later stages (e.g. what data is being used for mProtein, Motif, Domain and dProtein).


Potential DMIs

This tab shows all possible DMIs for the loaded PPI data by mapping all possibe mProtein-Motif-Domain-dProtein links from the loaded data, where mProtein and dProtein are both found in the PPI data, but not necessarily interacting with each other. This represents the overall pool of the DMIs, given the proteins (but ignoring the actual interactions) in the PPI data. For more details of how mapping is done, refer to Input Data Mapping. By default, all possible links will be shown. To reduce to the set of non-redundant mProtein-dProtein pairs, check the Show NR potential DMI box.


Predicted DMIs

This tab shows the actual DMIs that are identified (when restricting to known ELM interactions) or predicted in the PPI dataset through mapping mProtein and dProtein PPI pairs to mProtein-dProtein pairs in the set of Potential DMIs (i.e. all possible DMIs. For more details of how mapping is done, refer to Input Data Mapping. By default, all possible links will be shown. To reduce to the set of non-redundant mProtein-dProtein pairs used for calculations, check the Show NR predicted DMI box.


Summary Statistics

This tab gives a graphical representation of the unique number of mProtein, Motif, Domain and dProtein identifiers involved in the predicted DMI dataset. Numbers can be visualised upon mouse hover on the bars. Colours match the Network output.


SLiMEnrich Histogram

This tab is the main SLiMEnrich output. Clicking on this tab will trigger the PPI randomisation and generate the expected distribution of predicted DMIs based on purely chance associations between PPI mProteins and dProteins. In brief, the input PPI dataset is shuffled without replacement (i.e. keeping the original number of interacting partners per protein) and the number of predicted DMI calculated for the shuffled data. By default, this is done 1000 times, which can be set using the Number of randomisations (or --random=INTEGER on the commandline).

The histogram shows the expected distribution of predicted DMIs from these randomised PPI data, and marks the observed number of predicted DMIs in the real data. The following values are calculated and displayed:

  • P-value: this is an empirical p-value based on the proportion of randomised PPI datasets that equal or exceed the number of DMI observed in the real data.
  • Enrichment: this is the ratio of observed non-redundant DMI to the mean of the random non-redundant DMI.
  • FDR: this is the estimated proportion of observed DMI that are false positives, based on the mean random DMI count, excluding any random datasets exceeding the observed predicted DMI count.

Clicking SETTINGS will open options for customising the histogram. The bin size (default 1) and x-axis extension can also be set for the commandline version (-b and -x, respectively). Note that the histogram x-axis will not truncate before the maximum real or random DMI count and any Extend X-axis End setting below this number will be ignored. The histogram can be saved as a PNG by clicking DOWNLOAD.

  • Normalise DMI counts will convert all DMI counts to a value relative to the mean random DMI count (e.g. divide all values by mean random DMI) to enable comparisons between datasets of very different sizes.
  • Convert to distribution of estimated real DMI will subtract the random DMI counts from the predicted DMI to generate a distribution of estimated real DMI.

Motifs

This shows the frequency of each Motif identified involved in the predicted DMIs and the total proportion of predicted mProtein-dProtein DMIs that were linked via that motif. Frequencies can be visualised in a bar chart by clicking INTERACTIVE VIEW.


Domains

This shows the frequency of each Domain identified involved in the predicted DMIs and the total proportion of predicted mProtein-dProtein DMIs that were linked via that domain. Frequencies can be visualised in a bar chart by clicking INTERACTIVE VIEW.


Predicted DMI Network

This tab generates a visualisation of the predicted DMI network. Different network layouts can be selected and the network can be explored by dragging and moving nodes. Colours match those used in the summary tab.

NOTE: SLiMEnrich will display whatever identifiers have been used for DMI mapping. (ELM, Pfam and Uniprot by default.) For accessible network labels, please use accessible input data, e.g. HGNC gene symbols for mProtein and dProtein fields. Due to the inherent risks of errors and failing to keep up-to-date, SLiMEnrich does not perform any identifier mapping.