2.5 Regional scientific production profile

2.5 Regional scientific production profile
2.5 Regional scientific production profile logo

According to the RIS3 mapping exercise, bibliometric analysis and benchmarking is used in around half of all the reviewed RIS3 processes (55% of total). However, the mapping information also shows that the bibliometric analysis used by regions is usually simple, and very few regions have a forward view of emerging areas, using predictive analysis or building on hypotheses of potential regional changes or emerging areas for the future. There is a need to increase the understanding of knowledge produced and available in regions. This is relevant for later linking to the demand for knowledge and identifying emerging areas of activity and specialisation.

Scientific profiles and regional benchmarking of these profiles are important for the analysis of the context of a region as it facilitates a comparison of all aspects of a region’s performance in relation to science, main fields of science and specialisation patterns of regional academic systems. When benchmarked to other regions, it can be a valuable tool to identify weaknesses and strengths, and link them to overall regional performance.

Description of the method

The objective of a scientific production profile is to provide a bibliometric analysis of the scientific performance of regions. The scientific production profiles are generally based on a selected set of bibliometric indicators that aim to compare scientific performance across geographies (regions, but also countries). These indicators generally include (European Commission, 2013):

  • Number of publications. Number of peer-reviewed scientific publications written by authors located in a given geographical or organisational entity (e.g. the world, a country, a NUTS2 region, a university, a Research Performing Organisation (RPO) or a company). Publication counts based on full and fractional counting at the level of author addresses.
  • Growth Index (GI). Measure of the increase in the number of publications or co-publications in a field obtained using fractional counting of publications. A GI value above 1 means that a given region experienced an increase in its output in this research area; and index value below 1 means the contrary. The GI value for a given region can be compared to the GI calculated for the world or other benchmark regions in this research area to ascertain whether the increase experienced by the region has kept pace with the world/benchmark regions increase in this research area.
  • Specialisation Index (SI). Indicator of research intensity of a given entity (e.g. a country, a NUTS2 region or an institution) in a given research area (e.g. a research field), relative to the intensity of a reference entity (e.g. the world, the EU28, or the entire output as measured by a database) in the same research area. In other words, when a region is specialised in a field, it places more emphasis on that field at the expense of other research areas. Specialisation is said to be a zero-sum game: the more a region specialises somewhere, the less it does elsewhere. An index value above 1 means that a region/entity is specialised relative to the reference entity, whereas an index value below 1 means the contrary.
  • Total number of citations. Total number of citations received by each publication counted from the year of publication plus a three-year citation window. The total number of citations for a NUTS2 region is obtained by totalling the number of citations of the publications that were assigned to the region.
  • Average of Relative Citations (ARC). A field-normalised direct measure of scientific impact, (which also considers the publication year and document type of scientific contributions in the normalisation process) based on the citations received by an entity’s papers. To account for different citation patterns across fields and subfields of science, each publication’s citation count is divided by the average citation count of all publications that were published the same year in the same subfield to obtain a Relative Citation count (RC). The ARC of a given region is the average of the RCs of the papers belonging to it. An ARC value above 1 means that a region is cited more frequently than the world average, while a value below 1 means the contrary.
  • Average of Relative Impact Factors (ARIF). A field-normalised measure of the scientific impact of publications produced by a given entity (e.g. a NUTS2 region) based on the impact factors of the journals in which they were published (also taking the publication year of scientific contributions into account in the normalisation process). The ARIF is an indirect impact metric reflecting the average citation rate of the publication venue instead of the actual publications. It serves as a proxy for the ‘quality’ of the research performed by a region. When the ARIF is above 1, it means that a region scores better than the world average; when it is below 1, it means that on average, an entity publishes in journals that are not cited as often as the world level.
  • Highly cited publications. Percentage of papers in the 10% most-cited papers in a given reference database, making use of the relative citation (RC) scores of publications computed using a three-year citation window.
  • Number of co-publications. Number of co-publications (full-counting) from a NUTS2 region in which co-authors are from at least two different regions. The collaborations of a NUTS2 region can be broken down by the regions in which the co-authors are located to draw collaboration maps or aggregated to count the total number of co-publications of a region with other regions within the EU28.
  • Collaboration Index (CI). Scale-adjusted metric of scientific collaboration comparing the observed number of co-publications of an entity (e.g. NUTS2 region) to that expected given the size of the scientific production of the region. When the indicator is above 1, a region produces more publications in collaboration than expected based on the size of its scientific production, while and index value below 1 means the contrary.

Usability and impact

Effective and efficient research and innovation systems are those that succeed in producing strong scientific and technological outputs, both in terms of quality and relevance (European Commission, 2016). Understanding the scientific performance of regions, including impact and collaboration patterns, allows for a comprehensive analysis of the evolution, interconnectivity, performance and impact of regional research and innovation systems in the EU. They also help in providing an overall view of regional strengths and weaknesses in knowledge production across fields and subfields of science.

Required data

The elaboration of scientific profiles is based on publication data. The most common sources of publication data are:

  • Scopus
  • Web of Science (WoS) and
  • Google Scholar.

Several advantages and disadvantages exist for each of them. The most commonly source used in the European context is Scopus. Web of Science is known for having restricted coverage (see table below) and marked predominance of papers in English. In contrast, Scopus has greater coverage and through its tool ‘Scimago Journal and Country Rank’, it allows for the realisation of free bibliometric analysis (although with restricted indicators). Finally, Google Scholar has a much wider coverage, no linguistic biases, and is free. However, it is not as sophisticated and accurate as WoS or Scopus when performing more advanced bibliometric analysis. The table below provides a comparison of Web of Science, Scopus and Google Scholar.

Table 2 Comparison of features in WoS, Scopus and Google Scholar

Table 2 Comparison of features in WoS, Scopus and Google Scholar

Source: Bakkalbasi et al., 2006

Although national sources are also relevant, and those provided directly by the universities and academic organisations, these do not necessarily allow for benchmarking, unless the benchmark is done at the level of organisations or specific disciplines.

Relevant data sources

Scientific production profiles at national and regional (NUTS2) level have been produced by DG Research and Innovation (DG RTD) of the European Commission (EC) for several years. Between 2010-2014, Science-Metrix was selected as a provider of bibliometric indicators for DG RTD. They were collecting, analysing and updating all bibliometric data that was integrated into the EC’s evidence-based monitoring of progress towards the objectives set by the Lisbon framework and the post-Lisbon Strategy of the European Research Area (ERA). The analyses provided by Science-Metrix to the EC focused on the scientific performance – including impact and collaboration patterns – of countries, regions and research performers such as universities, public research institutes and companies. The statistics and indicators produced by Science-Metrix are based on a series of indicators designed to consider national and sector specificities, as well as to allow an analysis of the evolution, interconnectivity, performance and impact of national research and innovation systems in the EU. They also provide an overall view of Europe’s strengths and weaknesses in knowledge production across fields and subfields of science. All Science-Metrix reports are available here: http://science-metrix.com/en/news/the-european-commission-publishes-six-reports-produced-by-science-metrix

Science-Metrix has also built a journal-based, mutually-exclusive classification scheme (i.e. taxonomy) to delineate the main fields and subfields of science. This taxonomy is available here: http://www.science-metrix.com/en/classification. Moreover, they have also matched subfields (and, where necessary, journals) to 17 FP7 thematic priorities and 22 industrial sectors, allowing for further analysis along EU priorities and economic activities.

For the period 2016-2018, CWTS and INCENTIM (KU Leuven) were selected by DG RTD to produce and update a wide range of bibliometric data and performance indicators[6]. In addition, dedicated methods, metrics and indicators will focus on ‘open access’ publications, gender equality and research mobility.

To our knowledge, the data for producing scientific profiles for several NUTS2 regions already exists and has been thoroughly processed by DG RTD’s contractors. In the Science-Metrix reports, selected NUTS2 regions include the 50 regions which published the largest number of peer-reviewed publications over the 2000-2011 period. This work could be potentially extended to all EU28 NUTS2 regions, for which further data would require data collection and analysis. It would be however necessary to agree with DG RTD if this data could be used in an online tool and updated accordingly by their current contractors in an annual/regular basis.

Additionally, and based on this data, DG RTD produces every year the “Science, Research and Innovation Performance of the European Union” report that looks at the performance of EU countries in research and innovation. The report usually includes one chapter of science and technology outputs, in which country strengths and weaknesses on scientific production are highlighted[7].

The Essential Science Indicators from Clarivate Analytics[8] is a tool (formerly the IP & Science business of Thomson Reuters) that analyses top research output and research fronts, by revealing emerging science trends as well as influential individuals, institutions, papers, journals and countries/territories in different fields of research. It produces a compilation of science performance statistics and science trends data that is based on journal article publication counts and citation data from Web of Science. This tool is for-profit and requires paying a subscription fee.

Finally, Publish or Perish (http://www.harzing.com/resources/publish-or-perish) is an open-source software programme that retrieves and analyses academic citations. It uses Google Scholar and Microsoft Academic Search to obtain raw citations, which are analysed and then several metrics are produced. The metrics include the total number of papers and total number of citations, the average citations per paper, citations per author, papers per author and citations per year, as well as different variations of impact factor. This tool is useful for analysing individual focal authors of interest, or group of authors in which scientific strengths have been revealed.

Implementation roadmap

As discussed above, ideally it would be best and most efficient to work in collaboration with DG RTD (and their contractors) already producing scientific profiles for EU countries and regions, and simply use the data that is being produced and integrate it into an online tool, where regions can easily benchmark their performance to other preferred EU regions. Below, we present a full roadmap of what would be necessary to implement and develop such a tool from scratch (assuming no collaboration with DG RTD is possible). In case DG RTD makes available the data and analysis being done over the years in producing science profiles, the roadmap presented below would then start in Step 4.

  1. Data collection. Publication data for all EU28 regions should be collected using Scopus, Web of Science, Google Scholar or all (or alternatively only one source of choice).
  2. Data regionalisation. All data collected should be regionalised to produce regional bibliometric indicators.
  3. Data analysis and production of indicators. All bibliometric indicators (or a selection of these) listed in section “Description of the method” are produced.
  4. Graphic representation of data. Once data is analysed, several types of graphical representation could be considered to ease the visualisation and interpretation of bibliometric data, such as radar graphs and dashboards.

An example of a radar graph from the EU scientific production profiles is presented below. Radar graphs are useful for interpreting relative strengths and weaknesses of a region/country.

Figure 6 Scientific specialisation index by main scientific fields for the ERA, China, Japan and the US, 2000-2010

Scientific specialisation index by main scientific fields for the ERA, China, Japan and the US, 2000-2010

Source: Science-Metrix based on Scopus data in European Commission, 2013

Dashboard tables could be used to present several indicators side-by-side, often in the form of micro charts, using small graphics inserted in the table. This type of tables allows for rapid comparisons between entities, and/or for the easy interpretation of trends over time. An example of dashboard table is presented below.

Figure 7 Publications in Telecommunications by country, 2000-2011

Scientific specialisation index by main scientific fields for the ERA, China, Japan and the US, 2000-2010

Source: Science-Metrix based on Scopus data in European Commission, 2013


  1. Online tool. Once the data is analysed and presented in a visually attractive way, all should be collected and organised into an online tool that is accessible to the public. This tool should be interactive, and allow the selection of benchmark regions. Science profiles should be shown of the focal region and its benchmarks, allowing to make comparisons and further analyses.
  2. Continuous data update. All data should be continuously updated and added to the online tool.