Categories: Home | General | Requirements Report | Design
|The I-WIRE Project - A Repository Enhancement Project|
I was fortunate enough to be able to take some time out of our busy testing phase last Friday to visit the Repositories Support Project's event in London on Open Access and the impact for libraries and librarians. I was drawn by the content and the speakers, and my instinct was right as it turned out to be compelling from start to finish. Here are the points that resonated with me and my world.
Bill Hubbard of SHERPA opened the event by defining Open Access, its background and drivers. Bill went on to give his view of where the academic community is with Open Access. The scope has always been more than outputs alone: it includes data, grey literature (e.g., lab note-books) and arts media amongst other things, but there is currently a big gap between 'open to read' - the focus of most repositories - and 'strong' Open Access that includes use and re-use of data.
While there have been a number of drivers on the road to Open Access, including the serials crisis, Open Access is a component of the overall shift in academic practices and is something that the community is shifting to 'because it can'. Change is coming, indicated by the academic use of Slideshare, Flickr, YouTube, Mendeley and personal web pages. Bill states that all three units that operate in the academic model - Academics, Funders and Institutions - are in favour of Open Access. The structures, services and processes are in place to support Open Access. Repositories add value to the processes by providing control and authority over content.
Alma Swan gave a very thorough overview of the JISC funded and well publicised economic case for Open Access, including John Houghton's modelling and the work that Alma has undertaken with institutions in the UK, Australia, Netherlands, Denmark and the US. Anyone familiar with the work will know that research intensive institutions don't always come out well in the model, in fact they can see a potentially negative financial impact where Gold Open Access (pay-to-publish) costs rise above a certain level. Alma acknowledged this and emphasised that the UK can save money overall and we need to discuss at a community level how the overall saving is managed so that individual institutions are not disadvantaged. These discussions are already taking place between research funders and institutions.
One of the questions at the end of Alma's talk highlighted that Subject Repository costs weren't included in the Houghton model as apparently no one can see their sustainability; whereas Institutional Repositories are sustainable due to the institutional imperative. In Alma's view, Subject Repositories should harvest their content from Institutional Repositories and not take direct ingest.
Wim van der Stelt of Springer provided a publisher's perspective on Open Access. He expressed his bemusement that Open Access is an academic's cause but championed primarily by librarians! He went on to assure the room full of (mostly) librarians that Springer is different, has an 'agnostic business model' and is driven by customer demand, and so is working with libraries on Open Access, pioneered the 'hybrid journal' and is a 'green' publisher on SHERPA's RoMEO database. The internet has helped change the publishers role from purely distribution since the 50's and Springer has adapted to this. However, the pay-to-view and pay-to-publish systems will need to co-exist for some time as the Gold route to Open Access is growing, but not quickly enough.
The development of the University of Glasgow's repository was covered in a case study from Susan Ashworth, providing a fascinating insight into the university's work since 2001 to create a culture of Open Access, not just an institutional repository, and to evolve the repository into a central publications management system.
Key drivers for this work have been increasing citations, presenting a public view of the university's research profile, demonstrating compliance with funders mandates, managing publications and preparing for the REF.
Sue pointed out that building relationships with the university research office and academic departments has been fundamental to the repository's success. There is also a strong national initiative in the form of the Open Access Team for Scotland, and talk of a Scottish council to help create a climate of opinion on the importance of Open Access. A Scottish Open Access declaration was made in 2004, spurring all universities on to set up repositories and put mandates in place. Glasgow's mandate was issued in 2008, partly influenced by the library's experience of collecting publication data for the RAE. The mandate covers new publications from 2008, requesting bibliographic data as the minimum, and also providing a standard form of address to aid citation analysis.
The university clearly recognises the importance of its repositories. It has three, making the management of different types of outputs more straight forward, and all three are harvested by the university's library catalogue discovery tool.
The university's research and strategy committee is given regular reports, generated using ROAR and Google Analytics. These reports show that the full-text ratio is growing from the current ten per cent.
Sue talked briefly about the Gold route to Open Access. Glasgow has a pay-to-publish fund but anticipates this being difficult to argue for in the next financial year, and expects academics to cover these costs in their grant applications.
Glasgow is currently conducting a mini-REF exercise using a modified version of EPrints that allows academics to rank their top four outputs with some supporting text attached to the publication record, and to record 'esteem' and 'impact' information. Academics can change these records at any time but it provides a good view in the lead up to REF, and has also seen an increased rate of self-deposit, some with full text.
Important lessons for Glasgow have been the importance of advocacy, relationships, acknowledging the variety of user needs (the repositories support multiple deposit methods), making use of external influences and linking the work to central institutional requirements. Which led on nicely to a question about the better driver for self-deposit: the mandate or REF? In Sue's opinion, while the mandate was instrumental in triggering the Open Access debate at the university, the REF preparation has resulted in deposits.
David Carr of the Wellcome Trust provided a funder's perspective. Maximising access to outputs is central to the Wellcome Trust's mission, and it was recognised in the early 2000's that traditional academic models were not consistent with this goal. The trust made it mandatory in 2006 for their funded outputs to be made Open Access and is working with the major Scientific, Technical & Medical publishers to achieve this. There are challenges: improving compliance, persuading researchers of the benefits, improving payment mechanisms, clarifying publishers' policies and flipping the model from subscription to 'author-pays'. Questions at the end of David's presentation demonstrated a strong opinion that the funders need to take a much stricter line in enfforcing their policies.
Chris Middleton of the University of Nottingham talked about the institution's approach to funding Open Access publishing. A survey in 2009 showed that 14% of institutions had a central Open Access fund and that there is generally a low awareness of such funds amongst academics. Chris also pointed out that it's difficult to budget for these funds as they are sensitive to author up-take.
A presentation of the role of professional librarians in repository management was given by Jackie Wickham of the RSP. There has been a phenomenal growth in this area in the last five years, partly driven by the global move to Open Access, central government support and JISC funding, preparations for REF and providing a service to academics. A recent survey conducted by the RSP identified that communication skills and perseverance are key skills for librarians working with repositories, as anyone involved in this area will know!
Paul Ayris of UCL provided the European perspective on things, and talked about some of the european initiatives such as the Open Access theses gateway DART-Europe, LIBER and LERU. Importantly, the Gold Open Access route has been acknowledged at a European level as being too expensive and difficult to justify in the current economical climate, so the community is at an interesting cross-roads for the Gold and Green routes. At the end of Paul's presentation, Ken Chad suggested that portals and aggregators should make more use of 'attention data' and that this could be a growth area where institutions' services could be developed to rival those of Google.
Bill Hubbard closed by stating that the community is moving to Open Access 'because it can'. The whole academic model can be changed. It is up to funders to set the direction through their funding and programmes, institutions to enable and facilitate, and researchers to research.
That's my take on the event, but you don't have to take my word for it as the presentations are now on-line.
Just back from showing the demo version of our quick deposit ORCA Lite portlet to a forum of Research Administrators. Great to recognise nearly half of the faces from our user needs analysis phase this time last year! As sustainability of the project's outcomes begins to shift to the forefront of our minds, I took the advocacy road and have hopefully started the Research Administrators thinking about how they can encourage Researchers in their Schools to use ORCA Lite. We'll be working with the Research Administrators more in the future under our longer term Advocacy project.
Questions at the end touched on the more subtle and complex challenges that the project has identified, such as the over-lap between ORCA and ResearcherID, and the approach to integration with already established publication management systems in some of the Schools; subjects that can't be addressed within the current project's timescales but are at the top of our list of potential future work.
One more demo scheduled before we let people get their hands on ORCA Lite for themselves: Alpha testing is planned for November and December.
Great fun with Wordle today. I'm pleased that our word cloud shows the user firmly at the centre of the project.
The I-WIRE High Level Design and Technical Specification has just gone to our Project Management Group and IT team leads for review and approval. Here are the key design decisions and overview of the solution that the project will be developing and testing over the Summer.
Portlet and Integration
The simplified deposit workflow will be surfaced in the MWE (Modern Working Environment) Portal via a portlet developed and hosted using IBM WebSphere. The portlet will be connected to ORCA using SWORD for deposit, and using the EPrints API and EPrints 3.2 REST interface for other functions.
The portlet will contain the following tabs, which are explained further below:
- Quick Deposit
- DOI Deposit
- Web of Science Import
- Search ORCA
- Browse ORCA
- My Publications
The landing-tab (default) will be My Publications, and the portlet will initially carry the title ‘ORCA Lite’ in order to differentiate it from the full ORCA service which provides a different user experience.
All of the above is subject to end user feedback during the development phase, and further concept proofing in one of the MWE environments during the early part of the development phase.
All items deposited using the portlet will be reviewed by the University Library Service Cataloguing Team for completeness, and the bibliographic data supplemented where necessary, before being made available in live ORCA. The publisher’s copyright policy will also be checked to ensure compliance. This mediated approach to deposit is essential to ensure consistency and accuracy of data, particularly important as future projects are likely to integrate ORCA with other research related processes and systems, in particular for the REF.
Alternative approaches to the review process (e.g., review by exception) can be explored as a separate project if the volume of new deposits becomes too big to manage on a one-by-one basis.
The Quick Deposit tab will capture from the user the minimum bibliographic data required for the Cataloguing Team to supplement the data, whilst keeping effort to a minimum for the user. If a user decides that the minimal bibliographic data is not enough for a particular deposit and they wish to populate the full bibliographic data themselves (abstract, keywords, etc.), they will be able to select an Advanced Deposit link to ‘full’ ORCA that will take them to the extended deposit screens that are currently in existence and will continue to be available in parallel to the MWE Portal-based deposit route. Single Sign On will be implemented to avoid the user having to log-in to ORCA if they chose the Advanced Deposit from the portlet.
The IdMan unique person identifier will be added to each Cardiff author automatically during the deposit process.
The DOI Deposit tab will allow the user to enter a DOI and retrieve the associated bibliographic data from the CrossRef service. The search result will be displayed to the user for checking before submitting to ORCA, and a full text file can be added by the user at this point if desired.
The Cardiff authors associated with items returned by the search will have to be identified and their email addresses added by the Cataloguing Team during the review process. The addition of email addresses will also trigger the addition of IdMan identifiers via a planned overnight automated process.
The project team will explore if duplicate items can be identified at this point by a lookup of the DOI in ORCA at the same time as retrieving the data from CrossRef, in which case the user would be prevented from depositing the duplicate item.
Web of Science Import
The Web of Science Import tab will allow the user to search Web of Science using an author name and optional date range. The user can identify and select individual publications from the list presented, and a full text file can be added for each individual publication. This service could help a user to retrospectively populate ORCA with their entire publication lists, where their discipline is covered by Web of Science.
As with the DOI Deposit, the Cardiff authors associated with items returned by the search will have to be identified and their email addresses added during the review process. The addition of email addresses will also trigger the addition of IdMan identifiers via the planned overnight automated process.
The School(s) associated with a particular item will be selected by the user during deposit. It is not desirable to automatically populate this as the current affiliation for an author may not have been valid at the time the publication was written.
Research Centres and Research Groups will not be included in ORCA by the I-WIRE project due to the huge variations between them and the potential for inconsistent data. This will be kept as a candidate for a potential future project.
Due to the additional fields that would be needed to capture embargo type and date, a user that wishes to specify an embargo for a publication’s full text item, and optionally its bibliographic data, will need to select the Advanced Deposit link to ‘full’ ORCA. Embargo functionality will function as it does for the current ORCA, i.e., the Cataloguing Team will decide during the review process what steps are to be taken.
The My Publications tab will present the logged in user with a list of their publications. Long lists will be navigated by either a scroll-bar within the portlet, or via multiple navigable pages containing a sub-sets of the full list. The preferred approach will be explored during user feedback in the development phase and could be steered by any constraints with the Portal technology.
The user will have the option of exporting their publication list to a file. Within the I-WIRE project, this will export the entire list. The ability to select a sub-set of publications for export is seen as complex due to the required caching of a user’s selection across multiple pages (if pages are used), and therefore is being kept as a candidate for a potential future project.
The user will also have the option of selecting their top publications on an individual publication basis. A marker will be available in the standard data feed and may be used by Schools to identify selected publications for profile web pages. Each selection - either checking or unchecking an item - will update the ORCA database in real-time. The portlet will not implement any constraints on the number of items selected as each School is likely to have a different requirement, therefore, it is up to the School to decide how to deal with any constraint violations, e.g., ignoring anything after the first 6 selected publications. This function will require an extension to the ORCA database to introduce an attribute that allows marking of a publication on an author basis to cover scenarios where there is more than one Cardiff author associated with one publication.
The Browse tab will offer browse by author name, item type, school and year. Item type and school will be further grouped by year due to the volume of items associated with them. A scroll-bar or hyperlinked grouping will be implemented to aid navigation. An icon will be included in the publication lists to easily indicate the presence of an associated full text file.
The Search tab will offer a keyword or phrase search. The tab will include a link to full ORCA for the Advanced Search, offering many more fields that help limit and focus the search results. However, the experience of by the team has shown that the Simple Search will usually return the desired results.
Reuse of Bibliographic Data
The HTTP GET URL can include filters to limit the results by item type, school, author, email address and year.
The IdMan unique person identifier will be included in the data feed. The data feed will also include the ‘selected’ publications marker introduced by the project to help Schools indicate top publications on author’s profiles in their web pages.
Importing publication lists will continue to be supported by the current mediated process that takes publications lists from the Schools and validates the data before loading it into ORCA. Apart from the Web of Science tab, bulk import will not be surfaced in the portlet.
Deposit to Multiple Repositories
The ability to submit an item to other repositories at the same time as ORCA is being explored by the project team. While PubMed Central would be the most obvious choice, it does not offer a SWORD client. The benefit of a SWORD client has been notified to PubMed Central, but in the meantime a deposit to the arXiv subject repository will be explored by way of a drop down list in the Quick Deposit tab that allows the optional selection of arXiv.
Integration with Research Publication Management Systems
The User Needs Analysis phase of the project identified a number of Schools that have implemented - or are in the process of implementing - Publication Management systems. This provides opportunities to integrate ORCA with the Publication Management systems and associated processes that are in use in those Schools. However, the variety of systems and their different levels of maturity would be too wide for the project to tackle on an individual basis without putting it at severe risk of not delivering a usable solution within the project’s timescales.
Consideration has been given to the opportunity for integration, particularly as this could help the project with one of its key objectives of delivering an enhanced deposit workflow that is firmly embedded in the research management process. This will be balanced with delivering an enhanced workflow and toolset that meets the needs as far as possible of Schools that do not have any Publication Management system in place.
The I-WIRE Project Management Group agreed that the project team should explore integration with MEDIC’s implementation of the Symplectic Elements Publication Management system, as MEDIC are one of the project’s partners and a member of the Project Management Group. This is being conducted in parallel to the delivery of the I-WIRE solution.
ORCA management information reports will be used by the Cataloguing Team to identify duplicate items. At first a manual process, this will - over time - provide a set of rules that could be automated to run directly against the database in the future, and reduce the effort associated with this activity.
Citation Count Data
Access to citation count data is increasingly important for authors, and the portlet would be a convenient place to show this data alongside an author’s publications, without having to log-in to separate databases such as Web of Science. However, until there is more clarity on the citation data that the REF will be utilising, and the method of retrieving that data, this development will be kept out of scope of the I-WIRE project and kept on the list of potential projects that specifically support the REF. This will avoid the possibility of any re-work when this area is better understood.
A number of Thomson Reuters services are available to institutions to assist in the management of publication data and help improve and maintain the quality of the data. The ResearcherID service is one that is particularly high profile and relevant as it provides a globally unique researcher or author identifier which in turn can be used to retrieve publication lists and citation data from Web of Science, and a host of other Web of Science bibliometric data.
Initial discussions with Thomson Reuters have identified ResearcherID as an opportunity to improve data quality in ORCA as a separate activity to the deposit workflow, and it will therefore be explored outside the I-WIRE project. There may be an opportunity in the future to include this data in the deposit workflow but it is not yet understood enough to be able to be committed to within the I-WIRE project timescales.
Automated Email Reminders
One of the suggestions made during the latter part of the User Needs Analysis phase was for email reminders to authors that hadn’t updated their publication lists for some time. More analysis with authors would be needed to understand the timing and frequency of these reminders, as the time span between publications can vary immensely between disciplines and authors, and the function would therefore need to be configurable to meet these different user needs. Such user configuration would add a level of complexity to the portlet that should be addressed by a future project in order to keep the I-WIRE delivery as risk-free as possible, and to gain feedback from users on what the valuable future features would be.
The following diagram illustrates the scope of the solution being delivered by the I-WIRE project in terms of functions, and the systems and primary data flows supporting those functions.
As the I-WIRE Project is now half way through the Workflow and Toolset Design work package, it's worth taking some time out to summarise some of the design decisions that have been made by the team to date. These decisions relate to the Deposit User Stories, as they have been the primary focus of our initial design iterations.
Mediated Deposit Process
We will continue to review all deposits before moving them to the live repository. In the short term, this will require the same level of staff in the University Library Cataloguing team as we currently have. In the longer term, as volumes grow due to the launch of the enhanced deposit process, and a planned University mandate, the team will need to explore other review methods using management information reports and a 'review by exception' approach.
Due to the importance of the repository in supporting REF, and the potential future integration with other research systems and processes, the continuation of this approach is seen as key in maintaining the quality of the publication data.
Minimal Data Entry
Minimal data entry is a top priority requirement from Academics. We have agreed a minimal data set that will provide enough information for the Cataloguing team to check and enhance the data at the review stage:
Article Title, Author(s), Journal/Conference/Book Title, Year, School, URL
Plus the Publisher and ISBN for books and book sections, and the Patent Applicant for patents.
Publisher and ISSN will be automatically retrieved from the RoMEO database for journal articles.
Authors email addresses will be auto-completed from a lookup table, and their associated Unique Identifier will be populated in the background.
The portal will include a link to the full ORCA service for Academics that want to populate more than the minimum data themselves.
To make things even easier for Academics, were are aiming to provide two additional services in the portal:
- DOI Deposit will retrieve the full set of data from the CrossRef database using the DOI supplied by the Academic.
- Web of Science Import will enable an author to search for their publications, check those that are returned and import the basic metadata into the repository.
Both of these methods will also allow a full text file to be added to the deposit.
Research Centres and Groups
We have explored but decided not to pursue the inclusion of Research Centres or Groups in the I-WIRE Project:
"Any ORCA outputs designation other than School (which is the University's constitutional unit of currency) is problematic. There are huge variations in Research Centres and if we ask people to 'self-declare' attribution to a Research Centre we will get very very messy and inconsistent data. Alternatively, if we produce a drop down list of research centres for people to choose from it will not be complete and we have no way of validating that they are in fact members of the Centre. There is a level of complication here that would be hard to unpick if we get it wrong from the start, so we have agreed to have this as a developmental option after the implementation (of I-WIRE)."
This is the final in a series of blog entries that has summarised the findings of the I-WIRE project's User Needs Analysis phase, ending with this summary of the requirements captured during the Future State modelling group session, and during interviews with Research Administrators.
1 Guiding Principles
Considering the I-WIRE project’s objectives, the following guiding principles were applied when analysing the User Needs and assessing whether or not they should be accepted into the project’s scope and taken forward into the design phase.
The deposit process should be author centric. Authors are the best equipped people to provide all the metadata associated with their publications. The further removed the depositor is from the author themself, the bigger the overhead associated with populating the required metadata.
The deposit process must be as simple as possible, require minimal entry from the depositor and auto-complete as much of the data as possible on behalf of the depositor. Auto-completion will not only save time for the depositor, but also help improve the quality of the data captured.
Full text deposit is desirable, however, should not stand in the way of metadata deposit as a minimum in order for the repository to support the REF in parallel to its role as an Open Access repository.
Deposits should be made as early in the publication lifecycle as is reasonable, making the item available to as wide an audience as possible. This can be a lot earlier than the item appearing in one of the subject repositories or commercial databases, however, depends on the publisher’s copyright policy and can dictate which version of the item is deposited (pre-print, post-print or publisher’s final version).
Data should be re-usable for as many different purposes – by as many different processes and systems – as possible. Therefore, the data should be made available in a standard format that can be processed by its consumers.
With the above points in mind, the project team worked through each requirement (grouped into themes), and produced User Stories for the requirements that can be delivered in the project's timescales. The User Stories express the requirements in a neutral manner, written from the end users point of view to ensure the requirements are linked to genuine value, and written with Acceptance Criteria that can be used when it comes to testing the solution and piloting it with the partner Schools.
2 Summary of Requirements
This section summarises the User Needs identified during the interviews and sessions, along with the outcome of the analysis that has been carried out and an indication of which User Needs are being taken forward to the Design phase.
This has been an objective from the inception of the I-WIRE project and is essential for its successful outcome. Integration means surfacing the deposit process, and the browse / search functions, in the MWE portal, enabling data and process integration. Including the more specialised admin and review EPrints functions in the portal is not feasible in the project’s timescales and is seen as lower value as it does not contribute to the primary objective of an enhanced deposit process.
All Research Administrators and authors have stated that the deposit process must be as simple as possible and require minimal effort from the depositor, as authors do not have the time to spend on this activity. If possible, metadata should be automatically retrieved from the dOI entered by the depositor when available.
Being able to upload publication data from other repositories and databases is seen as valuable in saving the authors from having to enter it themselves. Importing data into ORCA - Cardiff's repository - is the responsibility of a separate Retrospective project, and the I-WIRE project is focusing on enhancing the deposit process for authors to be able to deposit as early as permissible and not have to wait for publication data to appear on various databases. However, the project will explore the opportunities for an enhanced deposit process that this type of system integration may provide.
This function would need considerable changes to EPrints and cannot be achieved in the I-WIRE project timescales. However, Research Administrators will be able to identify new deposits for their School using the ORCA data feed that will be implemented by the project.
Research Administrators would like to be able to export publication data into their own databases (MS Access and Excel have been mentioned) for analysis and to use in various other processes such as reports, performance management and funding applications. Due to the supplementary data and rules that will vary by School, the data should be useable in a variety of different processes and systems.
Every School interviewed raised automatic web page updates as a priority need or agreed that it would provide value. The Schools have varying different processes and systems for this currently, ranging from manually intensive to maintaining their own databases. The structure of the Schools research organisations varies too, with data having to be re-used across a variety of pages that serve different Research Groups, Research Centres and Themes.
While discussing this user need, it emerged that authors and Schools will want to be selective over which publications are included in their web pages, as complete publication lists can be too exhaustive. This would require giving users the facility to manage deposits that are already live.
Authors have listed many different processes and systems that require re-use of their publication data, including CVs, web pages, performance management documents and funding applications. The design phase will explore providing an author’s top or most recent publications in the default Portal view, supported by an export facility that could allow check-box selection of individual items.
The I-WIRE project will explore ways that the Cardiff University unique person identifier can be added to ORCA during the deposit process in order to facilitate integration between ORCA and other MWE systems that want to access publication lists. Use of this identifier will be transparent to users as they will not generally be aware of it, using instead items such as email address and name for browsing and searching ORCA
ORCA has very limited metadata for funding and is likely to be a component in the overall solution for management of funding data (and other research data), with links between each component system. Note that ORCA’s role in CRIS (Current Research Information Systems) is being looked at outside the I-WIRE project.
There are issues with including citation data in repositories that I-WIRE will not have time to resolve. The DSpace repository team attempted to include citation counts but encountered licensing constraints. However, the I-WIRE project will explore the various EPrints plugins that have been produced for this purpose. Note that citation data and other metrics are available from established databases (there are more than one) and are under consideration for REF support but usage of these has not been clarified yet.
One Research Administrator would be happy to review the author’s deposits for completion, accuracy and copyright, before promoting them to the live archive, as long as there is confidence that authors are actually depositing their publications on a regular basis. The project will explore this. However, the ‘reviewer’ role is currently carried out within the Libraries Cataloguing team, and will remain in that team for the majority of Schools.
There was also a view that some Schools would want to vet all deposits before they became available in ORCA, and indeed be responsible for the deposit process on behalf of authors, however, this contradicts one of the guiding principles of an enhanced author deposit process.
Many Schools are already using their Staff Appraisal processes to capture and update publication data. The I-WIRE project will look at providing a standard data feed that can be used in this process.
ARCCA would like the deposit process to ask a mandatory question that identifies outputs of research that have made use of ARCCA facilities. This would save the ARCCA team from having to identify these themselves directly with the Schools when producing the annual report. While useful to ARCCA, this is not crucial to the deposit workflow and in fact could be seen as contradictory to the simplified deposit, particularly if the user is presented with a question about ARCCA when they are not familiar with it. However, this will be explored during design.
The School of Medicine have invested in the Symplectic publication management system and are currently integrating it with their School web pages. Symplectic harvests publication data from Web of Science and presents it to authors for validation, minimising the amount of effort required for deposit. However, Symplectic is not a repository of full texts, for which a connector is being produced by the Symplectic team to integrate Symplectic with EPrints. The I-WIRE project will explore the integration between Symplectic and EPrints with a view to identifying opportunities for enhanced deposit, bearing in mind that early deposit into the repository is one of the guiding principles.
It is noted also that the Business School is in the process of implementing Digital Measures which manually captures and stores publication data, alongside other School data such as performance metrics.
The varying copyright policies of different publishers, and the terminology around different versions of articles, are seen as confusing and blockers to the deposit of full text and subsequently use of the repository. Sherpa’s RoMEO database goes some way to helping identify publisher’s copyright policies, and will be looked at in the design phase, but the wider issue of understanding and negotiating copyright will be looked at outside the I-WIRE project.
Various types of links have been requested, including between web page publication lists and the full text articles, and from articles in the repository to the School’s web page. The standard data feed will provide enough data for the consuming system to be able to construct links to the article in the repository (and to highlight Cardiff authors). However, linking from the repository to elsewhere would require a complex configuration feature if it were to satisfy the many different and conflicting requirements that Schools would have, and would therefore not be achievable in the I-WIRE project’s timescales.
Some publishers allow embargos. There would be value in system storing these publications and automatically making them available at the embargo expiry date. The project will explore the current embargo function but believe it to be limited and therefore more effective for depositors to submit metadata ahead of the full text in these scenarios.
Data that is related to the repository but is not within its scope includes funding bodies and sources, and the impacts of research needed by the REF. Note that ORCA will be just one component in the overall solution required to support the various calls for this type of data, but the I-WIRE project is not looking at solutions for identifying, modelling or storing this data.
3 User Stories
The User Needs that are being taken forward to the Design phase have been written as a set of User Stories, an example of which is shown below for illustrative purposes.
The full set of User Stories is available from the I-WIRE project team on request.
There is no guarantee that a particular User Story or Acceptance Criteria within the story will be fully delivered by the project, as the Design work package itself will have to conduct a deeper level of analysis in order to design a solution that is feasible to deliver within the project’s timescales. Each User Story has been prioritised with this in mind, to allow the project to continue with flexibility, and also to adapt to changes that are beyond the control of the project.
This is the second in a series of three blog entries that will summarise the findings of the I-WIRE project's User Needs Analysis phase, continuing with this summary of the Current State process that was captured during the group session with academic authors, and the issues associated with it.
1 Lean Group Session
During the Lean group session with academic authors, the management of research publications was examined end-to-end, from the initial research opportunity through to the various reports that are produced by the Schools. While the scope of the I-WIRE project is only one part of the end-to-end process - specifically around the deposit of items into the Repository - the end-to-end process was examined in order to identify any opportunities for improvement or automation of the deposit workflow by potentially integrating with other processes, and also to identify opportunities for re-use of the publication data and elimination of wasted effort.
The following two diagrams illustrate the main steps in the end-to-end process.
2 Summary of Issues with Current State Process
The following issues were raised during the group session and continually throughout the interviews, and are perceived as blockers to Open Access depositing:
- Complexity and uncertainty around the whole area of copyright policies and which version of articles can be deposited where. Some authors admit to often signing away copyright with little attention to it, and would like the School or University to manage copyright negotiations on their behalf, freeing them up to get on with the next piece of research. This could include retention of copyright (particularly for conference papers) or permission for deposit in Institutional Repositories.
- The final version of an article is 'king', and generally cannot be deposited anywhere other than with the publisher. And it's important to be published in the right journal that has a good impact and the right audience. However, there can be up to a year between submission and publication, and differences between on-line and paper versions.
- The perception that target audiences will have access to articles through journal subscriptions anyway, so depositing to an Institutional Repository is an unnecessary step.
- Academics don't have the time to deposit so anything beyond a single click is too time consuming, and there are databases that already collect this data (Web of Science, etc.) on behalf of the author.
The current process is seen as taking too long and requiring too much data to be keyed by the depositor. Academics want 'maximum output with minimum effort'.
Any new process needs to be flexible enough to cater for different approaches and requirements across the different schools. For example, the School of Medicine are running Symplectic which gathers their publication data for them, and are happy that final versions are available from the publishers through existing subscriptions. In contrast, outputs from the School of Journalism, Media & Cultural Studies won't neccessarily be covered by a single database, and therefore have to be deposited manually. Plus they have a lot of conference papers that could be included in the repository if an expert could help with the less-than straightforward copyright policies.
It's not feasible for the repository to serve the needs of every School in terms of the reports that they have to produce. This is due to local organisational hierarchies and the application of local data and rules to enhance the publication data, and therefore there will be a need for ORCA to provide raw publication data for the Schools to enhance and manipulate themselves. However, the many different outputs proposed, and access to metrics, may not be as valuable for the humanities based Schools as the sciences.
Research Administrators often have to chase academics for up to date publication lists to be used in a variety of processes such as reports to School Boards, web site updates and appraisals. Academics get annoyed if they feel they are giving the same information to a variety of places.
Balancing the needs of:
The science based Schools, focussed almost exclusively on final versions in journals made available through subscriptions; with little concern for copyright and wanting to have very little manual input (if any at all).
The social science and humanities Schools, with a variety of output types and destinations and no single database giving them complete coverage, so more open to having to capture their publication data themselves, and therefore with more concern for copyright.
The next and final blog entry in this series will summarise the requirements captured during the Future State modelling group session, and during interviews with Research Administrators.
This is the first in a series of three blog entries that will summarise the findings of the I-WIRE project's User Needs Analysis phase, starting with this entry on the approach taken by the project team.
1 Project Overview
The I-WIRE project aims to develop a workflow and toolset, integrated into a Portal environment, for the submission, indexing, and re-purposing of research outputs in ORCA (Online Research @ Cardiff) - Cardiff University’s Institutional Repository. This will be based on requirements gathered from academic Schools and administrative Directorates in the University during a User Needs Analysis phase - the subject of this Requirements Report.
The impetus for the project comes from a decision by the University Research Committee to approve the mandatory deposit of new publications in ORCA, subject to detailed consideration as to the means by which staff up-load data to the repository.
The I-WIRE project’s primary aim is to develop a submissions and metadata workflow that minimises staff effort and offers other value added services that will encourage use and in particular author self deposit of new publication data. There is also an increasing need for the University to create and maintain an up to date, central publications database in preparation for the REF
2 Stakeholder Identification
The following communities were identified by the project team as current or potential users of ORCA, and therefore key Stakeholders in the I-WIRE project:
- Researchers and academic authors in all Schools through their need to make their research outputs available on an Open Access basis.
- Research Administrators in all Schools through their need to provide reports on the School's research outputs.
- The Repository Manager and Subject Librarians through their responsibility to ensure the quality of metadata in ORCA.
Other Cardiff University staff in the Schools and Directorates who expressed an interest in either ORCA or the project itself during project initiation - but fell outside the above communities - were also included as Stakeholders. For example, interest in the project was expressed by Web Managers looking for ways to automate the population of publication lists on web pages.
3 User Needs Analysis Approach
By the time the I-WIRE project team was in place and the User Needs Analysis work package was able to start in October 2009, the project was six months into its two year fixed life cycle and was at risk of missing the scheduled delivery of the Requirements Report at the end of December 2009
The project team gave full attention to the User Needs Analysis work package and agreed the methods that would be used to provide as much coverage of the stakeholders as possible in as short a time as possible. A schedule was drawn up that would see the majority of stakeholder interviews completed by the end of December 2009 and the analysis take place during January.
Giving consideration to the key stakeholders being spread across 29 Schools - of which each School has its own systems and processes - and numerous supporting Directorates, and the short window of opportunity, the project team agreed to carry out the following key activities in parallel through October to December 2009:
The project team arranged for a series of communications and agenda items at related meetings and committees to ensure awareness of the project and its objectives, and to seek support from key stakeholders where required.
3.2 Interview Research Administrators
A structured questionnaire was developed and piloted with the partner Schools in order to consistently capture requirements during each interview, and to steer the conversation to provide maximum coverage of all topics.
With the assistance of Subject Librarians, the project team arranged to interview the Research Administrator in each School. Some of these interviews also captured the requirements of authors through the attendance of Research Directors or Heads of Schools.
Due to the short window of opportunity, the Schools were scheduled in order to give a good cross-section of science based and social-science / humanities based Schools, with the intention of covering a minimum of one third of the Schools by the end of the work package, and more if feasible.
The project team agreed to capture all requirements raised during the interviews, regardless of whether the requirement was within the scope of the project or not, with the intention of passing out-of-scope requirements to the Repository Manager during the analysis phase to inform the longer term repository roadmap and other projects.
3.3 Group Sessions with Academic Authors
The project team brought together a group of authors from the partner Schools - identified by the Research Administrators - in two workshops facilitated by the Cardiff University Lean team, to:
capture and document the Current State process that authors follow for the management of their publications, along with associated issues. The process was looked at end-to-end from identification of research opportunities to production of reports for Schools management, in order to identify any opportunities for linking the deposit process with other processes such as Performance Management. The Current State process has been documented as the project baseline in order to assist with evaluation of the project’s outcomes; and
agree and document an enhanced and simplified Future State process that the same set of authors agreed would encourage them to self-deposit in the repository, along with other requirements that they may have.
3.4 Interview Other Interested Stakeholders
The project team also arranged to interview staff in the Schools or Directorates that expressed an interest in the project or ORCA through the communications sessions or other groups and committees.
Key learning from the Lean Group Sessions
The following diagram illustrates the stakeholders interviewed and the interview schedule.
4 User Needs Analysis
Analysis of the captured User Needs - as expressed during the interviews and group sessions - is critical to ensure the scope of the project is sized appropriately and to give the project the best chance of a successful outcome.
The project would be at risk of delivering a disjoint and overly complex solution if requirements were taken at face value without analysis. The analysis took place in the form of a series of group sessions during January 2010 made up of the project team and the technical development team, and focussed on:
the most frequently occurring requirements expressed during the interviews;
implementing the requirements in as neutral a manner as possible to ensure the most value can be obtained by multiple Schools being able to use the solution, and not delivering bespoke solutions that cater to the needs of just one School or stakeholders;
what can feasibly be delivered in the project's timescales, bearing in mind available budget and resource; and
solutions that utilise EPrints functionality and features as it is delivered ‘out-of-the-box’, and remain aligned with the EPrints software roadmap and do not deviate from that to such an extent that the EPrints upgrades become difficult and costly in the future.
With these points in mind, the project team worked through each requirement (grouped into themes), and produced User Stories for the requirements that can be delivered in the project's timescales. The User Stories express the requirements in a neutral manner, written from the end users point of view to ensure the requirements provide value to the end user, and written with Acceptance Criteria that can be used when it comes to testing the solution.
Requirements that the team agreed are not in scope of the project have been notified to the Repository Manager for consideration in the longer term repository roadmap and other projects.
The next blog entry in this series will explain the Current State process that was captured during the group session with the academic authors, and the issues associated with it.
Our lead developer Jim got this week off to a good start by showing us the work he has done on our enhanced deposit workflow portal. This is a proof-of-concept in a development environment but a picture - as they say - is worth a thousand words and we can now start to see what the user experience will be and so far it looks good. The portal has a very clean and crisp lay out, and Jim demo'd a dOI lookup and a Web of Science search, alongside the minimal-metadata deposit screen.
The Web of Science search uses the WS Lite API that the Bibliosight project has been working with, and will help authors who want an efficient way of depositing retrospective publications into the repository in bulk. Of course, this approach will not give full coverage of all research outputs for all disciplines, but it will help many.
The dOI lookup retrieves metadata from crossref and will cut down the deposit time for authors that have been given a dOI by the publisher of their article.
For scenarios where Web of Science and dOI can't help, we have been focussing on the very minimum metadata that saves the author time during deposit and is also enough for the reviewer to be able to validate the item and also add to the data.
Integration with Publication Management systems has also been on our minds as we have entered the design phase. The School of Medicine is in the process of rolling out the Symplectic system, and we are beginning to assess the I-WIRE project's options in this area. The I-WIRE pilot later this year is going to have to assess how our enhanced deposit process and associated bells and whistles stand up against Symplectic and other in-house Publication Management systems in their various forms.
The I-WIRE Requirements Report has been issued. I'll be discussing opportunities for dissemination of our findings with Andy McGregor, but in the meantime here is a high level summary of the top priority User Needs. Feel free to get in touch if you would like to discuss any of this.
Re-use of Data
- By academic authors. Publication data stored in one place and available to be re-used in multiple processes, including population of CVs and Performance Management documentation.
- By Web Site Managers. Easier and automated updates of publication lists on School’s and author’s web pages.
- By Research Administrators. Research Administrator’s able to retrieve publication data to populate School reports, funding applications and support the REF
A simplified deposit process with minimal effort and minimal data entry, and auto-completion of data as far as possible.
The deposit and retrieval processes should be flexible enough to cater to differing needs across the Schools.
These User Needs have been written as a set of User Stories that are being taken forward into the design phase. We have broken the design phase into a number of iterations, the first of which starts today, so watch this space for interesting snippets and issues as we start working through the detail of what we are going to develop, and how.
The I-WIRE project reached its first major milestone last Friday with the issue of our Requirements Report to the Project Management Group for their review and approval. While the PMG members are individually ploughing their way through the report - they have until next Friday to do this - we have a bit of breathing space to follow up on some of the threads that will be carried into the design phase.
Approval to proceed to design phase
A project proposal has been submitted to the Cardiff Information Services IT Programme Board for approval to proceed to the design phase. This is more of a formality at this stage, but does provide a check point that asks if we have the right stakeholder backing and the right technical people lined up for involvement in the design phase.
EPrints and 3.2
Our growing list of questions about how EPrints can help us meet our user needs has been shared with the EPrints Services team and will form the basis of planning a consultancy day with the team for late February or early March. The day will be an important element of our design phase. Questions fall mainly into the themes of: SWORD, EPrints 3.2 REST interface and our approach to unique author identification.
Analysis of APIs and plugins
I'm taking some time to explore the various databases, APIs and plugins that give access to bibliographic data and metrics, including the Web of Science Lite API (via the BiblioSight project) and Thomson Reuter's ResearcherID. This is not only to identify opportunities for our solution, but also to understand what I can only describe as 'the competition' !
Service Oriented Approach
One of the actions that came out of our meeting with Andy McGregor was to look at the e-framework and JISC Innovation Base. I'm trying to quickly assess how much time the project can give to adopting these and at what level, weighed up with the benefits. As these are new to me, I'd be interested to hear of anyone else's experiences.
Looking at next week, the Welsh Repository Network team are visiting us, and - of course - we will be working through the Requirements Report review comments, so plenty to do.
Uniquely identifying an author to ensure publication lists are complete and correct is not straight forward. When you consider real life situations such as change of name and email address due to marriage, then an email address alone as the unique identifier for a person is not robust enough and could result in incomplete lists being found.
Add other factors such as Institutional policies on the re-use of email addresses for leavers, after a quarantine period, and you get another scenario where search results can be incorrect because they span more than one person.
There are ways around this, for example, using or introducing a genuinely unique identifier that will - given the right supporting processes - survive name and email address changes, and even cope with people who have left the Institution and may return at a later date.
However, this approach does introduce challenges of its own:
Generally, such a unique identifier will be internal to the institution's systems and processes, and not something that is recognisable to the person it identifies, let alone anyone else. We may not even want to expose such an identifier to the outside world. Therefore, we need to continue capturing the user's preferred search parameter such as email address, and translate it to the unique identifier behind the scenes before running a search.
For the scenarios where an email address has changed, this approach copes nicely as the search will return all the publications associated with that person, regardless of what their email adress was at the time. The draw back to this approach, however, when you consider what the user sees in their search results, is that they may be confused that the email address they entered doesn't appear in some of the results. And of course, this approach requires us to populate the unique identifier against all publication records in the repository, which may call for a retrospective population exercise.
For the scenarios where email addresses are re-used after a quarantine period (we don't want to be allocating JonesK947...), this approach presents a bigger challenge. When the search returns more than one unique identifier against the email address, the user is going to have to select the right one, and we may need to present additional data to the user to help them make this decision, such as the associated school. However, a machine interface is unlikely to be able to process this additional interim step unless we make this selection process the part of a very well defined interface.
I'd be interested to hear from anyone who is tackled this area.
I make no apologies for mentioning the Christmas word in this blog entry! It's officially upon us. The BBC have started using their festive themed stings between programmes, so it cannot be denied.
There is still much to do on the I-WIRE project before we can take a well earned break. Priority activities for us over the next week and a half include:
- Use the information gathered during workshops and interviews with stakeholders to document the Current State process and associated issues. This document will give us our baseline for the project and subsequent evaluation activities.
- Take the evaluation plan from its very early draft to a more complete version, ready for review.
- Document the user roles and scenarios associated with ORCA to input to the design phase.
- Start a piece of work to forecast volumes and usage of ORCA post I-WIRE, to feed into the requirements report.
Plus we have a few final stakeholder interviews between now and the break. There are likely to be one or two interviews left for the new year too, so our analysis work in January will no doubt have to be iterative as the activities overlap.
On the technical side, while Jim progresses his paper on options for integration of EPrints with Cardiff's Modern Working Environment portal, we are also looking into the potential use of the Symplectic - EPrints connector that Symplectic have been working on with help from White Rose at Leeds. We need to understand if this connector will help us with some of the requirements coming from the School of Medicine, who are currently rolling out their own implementaion of Symplectic.
We've just completed a couple of process mapping sessions with academics from three of the Schools. The first major deliverable from these sessions is a map of the Current State process that researchers follow, right from the identification of research opportunities through to accessing metrics associated with final publications. This gives us our process baseline, with all the associated issues, from which to work and also to evaluate the project's outcomes against.
Of course, our main focus was the area around depositing to ORCA, and the perceived blockers:
- Complexity and uncertainty around the whole area of copyright policies and which version of articles can be deposited where.
- The final version of an article is 'king', and generally cannot be deposited anywhere other than with the publisher.
- The perception that target audiences will have access to articles through journal subscriptions anyway, so depositing to an Institutional Repository is an unneccessary step.
- Academics don't have the time to deposit so anything beyond a single click is too time consuming, and there are databases that already collect this data (Web of Science etc.) on behalf of the author.
This morning's session walked through a proposed Future State process, much simplified from the Current State in regards of scope, allowing us to focus on the deposit area. Our Future State process proposed:
- Easier population of metadata through use of dOI where available.
- Clearer copyright policies on deposit, supported by a Take-Down policy.
- A sustainable process, providing deposit straight to the live archive, with central support for managing higher-risk output types - by exception.
- One input, many outputs, enabling re-use of publication data, for example: web site feeds.
Again we had a full and constructive discussion about the way things are, and what we are trying to achieve with Cardiff's Repository. The discussion covered:
The process needs to be flexible enough to cater for different approaches and requirements across the different schools. For example, MEDIC are running Symplectic which gathers their publication data for them, and are happy that final versions are available from the publishers through existing subscriptions. In contrast, JOMEC's outputs won't neccessarily be covered by a single database, and therefore have to be entered manually. Plus they have a lot of conference papers that could be included in the repository if an expert could help with the less-than straightforward copyright policies.
Publication Data Outputs
It's not feasible for the repository to serve the needs of every School in terms of the reports that they have to produce. This is due to local organisational hierarchies and the application of local data and rules to enhance the publication data, and therefore there will be a need for ORCA to provide raw publication data for the Schools to enhance and manipulate themselves.
The many different outputs proposed, and access to metrics, may not be as valueable for the humanities based Schools as the sciences.
To truly achieve the goal of Open Access, resources need to be made available to help with the cultural shift and understanding copyright policies for individual publications and output types.
So, in conclusion (for today), I have in my mind a see-saw, with on one side:
The science based Schools, focussed almost exclusively on final versions in Journals made available through subscriptions; with little concern for copyright and wanting to have very little manual input (if any at all),
...and on the other side:
The social science and humanities Schools, with a variety of output types and destinations and no single database giving them complete coverage, and therefore more open to having to capture their publication data themselves, and therefore with more concern for copyright.
Comments and thoughts very welcome!
Things have been pretty busy so I feel it's time to sit back for 10 minutes and take stock of where we are, and what we're learning.
User Needs Analysis and Keeping Hold of Scope
We're in the thick of the User Needs Analysis phase. I say 'the thick'... the progress has been steady and sure to date, and things are starting to pick up now that Jessica Emerton is helping out with the interviews. We've already spoken with the Research Administrators from the Schools of ENGIN, JOMEC and MEDIC, and we have interviews scheduled with CPLAN and CARBS. Now we are chasing down the final 6 Schools that make up the high priorities for this iteration: PHARM, OPTOM, PSYCH, WELSH, ENCAP and CHEMY
While this is going on, the list of priority Schools itself is under frequent discussion as people become aware of the project and express an interest, and as we ensure we have the right cross-section of Schools to adequately represent all Sciences and Humanities.
Next Tuesday - 24th November - will see the first of two Mapping Sessions being facilitated by the University Lean team, with selected academic authors from the Schools of ENGIN, MEDIC and JOMEC. As time is against us and we have only 4 hours in total with these key stakeholders (across 2 separate sessions), we are preparing a baseline Current State process to take into the first session for ratification and to get the discussion off the ground. This Current State process has been reviewed by the Research Administration team at MEDIC and will be looked over by two of the Subject Librarians on Friday 20th November to get it as good as possible ready for Tuesday. The second session will model the Future State process.
Depending on how the two Mapping Sessions go, it is likely we will run a follow-up session with authors from BIOSI and ENCAP to get as good a coverage as we can on the Future State process.
Research Management Systems
As we speak to people, what I'm becoming increasingly aware of is the real mixed bag of approaches across the different Schools, and the maturity of the solutions some of the Schools have in place. For example, MEDIC are using Symplectic to meet almost all of their needs - with the exception of a Repository, and this is where we need to understand the commercial aspects of Symplectic and the work being done to connect it with EPrints.
We've recently been given a lead on some commercially available Research Management Systems which will need looking into, particularly as one of the Schools we are going to interview is considering using one of these from early next year.
At the moment, I'm coming to the conclusion that a single approach will not suit all, and we will be looking at a variety of paths through the Research and Deposit processes, with the Repository being at the core.
One of the real blockers for one of the Schools is the copyright issue that gets in the way of Open Access. There is a perception that unless a high percentage of full text is available from the Repository, there is little value in going to the effort to deposit in the first place. Clearly a simplified deposit process could help counter this argument, but the bigger issue of Open Access is not something that will be solved over night, or by this project.
Ok, back to it, get in touch if you have any questions or pointers.
© Tracey Andrews. Powered by Apache Roller 4.0.1-dev.
|« May 2013|