For Collectors (Born-Digital Images)
An important note before starting any digital preservation – as per LOCKSS (Lots of Copies Keeps Stuff Safe), before converting ANYTHING, be sure to create backups of all files you are converting.
Hey image collectors! This is a guide focusing primarily on the basics of digital image collecting, how to preserve your digital images and what you can do to assure you’ll get lots of use out of them in the future. To begin, we are going to start from a state of little or no knowledge, so if you already know something about a particular section, feel free to skip it! It is important to be aware that in each section there will be links that refer to specific locations around the web that can help with more in-depth questions. It is also important to remember that this is going to cover only born-digital art images. There are great guides relating to digitizing images (making digital copies of physical images) available here: JISC Digital Media Digitising and Preservation. Alternatively, you can also look at the artist or curator sections, where more depth will be provided on various subjects related to born-digital images. So, let’s begin!
File Types/Formats
File types/formats are the fundamentals that define a file and tell a computer how to read it. In practice they are an interpretive layer that allows the computer to make decisions about how to process a file. The name of a given file format/type is entirely arbitrary but it is extremely important when a computer is trying to process a file. If a file is formatted incorrectly, it often will not be able to be read or will be read incorrectly. An easy way to fix digital image file type and format problems is often opening the image in a program called IrfanView. IrfanView is a program that can identify many different file types and formats and if it notices an image file is formatted incorrectly, will provide you with a simple prompt to change it to the correct file format.
Common file types are as follows: .jpeg/.jpg (Joint Photographic Experts Group), .png (Portable Network Graphic), .TIFF (Tagged Image File Format), .gif (Graphics Interchange Format), .psd (Photoshop Document), .psb (Photoshop Big). There are a number of other digital file formats, but let’s assume we’re only talking about these for now to keep the conversation (somewhat) simpler. These file formats encompass the large majority of digital images we see, including digital art, photography and other creative works. Each of them has various advantages and there are definitely some that are better for purposes of collecting.
.jpeg/.jpg is the most common file type for images and is used in a variety of applications and interfaces. The reason for such widespread use includes compatibility, small file sizes and ease of use. As an image file format, .jpeg/.jpg is the file format that sees the widest use among all file types largely via proliferation of images via the internet. Most images on the internet use the format because the small filesize helps the organizations who serve the images save on bandwidth, which would get expensive quickly if they had to move around uncompressed images all the time. Though technically capable of acting as a preservation format, its primary flaw is its inability to support transparency and “lossy” compression. Transparency is effectively making parts of the image “invisible.” “Lossy” compression is compression that results in parts or the entire image being reduced in visual quality in order to make the file smaller. While the compression is rarely so aggressive as to make the image unable to be parsed, it is not uncommon for an image in the .jpeg/.jpg file format to be of lower quality than one in another file format, even if it’s the exact same image. Thus, even in the case where you might have a .jpeg/.jpg as your only copy of an image, it is normally recommended to convert it to .TIFF for long-term preservation.
.png is a lossless compression (keeps all necessary information to accurately reproduce the image) format that keeps all necessary information and has a transparency layer. While this may sound ideal, .png does not support CMYK (Cyan, Magenta, Yellow, Black), which is the standard used when printing materials and also tends to be much larger in file size than a standard .jpg/.jpeg. It has recently seen higher adoption on the internet as bandwidth and connections have become better and cheaper, but is still not a preferred format due to the lack of certain important preservation options. CMYK is a color mixture technique used to create color, and while this has become the standard for print processes, RGB (Red Green Blue) is the standard most digital displays use. CMYK will not look correct on an RGB monitor and is the same is true of the reverse. It is not advisable to convert CMYK images to RGB images, or the other way around, as this is always a lossy process and should be handled by an artist or curator. Once again, the best way to preserve these for the long term is to convert them to .TIFF.
.gif is a file type primarily used on the internet for distributing animated images. While it supports animation very well, it is a format that heavily compresses images and thus is not recommended for preservation. It is also recommended that if you have an animated .gif you want to preserve that it would be better to look for a higher quality image that doesn’t use .gif as its primary streaming method (e.g. .webm). .TIFF will also support animated images for long term preservation.
.psd is a file type invented by Adobe as a container type for their Photoshop software. It is used primarily by artists and stores multiple images in what are defined by the software and most artists as “layers.” These layers are independent pieces of the image that can be moved around independently within their layer without effecting images in other layers. .psd also supports transparency and a wide array of additional contextual information and image file formats, including vector images (mathematically scaled images). It can be easily converted into .TIFF for long term storage, though you should make sure all layers are properly preserved.
.psb is a newer file type invented by Adobe as a container for large Photoshop images that contains a compression layer to make the images easier to process for Adobe Photoshop software. This, much like the .psd file type is a proprietary file type and thus cannot be used unless you own Photoshop software. It is better to also convert these to .TIFF, though the same warnings of making sure all layers save properly still applies and awareness of the fact that .psb files are often much larger when converted than their .psd counterparts.
.TIFF is the file format we tend to want to convert images into. The reason for this is that .TIFF supports a wide array of programs, both closed and open-source (such as Photoshop and GIMP), have support for most kinds of images (including animated, CMYK and RGB), supports layers and transparency, is lossless and allows for contextual tagging that can be embedded into the image to give more information about the image or the work, including information like creation date, artist information and more! It is worth noting that .TIFF is a fairly old file format, but has only seen widespread adoption across specialized image software. A free .TIFF image converter is available for Windows here.
There are many other file types that may also need to be considered when handling images for preservation but for now we are only covering these most basic types. For more image types and explanations on how to handle them, please head over to our post for artists, where we talk about some of the more specialized files you may encounter (these include .svg, .eps and .indd).
File Resolution
When handling a digital image file, it can often be vitally important to consider the image’s resolution before deciding whether or not it is an image you want to keep for long term storage. After all, low resolution images can be barely visible by standards of modern monitors, and this continues to become more true with time! That said, resolution has had fairly standard “phases” by which it has passed and as a result it can be a good idea to use these as guidelines for considering what images to keep based on the era the images themselves were from. In the 1980s, an image at 640×480 would fill up the entire screen, whereas today in 2015 it will fill up only a fraction of your total screen real estate. Resolution continues to increase at a steady but relatively slow rate to much of technology and as a result the best consideration you can give to an image’s resolution is relative to the time period in which it was released. As can be seen from these statistics on internet browsers’ screen resolutions, there is a marked difference from 2002 to 2007. In this time period, the use of the 800×600 monitor resolution has almost disappeared from use by internet browsers. As a result, it becomes apparent that as time passes and older machines are phased out, the average resolution of all computers increases. As a result, in order to continue providing high-quality images to people, artists also increased the resolution of their images. These are general numbers, but represent the majority of users’ screen resolutions at the time:
1980s: 640×480 1990s: 800×600 2000s: 1024×768 2010s: 1280×1024(increasingly 1920×1080)
What this means is that generally you should consider images that are at or above the image resolution available at the time to be preservation quality. Many artists work with digital art files many, many times the size of the screen, even in the past, and so keeping an image preserved would likely entail handling an image of a quality at least the size of the screen, if not significantly larger.
Please note, this DOES NOT mean you should change the resolution of a file to fit the screen. It is always better to keep the resolution of an image at the largest it can be and let the image simply be scaled down to fit the screen, whenever possible!
File Quality
In the age of the internet it is vitally important to consider the digital quality of an image before making the decision of whether or not it is of preservation quality. Most changes made to digital files, without a master copy, are usually irreversible and thus while it is easy to corrupt or lose a digital file, it is even easier to damage the file through poor handling.
A couple of things that will cause irreversible damage (and are easy to do when handling images!).
- Image compression or layer merging. These are both issues that happen as a result of converting a layered file, such as a .TIFF to a format that cannot support the layers, such as .jpg or .png. These formats will “merge” the layers, flattening them and preventing any access to the merged layers individually. A consequence of .jpg formatting will also result in losing significant image quality which happens due to the compression algorithm run when compressing an image into a .jpg. .jpg will also eliminate any transparency layer in the image, if there is one (the background will become white).
- Watermarking. Do not watermark images! If you absolutely must, only do it on a separate, non-master copy file and only as a layer separate from the original image. Watermarks are a frequent cause of irreversible damage on images and depending on where they are placed on the image, may not be impossible to fix.
- File corruption. File corruption can happen for a variety of reasons, but for images is most often the result of naming and saving files in incorrect formats (such as trying to save an animated file as .jpg) or on drives that don’t have space to store the complete image. On occasion image corruption can also occur due to the computer dumping memory while attempting to edit an image, though this is fairly rare as most image editing programs normally keep a history of edits and changes. Even so, here we see an issue that is often a result of causing internal faults with how the computer handles images and should an image file be corrupted, is rare that it can be fully recovered (partial recovery is often possible but extremely slow).
When thinking about an image’s quality, it is often important to closely examine the image to make a determination about whether or not it is an image you want to keep in the long term. There are many subtleties to cover when examining an image’s quality, including noise and bit-depth, among others.
File Copies
The acronym of LOCKSS is used commonly by digital historians, librarians and others to generally refer to having lots of copies of things in order to prevent loss (Lots of Copies Keeps Stuff Safe). Established as a digital initiative by Stanford, it is a general philosophy of having many copies of your information. To extend it, it is also equally important to have data stored in many places, not just one place. After all, having three copies of the same image on the same hard drive will still cause you to lose the image for good if that hard drive fails! For images, this can either be an online cloud service or some other long-term storage service, of which there are many available that can store many types of files, including images. Storing digital images for the long-term will necessitate having at least one (preferably more) of these resources to backup to and it is best to get into the habit of setting up a system that creates copies, preferably automatically (there is software that can do this for you, but because it is not free, this guide does not include it). Establishing long-term storage is one of the most important things you can do for the images you collect!
Endnote
To end, I hope some of these explanations were useful to your understanding of what digital images are, how they work, why it is important to handle them carefully, as well as how to care for them in the long-term. In the other sections for artists and curators, more depth will be provided for issues specific to artists and curators in handling born-digital images.
For Artists (Born-Digital Images)
Hey artists! This guide is going to assume you have the knowledge from the post about collectors and are generally familiar with photo-editing software like GIMP or Photoshop. It will get into, to greater and lesser degrees, different ways to better preserve your work as well as discuss how to talk about long-term preservation with other parties with whom your work may be a part! It seeks to be comprehensive to the specific needs artists may face in the course of their work, especially with regards to long-term preservation of their digital images. To begin, we are going to talk a little bit about lesser known formats and discuss special considerations necessary to properly handle works that use these formats.
File Formats
Even among all the formats explained in the collectors section, there are at least a few more that are worth understanding for purposes of better understanding born-digital image from an art and, particularly design, perspective. These formats are special for a variety of reasons, either because they do not tend to have standardized formatting and thus do not transfer easily to other formats, or because they do not have an agreed upon standard for long-term preservation.
.eps (encapsulated postscript) is a file format most often used as an alternative to .svg or .ai files which are both also made for primarily storing scalable vector-based images. These files are different primarily for the additional information they can sometimes store in their files which includes file metadata that oftentimes will not be transferred to .ai or .svg formats. Due to .eps not being a common preservation format and support for it becoming rarer, it is recommended to convert these files to .svg when possible.
.ai (Adobe Illustrator) is a file format used specifically by Adobe to handle Adobe Illustrator files which are most often used in design contexts and include a wide variety of tools for making scalable vector-based images. In spite of this, .ai is a proprietary file format owned by Adobe and thus not recommended as a long-term storage format. Converting this to a .svg is recommended. It is known that some .ai files will not properly convert to .svg and thus must remain as .ai files until a proper conversion methodology is available.
.svg (Scalable Vector Graphics) is an image file format that can infinitely scale due to being an image created by math and then converted to scale appropriately to the size of the screen. These are ideal for certain formats and are often used with text and other elements that appear regularly in documents or designs. More complex images require more complex mathematics, such that it is not practical to use scalable vector images for many kinds of digital art. Most images are raster-based whereas .svg/.ai/.eps are vector-based, meaning they can scale as large or as small as computation power will allow. .svg is the recommended format for storing scalable images due to it being one of the few scalable vector formats that is open.
.indd (InDesign Document) is an Adobe file specification with rather unique properties. In addition to being a proprietary file format, these files often act as intermediaries for design specifications of various files and images. The file format is typically used in publishing but has also recently seen more use in digital art spaces as well, primarily acting as a layout editor for things like books or magazines. There is no agreed upon preservation format in which these documents can be converted, though for now it is recommended that any scalable images be exported to .svg and raster images be exported to Photoshop (where they can then be converted again to .TIFF). This is not an ideal situation but at the moment, no open specification exists for dealing with these files in the long-term.
These are the more specialized formats likely to be come across while working as an artist, but if there are more don’t hesitate to email me and I’ll be happy to add any others that are relevant to the guide!
Image Ownership
This is perhaps the most meat and potatoes issue in the guide and much of it is going to be contingent upon actually owning the art. Effectively, as an artist you need to be very careful about the contracts you sign and while you do not necessarily need to be sure that the art is owned by you and not the paying organization (organizations should tell you how ownership of any work they purchase is handled), you should be aware if they are going to preserve your works or not. Depending on the organization, this may or may not be a viable step for them to take, but if they are not going to take responsibility for the long-term preservation and handling of the art, this means that it falls to you as the creator to be certain to preserve digital master copies of any art you give or sell to the organization you are working with (and be sure to do this within the bounds of the contract). Without any kind of guarantee for handling or preservation you cannot be sure that they will retain your works or in what condition they will retain them. This is vitally important to consider both as an artist and as a creator of intellectual works. Be sure that you have a contingency plan for works you create, whether it takes the form of a digital backup site or an organization guaranteeing the long-term holding of your works. Note that this is NOT legal advice and is more for edification of the importance of being aware how your works will be handled in the future. You should talk to a legal adviser in the case of any contract you sign for substantial work, as they will be able to provide region-specific information that cannot be covered in this guide.
Content Sharing
Content sharing has been recommended, even by preservationists, as one method of spreading digital work and ensuring it lasts by proliferation rather than just profundity. While this can be an excellent methodology for ensuring the long-term survival or work, it is important to remember that these copies are not master-copies and thus cannot reproduce the depth of material available from those master copies. As a result it is vitally important to ensure that if your content is shared, it is within the constraints of any applicable contracts and just as importantly, is a copy as close to representing the master copy as possible. Content sharing is notorious for degrading the quality of work via simple distribution and the variety of changes that often happen in those exchanges. Oftentimes systems outside the control of either you or others will effect the quality of the material and as a result it is vitally important to maintain master-copies in at least three reliable sources.
Metadata
Though perhaps often furthest from the mind of artists, metadata can serve as a vitally useful resource for contextualizing digital art pieces when they are being handled by curators and preservationists. Metadata is a form of contextual data often added to files either automatically or manually by the file creator. By providing some metadata to your digital files, you can help curators by making their ability to understand and contextualize your artworks easier. The more metadata the better and even seemingly small things like the data of creation can often be extremely helpful for a conservator who may look at the file twenty years in the future. The more information the image is embedded with, the easier it is for future individuals to parse your creations!
Endnote
Altogether, these tidbits represent a small fraction of information to better increase your ownership, understanding and ability to own and preserve your works. The information here is intended to act as a starting point to learning about preserving digital artworks, to give some contextual information and provide resources to further explore the depths of digital preservation.
For Curators (Born-Digital Images)
Hey curators! This section is going to assume you have the collective knowledge available from both the collectors and artists sections, so if there’s anything unfamiliar here, it may have already been covered in one of those sections. In curation we’re mostly going to cover how to establish a metadata plan for your images as well as legal concerns over ownership and copyright, complicated legal hurdles which must often be overcome before deeper preservation of artworks can even begin. Having both a metadata plan and a legal understanding of ownership over artworks that were not created by you or even necessarily your organization can be vitally important when working with an image, particularly with regards to how that image is to be treated. In addition, we will discuss storage media and what may be ideal versus what is actually possible and how to go about distributing materials so that others, particularly researchers and the interested public can get use out of your materials.
As a note, since it is used in this guide, ($$$) means the resource being linked to must be paid for in order to access it.
Metadata
Depending on the scale of your organization, much of the metadata handling may already be directed by others within your organization and thus this may not be a significant issue for you past the point of simply entering information into a processing application that formats the metadata. In spite of this, it is still vitally important to understand that metadata is itself data about data and that it is extraordinarily important in identifying files as well as what information they contain. Without metadata it is often impossible to parse a file, sometimes extending so far as to be unable to be processed. Metadata is constantly under scrutiny, being revised and changed, and as yet there has not been a widely agreed upon file format for digital artworks. However, fairly recently there has emerged a new process still in infancy from Cornell. Though still needing time to further develop, it is a heartening attempt to preserve new media which continues to bewilder and frustrate many organizations when attempting to handle and ingest digital artworks.
Legal Issues
Legal issues abound for the preservation of digital artworks, not the least of which includes many powerfully restrictive laws that often chain even the ability of owners to exchange their work with a museum or library. Thus, it becomes imperative to acquire deeds of gift or some other legal form of disclosure that allows the organization to legally own the materials they receive from the creators or organizations they are working with. In many cases this can be a long and arduous process, but is often a necessary step to avoid misunderstandings and potential issues for the preserving organization in the future. A chapter entitled Death by Law in Rinehart and Ippolito’s ($$$) wonderful book on handling new media explains the multitude of ways new legal restrictions powerfully limit the ability of preserving institutions to actively and accurately preserve digital material for long periods of time without being subject to legal action. It is important to be familiar with applicable laws regarding digital artworks, including region, state and federal laws where applicable. Due to the difficulty with which the legal system has had with digital materials, the laws often vary widely from place to place, so it is helpful to familiarize yourself with them as they relate to your particular locale.
Storage
Storage for digital materials is a difficult prospect primarily because the digital materials themselves are often products of their time. The result is that digital artworks, particularly those using proprietary file formats like Photoshop (.psd), may eventually require using legacy software and hardware in order to access those files. Thus it is important where possible to not just store digital art files in robust storage media (digital storage media that is resistant to damage), but also to ensure that either legacy systems are available to support the media or there is some way to convert the digital material into a format that can be used for a longer period of time. In Serexhe’s book, a chapter entitled Keeping the Bits Alive ($$$) discusses at length the different ways in which storage media can cause the end of digital material as well as how that might be avoided in some cases. What’s worse is that the death of a software or hardware can necessitate expensive updating that not only costs money, but also time, and with digital materials often being extremely fragile to begin with, this compounds the likelihood of permanent loss or irreparable damage. Thus it is vitally important to convert materials where possible to better storage formats and, where not possible, to preserve the materials necessary to access them.
Weeding
Weeding materials, or culling copies or damaged materials, is an important function of any digital repository and the same holds true for any organization handling a large set of digital information. Another possibility is also weeding to save space or handing materials off to organizations better able to handle the materials, as is the case in some digital artworks which require specialized handling. Making decisions about weeding tends to relate to a few important questions when handling digital artworks. It begins with asking whether or not the materials need to be weeded in the first place, and if so, why and what purpose the weed will serve to better provide materials to the general public. Generally, if there is a copy of a digital artwork that is not a stored master-copy, it is acceptable to get rid of this copy if it shows up in some place it doesn’t belong. The reasoning behind a material’s appearance can be many fold, but it is not uncommon for copies of these materials to emerge as a result of public use or use by internal staff for various purposes, such as marketing. More complex are often the copies that emerge as a result of a database copying data or seeing an image copied multiple times for backup purposes and ending up with extra copies. More difficult still are the occasions where these result in lower-quality but visually similar copies and these will need to be dealt with expediently to avoid cluttering storage spaces. It is best to make a note in a public space that these deletions are occurring, if not already done by whatever content management system used by your organization. There are also a variety of programs that can help find image copies within a set of images and though there are free options their results tend to be statistically sketchy. It is almost always better to go with a paid option and sort through images manually before making weeding decisions, as even the best paid image duplicate finders tend to have some false positives.
Public Use
Allowing the public to access your digital artworks is almost always contingent on first clearing the pertinent legal hurdles associated with distributing visual copies of visual works. Once this is done, it becomes vitally important to create opportunities to provide materials online for purposes not just of public service, but also to attempt to find parties interested in the work your organization is pursuing. Notoriety with some degree of humility and caution has often led to many more opportunities for smaller organizations handling more specialized materials that previously had not been available or accessible to the public at large online. It is also important to restrict any access to master-copies to those on-site and where possible, separate the connection between the networks in some real way to prevent unauthorized access from the outside. As a security measure this simple separation can often prevent significant data loss, theft or damage from the outside. Where it is not possible to separate these networks, it then becomes important to enforce stricter security policies on networks with sensitive data, including master-copy digital artworks. What this will entail is likely to vary from organization to organization but will generally consist of making sure the network doesn’t have an outside, direct connection to the internet if possible.
Endnote
As an endnote, I hope this was a helpful guide to some of the issues requiring handling by a curator of digital artworks and answers some questions (and raises others!) about dealing with digital artworks. Digital is a media that is still evolving and with it, curators must attempt to keep up with the myriad of complex issues upon them when it comes to working with these new, beautiful materials. Please check out the resources for even more information on how to work with digital art materials!
Resources
Here is a list of resources you can refer to when gathering information on digital art preservation. There are some books as well as programs that are all separated from one another. Some of these require purchase, indicated by ($$$).
Books
Re-Collection by Rinehart & Ippolito ($$$) (Best Practices for Digital Collections)
Digital Art Conservation by Bernhard Serexhe ($$$) (Case Studies and Models for Digital Curation and Conservation)
Web Resources
JISC Digital Media Center (Guides Mostly For Curators)
Electronic Media Group Library (Old But Useful, Mostly For Curators)
American Institute For Conservation (Mostly for Curators and Artists – More for Physical than Digital Works)
Words in Space (Shannon Mattern’s webblog, Discussing Digital Preservation Aesthetics, Mostly for Curators)
Wikipedia (Great resource for lots of Generic and Some Specific Knowledge about Digital Preservation)
Programs
GIMP (Photo-editing)
Photoshop ($$$) (Photo-editing)
IrfanView (Batch Processing of Image Files)
SumatraPDF (PDF Reader)
FileZilla (FTP Program)
Other Web Resources
Most of these are guides, some of which helped to create the guides you see here!
https://www.jisc.ac.uk/guides/preserving-and-storing-your-information
http://jiscdigitalmedia.ac.uk/infokit/audiovisual-digitisation/long-term-considerations
https://www.jisc.ac.uk/guides/records-management/master-copy
https://www.jisc.ac.uk/guides/intellectual-property-rights-in-a-digital-world
https://www.jisc.ac.uk/guides/make-your-digital-resources-easier-to-discover
https://www.jisc.ac.uk/guides/make-your-collection-available-for-learning-and-teaching
http://www.dpconline.org/advice/preservationhandbook/institutional-strategies
http://bentley.umich.edu/about/what-we-do/digital-curation-strategies-and-procedures/680-2/