Proposal: Source Information
PostPosted:02 Dec 2013, 00:17
We've had some discussion in one place or another about how to deal with storing source information (or maybe it would be better to call it 'provenance') for our data. I've a proposal for a general way to handle this information, which I think may be suitable. I will mix a little bit of implementation details in my description, for which I apologize in advance.
The main thing I suggest is that a new type of object (and consequently a new table in the database) called, for example, SourceInformation be created. Each other type of object for which we require source information can link to one of these. This object type will have several properties:
The 'text' field includes a textual description of the source of the information. It should describe what the attached images show and where they come from (e.g. "The credits are sourced to the endgame staff roll, of which stills are provided."), and other information to explain the source of the information. Any other relevant information can also be included here. If, for example, there's reason to doubt that a source is accurate, text can be included here explaining why there is some doubt.
The 'source tags' field links to some tag list containing relevant descriptive information. For example "Low Quality Screenshots" to indicate that we might wish to replace our source screenshots with better ones. "Release Date from Retailer" or "Uncertain Release Date" could help us find dates that might need more verification. "Source Links to Dead Website" could be useful as well. Of course, there could be many such tags we could find useful. The primary purpose of this field is to help us find entries for which the data has low-quality sources of one kind or another, by tagging them in a way that we can easily search for.
These SourceInformation objects would be linked to, as I said, from any entities in the database which would require source information. By creating dedicated objects in the database to hold this information, we won't have to duplicate all these columns on every table, since essentially the same sorts of information will need to be stored whether the sources apply to a Release Group or a Game.
Importantly, this proposal does not allow for individual properties to have individual source information separated in the database, since I am proposing a single extra column per table (though of course it would be possible to add one source column for each piece of data, I think it would be overkill). It also does not provide for a database of potential sources (i.e. there will be no object called 'the playstation store' that release dates can be sourced to).
The exact structure of the SourceInformation table that I've described above is, of course, just a preliminary example. Other fields may be needed: if Oregami stores information about magazine issues, we may well want to allow a magazine issue to be linked to as a source, for example.
The only really important thing I'm suggesting is: the descriptions of our sources should get a table in the database and be linked to when needed.
Given all these caveats, what do you think?
The main thing I suggest is that a new type of object (and consequently a new table in the database) called, for example, SourceInformation be created. Each other type of object for which we require source information can link to one of these. This object type will have several properties:
Code: Select all
The first three fields hold (lists of) links to other objects in the database. For example, if credits are sourced to screenshots, or if publisher info is sourced to a box scan, those fields will contain links to the relevant entities. The third field, in particular, will hold any uploaded source images that don't fit into the other categories, be they low-quality images not otherwise suitable for inclusion, screenshots of web pages, or whatever else.Screenshots [Links to Screenshot entities]
Box Scans [Links to box scan entities]
Uploaded Images [Links to generic image entities]
Text [Freeform description of the sources]
Source Tags [Links to tags--more info later]
The 'text' field includes a textual description of the source of the information. It should describe what the attached images show and where they come from (e.g. "The credits are sourced to the endgame staff roll, of which stills are provided."), and other information to explain the source of the information. Any other relevant information can also be included here. If, for example, there's reason to doubt that a source is accurate, text can be included here explaining why there is some doubt.
The 'source tags' field links to some tag list containing relevant descriptive information. For example "Low Quality Screenshots" to indicate that we might wish to replace our source screenshots with better ones. "Release Date from Retailer" or "Uncertain Release Date" could help us find dates that might need more verification. "Source Links to Dead Website" could be useful as well. Of course, there could be many such tags we could find useful. The primary purpose of this field is to help us find entries for which the data has low-quality sources of one kind or another, by tagging them in a way that we can easily search for.
These SourceInformation objects would be linked to, as I said, from any entities in the database which would require source information. By creating dedicated objects in the database to hold this information, we won't have to duplicate all these columns on every table, since essentially the same sorts of information will need to be stored whether the sources apply to a Release Group or a Game.
Importantly, this proposal does not allow for individual properties to have individual source information separated in the database, since I am proposing a single extra column per table (though of course it would be possible to add one source column for each piece of data, I think it would be overkill). It also does not provide for a database of potential sources (i.e. there will be no object called 'the playstation store' that release dates can be sourced to).
The exact structure of the SourceInformation table that I've described above is, of course, just a preliminary example. Other fields may be needed: if Oregami stores information about magazine issues, we may well want to allow a magazine issue to be linked to as a source, for example.
The only really important thing I'm suggesting is: the descriptions of our sources should get a table in the database and be linked to when needed.
Given all these caveats, what do you think?