Open Registry of Game Information 

  • Workflow for data entry

  • Talk about specific features of our upcoming online game database.
Talk about specific features of our upcoming online game database.

Moderators: MZ per X, gene

 #38549  by gene
 29 Jul 2019, 16:46
Regardless of our individual technical contexts, I am currently thinking about the general workflow of data entry.

Since all our data entries are technically realized via "Event Sourcing", each individual entry can be traced (who entered which data and when?). In addition, I had planned right from the start to use this technical procedure to "undo" or "reject" entries, because we will use some kind of verification process to keep the quality of our data high. Now the time has come to elaborate this general idea in detail.

What exactly can happen?

A relatively simple case could be when someone adds a single detail to an existing record (e.g. the first release year of a video game system). The user then makes no further entries. Within our review process, this change will be reviewed by an "expert" to assess whether the information is correct and whether the data entered should really remain in the system. If the data should be "deleted" again for any reason, the last event in the chain of all entries could technically be deleted and the publicly viewable data would then be updated. So far, so good.

But what if someone makes several entries directly one after the other and the data of one or more later steps depend existentially on the entries of the previous steps? It might then not be possible to simply remove one or more of the "middle" inputs from the chain of all unchecked inputs. This is because the subsequent events might not be "imported" without errors to update the public data.

Unfortunately, this discussion is currently somewhat theoretical, as we have not yet worked out many concrete input processes. But we should already think about it now, because how we want to tackle such problems is quite central for all technical areas.

But perhaps we should look at the whole issue from a different angle, namely from a technical point of view, i.e. abstracted from technology. Questions that arise here: Is it really absolutely necessary for a user to enter several data consecutively? Or is it legitimate that he has to wait for an activation after each data entry? Assuming that the answers to these two questions are "Yes" and "No", one should think about the fact that there must be two different views of all data: a view that contains all entered data and a view that contains only the unlocked data. Is that what we need?

Who makes a proposal for our input process? Any thought is welcome!
 #38550  by MZ per X
 30 Jul 2019, 20:47
Some initial thoughts about this important issue.
gene wrote:Is it really absolutely necessary for a user to enter several data consecutively?
In our Discord channel, one very valuable answer to this questions is: yes. And I tend to agree with that consecutive contributions are critical for ease of contributing und user motivation.

Back in my time at MobyGames, one could not contribute to games with unapproved changes. To remedy the negative effects of that, contributions needed to be finished rather quickly (within 7 days, I think), or would be automatically deleted, so the database entry wouldn't stay in limbo forever. Of course, the contributor would be warned several times before deletion. But in any way, it was very demotivating to try to contribute just to see someone else blocking the database entry.

Of course, we might be well advised to do both: Forbid consecutive contributing for newbies to assess the quality of their edits, give them the option to contribute consecutively later, if they have proven themselves somehow.
gene wrote: Assuming that the answers to these two questions are "Yes" and "No", one should think about the fact that there must be two different views of all data: a view that contains all entered data and a view that contains only the unlocked data. Is that what we need?
I'd say we need different data views regardless of the answers to the above questions, don't we?

If we do moderation, we separate our data per se in moderated and unmoderated data, or how I would like to call it: live data and staging data. Furthermore, we should give our users the option to "save" unfinished contributions for finishing them later. So the data flow would be like this: unfinished ---> staging ---> live.

The question is where and how to show those data.

My proposal would be to show unfinished data only to the contributor itself. MobyGames does auto deletion of such data after they haven't been touched for 7 days, I think. We should do something similar to not pollute the database with unfinished items.

As regards staging data (unmoderated data), we should show them only to logged in users, never to the public. And our users should be able to switch between staging and live data for their personal views. What I don't have a strong opinion about is if the user shall need to decide at sign-up already or not.

And live data is what we show to the public and deliver through our API.
gene wrote:But what if someone makes several entries directly one after the other and the data of one or more later steps depend existentially on the entries of the previous steps? It might then not be possible to simply remove one or more of the "middle" inputs from the chain of all unchecked inputs. This is because the subsequent events might not be "imported" without errors to update the public data.
Yes, we will end up with stray events. IMHO, every new contribution is also a stray event as long as it is not saved as staging to the database, isn't it? If we give the contributor the possibility to save a draft contribution, its parent(s) might have been gone when he/she finishes it.

To solve this situation, we might need something like git rebase. I'd say that an event is similar to a git commit as in it has one or more "parents". Would it be possible to offer contributors the "rebasing" of stray events?

Example:
Someone contributed a game, then two release groups with one release each. The game entry is then rejected, because it already exists. What if we'd give the user (or some expert) the possibility to give the RGs a new parent G entry, and have it again approved? And if the respective RGs already existed under the first G entry, too, the possibility to rebase the releases, too, and delete the new RGs?

gene wrote:Who makes a proposal for our input process? Any thought is welcome!
One proposal would be upvoting and user levels.

Let's assume our users can have seven levels:
1 public / not logged in
2 newbie
3 contributor
4 voter
5 reviewer
6 approver
7 admin

And these seven levels come with the following rights, in parts fine-grained per bounded context:
1 can only view live data and comment in a special section of our forums

2 can choose between viewing live and staging data, can contribute un-consecutevily, and can comment on own contributions

3 can contribute consecutevily, and comment on other people's contributions

4 can vote with a +1 on other people's contributions in one or more BCs of expertise

5 can vote with a +2 or -1 on other people's contributions in one or more BCs of expertise

6 can vote with a +3 or -1 or -2 on other people's contributions all over the database

7 just can :)

With this user system in place, we could assign voting points to data types, similar to Gerrit. Let's assume a new G entry is no rocket science, so we just assign a +2 to it. That would mean that for this entry to go live, two voters would need to upvote it, or one reviewer, or one approver.

A new gaming environment is much more difficult, so we assign +10 to it, so it would exemplary need 2 approvers and 2 reviewers to push it through.

Of course, we could invent even more user levels to add some gamification. ;)

So much for some initial thoughts.