The pedigree of information is a history search
#3332 Created 04/26/2013 Updated 08/13/2019
August 10, 2011: We defined data pedigree as the certification of the creation of a piece of content, and the history of its adaptation, editing, or modification from its beginning to its present state.
One aspect of pedigree is "source detection", the observation of the creation of the item. While a copyright springs into existence at the moment content's creation, the documentation of that creation is less certain. Content that can achieve a certain pedigree of the moment of creation can be documented and attested.
As an example, the moment of creation of a newspaper story is at the time it it written, somewhere in the bowels of the newspaper production system, and the corporate copyright exists from that moment. Yet, it is the moment of publication -- electronic or physical -- which creates the start of a document trail.
For Web content, it is conceivably possible to document the path of pieces of text or video through various stages of appropriation and reproduction across the Web. While technically feasible, it would take the search capabilities of Google. The necessary action in this regard would be to fingerprint the original at creation, and watch the modification of the fingerprint over time as the text soaks through the web's aggregation and appropriation processes.
Similarly, concerning content created by consumers in, for example, a social media context, that content can be examined, and pattern-matching algorithms applied to detect probable matches between substrings in the postings and previously published products. When significant portions of previously published text match the consumer's post, and when those portions are unique in some identifiable way from commonly occurring patterns in ordinary language, then the content can be given a numeric rating which represents both the probability that the text was copied, and an indicator of the significance of the text in terms of uniqueness.
Variations on this concept can be used to construct reflexive histories of the penetration of stolen material into Web-wide content (assuming, of course, infinite free processing power).
This idea will be loved by content creators, distributors, and lawyers. This idea will be hated by Fox News, the Huffington Post,