[mythtv-users] hash

BP lists at qucae.com
Tue Mar 17 03:13:45 UTC 2020

On 3/15/20 4:44 PM, jam at tigger.ws wrote:
>> On 15 Mar 2020, at 8:00 pm, mythtv-users-request at mythtv.org wrote:
>>> we have a feature that I think is silly, over the years has on many occasions caused me great grief. Not to list identical hash files more than once. (scan fails to find a file)
>>> The chance of two unrelated files having the same hash is a million to one. 
> I had not thought of that (re-organize library)
> I've just been going through this hell:
> [sandypit] /store/Movies [1002]% hashDups
> ----------------------
> now doing names
> ----------------------
> [number of files] 1261
> --------------------------------
> now doing hash on files
> --------------------------------
> now doing hash check
> --------------------------------
> But two files *still* wont scan.
> Recolved by:
> * removing the files
> * making an empty directory
> * scanning
> * put the files in the empty directory
> * scanning
> At last there they are. They were two slightly differant versions of the same movie (cutpoint changed)
> * mv them where I want
> * scanning
> I will post a patch when I'm done, meanwhile you could have multiple slightly different copies in your library.
> James

If I recall from a comment years ago when the current hash was 
implemented, it doesn't hash the entire file because that could take way 
too long.  It does a portion of the beginning and a portion of the end. 
Your getting hash collisions on files that are different in the middle 
could make sense assuming they were encoded with the same settings from 
the same source and your container format doesn't have any header data 
that would change the beginning bytes.

If the files are different sizes (since you mentioned different 
cutpoints), it might make sense to look in to incorporating filesize to 
the hash check.

More information about the mythtv-users mailing list