Skip to content

HashDiff never finishes when comparing two arrays #49

@nbarrientos

Description

@nbarrientos

Hi,

When debugging an issue with octocatalog-diff we narrowed down the problem to HashDiff unable to compare a couple of arrays when LCS is used. Here's a reproducer:

require 'hashdiff'

a = {x: []}
(0...17000).each{ |x|
        a[:x].push((0...11).map { ('a'..'z').to_a[rand(26)] }.push('.example.org').join)
        }

b = {x: a[:x]}
 
puts "Without LCS"
diff = HashDiff.diff(a, b, :use_lcs => false)
puts diff
puts "Done!"
puts "With LCS"
diff = HashDiff.diff(a, b)
puts diff
puts "This should never be printed"

When comparing those two arrays using LCS (which is on by default) the application starts to eat up memory very quickly with 100% CPU usage until, in our case, the process is killed by the kernel's OOM killer as the test machine does not have any swap:

$ ruby reproducer_synth.rb 
Without LCS
Done!
With LCS
...
Killed
kernel: Out of memory: Kill process 18783 (ruby) score 526 or sacrifice child
kernel: Killed process 18783 (ruby) total-vm:1192540kB, anon-rss:1013324kB, file-rss:0kB, shmem-rss:0kB

Yes, that's >1GiB!

# ruby --version
ruby 2.0.0p648 (2015-12-16) [x86_64-linux]
# gem list hashdiff

*** LOCAL GEMS ***

hashdiff (0.3.8)
# 

I understand that LCS is O(n2) but not sure this is the expected behaviour as the array size is moderate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions