Skip to content

[pkg/stanza/fileconsumer] - Duplication of logs #27037

Closed
@VihasMakwana

Description

@VihasMakwana

Component(s)

pkg/stanza

What happened?

Description

Consider the following scenario for fileconsumer:

  • We read a file during a poll cycle, and emit all the logs. Let's say we emitted 10 lines.
  • Before beginning the next poll cycle we do the following things in the background.
    - Write more logs to the file.
    - Copy that file, move it out of the pattern, and truncate it to 0.
    - Write more logs to the previous file. More than 10 lines (this is necessary)
    - Call next poll()

While performing the above steps, I noticed the excess logs that we wrote (after 10 lines), were emitted twice i.e. duplicated.

Steps to Reproduce

  • Used the following to produce this bug/issue.
  • The following test passes, unfortunately.
func TestOutOfPattern(t *testing.T) {
	tempDir := t.TempDir()
	cfg := NewConfig()
	cfg.Include = append(cfg.Include, fmt.Sprintf("%s/*.log1", tempDir))
	cfg.StartAt = "beginning"
	operator, emitCalls := buildTestManager(t, cfg)
	operator.persister = testutil.NewMockPersister("test")

	temp := openTempWithPattern(t, tempDir, "*.log1")
	writeString(t, temp, "testlog1\n")
	operator.poll(context.Background())
	waitForToken(t, emitCalls, []byte("testlog1"))

	// write more log, before next poll() begins
	writeString(t, temp, "testlog2\n")
	// copy the file to another file i.e. rotate, out of pattern
	temp2 := openTempWithPattern(t, tempDir, "*.log2")
	temp.Seek(0, 0)
	_, err := io.Copy(temp2, temp)
	require.NoError(t, err)

	temp.Seek(0, 0)
	temp.Truncate(0)
	temp.Write([]byte("testlog4\ntestlog5\n"))
	// begin next poll()
	fmt.Print("\n\n\nSecond poll\n")
	operator.poll(context.Background())

	// INCORRECT, should emit testLog5 only once.
	waitForTokens(t, emitCalls, [][]byte{[]byte("testlog5"), []byte("testlog4"), []byte("testlog5")})
}

Expected Result

It should only emit the logs once.

Actual Result

Duplication.

Proposed fix

  • The fix for this would be to compare previous fingerprints, and newer fingerprints and only emit more logs if they're the same.
  • I can work on a PR if that sounds okay.

Collector version

v0.85.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions