sajith is a user on mastodon.social. You can follow them or interact with them if you have an account anywhere in the fediverse. If you don't, you can sign up here.
sajith @sajith

It is Friday night and like the idiot I am, I am trying to figure out why pclose() returns -1 in some extremely awful code I wrote months ago, only when running in a machine in South Korea.

How is your night going?

· Web · 0 · 0

@sajith Well now I'm curious. Would you like debugging suggestions that you probably already tried? I can also offer rubber duck services.

@jamey Yes please!

I have tried a small test to try isolate the problem (that calls popen()/pclose()) but could not reproduce it. Probably something with the larger system.

I have tried strace, but I can't figure out why wait4() (which pclose() calls) returns ECHILD. Clearly the child process has run, because I could read its stdout.

I'm just gonna look for patterns in the said stdout and declare victory for tonight. But I too am curious to know what's going on.

@jamey Also Jamey you are a totally great person and you have my love and gratitude

@sajith As I expected, you have tried what I would have!

After checking the Linux man page for waitpid(2), the only other thing I can think of is whether the signal handling action for SIGCHLD is SIG_IGN, which signal(7) says is the default. (I feel like it ought to be possible to check that by digging around in /proc but I can't find anything about signals there.) I don't know how that's supposed to work, but maybe you have to set a signal handler to be able to call wait*?

@jamey So far my guess is that something somewhere else (that someone else wrote even before I joined this project) is leaking file handles.

I will need a good night's sleep to think about this with any clarity.

I will certainly peek inside /proc/. Thank you!

@sajith I don't immediately see how leaked file descriptors would lead to that problem, but it seems as plausible as anything. 😅

I'm curious to hear how it turns out, if you find the bug!

@jamey Yeah, I am pretty sure I guessed wrong. :-)

Another guess was that glibc's pclose() is not reentrant, but that doesn't seem to be the case either.