Super User Asked by Chris Le Sueur on January 5, 2022
I have a machine that has been damaged by some careless couriers, and want to replace the damage parts efficiently. I have limited opportunities to test components in other computers, so I’m trying to find out what is broken in other ways.
I have two main issues:
Graphical artifacts. These take the form of small grid-aligned squares which usually appear and then flicker form position to position. If the display driver doesn’t crash, they often settle down to a final position, and sometimes the contents of the squares itself changes. This says to me that VRAM is being corrupted. Occasionally there are other artifacts, like polygon spikes, in games.
This is affected by physically pushing on the graphics card: in particular, with the computer on its side, it usually goes away, which strongly suggests a graphics card error. However, it could also be the PCI-e slot or some part of the motherboard.
Twice since the problems started, Windows has somehow been rendered unbootable and unsalvageable: each time some boot files were corrupt and SFC could not fix them (different errors and files each time) so I had to reinstall. The first time this occurred was following a BSOD which occurred after graphical artifacts while playing a game. The second time, the computer BSODed while I wasn’t doing anything, but I wonder if it was installing updates.
The thing is, I would quite like it to be the case, for the sake of my wallet, that these are caused by the same underlying phenomenon. So, my question is: is it reasonable to believe that graphics card damage could somehow cause system corruption (presumably by the display driver doing something whacky in kernel mode?) and/or is it reasonable to believe that some other kind of system damage, presumably to the motherboard, could cause very specific graphical artifacts and occasionally more general breakage?
I should say that I strongly doubt the RAM is to blame (since we’re talking physical damage and RAM is pretty resilient, and it passes everything but the extreme hammer test in memtest)
I have disabled the graphics card and tested with on-board graphics. This gets rid of the graphical artifacts but does not rule out the slot, or motherboard circuitry related to the card, of course.
I have checked for SMART errors on the disks but there are none. Of course that’s not the be-all and end-all. Temperatures are all quite reasonable (CPU gets a bit toasty but it always has) and definitely not correlated to the artifacts or BSODs. I can run furmark/prime95 quite happily for ages with no ill effects. Specific games are more likely to trigger artifacts and driver crashes, presumably because they use the faulty circuitry more.
I'll start by saying that I personally haven't seen any display issues that would lead directly to bad RAM. Whenever I've had bad RAM, I'd experience BSODs, computer not booting, or random crashes. Never artifacts on screen. I'm not saying this is not possible, I don't know enough to be sure. I'm just saying that in my experience, this never happened.
I have seen faulty GPUs produce artifacts and BSODs. But it's more likely in my opinion that the problem lies in the motherboard. I don't see how a broken GPU could corrupt system files. If you can, you can test if you still have the artifacts when your GPU is connected to another PCIe slot through a GPU riser card, if you have one. Or you could install your GPU into another computer (maybe a friend could help you out with that?). That way you can eliminate or confirm if GPU works as intended.
The second issue you mention could be caused by the motherboard, or by a faulty disk. But since you checked the disk for errors, I'd say the motherboard is to blame.
This is, unfortunately, the hardest piece of equipment to test for errors, since you have to basically try if every other piece of equipment works properly on another motherboard.
Answered by GChuf on January 5, 2022
Overview / Preliminary Discussion
RAM is almost definitely what needs to be getting blamed in this case.
(In theory, a bad bus (communication pathway on the motherboard) or a bad CPU could cause such things. However, in practice, bad RAM happens at far greater frequency than those things. The only way to test that would be if you tried different RAM chips and found that the same hardware keeps reporting tested-good RAM chips as bad. A bad PSU could also lead to certain types of troubles.)
It is not surprising at all that some software may trigger problems more than other software. This can often be the case due to reasons like how much parallel threading a program has used in its design. It is not uncommon at all for games to use hardware heavily, thereby making games particularly prone to exposing actual problems. The problems are often exasperated by the internal design of the software, and different software creators may use different technical processes, so therefore it is not even uncommon at all for one game to show problems, while another similar-looking game doesn't show the same problems. (What the game looks like, e.g. if the game is a "first person shooter", is can be a good basis to try to form conclusions about whether certain types of problems are likely to be similar, but it is not always a good basis.)
So, other than historical trends of RAM being more likely, why should we be prone to blame bad RAM? We have two reasons.
It matches the experienced problems (very well)
Bad RAM can affect what the computer understands when it reads from files. Worse, bad RAM can affect what the computer thinks should be written to disk, leading the system files. So this explains your second symptom.
Bad RAM could also affect what the video card thinks should be drawn, and explains your first symptom.
So RAM is highly suspect, but the clincher is this:
You have evidence that RAM is the culprit
You may be leaning against the idea of trusting this evidence. I disagree. I believe this evidence should be trusted.
"RAM is pretty resilient, and it passes everything but the extreme hammer test in memtest"
When I've had bad RAM (unfortunately for me, I have), Memtest86 usually picks up on it the first pass. In some cases, it doesn't pick up until the 3rd or 4th pass. Rarely, it's picked up RAM errors on larger pass numbers, like 78 or 81 or 133. If Memtest86 picks up any errors, I do consider the RAM to be bad. If I'm on a machine that stores any files that have data that I care about, then I consider the bad RAM unsuitable. (I don't want my files to have incorrect data.) In theory, I might use a machine with bad RAM for something like a media server, a printer server, etc., where stability is less important to me and where I don't store any data that I wouldn't mind losing. In practice, this limitation ends up meaning that I have real no use for bad RAM.
However, I hadn't read Memtest86 documentation for a while, and wasn't familiar with this "extreme hammer test in memtest". So I checked it out.
Memtest86.com: Troubleshooting FAQ: "Why am I only getting errors during Test 13 Hammer Test?"
The text there is a bit of a lengthy answer (multiple screens), but I suggest reading it since it looks like this affects you. Most notably, I point out this sentence: "The errors detected during Test 13, albeit exposed only in extreme memory access cases, are most certainly real errors."
Answered by TOOGAM on January 5, 2022
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP