Pitite interview d'un petit gars de chez ATI expliquant le processus de fabrication d'une nouvelle CG ( très interressant
):
Zardon: Thanks for taking the time Eric to answer some questions for the Driverheaven members, I really appreciate it. Can you tell us your position in ATI? Have you always worked there?
Eric: No problem. I love to talk :-). I'm currently a Hardware Engineering Manager for ASIC graphics in our Santa Clara, CA, office. My team and I have been focused on developing the newest products from ATI (9500s and above). Before that, we worked on the Flipper Chip. And before that, I was at a startup called ArtX, which was purchased by ATI in 2000. Before that, I was at SGI. If I go back further, my brain starts fogging up.
Zardon: Do you enjoy working for ATI?
Eric: It's a great job. On a day-to-day basis, it's really the people you work with that determine if you enjoy working somewhere. I've known most of them a long time and, I must admit, they are the best engineers and just plain the best people around. I look forward to working every morning! As well, the intro of our products has been a great. It's nice to see the fruits of your work being enjoyed by people! We worked really hard, and I still remember the goose-bumps I got when we introduced the 9700p (Pro) in San Francisco. That was awesome.
Zardon: I would like to go back in time to when the techs in ATI were designing the 9700 video card, can you perhaps explain just how the whole concept is started on a typical implementation?
Can you explain the initial concepts before it goes to silicon, how does ATI go about designing a new chip?
Eric: Well, each project is a little different, but we generally start with an ?architectural definition?. It's an interactive process, where we look at all the new features we would like to have, estimate their costs (in terms of die area) and find the required performance level. All of this is based on what the marketplace wants (i.e. what power users and ISVs are looking for in the "next generation" graphics chip). Once we've figured out what main features we want, we develop an architecture that is capable of addressing those requirements. Sometimes that architecture is based on an evolution of the previous one, sometimes it's brand new. Once it's clear what needs to be built (from a broad-stroke standpoint), we focus on the details. This means running "simulations" of the new features, which are used to evaluate performance and image quality. At the same time, we get a much better idea on the implementation details and the area of the chip each feature will use. The break down of that area comes from figuring out the datapath, memory and control logic of the feature (datapath is the logic which performs the operations (such as a pixel shader ALU); memory is simply local storage for all variables; finally, control is what operates all these things and allows it to consume and produce data). We then iterate over all these. Sometimes features are cut, if they aren't "good enough" or if they cost too much (area wise). Or features can evolve and change, if we find better ways of doing things. It sounds more complicated than it is. It's really simply coming up with ideas, and then figuring out exactly how to make them work.
Zardon: What did you need to do for MSAA?
Eric: If we look at MSAA on the 9700p, we did tons of image quality simulations of AA algorithms. We varied sample positions, sample numbers, etc. This made us realize that we needed to gamma correct the samples. Otherwise the intensity of the AA line would vary too much. Once we figured out what the AA algorithm was going to be, we then focused on how to make it fast. That's where the lossless compression came in. It reduced memory bandwidth back down to what it was without AA. That allows us to achieve free AA in a lot of cases, and only small losses in most of the remaining cases. Finally, we decide to make the sample positions and count programmable, which allowed us to wait until silicon was back before the current sample positions were selected.
Zardon: What about the verification of the design process? I?m sure this is a critical part of the whole process.
Eric: Once we know what we want to build, and have the algorithms determined, we need to implement and verify the design. And verification is the key. We have a saying "if a feature is not verified, it's broken". And generally, that's very true. It's the critical stage of the whole process. Also, it's probably the single longest part of the process (total verification time for our team on the 9700p, was probably a year or so). Bugs in the silicon (never in our products, but in other products, I?ve heard) are usually due to operations that were not fully covered in the tests or features which could not be properly cross compared with initial algorithmic tests. As well, testing a feature must include testing of all possible variations in the API for that feature (either D3D or OGL). We use all these tests to verify our hardware model of the chip.
In the case of MSAA, all variations of the number of samples and the mixture of all possible surfaces and compression states must be tested. That's a huge test space. Lots of people sweated a long time to get that right.
Zardon: After you have verified the design process, I take it the next stage is to actually follow through on what you have learned and actually start making some silicon?
Eric: Once the design is stable and is passing all the verification tests, then you need to prepare things for manufacture. This means converting your hardware (which is written in a hardware description language, similar to C and other languages) into logic gates. Then you need to "place and route" your gates. This means, you need to place each gate into a position on the silicon die, then you need to route all the wires between the gates. ?Routing wires? means connecting wires between gates. In a typical CMOS process, this interconnect is made of many layers of metal (either Al or copper) with vertical connections between wires on each level. Once we've done this process (which is not trivial, when you are placing tens of millions of gates on a small piece of silicon the size of your fingernail), you need to "extract timing" back. This means you have to get the physical information on how long all the wires are and were they go. With that information, you can figure out how much time it takes for information (i.e. bits) to get from one memory to another. Memory elements in the design are the only elements that receive the "clock". The clock is what synchronizes all the data movements. Data can go from one memory to another, through gates and wires, between clock events. How fast that data moves through determines the fastest clock you can use. So, with our extracted information, we can figure out the slowest paths and what the fastest clock would be for that path. If that's enough to hit your performance targets, then you?re done. If not, then you need to go and change things to make them faster (these include new place and routes, new gate paths, etc.).
Once the timing is where you want it, and you've placed and routed all the gates and wires without problems, then you are ready to send that information to the manufacturer. There are quite a few of them out there (TSMC, UMC, IBM, NEC, just to name a few). These companies will take your data, and convert it to something that they can use to manufacture the silicon. It's rather involved and that could be an article all by itself.
Zardon: Can you explain how you go about working with Terry and the boys over at driver development, making sure the chip works correctly with the driver sets? This would be interesting for our readers.
Eric: We begin working with the driver guys are involved from the beginning. They are the API experts, and are involved in determining what new features get added, as well as the method to access those features. During the whole design process, we work together to make sure that the SW drivers all work efficiently and correctly for our hardware. As well, they work on adding all the new SW features, or changing old features, to match them to the new hardware. This is both for OpenGL, DirectX, 2D and multi-media drivers. The aim is to have working drivers before the new silicon comes back. This is done by running the drivers on hardware simulators. Once the chip comes back, the clock really starts running. Within a few months, the drivers and hardware must both be debugged, and the software must be optimized to allow the hardware to shine to its fullest. This is a job that never really finishes. Even after we ship, the software and hardware engineers continue to work together, to optimize and to push the hardware even harder. I'm not sure how long it takes to hit any sort of plateau, but even years after the 8500 was introduced, driver guys are still working on pushing that hardware harder. At the end, we all work together, since we really are shipping a system composed of boards and software. One cannot work without the other.
Zardon: Are there any stages after the physical design and driver development?
Eric: Well, there are quite a few stages. A fabrication house will manufacturer silicon wafers, with hundreds of dies on them. The process involved in that creation is complex. But once it?s done, wafer ?sorting? is next. In that process, each die is tested and we mark each die with a ?all passing?, or "dead". The dead ones are discarded. The remaining ones are all packaged, into the chip packages. They are then tested again, in a different tester, and the same three results occur (all good, ½ good, bad). Again, the bad ones are discarded (it?s a very low amount on this sort). The ½ passing dies will end up in the 9500 products, while any 100% passing die will end up in the 9700/9800 products.
With the initial silicon, at this point, we mounted the chips onto boards and spent months verifying all the features and making sure there was no silicon related issues. For the MSAA case I gave above, we actually never did find any problems with our MSAA.
With final silicon, after chip testing, the chips are ready to be mounted onto boards. Once the boards are assembled, they are again tested. There are quite a few tests there, to make sure that those boards are working properly, and up to our stringent quality level.
Zardon: Thats alot of work, is the next stage product releasing and promotion?
Eric: Well, there are a lot of items related with product release. Once we have sufficient chips, we need to work with our add-in-board partners to get their production boards ready, so that they can introduce their products at around the same time as any ATI branded boards.
During that time as well, engineering is spending time examining production. There are two main hopes there. The first is to improve the silicon yield on the wafers. That is mostly working with the fabrication house. The second is to improve the yield of good package chips and boards. That usually means increasing testing (and the such) at earlier stages in the pipe.
Finally, once we have a sufficient number of boards all tested and working and production is working full-tilt to produce more, we release the product. The release is a major affair. The marketing team works on that for a year before it?s even released! All the publicity, the launch events, etc? must all be organized and planned. As well, production level boards must be given to you guys, so that reviews can come out at the same time as the product is introduced. ATI has a mandate that we introduce products within 30 days of reviews. We don?t do paper launches !
Zardon: Thanks for doing this, anything you would like to add?
Eric: You?re welcome. My pleasure. I would like to also thank Gordon Elder for his work in helping me shape the responses. I?d also like to add that I focused on the 9700p products, but pretty much all products go through similar cycles. I?d also like to thank everyone that?s enjoying our products. We do it for you!
Src: http://www.driverheaven.net/ericdemers/
Message édité par krumli le 10-07-2003 à 10:06:41