Blame - llvm/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt - toolchain/llvm-project

blob: 81ca53919d421a151b987a7f389b301181995f24 [file] [log] [blame]

Chris Lattner	2f7c963	2001-06-06 20:29:01 +0000	[diff] [blame]	1	Date: Sun, 19 Nov 2000 16:23:57 -0600 (CST)
				2	From: Chris Lattner <sabre@nondot.org>
				3	To: Vikram Adve <vadve@cs.uiuc.edu>
				4	Subject: Re: a few thoughts
				5
				6	Okay... here are a few of my thoughts on this (it's good to know that we
				7	think so alike!):
				8
				9	> 1. We need to be clear on our goals for the VM. Do we want to emphasize
				10	> portability and safety like the Java VM? Or shall we focus on the
				11	> architecture interface first (i.e., consider the code generation and
				12	> processor issues), since the architecture interface question is also
				13	> important for portable Java-type VMs?
				14
				15	I forsee the architecture looking kinda like this: (which is completely
				16	subject to change)
				17
				18	1. The VM code is NOT guaranteed safe in a java sense. Doing so makes it
				19	basically impossible to support C like languages. Besides that,
				20	certifying a register based language as safe at run time would be a
				21	pretty expensive operation to have to do. Additionally, we would like
				22	to be able to statically eliminate many bounds checks in Java
				23	programs... for example.
				24
				25	2. Instead, we can do the following (eventually):
				26	* Java bytecode is used as our "safe" representation (to avoid
				27	reinventing something that we don't add much value to). When the
				28	user chooses to execute Java bytecodes directly (ie, not
				29	precompiled) the runtime compiler can do some very simple
				30	transformations (JIT style) to convert it into valid input for our
				31	VM. Performance is not wonderful, but it works right.
				32	* The file is scheduled to be compiled (rigorously) at a later
				33	time. This could be done by some background process or by a second
				34	processor in the system during idle time or something...
				35	* To keep things "safe" ie to enforce a sandbox on Java/foreign code,
				36	we could sign the generated VM code with a host specific private
				37	key. Then before the code is executed/loaded, we can check to see if
				38	the trusted compiler generated the code. This would be much quicker
				39	than having to validate consistency (especially if bounds checks have
				40	been removed, for example)
				41
				42	> This is important because the audiences for these two goals are very
				43	> different. Architects and many compiler people care much more about
				44	> the second question. The Java compiler and OS community care much more
				45	> about the first one.
				46
				47	3. By focusing on a more low level virtual machine, we have much more room
				48	for value add. The nice safe "sandbox" VM can be provided as a layer
				49	on top of it. It also lets us focus on the more interesting compilers
				50	related projects.
				51
				52	> 2. Design issues to consider (an initial list that we should continue
				53	> to modify). Note that I'm not trying to suggest actual solutions here,
				54	> but just various directions we can pursue:
				55
				56	Understood. :)
				57
				58	> a. A single-assignment VM, which we've both already been thinking
				59	> about.
				60
				61	Yup, I think that this makes a lot of sense. I am still intrigued,
				62	however, by the prospect of a minimally allocated VM representation... I
Chris Lattner	0ab5e2c	2011-04-15 05:18:47 +0000	[diff] [blame]	63	think that it could have definite advantages for certain applications
Chris Lattner	2f7c963	2001-06-06 20:29:01 +0000	[diff] [blame]	64	(think very small machines, like PDAs). I don't, however, think that our
				65	initial implementations should focus on this. :)
				66
Chris Lattner	0ab5e2c	2011-04-15 05:18:47 +0000	[diff] [blame]	67	Here are some other auxiliary goals that I think we should consider:
Chris Lattner	2f7c963	2001-06-06 20:29:01 +0000	[diff] [blame]	68
				69	1. Primary goal: Support a high performance dynamic compilation
				70	system. This means that we have an "ideal" division of labor between
				71	the runtime and static compilers. Of course, the other goals of the
				72	system somewhat reduce the importance of this point (f.e. portability
				73	reduces performance, but hopefully not much)
				74	2. Portability to different processors. Since we are most familiar with
				75	x86 and solaris, I think that these two are excellent candidates when
				76	we get that far...
				77	3. Support for all languages & styles of programming (general purpose
				78	VM). This is the point that disallows java style bytecodes, where all
				79	array refs are checked for bounds, etc...
				80	4. Support linking between different language families. For example, call
				81	C functions directly from Java without using the nasty/slow/gross JNI
				82	layer. This involves several subpoints:
				83	A. Support for languages that require garbage collectors and integration
				84	with languages that don't. As a base point, we could insist on
				85	always using a conservative GC, but implement free as a noop, f.e.
				86
				87	> b. A strongly-typed VM. One question is do we need the types to be
				88	> explicitly declared or should they be inferred by the dynamic
				89	> compiler?
				90
				91	B. This is kind of similar to another idea that I have: make OOP
				92	constructs (virtual function tables, class heirarchies, etc) explicit
				93	in the VM representation. I believe that the number of additional
				94	constructs would be fairly low, but would give us lots of important
				95	information... something else that would/could be important is to
				96	have exceptions as first class types so that they would be handled in
				97	a uniform way for the entire VM... so that C functions can call Java
				98	functions for example...
				99
				100	> c. How do we get more high-level information into the VM while keeping
				101	> to a low-level VM design?
				102	> o Explicit array references as operands? An alternative is
				103	> to have just an array type, and let the index computations be
				104	> separate 3-operand instructions.
				105
				106	C. In the model I was thinking of (subject to change of course), we
				107	would just have an array type (distinct from the pointer
				108	types). This would allow us to have arbitrarily complex index
				109	expressions, while still distinguishing "load" from "Array load",
				110	for example. Perhaps also, switch jump tables would be first class
				111	types as well? This would allow better reasoning about the program.
				112
				113	5. Support dynamic loading of code from various sources. Already
				114	mentioned above was the example of loading java bytecodes, but we want
				115	to support dynamic loading of VM code as well. This makes the job of
				116	the runtime compiler much more interesting: it can do interprocedural
				117	optimizations that the static compiler can't do, because it doesn't
				118	have all of the required information (for example, inlining from
				119	shared libraries, etc...)
				120
				121	6. Define a set of generally useful annotations to add to the VM
				122	representation. For example, a function can be analysed to see if it
				123	has any sideeffects when run... also, the MOD/REF sets could be
				124	calculated, etc... we would have to determine what is reasonable. This
				125	would generally be used to make IP optimizations cheaper for the
				126	runtime compiler...
				127
				128	> o Explicit instructions to handle aliasing, e.g.s:
				129	> -- an instruction to say "I speculate that these two values are not
				130	> aliased, but check at runtime", like speculative execution in
				131	> EPIC?
				132	> -- or an instruction to check whether two values are aliased and
				133	> execute different code depending on the answer, somewhat like
				134	> predicated code in EPIC
				135
				136	These are also very good points... if this can be determined at compile
				137	time. I think that an epic style of representation (not the instruction
				138	packing, just the information presented) could be a very interesting model
				139	to use... more later...
				140
				141	> o (This one is a difficult but powerful idea.)
				142	> A "thread-id" field on every instruction that allows the static
				143	> compiler to generate a set of parallel threads, and then have
				144	> the runtime compiler and hardware do what they please with it.
				145	> This has very powerful uses, but thread-id on every instruction
				146	> is expensive in terms of instruction size and code size.
				147	> We would need to compactly encode it somehow.
				148
				149	Yes yes yes! :) I think it would be VERY useful to include this kind
				150	of information (which EPIC architectures implicitly encode. The trend
				151	that we are seeing supports this greatly:
				152
				153	1. Commodity processors are getting massive SIMD support:
				154	* Intel/Amd MMX/MMX2
				155	* AMD's 3Dnow!
				156	* Intel's SSE/SSE2
				157	* Sun's VIS
				158	2. SMP is becoming much more common, especially in the server space.
				159	3. Multiple processors on a die are right around the corner.
				160
				161	If nothing else, not designing this in would severely limit our future
				162	expansion of the project...
				163
				164	> Also, this will require some reading on at least two other
				165	> projects:
				166	> -- Multiscalar architecture from Wisconsin
				167	> -- Simultaneous multithreading architecture from Washington
				168	>
				169	> o Or forget all this and stick to a traditional instruction set?
				170
				171	Heh... :) Well, from a pure research point of view, it is almost more
				172	attactive to go with the most extreme/different ISA possible. On one axis
				173	you get safety and conservatism, and on the other you get degree of
				174	influence that the results have. Of course the problem with pure research
				175	is that often times there is no concrete product of the research... :)
				176
				177	> BTW, on an unrelated note, after the meeting yesterday, I did remember
				178	> that you had suggested doing instruction scheduling on SSA form instead
				179	> of a dependence DAG earlier in the semester. When we talked about
				180	> it yesterday, I didn't remember where the idea had come from but I
				181	> remembered later. Just giving credit where its due...
				182
				183	:) Thanks.
				184
				185	> Perhaps you can save the above as a file under RCS so you and I can
				186	> continue to expand on this.
				187
				188	I think it makes sense to do so when we get our ideas more formalized and
				189	bounce it back and forth a couple of times... then I'll do a more formal
				190	writeup of our goals and ideas. Obviously our first implementation will
				191	not want to do all of the stuff that I pointed out above... be we will
				192	want to design the project so that we do not artificially limit ourselves
				193	at sometime in the future...
				194
				195	Anyways, let me know what you think about these ideas... and if they sound
				196	reasonable...
				197
				198	-Chris
				199